Master’s thesis on finding missing links in data using machine learning at Danmarks Statistik

Are you curious about machine learning (ML)? Do you want to take part in developing and improving the education data that provides the basis for novel research and data driven policy-making?

 Statistics Denmark is seeking one student with a passion for ML to explore the usefulness of ML methods in finding missing links between data in a relational database.

Matched Educational Data

Statistics Denmark is developing a new internationally ground-breaking database, Matched Educational Data (MED), that provides detailed knowledge of which lessons students attend, which teachers have taught these lessons and the subjects of these lessons.

 The database receives live data directly from schools pertaining to every student, teacher, class and lesson taught in the schools every day. As such, the volume of data spans more than 100.000.000 data rows every year, which makes traditional methods of quality assurance unfeasible.

 ML methods have already proven useful for the MED database in terms of label recognition and filling missing data, however the usefulness of ML in finding missing links between data remain unexplored.

 While most of the MED data is linked via unique keys when it comes to mandatory school subjects, there is considerable potential for improvement pertaining to elective subjects. As of 2022 it is not possible to know which students have attended lessons in elective subjects, since there exists no unambiguous link between students and lessons in this case.

 To solve this issue, ML can be used to find the correct links and reduce ambiguity. For this task, all of the MED database’s auxiliary information such as students’ grades and class composition is available to validate and improve the ML model.

Supportive learning environment

Statistics Denmark offers a beneficial environment with great opportunities for learning and collaborating. Our Data Science Lab has experience with and expertise in ML, and can provide guidance and help. Furthermore, you will mostly be collaborating with our office, the education section, which holds a deep understanding of the data you will be working with, as well as some knowledge of ML.

 It is required to involve the employees of our office in your work throughout the process – this is important to find the best solution to a given problem, but also to share learnings and thoughts about the ML model being built.

Contact

If you want to apply for this master’s thesis project, we want to receive your application by September 1, 2022. If you want to know more or want to discuss any details, you are encouraged to contact Alex Skøtt Nielsen:

 +45 39 17 31 35

 axn@dst.dk

Attention: Often you need a pre-approval from your university or study counselor, to ensure that projects or thesis found on AU Job- og Projektbank will be accepted as part of your education. Please contact the right entity in due time to ensure that you're picking the right project.