F1-AS1-2 - Using early assessment performance as early warning signs to identify at-risk students in programming courses

3. Research Full Paper
ASHOK KUMAR VEERASAMY1 , Daryl D'Souza2, Mikko-Ville Apiola1, Mikko-Jussi Laakso1, Tapio Salakoski1
1 University of Turku, Turku, Finland
2 RMIT University, Melbourne, Australia

This research full paper presents results of a parsimonious model developed using early assessment tasks as predictors to identify at-risk students. To date several studies have been conducted to identify and retain at-risk students in CS courses. However, both CS education researchers and teachers have long sought to understand early warning signs for identifying at-risk students. While coursework-based predictive models have been developed, which employ a range of preliminary statistical techniques and machine learning methods, they need further investigation, due to inconsistencies in a range of identified factors and techniques employed. In addition, they did not consider the use of parsimonious models. This study investigates a validated, parsimonious model with the added objective of visualisation of student performance on early continuous assessments, as early warning signs for instructors to assist students who identified as low-motivated learners. Specifically, this paper presents classification tree analysis (manually created) and a machine learning-based predictive model that uses two variables to predict student performance in introductory programming. The model uses Random forest classification analysis to identify at-risk students. Visualization of the decision tree results is employed to facilitate early instructor identification of at-risk students. Student data for academic years 2016, 2017 and 2018, is analysed, for both descriptive and classification analysis. Data for the formative assessment tasks were collected in the first two weeks of the semester used for model development, validation and testing. The overall prediction accuracy of the Random forest classification-based model for unknown data was 60%. The unknown data test results of this study showed that it is possible to predict 77% of students that need support, as early as week 3, based on student performance in continuous formative assessment tasks in a 12-week introductory programming course. Moreover, our classification tree analysis revealed that students who secured less than or equal to 25% in formative assessment tasks in the first two weeks are unlikely attend or indeed fail the final exam. Hence, these results provide insights for early interventions to prevent attrition and failure and to increase student retention and student success.