Page 66 - The-5th-MCAIT2021-eProceeding
P. 66
3. Results and Discussion
Student’s data were collected and preprocessedas explained in step one to four. After the data were
cleaned,Correlation analysis using Spearman correlation coefficient was conducted because it is suitable for
dataset containingboth the continuous and discrete variables (Hussain et al., 2018). Every variable in the study
received a correlation coefficient (r) after the Spearman correlation analysis, which represented the intensity
and direction of the linear relationship between the tested pair of variables.
Table2. Correlation analysis and descriptive statistics for all students-related features.
Variables r P Std. deviation Mean
Jantina 0.030* 0.000 0.469 1.672
Perkahwinan -0.043* 0.000 0.085 1.007
Negeri_Lahir 0.076* 0.000 1.398 2.861
Kediaman_Penginapan 0.016* 0.000 0.456 1.705
Kelas_B40 -0.011* 0.000 0.763 1.523
Sekolah -0.019* 0.000 0.941 1.364
Universiti -0.040* 0.000 0.656 2.210
Umur_Daftar -0.009* 0.000 0.816 2.070
Kelayakan -0.018* 0.000 0.806 2.047
Bidang_Pengajian -0.052* 0.000 1.538 3.792
Tempoh_Pengajian -0.018* 0.000 0.784 2.521
Tajaan 0.020* 0.000 0.782 2.347
Cgpa 0.043* 0.000 0.707 2.072
Keputusan_Li 0.049* 0.000 0.982 2.167
Bil_Aktiviti_Keseluruhan -0.024* 0.000 6.573 2.945
Note: * Correlation is significant at the 0.05 and 0.01 levels, respectively (2-tailed)
The statistical results shown in Table 2 suggests that the correlation of jantina, perkahwinan, negeri lahir,
universiti, bidang pengajian, cgpa and keputusan LI with the employment status ares lightly higher as compared
to kediaman penginapan, kelas B40, sekolah, umur daftar, kelayakan, tempoh pengajian, tajaan and bil aktiviti
keseluruhan. Moreover, Table 2 indicates that all variables were significant with respect to the employment
status where P-values are less than 0.05.
This study concluded that the variables in our dataset are meaningful and can be used in subsequent studies
to develop a prediction model of student performance.Machine learning algorithm such as random forest,
decision tree, naïve bayes and k-nearest neighbour can be applied to our dataset because they are widely used
and produce high accuracyaccording to prior research (Casuat & Festijo, 2020).
Acknowledgements
This publication was supported by the Universiti Kebangsaan Malaysia (UKM) under the Research
University Grant (project code: GUP-2019-060).
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [53]
Artificial Intelligence in the 4th Industrial Revolution