Analysis of academic performance using machine learning techniques with assembly methods
Main Article Content
Keywords
Boosting, Educational data analytics, Ensemble, Machine learning, Student academic performance
Abstract
In recent years, the educational field has been permeated by data analysis models and algorithms that aim to search for knowledge from data to improve academic performance and other indicators. The main objective of this research is to predict the academic performance of students through machine learning techniques. Through feature selection methods are analyzed 324 variables, in order to determine the influential variables. The university academic performance prediction model is evaluated by means of supervised algorithms (KNN, SVC, Naive Bayes and decision tree), which are optimized using Python language. In addition, assembly algorithms are implemented that allow improving the accuracy of the previous classifiers. Bagging (CART, Random Forest; ExtraTreesClassifier), Boosting (AdaBoost, GBM, XGBoost, CatBoost, Light Boost) and Voting (Blending, Stacking) methods are implemented. The results show that the Stacking and Blending algorithms with accuracy values in each semester that oscillate around 85% and 75% for training and testing, respectively, yield the best results.
References
Adekitan, A. I., & Noma-Osaghae, E. (2018). Data mining approach to predicting the performance of first year student in a university using the admission requirements. Education and Information Technologies, 24, 1527–1543. https://doi. org/10.1007/s10639-018-9839-7
Adekitan, A. I., & Salau, O. (2019). The impact of engineering students’ performance in the first three years on their graduation result using educational data mining. Heliyon, 5(2), e01250. https://doi.org/10.1016/j. heliyon.2019.e01250
Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education, 17(3). https://doi.org/10.1186/s41239-020-0177-7
Anderson, H., Afshan, B., & Baker, R. (2019). Predicting Graduation at a Public R1 University Predicting Graduation at a Public R1 University. 2012(February), 1–4.
Awadalla, S., Davies, E. B., & Glazebrook, C. (2020). A longitudinal cohort study to explore the relationship between depression, anxiety and academic performance among Emirati university students. BMC Psychiatry, 20(448). https:// bmcpsychiatry.biomedcentral.com/track/ pdf/10.1186/s12888-020-02854-z.pdf
Bhutto, S., Siddiqui, I. F., Arain, Q. A., & Anwar, M. (2020). Predicting Students’ Academic Performance Through Supervised Machine Learning. ICISCT 2020 - 2nd International Conference on Information Science and Communication Technology. https://doi. org/10.1109/ICISCT49550.2020.9080033
Bonsaksen, T. (2016). Predictors of academic performance and education programme satisfaction in occupational therapy students. British Journal of Occupational Therapy, 79(6). https://doi. org/10.1177/0308022615627174
Bourel, M. (2012). Model aggregation methods and applications. 10, 19–32.
Campo-Ávila, D., Ramos-Jimenez, G. P., Morales-Bueno, R., & Baena-García, M. (2018). Minería de datos educativos para la predicción personalizada del rendimiento académico.
Candia Oviedo, D. I. (2019). Predicción del rendimiento académico de los estudiantes de la UNSAAC a partir de sus datos de ingreso utilizando algoritmos de aprendizaje automático.
Castrillón, O., Sarache, W., & Ruiz, S. (2020). Predicción del rendimiento académico por medio de técnicas de inteligencia artificial. Revista Formación Universitaria, 13(1), 93–102. https://doi.org/10.4067/S0718- 50062020000100093
Céspedes, R. C., Vara-Horna, A., Lopez-Odar, D., Diaz-Rosillo, A., & Asencios-Gonzalez, Z. (2018). Ausentismo, presentismo y rendimiento académico en estudiantes de universidades peruanas. Propósitos y Representaciones, 6(1), 83–133. https:// doi.org/10.20511/pyr2018.v6n1.177
Contreras, L., Fuentes, H., & Molano, J. (2021). Analítica académica: nuevas herramientas aplicadas a la educación. Revista Boletin Redipe, 10(3), 137–158.
Contreras, L., Fuentes, H., & Rodriguez, J. (2020). Application of automatic learning as a prediction strategy for academic dropout in universities. Sylwan Journal, 164(6). http://sylwan.ibles.org/archive. php?v=164&i=6
Contreras, L., & López, I. (2020). Academic Performance Prediction in Universities using Ensemble Algorithms: A Literature Review. International Journal of Mechanical and Production Engineering Research and Development (IJMPERD) , 10(5), 797–810. http://www.tjprc.org/ view_paper.php?id=14682
Costa, E., Fonseca, B., Almeida, M., & Ferreira, F. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. https://doi.org/10.1016/J. CHB.2017.01.047
De La Hoz, E. J., De La Hoz, E. J., & Fontalvo, T. J. (2019). Methodology of Machine Learning for the classification and Prediction of users in Virtual Education Environments. Informacion Tecnologica, 30(1), 247–254. https://doi.org/10.4067/ S0718-07642019000100247
Durán, C., & Rosado, A. (2019). La comprensión lectora y el rendimiento académico en estudiantes de Ingeniería. Revista Colombiana de Tecnologías de Avanzada (RCTA), 1(33). https:// doi.org/10.24054/16927257.v33. n33.2019.3317
Espinosa, J., Hernández, J., Rodríguez, J., Chacín, M., & Bermúdez, V. (2020). Influencia del estrés sobre el rendimiento académico. AVFT-Archivos Venezolanos de Farmacología y Terapéutica, 39(1). https://bonga.unisimon.edu.co/bitstream/ handle/20.500.12442/6322/PDF.pdf;jsessi onid=BCAE649A5ED0968F81C3D9B047 C8039E?sequence=1
Ferreyra, M., Botero, J., Haimovich, P., & Urzúa, S. (2017). Momento decisivo La educación superior en América Latina y el Caribe. https:// openknowledge.worldbank.org/bitstream/ handle/10986/26489/211014ovSP. pdf?sequence=5&isAllowed=y
Garbanzo, & María, G. (2007). Factores asociados al rendimiento académico en estudiantes universitarios, una reflexión desde la calidad de la educación superior pública. Revista Educación, 31(1), 43–63. http://www.redalyc.org/articulo. oa?id=44031103
García, G. (2014). Modelo de Machine Learning para la Clasificación de pacientes en términos del nivel asistencial requerido en una urgencia pediátrica con Área de Cuidados Mínimos. 103.
García, J., Sánchez, P., Orozco, M., & Obredor, S. (2019). Extracción de Conocimiento para la Predicción y Análisis de los Resultados de la Prueba de Calidad de la Educación Superior en Colombia Knowledge Capture for the Prediction and Analysis of Results of the Quality Test of Higher Education in Colombia. Revista Formación Universitaria, 12(4), 55–62. https://doi.org/10.4067/ S0718-50062019000400055
Gareth, J. (2013). An introduction to statistical learning : with applications in R (Springer (ed.); 1st ed., Vol. 1). Springer
Grob, M., Becerra, D., Rodriguez, A., Cristiane, J., Ramirez, V., & Sabag, N. (2015). Relación entre Puntaje de Prueba de Selección Universitaria y Nota Enseñanza Media, y el Rendimiento Académico de la Asignatura de Morfología en Alumnos de Primer Año de Odontología de la Universidad de Los Andes. International Journal of Morphology, 33(2), 527–531.
Guizado, G., Valenzuela, M., & Vallejo, P. (2020). Desempeño docente y el rendimiento académico de los estudiantes de la Facultad de Tecnología en la Universidad Nacional de Educación de Perú. Revista Conrado, 16(72). https://orcid.org/0000- 0002-7852-458X
Guleria, P., & Sood, M. (2018). Predictive data modeling: Educational data classification and comparative analysis of classifiers using python. PDGC 2018 - 2018 5th International Conference on Parallel, Distributed and Grid Computing, 74Guleria, P., Sood, M. (2018). Predictive data. https:// doi.org/10.1109/PDGC.2018.8745727
Hernández, C. (2016). Diagnóstico del rendimiento académico de estudiantes de una escuela de educación superior en México. Revista Complutense de Educación, 27(3), 1369–1388. https:// revistas.ucm.es/index.php/RCED/article/ view/48551/48839
Jahangiri, A., & Rakha, H. A. (2015). Applying Machine Learning Techniques to Transportation Mode Recognition Using Mobile Phone Sensor Data. IEEE Transactions on Intelligent Transportation Systems, 16(5), 2406–2417. https://doi. org/10.1109/TITS.2015.2405759
Jalota, C., & Agrawal, R. (2019). Analysis of Educational Data Mining using Classification. Proceedings of the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing: Trends, Prespectives and Prospects, COMITCon 2019, 243–247. https://doi.org/10.1109/ COMITCon.2019.8862214
Joshika, P. and Rajeshwari. (2019). STUDENTS ’ PERFORMANCE ANALYSIS USING SIMPLE K-MEANS. 21(14), 990–995.
Kaunang, F. J., & Rotikan, R. (2018). Students’ academic performance prediction using data mining. Proceedings of the 3rd International Conference on Informatics and Computing, ICIC 2018, 1–10. https:// doi.org/10.1109/IAC.2018.8780547
Kostopoulos, G., Kotsiantis, S., Pierrakeas, C., Koutsonikos, G., & Gravvanis, G. A. (2018). Forecasting students’ success in an open university. International Journal of Learning Technology, 13(1), 26–43. https:// doi.org/10.1504/IJLT.2018.091630
Kumar, V. Krishna, A. Neelakanteswara, P. Basha, C. (2020). Advanced Prediction of Performance of a Student in an University using Machine Learning Techniques. Proceedings of the International Conference on Electronics and Sustainable Communication Systems, ICESC 2020, Icesc, 121–126. https://doi.org/10.1109/ ICESC48915.2020.9155557
Lamas, H. (2015). Sobre el rendimiento escolar. Prósitos y Representaciones: Revista de Psicología Educativa, 3(1), 313–386.
Lenskiy, A., Shariat, R., & Seol, S. (2020). The effect of academic breaks on undergraduate academic performance. The International Journal of Electrical Engineering & Education, 0(0), 1–12. https://doi. org/10.1177/0020720920922518
Lloret-Segura, S., Ferreres-Traver, A., Hernández-Baeza, A., & Tomás-Marco, I. (2014). El análisis factorial exploratorio de los ítems: Una guía práctica, revisada y actualizada. Anales de Psicologia, 30(3), 1151–1169. https://doi.org/10.6018/ analesps.30.3.199361
López-Aguado, M., & Gutiérrez-Provecho, L. (2019). Cómo realizar e interpretar un análisis factorial exploratorio utilizando SPSS. REIRE Revista d’Innovació i Recerca En Educació, 12(2), 1–14. https:// doi.org/10.1344/reire2019.12.227057
Martinez-Rodriguez, R. A., Alvarez-Xochihua, O., Mejia Victoria, O. D., Jordan Aramburo, A., & Gonzalez Fraga, J. A. (2019). Use of Machine Learning to Measure the Influence of Behavioral and Personality Factors on Academic Performance of Higher Education Students. IEEE Latin America Transactions, 17(4), 633–641. https://doi. org/10.1109/TLA.2019.8891928
Mavrou, I. (2015). Análisis factorial exploratorio: Cuestiones conceptuales y metodológicas. Revista Nebrija, 19, 71–80. https://www. nebrija.com/revista-linguistica/analisisfactorial-exploratorio.html
Mengash, H. A. (2020a). Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access, 8(1), 55462–55470. https://doi.org/10.1109/ ACCESS.2020.2981905
Mengash, H. A. (2020b). Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access, 8, 55462–55470. https://doi.org/10.1109/ ACCESS.2020.2981905
Minichil, W., Eskindir, E., Demilew, D., & Mirkena, Y. (2020). Magnitude of premenstrual dysphoric disorder and its correlation with academic performance among female medical and health science students at University of Gondar, Ethiopia, 2019: a cross-sectional study. BMJ Open, 10(e034166). https://doi.org/10.1136/ bmjopen-2019-034166
Montero, E., Villalobos, J., & Valverde, A. (2007). Factores institucionales, pedagógicos, psicosociales y sociodemográficos asociados al rendimiento académico en la Universidad de Costa Rica: un análisis multinivel. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 13(2), 215–234. www.uv.es/RELIEVE/ v13n2/RELIEVEv13n2_5.htmwww.uv.es/ RELIEVE]pag.215
Moubayed, A., Injadat, M., Shami, A., & Lutfiyya, H. (2020). Student Engagement Level in an e-Learning Environment: Clustering Using K-means. American Journal of Distance Education, 34(2), 137–156. https://doi.org /10.1080/08923647.2020.1696140
Muñoz-Comonfort, A., Leenen, I., & der Goes, T. I. F. (2014). Correlación entre la evaluación diagnóstica y el rendimiento académico de los estudiantes de medicina. Investigación En Educación Médica, 3(10), 85–91. https://www.sciencedirect.com/science/ article/pii/S2007505714727310
Murnion, P., & Helfert, M. (2013). Academic Analytics in quality assurance using organisational analytical capabilities A User-level Usage Analytics in Cloud Based Applications View project Insight View project. In U. Oxford (Ed.), Annual Conference of the UK Academy of Information Systems (UKAIS). https://doi. org/10.13140/2.1.3368.1600
Nieto, Y., Garcia, V., Montenegro, C., Gonzalez, C., & Gonzalez, R. (2019). Usage of Machine Learning for Strategic Decision Making at Higher Educational Institutions. IEEE Access, 7, 75007–75017. https://doi. org/10.1109/ACCESS.2019.2919343
Ochoa, L. L., Rosas Paredes, K., & Baluarte Araya, C. (2017). Evaluación de técnicas de minería de datos para la predicción del rendimiento académico. Proceedings of the LACCEI International MultiConference for Engineering, Education and Technology, 2017-July(January). https:// doi.org/10.18687/LACCEI2017.1.1.368
Orihuela Maita, G. Y. (2019). Aplicación de Data Science para la Predicción del Rendimiento Académico de los Estudiantes de la Facultad de Ingeniería de Sistemas de la Universidad Nacional del Centro del Perú. Universidad Nacional Del Centro de Perú, 114.
Patacsil, F. F. (2020). Survival analysis approach for early prediction of student dropout using enrollment student data and ensemble models. Universal Journal of Educational Research, 8(9), 4036–4047. https://doi. org/10.13189/ujer.2020.080929
Rivera, E. E., Becerra, S. C., Cotrina, A. R., & Acero, A. C. (2020). Empatía y rendimiento académico en estudiantes universitarios. Educare, 24(2), 26. https://revistas. investigacion-upelipb.com/index.php/ educare/article/view/1319/1289
Rodriguez, M., & Ruíz, M. (2009). Indicadores de rendimiento de estudiantes universitarios: calificaciones versus créditos acumulados. Revista de Educación, 355, 467–492. http://www.revistaeducacion.educacion.es/ re355/re355_20.pdf
Sajjadi, S. Shapiro, B. Mckinlay, C. Sarkisyan, A.Shubin, C., & Osoba, E. (2018). Finding bottlenecks: Predicting student attrition with unsupervised classifier. 2017 Intelligent Systems Conference, IntelliSys 2017, 2018-Janua, 1166–1172. https://doi. org/10.1109/IntelliSys.2017.8324279
Santosh, K. C. (2020). AI-Driven Tools for Coronavirus Outbreak: Need of Active Learning and Cross-Population Train/ Test Models on Multitudinal/Multimodal Data. Journal of Medical Systems, 44(5), 1–5. https://doi.org/10.1007/s10916-020- 01562-1
Santoso, L. W., & Yulia. (2019). The Analysis of Student Performance Using Data Mining. In Advances in Intelligent Systems and Computing (Vol. 924). Springer Singapore. https://doi.org/10.1007/978-981-13-6861- 5_48
Sweeney, M., Rangwala, H., Lester, J., & Johri, A. (2016). Next-Term Student Performance Prediction: A Recommender Systems Approach. 1–27. https://doi.org/10.5281/ zenodo.3554603
T.Velmurugan, & Anuradha, C. (2016). Performance Evaluation of Feature Selection Algorithms in Educational Data Mining. International Journal of Data Mining Techniques and Applications, 5, 131–140. http://www.hindex.org/2016/ article.php?page=1176
Vega García, J. F. (2019). Modelo de pronóstico de rendimiento académico de alumnos en los cursos del programa de estudios básicos de la Universidad Ricardo Palma usando algoritmos de Machine Learning.
Viloria, A., García Guliany, J., Niebles Núñez, W., Palma, H. H., & Niebles Núñez, L. (2020). Data Mining Applied in School Dropout Prediction. Journal of Physics, 1432, 12092. https://doi.org/10.1088/1742- 6596/1432/1/012092
Yamao, E., Saavedra, L., Campos, R., & Huancas, V. (2018). Prediction of academic performance using data mining in first year students of peruvian university. Revista USMP - Campus, 23(26), 151–160.
Zaffar, M., Hashmani, M. A., Savita, K. S., & Rizvi, S. S. H. (2018). A Study of Feature Selection Algorithms for Predicting Students Academic Performance. International Journal of Advanced Computer Science and Applications, 9(5), 541–549. https:// doi.org/10.14569/IJACSA.2018.090569
Zárate, E., Lavado, B., & Pomahuacre, W. (2020). Competecia comunicativa intercultural y rendimiento académico en lenguas extranjeras. Revista Conrado, 16(74). https://orcid.org/0000-0002-2924- 6771
Zhang, C., & Ma, Y. (2012). Ensemble machine learning : methods and applications (Springer (ed.); 2nd ed.).
Zhi-Hua Zhou. (2012). Ensemble learning: foundations and algorithms (1st ed., Vol. 1). Chapman & Hall/CRC.