A Machine Learning Prediction Model to a Scholarship Program for New Undergraduate Students at a Private University


  • Gideon Budiyanto Parahyangan Catholic University
  • Dedy Suryadi Parahyangan Catholic University




machine learning, scholarship program, private higher education, prediction model


Competition in the higher education, especially private higher education (PTS) in the digital era, is becoming increasingly tough. In order to achieve the number of prospective new students, various methods are used so that the target for admitting the number of new students can be achieved in each new academic year. Providing a scholarship program is one way to attract the prospective new students. The awarding of a scholarship program must consider various possibilities such as the seriousness or commitment of the prospective new student. Refusal to grant scholarship programs can occur and become an obstacle for achieving the target. The prediction model through machine learning using some variables such as high school’s name, high school “category”, province or area of high school located, focus of specialization in high school, high school’s grade, type of parents income, and selected major of study in higher education. All of those variables will provides the probability values that will become an indicator that can be used to prioritize requests for scholarship program applications by taking into account the factors of acceptance or rejection from prospective students. Currently there is no measurement with accuracy of acceptance or rejection from prospective students. The purpose of this research is to build and compare machine learning models such as Logistic Regression, Artificial Neural Networks, Support Vector Machines, Decision Trees, Naïve Bayes, and K Nearest Neighbors so that a machine learning model is obtained that has the best predictions for awarding scholarship programs. The result of this research is that the Logistic Regression model has the highest model average accuracy value (62,05%) from training data compared to others. The highest accuracy of Logistic Regression model (62,29%) achieved based on the testing data. The highest AUC value (0,818) generated by Logistic Regression model which means the model is able to do the classification categorized “Good Classification” compare to other models.

Author Biographies

Gideon Budiyanto, Parahyangan Catholic University

Industrial Eningeering Department

Dedy Suryadi, Parahyangan Catholic University

Industrial Eningeering Department


Aggarwal, C.C. (2015), Data Classification Algorithms and Applications, CRC Press Taylor & Francis Group, Watson Research Center Yorktown Heights, New York, USA.

Ahmed, D.M., Abdulazeez, A.M., Zeebaree, D.Q., dan Ahmed, F.Y.H., (2021), Predicting University's Students Performance Based on Machine Learning Techniques, IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), Malaysia, June 26.

Alaka, B.O., (2017). A Dimensional student enrollment prediction model: case of Strathmore University, Master Degree Thesis, Strathmore University.

Aulck, L., Nambi, D., dan West, J., (2020), Increasing Enrollment by Optimizing Scholarship Allocations Using Machine Learning and Genetic Algorithms, Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020).

Basheer, M.Y.I., Mutalib, S., Hamid, N.H.A., Rahman, S.A., dan Malik, A., (2019). Predictive analytics of university student intake using supervised methods, IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 8, No. 4, December 2019, pp. 367~374.

Berens, J., Schneider, K., Görtz, S., Oster, S., dan Burghoff, J., (2019), Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods, Journal of Educational Data Mining, Vol. 11, No. 3.

Cardona, T.A., dan Cudney, E.A., (2019). Predicting Student Retention Using Support Vector Machines, 25th International Conference on Production Research Manufacturing Innovation: Cyber Physical Manufacturing August 9-14, 2019, Chicago, Illinois (USA).

Chawla, N.V., Bowyer, K.W., Hall, L.O., dan Kegelmeyer, W.P., (2002), “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, 321-357, 2002.

Delima, A.J.P. , (2019). Predicting Scholarship Grants Using Data Mining Techniques, International Journal of Machine Learning and Computing, Vol.9, No.4.

Fernandes, E., Holanda M., Marcio Victorinom M., Borges, V., Carvalho, R., dan Erven, G., (2018). Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil, Elsevier Journal of Business Research

Gorunescu, F. (2011). “Data Mining : Concepts, Models and Techniques”, Springer-Verlag Berlin Heidelberg

Hamers, Y., (2017). Predicting student enrollment Logistic regression on attended marketing events, Master Degree Thesis, Tilburg University.

Han, J., Kamber, M., dan Pei, J., (2015), Data Mining Concepts and Techniques. 3rd ed. The Morgan Kaufmann series in data management systems.

Harani, N.H., dan Prianto, C., (2020). Penerapan algoritma Adaboost guna menentukan pola masuknya calon mahasiswa. Journal Transformtika, Vol.18, No.1, July 2020, pp. 123 – 132

Herlina, N. (2021).”Ditjen Diktiristek Akselerasi Program Penggabungan atau Penyatuan PTS”. (https://dikti.kemdikbud.go.id/kabar-dikti/kabar/ditjen-diktiristek-akselerasi-program-penggabungan-atau-penyatuan-pts, diakses 15 Oktober 2022).

Hosmer, D.W., Lemeshow, S., dan Sturdivant, R.X. (2013), Applied Logistic Regression. 3rd ed. John Wiley & Sons, Inc., Hoboken, New Jersey

Indrawati, A., Subagyo, H., Sihombing, A., dan Afandi, S. (2020), Analyzing The Impact Of Resampling Methode For Imbalanced Data Text In Indonesian Scientific Articles Categorization, BACA: Jurnal Dokumentasi Dan Informasi, baca.v41i2.563

Kanadpriya, B., Treena, B., Buckmire, R., dan Nishu, L., (2019), Predictive Models of Student College Commitment Decisions Using Machine Learning, MDPI Journal Data. 2019, 4, 65.

Kovacic, Z.J., (2010). Predicting student success by mining enrolment data, Proceedings of Informing Science & IT Education Conference (InSITE), 19-24 June 2010, Cassino, Italy.

Nakhkob, B., dan Khadem, M., (2015). Predicted Increase Enrollment in Higher Education Using Neural Networks and Data Mining Techniques, Journal of Computer Research and Development.

PDDikti Kementerian Riset, Teknologi, dan Pendidikan Tinggi Republik Indonesia, (2015 ~ 2020). “Statistik Pendidikan Tinggi2015~2020”.(https://pddikti.kemdikbud.go.id, diakses Juli 2022)

Pedregosa, F., Varoquaux, G. dan Gramfort, A. (2011),”Scikit-learn: Machine Learning in Python“,(https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.Grid SearchCV.html, diakses 1 Maret 2023).

Ploutz, E.C., (2018). Machine Learning Applications in Graduation Prediction at the University of Nevada, Las Vegas, Master Degree Thesis, University of Nevada.

Slim, A., Hush, D., & Ojah, T., dan Babbitt, T., (2018), Predicting Student Enrollment Based on Student and College Characteristics, Proceedings of the 11th International Conference on Educational Data Mining, July 15-18, 2018, Buffalo, NY USA

Tomasevic, N., Gvozdenovic, N., dan Vranes, S., (2019), An Overview And Comparison Of Supervised Data Mining Techniques For Student Exam Performance Prediction. Elsevier Journal.

Trusheim, D., dan Rylee, C. (2011), Predictive modeling: linking enrollment and budgeting. Planning for Higher Education, 40(1):12, 2011.

Tukur, M.A., Abubakar, L.A., dan Sayuti, O.A., (2019), “Marketing Mix and Students Enrolment in Private Universities in Kwara State Nigeria”. Makerere Journal of Higher Education.

Witten, I.H., Frank, E., Hall, M.A. (2011). Data Mining : Practical Machine Learning Tools and Techniques, 3rd ed The Morgan Kaufmann series in data management systems

Yagci, M. (2022). Educational data mining: prediction of students’ academic performance using machine learning algorithms, Springer Open Journals.