Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes

ABSTRACT

In the realm of education, higher education institutions are challenged to orchestrate a realm of quality education that births students who are not only competent, but also creative and brimming with competitiveness.The quality of Indonesian higher education institutions is reflected in the accreditation bestowed by the National Accreditation Agency for Higher Education, commonly known as BAN-PT.Expanding the horizon, the rate of graduation success stands as a pivotal gauge, shaping the evaluation that determines the quest for recognition [1], following the footsteps of regulations etched in the Appendix of National Accreditation Agency for Higher Education Regulation No. 23 of 2022.
The evaluation of graduation rates conducted so far has mostly relied on graduation registration data, often overlooking students who might be facing academic or administrative challenges.On the other hand, the university's response to students who do not graduate on time can be carried out through methods of persuasion, guidance, and mentoring, encouraging students to promptly complete their studies.
The challenge faced by universities is the absence of an integrated system for predicting student graduation.The consequences of this situation include, among others: the Academic and Student Administration Bureau cannot ensure that an entire cohort graduates on time, leading to a scenario where students find themselves without a solution to the issue of delayed graduation.
Drawing inspiration from the aforementioned issues, there arises a need for a student graduation prediction application within the university environment [2].This application encompasses comprehensive data about students grouped within a single cohort, spanning various study programs.With the advent of this application, we embark on a new chapter where student quality and university accreditation are not only monitored but also continually enhanced on the journey towards excellence.

LITERATURE REVIEW 2.1. Literature Review
Regarding the matter of predicting student graduations, numerous studies have been conducted across various universities, employing a variety of methodologies.Previous research endeavors were undertaken by Armansyah and Rakhmat Kurniawan Ramli in a study titled "A Naive Bayes Approach to Predicting Timely Graduation of Students" [3].They grappled with the challenge of declining graduation rates stemming from the disparity between incoming students and those who graduate, leading to detrimental effects on study programs across multiple aspects.They adopted an experimental approach and employed the naïve bayes method.The outcomes of this research yielded predictions of student graduation rates that showcased exceptional performance of this prediction model, reaching an accuracy of 100%.
Moving forward, let's delve into the study by Lydia Yohana Lumban Gaol, M. Safii, and Dedi Suhendro titled "Anticipating Successful Student Graduations in Stikom Tunas Bangsa's Information Systems Program Through the Implementation of the C4.5 Algorithm" [4].The puzzle they tackle stems from the crucial role that graduation plays as a vital yardstick in evaluating the accreditation of higher education institutions.As a result, when an increasing number of students graduate within the designated time frame, it directly contributes to the institution's accreditation assessment climbing higher.Their chosen methodology revolves around the deployment of the C4.5 classification algorithm, deftly sifting through both numeric and categorical attributes within the dataset.The culmination of their research opens up a treasure trove of predictive data on student graduations, underlining that the most influential factor determining student success is the GPA attribute [5].
Stepping into the next research expedition, we enter the realm of exploration crafted by Ray Mondow Sagala with the title "Unveiling Student Graduation Forecasts Through the Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes (Shilpa Mehta) ❒ P-ISSN: 2963-6086, E-ISSN: 2963-1939 K-means Algorithm in the World of Data Mining" [6].The heart of the issue arises from the significant urgency surrounding student graduation, as the common thread interweaving various courses forms an inevitable connection.K-means, used as a tool, unlocks the door to interpreting research data into a series of numbers.The final hue of this research journey is a canvas of predictive data, proving that through the stride of k = 3 out of a total of 118 manipulated data points, 13 students were found not to conclude their journey, followed by 36 students opting for the path with satisfactory grades, and finally, 69 students anchoring within the realm of stellar scores.
In the ensuing research endeavor, the ladder of knowledge is scaled by Nursetia Wati with the title "Envisioning Student Graduations by Applying the K-Nearest Neighbor Approach Based on Particle Swarm Optimization" [7].The foundation of this challenge is obstructed graduation rates, especially in the realm of the Faculty of Engineering.As this situation arises, like the rustling of leaves driven by the wind, every study program tirelessly delves into the journey to enhance the graduation rates, aiming to reach the pinnacle of desired quality.Employing the method of classifying data based on the distance from new to existing data, curiosity answers the call, and the experimental method named K-Nearest Neighbor (KNN) is chosen as the complement.The result of this research ripple, as it turns out, takes the form of notes, recording that the conducted testing yielded the best value when the K-Nearest Neighbor algorithm was applied.

Theoretical Framework
Unveiling the curtain of knowledge in the realm of data, we come across the term "data mining" as a tool to excavate intellectual treasures within the database vault.In this process, mathematics, statistical techniques, machine learning, and even artificial intelligence play pivotal roles.They collaborate, comb through data, and formulate the identification and extraction of various valuable pieces of information, as well as weighted knowledge from an array of expansive databases [8].Maulana and Fajrin also share an intriguing perspective that data mining in the realm of research is fundamentally not a novel topic.It emerges as an added-value agent capable of enhancing the effectiveness of various previously employed techniques, thus addressing an array of challenges we commonly encounter [9].
Like a ready-to-eat dish, an application is a program menu waiting to be served to execute various commands of its users.In alignment with the given commands, it artfully crafts detailed outcomes, just as desired when preparing a meal.However, an application's role doesn't merely stop at being a digital cook; it also becomes a catalyst for solving puzzles through the application of one of the various data processing recipes available.Aligned with specific hopes or goals, it transforms into a data-grinding machine that functions harmoniously according to its capabilities.Delving deeper, we encounter another perspective that defines an application as a pre-assembled machine component ready for operation by its enthusiasts [10].
Once upon a time, Al Khawarizmi, a scholar from Persia, breathed life into algorithms for the first time.Like a seed sown, initially, algorithms were used to formulate solutions for arithmetic problems.However, algorithms underwent transformations over time, assuming the role of cracking various mathematical puzzles.Delving deeper, algorithms also weave an inseparable thread with mathematics, anchoring themselves at the heart of the world of knowledge [11].Through another lens, T.S. Alasi mentioned that algorithms are a sequence of logical steps, speaking the language of order.They lead us on a journey through the forest of problems with a well-organized and systematic route [12].Prediction is a mystical endeavor that leads us to peek through the door of the future, estimating the array of possibilities that might unfold there.It's like unearthing a magical chest of past data, sculpting forecasts guided by the stars of indicators.Various challenges require the enchantment of prediction, including peeling back layers in the tale of prices, unveiling the production veil, or dissecting the secrets of graduation rates-and many more [13].
Classification is like assembling puzzle pieces of data, carefully putting them together to predict the characteristics of new data.Just like a detective grouping evidence based on the clues left behind, classification uses existing data as a foundation to guess the nature of unfamiliar data.In the realm of classification, there are two main ingredients: test data, like gems whose light is being examined, and training data, the stepping stones that guide the learning process [14].
Imagine Jupyter Notebook as an enchanting laboratory holding three magical languages: Julia, the clever wizard; Python, the versatile magician; and R, the alchemist of numbers.In its ritual, Jupyter Notebook combines the powers of these three languages into a mesmerizing interactive spectacle.Like a sorcerer turning objects into gold, this web application transforms thoughts into beautiful computational documents.Undisturbed, uncomplicated, solely focused on the magic of the document itself [15].
Python, the magical language that traverses various platforms like astral beings exploring the universe.Interactive like conversing with genies, it swiftly and gracefully responds to every command call.Its magical prowess is undeniable, comprehending human language with elegance and charm.The enchanting codes in this language will be transformed into secret codes known as byte code before the execution spell is cast.Like embarking on a journey to learn the art of magic, understanding classification and Python is the first step in mastering this mysterious world.Throughout the adventure, you will comprehend how to piece together clues from the past to unlock the gates of the future [16].

RESEARCH METHODOLOGY 3.1. Stages of Research
This research journey begins by crafting challenging questions and delving into the realm of hidden literature.Like a detective gathering clues from various angles, we gather data through observation and documentation methods, laying the foundation to unravel the existing mysteries [17].
Having completed the initial phase, we step into the next chapter, gathering the "trails" of data from the required students.Like assembling puzzle pieces, 395 sets of data from the 2018 batch of students who have completed their study journey are collected across 16 attributes.The subsequent action involves crafting and cleansing this data, akin to arranging bricks before constructing a house.Out of the 395 data points, 302 of them, complete with 14 relevant attributes, will serve as the main ingredients when the naive Bayes algorithm comes into play.This is the spotlight moment, where the naive Bayes algorithm takes center stage as the hero of this narrative.Utilizing the magic of the Python programming language, this algorithm is activated to meticulously unravel the data and provide potential hidden answers.Not dissimilar to the power of a wizard concocting magical potions [18].
As the experiments unfold, the constructed model is tested as if facing a magical trial.And behind the curtain's veil, evaluation and validation take on the leading roles.Like a sorcerer assessing the success of an incantation, we carefully evaluate the testing results and ensure their authenticity.
Thus, from scientific steps to magical performances, this research is a journey to unveil mysteries, gather truths, and carve new pathways in the realm of knowledge.

Data Collection
As a strategy to gather valuable information resources, the researcher opted for the following steps: 1. Observation Method Embark on a journey of direct observation, immersing yourself in every hidden detail of the field.The insights gained from this observation are not merely visual snapshots, but rather the core components that will paint the path forward for a dynamic system under the company's spirited umbrella [19].The author also assumes the role of an observer within the observed university campus.

Interview Method
Engaging in direct interviews through a carefully crafted set of questions, we venture into the realm of inquiry, guided towards the Bureau of Academic Administration and Student Affairs.Like an explorer unearthing treasures from conversations, we discern the pieces of information needed.

Library Method
Delve into the realm of knowledge by excavating intellectual treasures through the pages of books, scholarly journals, and the digital footprints in the vast sea of the internet.Like a mind archaeologist, we unearth valuable artifacts that fortify the foundations and outcomes of this research.

Data Analysis
The student landscape that emerges in this research paints a portrait of the students from the 2018 cohort at Universitas.They are captured through intriguing variables such as gender traits, student statuses, marital journeys, age, records of Semester Grade Point Averages (IPS) spanning from the first semester to the eighth, and also Cumulative Grade Point Averages (IPK) that provide a glimpse into the extended journey of academic achievements.Amidst this center stage of attention, the target class dances with joy, depicting the graduation destination that awaits at the end of the journey, whether it lands accurately or might encounter a minor time obstacle [20].

Pre-processing
Based on the collected research findings, some intriguing revelations come to light.A total of 395 datasets of students who have completed their academic journey are recorded.When categorized by study duration, three main groups emerge.Firstly, there are those who graduated on time within 7 semesters (3.5 years) or 8 semesters (4 years).Then, there is a group of students who surpassed these timeframes, concluding their studies in more than 8 semesters [22].
However, just as sifting for gems in the sand, not all data and information gathered can be readily utilized.An initial, creative process is necessary to refine this data, akin to tending a garden to yield better results.And, do not miss it -under the spotlight's beam, there are 16 data attributes that have not undergone this process.Behold, the list of Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes (Shilpa Mehta) ❒ P-ISSN: 2963-6086, E-ISSN: 2963-1939 treasures to be further unearthed [23].As for the preprocessing techniques employed by the author, they encompass: 1. Polishing the Data Brilliance, by sweeping away all vacant and incomplete data.For instance, the records of inactive or departed students, their data erased due to the incomplete payload of course grades.Consequently, a mere 302 usable data remain from the initial 395, implying a data cleansing of 23.54%.This process acts as a cleansing beam in the data preprocessing phase, ensuring no gaps are left behind.2. Squeezing the Data Spectrum, aims to grasp the relevant traces within records along with the suitable count of attributes engaged in the mining process.This resembles not inviting attributes like SIN and name to the gathering, deemed somewhat unrelated or less impactful.This signifies that in the swift mining dance, only a handful of attributes are embraced -such as gender, age, student status, marital status, Semester Grade Index (IPS) from semester 1 to 8, as well as Cumulative Grade Index (CGPA), and Graduation status.3. Interweaving Data: Shifting from Categorical Flair to Numeric Charm.The "gender" attribute, once entwined with the words "male" and "female," is now transformed into 0 for males and 1 for females.The "student status" attribute, previously branching into "working" and "student," is now swapped to 0 for those working and 1 for students.Next, the "marital status" attribute, previously linked to "married" and "unmarried," is ignited to 0 for those already married and 1 for those still waiting.Lastly, the "graduation" attribute, formerly narrating "on time" and "late," is Int.Transactions on Artificial Intelligence, Vol.❒ reshaped to 0 for the late ones and 1 for the punctual achievers.
After the preprocessing waltz reaches its final bow, the next act unfolds -a dance into the mining process upon the cluster of 302 student data.They all play their parts across 14 attributes that have passed through the scale harmonization and evaded the potential of missing values.Here are the intricate movements engraved on the following page:  The author harnesses the power of Jupyter software to embark on a journey of data experimentation concerning student graduation, employing the Naive Bayes methodology.Much like a digital alchemy expert, they skillfully blend information in pursuit of shimmering discoveries [24], [25].
1. Summoning the Required Library       In the course of this research involving the utilization of the Naive Bayes algorithm, various aspects have been examined and analyzed comprehensively.The evaluation and validation of these findings indicate that the Naive Bayes algorithm exhibits an impressive accuracy rate, reach ing 85% out of a total of 302 student data records.Specifically, the late submission precision value reaches 0.42, while the on-time precision stands at 0.95.Similarly, the late submission recall rate is 0.65 and the on-time recall rate is 0.88.The late submission F1-score achieves 0.51, and the on-time F1-score is 0.91.
In its application, the Naive Bayes algorithm has demonstrated the ability to predict student graduation statuses accurately, whether they are on time or delayed.This Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes (Shilpa Mehta) Int. Trans on AI P-ISSN: 2963-6086, E-ISSN: 2963-1939 ❒

❒ 4 .
Int. Transactions on Artificial Intelligence, Vol. 2, No. 1, November 2023: 60-75 Int.Trans on AI P-ISSN: 2963-6086, E-ISSN: 2963-1939 RESULTS AND DISCUSSION 4.1.Data CollectionLike tracing the footsteps of an adventure, the process of data collection brings its narrative to life in the field and from the expanse of the observed university's website.Much like painting piece by piece of a puzzle, data is meticulously gathered and poured into Microsoft Excel files in the xlsx format.As if weaving the tales of the 2018 cohort that has traversed their academic journey, this data arrives in a total of 395 entities woven with 16 unique attributes.As if presenting a beautiful painting, examples of the showcased data can be found within mysterious images under the sunlight[21].

Figure 6 .
Figure 6.Command to Invoke a Library 2. Reading student graduation data from an Excel file

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Command to Read Excel Data 3.The displayed results of the data are as follows:

Figure 22 .
Figure 22.Printing Accuracy Score With the staged report of classification results as follows, unveiling the grand spectacle where 88% of students successfully complete their studies on time: