Student Performance Factors Analysis

Introduction

This project explores the “Student Performance Factors” dataset, publicly available on Kaggle. This dataset collects individual-level student information, integrating academic, family, socioeconomic, and behavioral variables. The goal is to analyze factors associated with academic performance as measured by the score obtained in final evaluations.

The dataset includes variables related to study habits, attendance, access to educational resources, motivation level, family environment characteristics, school type, and other relevant determinants of school performance. This diversity of variables allows for a multivariate approach to academic performance, recognizing that educational outcomes depend not only on individual factors but also on structural and contextual conditions.

From an applied perspective, this dataset is particularly relevant for educational studies, facilitating the identification of patterns associated with academic performance and the segmentation of students according to performance profiles. In educational policy and school support program contexts, this type of analysis is fundamental to guide targeted interventions, optimize resource allocation, and strengthen strategies for improving learning.

Summary of Findings

  • Data Quality: The dataset is high quality with minimal cleaning required.
  • Best Model: Linear Regression is the most effective (R2 ~0.825), showing that factors like attendance and study hours have a strong linear relationship with performance.
  • Segmentation: K-Means identified distinct student profiles, which can help in designing targeted educational interventions.
  • Reflection: Machine Learning provides powerful tools for early intervention in education, shifting from reacting to failure to preventing it.

Score Prediction Feature

Use the sliders to estimate a student’s performance based on their habits and environment.

-- predicted score
Move a slider to evaluate.

Move the sliders to update the predicted score:

Impact: +0.00 pts
Impact: +0.00 pts
Impact: +0.00 pts
Impact: +0.00 pts
Impact: +0.00 pts
Impact: +0.00 pts

Project Sections

  1. Data Preparation: Data loading, cleaning, and initial exploration.
  2. Exploratory Data Analysis: Univariate and bivariate analysis of the variables.
  3. Modeling (Supervised & Unsupervised): Prediction of scores and student segmentation.
  4. Conclusions: Summary and final reflections.