Predicting Diabetes Using Health & Socioeconomic Indicators
- Tejaswi Rupa Neelapu
- Apr 20
- 1 min read
Tags: Machine Learning | Python | Logistic Regression | Scikit-learn
Link: GitHub
🔍 Overview
Used CDC’s BRFSS 2015 dataset to build a predictive model for early detection of diabetes based on lifestyle, demographic, and health metrics. The goal was to flag high-risk individuals before clinical onset.
❓ Key Questions
Can diabetes risk be predicted using demographic and behavioral indicators?
Which features most influence diabetes prediction?
🧪 Methodology
Cleaned 250K records, removed duplicates, engineered key features
Compared Logistic Regression, Decision Tree, and Quadratic classifiers
Evaluated using F1-score, recall, and precision
📈 Results
Best F1-score: 90.6% using Logistic Regression
Key predictors: BMI, blood pressure, cholesterol levels
Comments