AI accurately predicts depression risk using lifestyle and health data

Interestingly, the study also identified a counterintuitive negative association between heavy alcohol consumption and depression, as well as between higher BMI and depression risk, challenging long-standing assumptions. These findings point to a need for more nuanced interpretations of such variables, especially when analyzed in context through explainable AI frameworks.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 03-06-2025 18:19 IST | Created: 03-06-2025 18:19 IST
AI accurately predicts depression risk using lifestyle and health data
Representative Image. Credit: ChatGPT

Machine learning models are showing unprecedented potential in detecting depression by analyzing patterns in social, demographic, and health data. A new study titled “Explainable Machine Learning in the Prediction of Depression”, published in Diagnostics, evaluates how four AI models can identify high-risk individuals based on lifestyle and health characteristics. The research, conducted in the diverse region of Thrace in northeastern Greece, used explainable AI to pinpoint the most influential factors driving depression risk in adults.

The study employed logistic regression (LR), support vector machines (SVM), neural networks (NN), and XGBoost classifiers. Among them, XGBoost significantly outperformed its peers, achieving nearly 98% accuracy in identifying individuals with depression. The team used genetic algorithms for feature selection and SHAP (SHapley Additive exPlanations) to provide interpretability to the model’s predictions, ensuring transparency in AI-assisted diagnostics.

What are the key predictors of depression identified by machine learning?

The researchers analyzed data from 1,227 adults aged 19 to 76, selected through stratified random sampling across urban and rural areas. Depression status was determined using the Beck Depression Inventory. The study uncovered that 28.7% of participants suffered from depressive symptoms, with prevalence disproportionately higher among older individuals, rural residents, divorced individuals, the unemployed, and those with low income or education.

After applying genetic algorithms to select the top 15 predictive features, SHAP analysis highlighted that anxiety was the most influential factor, followed by education level, alcohol consumption, body mass index (BMI), and coffee intake. Notably, high anxiety scores strongly predicted depression, consistent with global literature linking anxiety disorders to depressive conditions.

Other high-risk indicators included chronic disease presence, unemployment, high coffee consumption (over four cups daily), and short sleep duration. Conversely, protective factors identified by the AI model included higher education levels, female gender, high income, rural residence, and longer sleep durations.

Interestingly, the study also identified a counterintuitive negative association between heavy alcohol consumption and depression, as well as between higher BMI and depression risk, challenging long-standing assumptions. These findings point to a need for more nuanced interpretations of such variables, especially when analyzed in context through explainable AI frameworks.

How well do machine learning models predict depression in diverse populations?

Using a 70/30 train-test split with 10-fold cross-validation, the XGBoost model demonstrated a 97.83% accuracy rate, followed closely by neural networks at 97.02%. The XGBoost model also delivered a 98.96% sensitivity and 97.44% specificity, outperforming traditional methods like logistic regression, which only achieved 79.95% accuracy.

These performance metrics underscore XGBoost’s ability to handle complex, non-linear interactions among depression risk factors. It effectively processed categorical variables such as marital status, residence, and health metrics while preserving prediction reliability. SHAP plots visually demonstrated how each factor individually pushed a person toward or away from a positive depression prediction, reinforcing the importance of transparency and interpretability in AI diagnostics.

The study emphasized that while neural networks excel in pattern recognition, XGBoost offers a better balance between performance and computational efficiency, especially when analyzing structured health data in moderate-size datasets like this one. The researchers recommend further external validation to ensure generalizability across other cultural contexts and populations.

Moreover, XGBoost’s compatibility with SHAP explainability methods enabled not just accurate classification but also meaningful insights into the relative importance and directionality of each risk factor - a critical step toward clinical acceptance of AI tools.

What are the broader implications for AI in mental health diagnostics?

This research represents a significant advance in the application of machine learning to mental health, especially in regions with limited psychiatric infrastructure. By identifying high-risk individuals based on easily measurable environmental and behavioral features, the study opens the door to early, targeted interventions.

The findings have practical implications for designing personalized public health strategies. For instance, screening programs in rural areas or among unemployed populations can be enhanced using predictive algorithms informed by the study’s model. Additionally, mental health platforms and mobile applications can integrate these models for proactive mental wellness tracking, especially where stigma or access issues prevent timely clinical intervention.

The research also adds to the growing body of literature supporting the value of AI in mental health. Previous studies have explored depression detection using EEG, speech patterns, or social media content. This study contributes by combining traditional survey data with state-of-the-art explainable machine learning, offering a scalable and low-cost model applicable in real-world settings.

Despite its strengths, the study acknowledges limitations. Being cross-sectional, it cannot establish causality. The reliance on self-reported data introduces potential biases, and results may not fully generalize to other populations without further testing. However, the authors argue that the model’s high performance and explainability make it a strong candidate for integration into clinical decision support systems and workplace wellness tools.

The study also calls for future research into multimodal AI models that incorporate genetic, imaging, and behavioral data. Larger, more diverse training datasets, paired with rigorous external validation and regulatory oversight, are essential to translate these AI models into trusted clinical tools.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback