Interpretable Machine Learning Framework for Early Depression Detection Using Socio-Demographic Features with Dual Feature Selection and SMOTE
Abstract
Depression is the most widespread psychological disorder globally, impacting individuals across all age groups; when left undiagnosed or untreated, it significantly elevates the risk of severe outcomes, including suicidality. This study explores the efficacy of eight machine learning (ML) classifiers utilizing socio-demographic and psychosocial data to discern signs of depression. A depression dataset available on GitHub was acquired, comprising 604 instances with 30 predictors and 1 target variable indicating depression status. Preprocessing included normalization, handling missing values, and encoding categorical variables. Two feature selection methodologies, Analysis of Variance (ANOVA) and Boruta were employed to extract pertinent features. ANOVA selected 19 features, while Boruta retained 13 for model training. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was utilized to enhance prediction accuracy (ACC). Results demonstrate that Logistic Regression (LR), combined with ANOVA feature selection, exhibits superior performance, achieving an ACC of 92.56% and an AUC of 92.69%. With Boruta, LR achieved an ACC of 91.74% and an AUC of 91.65%. Without feature selection, LR yielded an ACC of 87.75%, a precision of 91.73%, and an AUC of 89.98%. SHapley Additive exPlanations (SHAP) analysis revealed that anxiety (ANXI) is the most influential predictor within the ML model designed for depression prediction. This study identifies the most effective model for predicting depression through evaluation metrics, while also addressing societal biases and supporting clinicians with interpretable insights for early intervention.DOI:
https://doi.org/10.31449/inf.v49i4.10245Downloads
Published
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







