Machine Learning Models for the Development of a Probabilistic Screening Tool for Polycystic Ovary Syndrome

Authors

DOI:

https://doi.org/10.47203/IJCH.2025.v37i02.027

Keywords:

Polycystic Ovary Syndrome, Screening, Machine Learning, Decision Tree, Naive Bayes

Abstract

Background: Polycystic ovary syndrome (PCOS) is a common hormonal disorder in women of reproductive age that can lead to infertility and other long-term health problems. Early detection using simple, non-invasive tools is important to support timely intervention and improve outcomes. Objective: The study aimed to compare the performance of decision tree and naive Bayes models in predicting the likelihood of PCOS using non-invasive clinical features. Methodology: The study included 100 diagnosed cases of PCOS and 100 controls based on ultrasonographic findings. Clinical and lifestyle information was collected through a structured questionnaire. The models were evaluated using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve. Five-fold cross-validation was used for validation. Results The decision tree model had high training accuracy but lower test accuracy, indicating overfitting. The naive Bayes model showed more consistent performance with 81 percent test accuracy and an F1 score of 0.81. Conclusion: The naive Bayes model shows promise as a simple, non-invasive screening tool for early identification of PCOS, particularly in primary care and low-resource settings.

Downloads

Download data is not yet available.

References

Teede HJ, Misso ML, Costello MF, et al. Recommendations from the international evidence-based guideline for the assessment and management of polycystic ovary syndrome. Hum Reprod. 2018;33(9):1602–1618.

Escobar-Morreale HF. Polycystic ovary syndrome: definition, aetiology, diagnosis and treatment. Nat Rev Endocrinol. 2018;14(5):270–284.

Zhang J, Li M, Chen Q, et al. Predictive models for polycystic ovary syndrome based on machine learning techniques. J Biomed Inform. 2019;93:103149.

Nandi A, Chen Z, Patel R, et al. Machine learning-based identification of clinical phenotypes and biomarkers for PCOS diagnosis using non-invasive features. J Clin Endocrinol Metab. 2020;105(5):1728–1736.

Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106.

Lewis DD. Naive (Bayes) at forty: The independence assumption in information retrieval. In: European Conference on Machine Learning. Springer; 1998:4–15.

Hajian-Tilaki K. Sample size estimation in diagnostic test studies of biomedical informatics. J Biomed Inform. 2014;48:193–204.

Jin Z, Wang Y, Zhang X, et al. Role of machine learning in the diagnosis and prediction of reproductive disorders: A comprehensive review. Front Endocrinol (Lausanne). 2021;12:697962.

Roy KK, Kumar N, Saxena A, et al. Predictive model for PCOS using machine learning techniques with non-invasive data. J Obstet Gynaecol Res. 2021;47(2):760–766.

Liang B, Liu X, Wang Y, et al. A machine learning model for polycystic ovary syndrome diagnosis based on phenotypic and genetic data. Endocr Connect. 2021;10(7):817–827.

Downloads

Published

2025-04-30

How to Cite

1.
Narni H, Ananthasetty VR, Jilani S, Sailaja PS. Machine Learning Models for the Development of a Probabilistic Screening Tool for Polycystic Ovary Syndrome. Indian Journal of Community Health [Internet]. 2025 Apr. 30 [cited 2025 Jul. 9];37(2):339-42. Available from: https://iapsmupuk.org/journal/index.php/IJCH/article/view/3138

Issue

Section

Short Article

Dimensions Badge