Traditional sleep questionnaires have long been regarded as gold standards for assessing various aspects of sleep disorders [1,2]. Despite their effectiveness, these tools are often criticized for being too lengthy and time-consuming, which can lead to reduced patient compliance and data quality. This is particularly problematic in busy clinical settings and large-Recent studies have demonstrated that machine learning algorithms, such as eXtreme Gradient Boosting (XGBoost) and Random Forest, can be leveraged to identify a subset of items from traditional questionnaires that provide high predictive power for total scores and clinical classifications [3-8]. For example, Jo et al. [3] developed a data-driven shortened version of the Dysfunctional Beliefs and Attitudes about Sleep (DBAS)-16, called the DBAS-6, using exploratory factor analysis (EFA) and XGBoost. The DBAS-6, which consists of just six items, achieved an R² value of 0.90 for predicting the DBAS-16 total score, making it a highly efficient tool for clinical settings. Similarly, Lee et al. [5] applied the random forest algorithm to create two shortened versions of the Metacognitions Questionnaire-Insomnia (MCQ-I)—MCQI-6 and MCQI-14. The six-item version (MCQI-6) showed a high area under the receiver operating characteristic curve (AUROC>0.97), demonstrating its capacity to distinguish individuals with clinically significant insomnia from those without.
scale research studies. Given these challenges, a growing body of research has explored the potential of machine learning to develop shortened versions of these questionnaires without compromising their psychometric properties. I believe that these innovative approaches mark a significant advancement in sleep disorder assessment and could pave the way for more efficient and scalable clinical and research practices.
One of the main advantages of using machine learning for questionnaire reduction is the preservation of psychometric properties. Traditional methods for shortening questionnaires often rely on classical test theory or principal component analysis, which may not fully capture complex interactions among items. Machine learning algorithms, on the other hand, allow for the selection of items based on their importance in predicting the total score or classification outcome, thereby ensuring that the shortened questionnaire retains its predictive validity. For example, Jo et al. [4] utilized both EFA and XGBoost to develop the Insomnia Severity Index (ISI)-3m, a three-item version of the ISI. This shortened version outperformed several previously developed shortened versions of the ISI, achieving an R² value of 0.91 and an accuracy of 0.965 for classifying incomnia severity levels.
However, despite these clear benefits, machine learning-based shortened questionnaires often face practical challenges in clinical settings. Integrating machine learning methods into existing medical systems can be complex and resource-intensive. Additionally, the application of these methods typically requires extensive training or the involvement of specialized professionals, both of which are costly and time-consuming. Moreover, these machine learning-based questionnaires are often perceived as ‘black boxes,’ making them difficult for medical professionals to understand and trust. To tackle these problems and align with the principles of explainable artificial intelligence, new methodologies have been developed. For instance, Xie et al. [9] introduced AutoScore, an automatic clinical score generator that combines machine learning with regression modeling. AutoScore uses a random forest algorithm to select key questions from the original questionnaire and then groups responses to form logistic models that predict risk scores. This simple conversion of model coefficients to response weights results in a user-friendly, shortened questionnaire. However, AutoScore’s manual response grouping introduces subjectivity, and it lacks monotonicity constraints, limiting its clinical interpretability. To address these issues, Cawiding et al. [10] developed Symscore. SymScore automates response grouping, enforces monotonicity, and enhances flexibility, offering a more robust and interpretable solution.
In conclusion, the application of machine learning to shorten sleep questionnaires is a promising development that could revolutionize the field of sleep medicine. By reducing the burden of lengthy assessments on both patients and clinicians, these shortened tools can improve compliance and data quality, ultimately leading to better diagnosis and treatment of sleep disorders. However, to realize their full potential, further research and innovation are needed to address the current limitations of integrating these models into clinical practice. With continued advancements in machine learning, we can expect to see more efficient and effective tools for assessing a wide range of psychological and medical conditions.