Background: Coronary artery disease (CAD) comprises one of the leading causes of morbidity and mortality both in the European population and globally. All established clinical risk stratification scores and models require blood lipids and physical measurements. The latest reports of the European Commission suggest that attracting health professionals to collect these data can be challenging, both from a logistic and cost perspective, which limits the usefulness of established models and makes them unsuitable for population-wide screening in resource-limited settings, i.e., rural areas. Therefore, the aim of this study was to develop and externally validate a questionnaire-based risk stratification model on a population scale at minimal cost, i.e., the Questionnaire-Based Evaluation for Estimating Coronary Artery Disease (QUES-CAD) to stratify the 10-year incidence of coronary artery disease.
Methods: Cox proportional hazards (CoxPH) and Cox gradient boosting (CoxGBT) models were trained with 10-fold cross-validation using combinations of ten questionnaire variables on the White population of the UK Biobank (n = 448,818) and internally validated the models in all ethnic minorities (n = 27,433). The Lifelines cohort was employed as an independent external validation population (n = 97,770). Additionally, we compared QUES-CAD's performance, containing only questionnaire variables, to clinically established risk prediction tools, i.e., Framingham Coronary Heart Disease Risk Score, American College of Cardiology/American Heart Association pooled cohort equation, World Health Organization cardiovascular disease risk charts, and Systematic Coronary Risk Estimation 2 (SCORE2). We conducted partial log-likelihood ratio (PLR) tests and C-index comparisons between QUES-CAD and established clinical prediction models.
Findings: In the external validation set, QUES-CAD exhibited C-index values of CoxPH: 0.692 (95% Confidence Interval [CI]: 0.673-0.71) and CoxGBT: 0.699 (95% CI: 0.681-0.717) for the male population and CoxPH: 0.771 (95% CI: 0.748-0.794) and CoxGBT: 0.759 (95% CI: 0.736-0.783) for the female population. The addition of measurement-based variables and variables that require a prior medical examination (i.e., insulin use, number of treatments/medications taken, prevalent cardiovascular disease [other than CAD, and stroke diagnosed by a doctor]) and the further addition of biomarkers/other measurements (i.e., high-density lipoprotein [HDL] cholesterol, total cholesterol, and glycated haemoglobin) did not significantly improve QUES-CAD's performance in most instances. C-index comparisons and PLR tests showed that QUES-CAD performs and fits the data at least as well as the clinical prediction models.
Interpretation: QUES-CAD performs comparably to established clinical prediction models and enables a population-wide identification of high-risk individuals for CAD. The model developed and validated herein relies solely on ten questionnaire variables, overcoming the limitations of existing models that depend on physical measurements or biomarkers.
Funding: University Medical Center Groningen.
Keywords: Coronary artery disease; Data-driven prediction; Discriminative abilities; Machine learning; Population screening; Risk stratification.