Prediction of type 2 diabetes mellitus onset using logistic regression-based scoreboards ======================================================================================== * Yochai Edlitz * Eran Segal ## Abstract Type 2 diabetes mellitus (T2DM) accounts for ∼90% of all cases of diabetes which are estimated with an annual world death rate of 1.6 million in 2016. Early detection of T2D high-risk patients can reduce the incidence of the disease through a change in lifestyle, diet, or medication. Since populations of lower socio-demographic status are more susceptible to T2D and might have limited resources for laboratory testing, there is a need for accurate yet accessible prediction models based on non-laboratory parameters. This paper introduces one non-laboratory model which is highly accessible to the general population and one highly precise yet simple laboratory model. Both models are provided as an accessible scoreboard form and also as a logistic regression model. We based the models on data from 44,879 non-diabetic, UK Biobank participants, aged 40-65, predicting the risk of T2D onset within the next 7.3 years (SD 2.3). The non-laboratory prediction model for T2DM onset probability incorporated the following covariates: sex, age, weight, height, waist, hips-circumferences, waist-to-hip Ratio (WHR) and Body-Mass Index (BMI). This logistic regression model achieved an Area Under the Receiver Operating Curve (auROC) of 0.82 (0.79-0.85 95% CI) and an odds ratio (OR) between the upper and lower prevalence deciles of x77 (28-98). We further analysed the contribution of laboratory-based parameters and devised a blood-test model based on just five blood tests. In this model, we included age, sex, Glycated Hemoglobin (HbA1c%), reticulocyte count, Gamma Glutamyl-Transferase, Triglycerides, and HDL cholesterol to predict T2D onset. This logistic-regression model achieved an auROC of 0.89 (0.86-0.91) and a deciles’ OR of x87 (27-152). Using the scoreboard results, the Anthropometrics model classified three risk groups, a group with 1%(1-2%); a group with 9% (7-11%) probability, and a group with a 15% (7-23%) risk of developing T2D. The Five blood tests scoreboard model, further classified into four risk groups: 0.9% (0.7%-1%); 8%(6-11%); 18%(14-22%) and a high risk group of 38%(23-54%) of developing T2D. We analysed several more comprehensive models which included genotyping data and other environmental factors and found that it did not provide cost efficient benefits over the five blood tests model. The Five blood tests and anthropometric models, both in their logistic regression form and scoreboard form, outperform the commonly used non-laboratory models, the Finnish Diabetes Risk Score (FINDRISC) and the German Diabetes Risk Score (GDRS). When trained using our data, the FINDRISC achieved an auROC of 0.75 (0.71-0.78), and the GDRS auROC resulted in 0.58 (0.54-0.62), respectively. ## 1. Introduction Diabetes mellitus is defined as a group of diseases characterised by symptoms of chronic hyperglycemia. It is becoming one of the world’s most challenging epidemics. The prevalence of T2D has increased from 4.7% in 1980 to 8.5% in 2014. An estimated 1.6 million deaths were directly caused by diabetes in 2016. T2D is generally characterised by insulin resistance, resulting in hyperglycemia, and it accounts for ∼90% of all diabetes cases 1,2. In recent years, the prevalence of diabetes has been rising more rapidly in low and middle-income countries (LMICs) than in high-income countries3. In 2014 Beagley et al. estimated that 45.8% or 174.8 million of all diabetes cases in adults are undiagnosed. 83.8% of all cases of undiagnosed diabetes are in LMICs 4, where laboratory diagnosis testing is limited for some of the populations in these countries5. According to several studies, a healthy diet, regular physical activity, maintaining normal body weight and avoiding tobacco use can prevent or delay T2D onset 3,6,7,8,9. A screening tool that can identify individuals at risk will enable a lifestyle or medication intervention. Ideally, such a screening tool should be accurate, simple and low cost. It should also be easily available, allowing populations who have difficulty accessing laboratories to be screened by other means. Several such tools are in use today 10,11,12. The Finnish Diabetes Risk Score (FINDRISC), a commonly used, non-invasive T2D risk-score model, estimates patients aged between 35 and 64 developing T2D within the next ten years. FINDRISC was created based on a prospective cohort of 4,746 and 4,615 individuals in Finland in 1987 and 1992. The FINDRISC model uses gender, age, Body Mass Index (BMI), blood pressure medications, a history of high blood glucose, physical activity, daily consumption of fruits, berries, or vegetables and family history of Diabetes as the parameters for the model. The FINDRISC might be used as a scoreboard model or a logistic regression (LR) model 13,14,15. Another commonly used prediction model is the German Diabetes Risk Score (GDRS), which estimates a five-year risk for developing T2D. The GDRS is based on 9,729 men and 15,438 women aged 35-65 years from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study 16. The GDRS is a Cox regression model based on age, height, waist circumference, the prevalence of hypertension (yes/no), smoking behaviour, physical activity, moderate alcohol consumption, coffee consumption, intake of whole-grain bread, intake of red meat, and parent and sibling history of T2D 17,18. This model, too, can be assigned as an accessible scoreboard model. The objective of the present research was to develop clinically usable models which are easy to use and highly predictive of T2D onset. We developed two simple models and compared their predictive power to the highly esteemed FINDRISC and GDRS as our baseline. We trained all models on a training data set and tested them on the holdout data set, taken from the UK Biobank (UKB) observational study cohort. We based one of the models on easily accessible anthropometric measures and the other on an invasive laboratory test using only five blood samples.As our models were trained and evaluated using the UKB database, they are most relevant for the U.K. population aged 40-65. Still, they can also be used for people similar to our research cohort (as presented in Table 1) and might be adapted to additional populations. Both models are given both in their logistic regression form and as accessible scoreboards. View this table: [Table 1](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/T1) Table 1 Cohort statistical data ## 2. Results We analysed the data of 20,346 participants from the UK Biobank’s (UKB) cohort who revisited the UKB assessment centers during 2012-2013 and 48,705 participants who revisited the centers from 2014 onwards (see Figure 1 and Methods). During the screening process of our cohort, we kept the data of the participants who returned for a second or third visit, tested negative for T2D and were not treated for T2D. The final cohort sample included data of 44,873 participants, of whom 2.16% developed T2D during a follow-up period of 7.3±2.3 years (see Table 1, Figure 1A and Methods). ![Figure 1](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/03/2020.08.02.20165092/F1.medium.gif) [Figure 1](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/F1) Figure 1 A flow chart of the cohort selection process and an illustrative figure of the model’s extraction **A**. A flowchart which demonstrates the selection process of participants in this study. Participants who came for a repeated second or third visit were selected from the 502,536 participants of the UKB. Next, we excluded 1,652 participants who self-reported having T2D. We then split the data into 80% of the training and validation set, and 20% holdout test set. We excluded an additional 2,115 participants due to (1) having 25% or more missing values from the full feature list, (2) having HbA1c levels above or equal to 6.5%, or (3) being treated with Metformin or Insulin. Finally, the training set included 25,122 participants (56% of the cohort), the validation set included 10,757 (24% of the cohort), and the test set included a total of 8,994 participants (20% of the cohort). **B**. Process flow during training and testing of the models. We first split the data and kept a holdout test set. We later explored several models using the training and validation data sets. In the final stage, we compared the selected models using the holdout test set and reported the results. The output of the models is calibrated to predict the probability of a participant to develop T2D. Before training the models, we partitioned our data into training, validation, and holdout test sets to avoid overfitting. The training dataset consisted of 25,122 participants, and the validation dataset included 10,757 participants. We explored the training and validation datasets to select the optimal features for our models. We used the holdout test set, which included 8,994 participants, to report the final models’ results (see Figure 1S and Methods). ### 2.1 Anthropometric based model To provide an accessible, simple, non-laboratory and non-invasive T2D prediction model, we built anthropometric based ascoreboard model where a patient can easily mark its result in each of the scoreboard questions, consisting of the following eight parameters: age, sex, weight, height, hip circumference, waist circumference, body mass index (BMI) and the waist-to-hip ratio (WHR) (Figure 2A). The patient then sums up its final score which relates to one of three risk groups first group score range [1-70] has a 1% [1-2% 95% CI] of developing T2D; Second group, score range 71-83 predicts a 9%[7-11% 95%CI] of developing T2D; Third group 84-92 15% [7-23% 95%CI] of developing T2D (Figure 2C). ![](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/03/2020.08.02.20165092/F2/graphic-4.medium.gif) [](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/F2/graphic-4) ![](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/03/2020.08.02.20165092/F2/graphic-5.medium.gif) [](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/F2/graphic-5) Figure 2 Anthropometrics and Blood tests scoreboards A) Anthropometrics based scoreboard. Scoreboard, summing the scores of the various features provides a final score that is quantified into one of three risk groups. B) Five blood tests scoreboard. Summing the scores of the various features provides a final score that is quantified into one of four risk groups (Figure 2D). C) Anthropometrics scoreboards risk groups -first group score range [1-70] 1% [1-2%]95%CI of developing T2D; Second group, score range 71-83 predicts a 9% [7-11%] 95%CI of developing T2D.; Third group 84-92 15% [7-23%]95%CI of developing T2D. D) Five blood tests scoreboards risk groups -first group score range [1-114] <1% [0.7-1%]95%CI of developing T2D; Second group, score range 115-125 predicts an 8% [6-11%]95%CI of developing T2D.; Third group 126-145 18% [14-22%] 95%CI of developing T2D. Fourth group 146-156 predicts 38% [23-54%] 95%CI of developing T2D. We also provide a similar model in its logistic regression form for more accurate computer aided results. Testing this model using the holdout test set, the logistic regression form of this model achieved an area under the receiver operating curve (auROC) of 0.82 (0.79-0.85) and an average precision score (APS) of 0.12 (0.09-0.16) at 95% CI). Using the model in its scoreboard form, we achieved an auROC of 0.81 (0.78-0.84) and an APS of 0.09 (0.07-0.12). Both model’s forms outperformed the two models which we used as a reference: the FINDRISC model, which has an auROC of 0.75 (0.71-0.78) and an APS of 0.07 (0.05-0.09), and the GDRS model, which has an auROC of 0.58 (0.54-0.62) and APS of 0.03 (0.02-0.04), see Figure 3A-B and Methods. With the cohort’s baseline prevalence of 2.17%, the LR anthropometric model achieved deciles’ OR of x77 (27.7-98.1), and its scoreboard form achieved deciles OR of x61 (17.7-101) compared to the FINDRISC x23 (6.80-70.4) and the x4.1 (1.75-9.24), see Figure 3C and Table 2. Analysing the models’ feature importance, the WHR and the BMI have the highest predictability in the anthropometric model due to their highest logs-odds-ratio (β) (Figure 3D). These two body habitus measures are commonly mentioned in the literature as indicators associated with chronic illness 19,20,21,22. View this table: [Table 2](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/T2) Table 2 Comparing models main results ![Figure 3](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/03/2020.08.02.20165092/F3.medium.gif) [Figure 3](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/F3) Figure 3 Main results calculated using 1000 bootstraps of the cohort population. Each point in the graphs represents a bootstrap iteration result. The color legend is shown at the bottom of the figure. **A**. ROC curves comparing the models developed in this research: a GBDT model of all features; logistic-regression models of five blood tests and the anthropometry based model compared to the well established GDRS and FINDRISC. **B**. Precision-Recall (P-R) curves, showing the precision versus the recall for each model, with the prevalence of the population marked with the dashed line. **C**. Deciles’ odds-ratio graph, the ratio of prevalence in each decile to the prevalence in the first decile. We bounded the prevalence in the first decile to be at least a tenth of the T2D prevalence in the full cohort. **D**. A feature importance graph of the logistic regression anthropometry model for a model with normalised features values. The bars indicate the standard deviation (SD) of the feature importance values. The top predictive features of this model are the body mass index (BMI) and waist to hip ratio (WHR). **E**. Feature importance graph of logistic regression Blood-tests model with SD bars. While the HbA1c% and Reticulocyte positively contribute to the T2D prediction, and HDL cholesterol lowers the T2D prediction probability, the information provided age and sex which is relevant for the prediction of T2D onset is overpowered by other features. **F**. A calibration plot of the anthropometry; five blood tests; full blood-test and the FINDRISC models. Calibration of the models’ predictions allow reporting the probability of developing T2D (see Methods). The calibration was performed using an isotonic regression method. ![Figure 4](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/03/2020.08.02.20165092/F4.medium.gif) [Figure 4](http://medrxiv.org/content/early/2021/07/03/2020.08.02.20165092/F4) Figure 4 Anthropometric, five blood tests FINDRISC and GDRS scoreboards calibration graphs. While Five blood tests, Anbthropometrics and GDRS are monotonically rising, the FINDRISC model starts to decline after the third bin. The calibration of the Anthropometric and Five blood tests deteriorates compared to the continuous logistic regression model due to the scores quantisation effect. ### 2.2 five blood tests based model In addition, for those cases where laboratory testing will be available, we developed a more accurate tool for predicting T2D onset. This tool uses five blood test scores as an input to a logistic regression model which we also simplified to a scoreboard model (Figure 2B,D). Using the Five blood tests scoreboard (Figure 2B), we bin the resulted scores into four groups: first group score range [1-114] has a 0.9% [0.7-1%] 95% CI probability of developing T2D; Second group, score range 115-125 predicts an 8% [6-11%] 95% CI probability of developing T2D; Third group score range 126-145 18% [14-22%] 95% CI of developing T2D; The fourth group score range is 146-156, participants in these score range has 35% [24-46%] 95% CI of developing T2D. To derive this model, we started by a feature selection process from a full-feature GBDT model, using only the training and validation data sets. We clustered the features of this model into 13 categories such as lifestyle, diet, and anthropometrics. Based on this process, we concluded that the blood tests have higher predictability than the other clusters assessed. We thus trained a full blood test model using 59 blood tests that are available in the training dataset. Applying a recursive feature elimination process on the top 10 predictive features, we established the features of our final model, which is based on five blood tests (see Methods). Using the five blood tests logistic regression model we achieved the following results for the test set: an auROC of 0.89 (0.86-0.91), an APS of 0.26 (0.2-0.33), and a deciles’ OR of x87 (27-152). When using the scoreboard model. we achieve an auROC of 0.88 (0.85-0.9), an APS of 0.18 (0.14-0.23), and a deciles’ OR of x78 (23-139) (Figure 3 A-C, Table 2). The five blood tests model results are superior to those of our non-laboratory anthropometric model, as well as those of the highly-esteemed FINDRISC and GDRS models (Figure 3 A-C, Table 2). We then compared these results to those of a 59 blood tests input features of logistic regression model and to those of a GBDT model including 13 feature clusters, which consisted of 279 individual features available in the dataset and genetics data. These two models achieved an auROC of 0.91 (0.88-0.93) and 0.92 (0.9-0.94); an APS of 0.32 (0.25-0.39) and 0.34 (0.28-0.42); and a deciles’ OR of x117 (37-163) and x133(45-167), respectively. The five blood tests that we used are the following: the Glycated Haemoglobin test(HbA1c%), which measures the average blood sugar for the past two to three months and which is one of the means to diagnose Diabetes; the Reticulocyte Count; the Gamma-Glutamyl Transferase Test (GGT); the HDL Cholesterol Test, and the Triglycerides Test. We also included the time to prediction (time between visits); gender; age at the repeated visit; and a bias term which is related to the prevalence in the population. We computed the values of these features’ associated coefficients with their 95%CI to enable a reconstruction of the models (Figure 3E). As expected, the HbA1c% feature had the highest predictive power since it is one of the criteria for T2D diagnosis. The second-highest predictive feature was the high-light-scatter-reticulocytes-count, which reflects the number of new red blood cells in the body23. HDL cholesterol, which is known to be beneficial for health, especially in the context of cardiovascular diseases and T2D 15,24,25, was found to be inversely correlated to the predicted probability of T2D onset. Interestingly, age and sex had a very low OR value, meaning that they hardly contributed to the model, probably because the T2D relevant information of these features latent within the blood-tests’ data. Using the five blood tests scoreboard model, we removed the use of age, as we found that it did not contribute to the final score of the model. ### 2.3 Prediction within an HbA1c% stratified population To verify that our models are capable of discriminating within a group of normoglycemic participants and within a group of pre-diabetic participants, we tested the models separately on each group extracted from our data. We separated the groups based on their HbA1c% levels during the first visit to the UKB assessment centers. We allocated participants with 4%=20 units per day, current smoker >=20 units per day or <20 units per day); whole bread intake; coffee intake; red meat consumption; one parent with diabetes; both parents with diabetes and a sibling with diabetes. We performed a random hyperparameters search in the same way that we used for our models. The hyperparameters we used here are: the penaliser parameter in the range of 0-10 using a 0.1 resolution; variance threshold 0-1 with 0.01 resolution to drop columns where the variance of the column was lower than the variance threshold. ### 4.7 Model building procedures To prevent overfitting and biased models, we split the data to twenty percent of a holdout test set which we used only for the final reporting of results. From the remaining data, we split again into a thirty percent validation set and a seventy percent for the training set. We then use a two-stage process to evaluate the models’ performance: an exploration phase and a test phase (Figure 1, S1). During the exploration stage, we select the optimal features for our models using the training and validated data sets. For each group of features, we optimised the hyperparameters using two-hundred iterations of a random selection process. In each iteration, we measured the performance using the auROC metric with a five-fold cross-validation within the training set. We later trained a model on the full training set with the top ranked hyper-parameters from the previous step. We test this model using the validation data set. We use this stage to compare various models and for the features selection process for our models. At the final phase, the test phase, we report the results of our selected models. In this phase, we evaluate the selected models on the holdout test-set. To do so, we rerun the hyperparameters selection process using the training and validation data sets. We train the selected models with the selected hyperparameters on the pooled training and validation data sets. Lastly, we calculate the results of the trained model based on the holdout test-set. We use the same datasets for all of the discussed models. For the logistic regression models we used SKlearn’s LogisticRegressionCV model 30. For the GBDT models we used Microsoft’s LightGBM package 33, and for the survival analysis models, we used the lifelines package 32. During the models’ calculation process we used two-hundred iterations of random hyperparameters-search for the training of the models. For the GBDT models we used the following parameters values for the search: number of leaves - [2, 4, 8, 16, 32, 64, 128]; Number of boosting iterations - [50, 100, 250, 500, 1000, 2000, 4000]; learning rate - [0.005, 0.01, 0.05]; minimum child samples - [5, 10, 25, 50]; subsample - [0.5, 0.7, 0.9, 1]; features fraction - [0.01, 0.05, 0.1, 0.25, 0.5, 0.7, 1]; lambda l1 - [0, 0.25, 0.5, 0.9, 0.99, 0.999]; lambda l2 - [0, 0.25, 0.5, 0.9, 0.99, 0.999]; bagging frequency - [0, 1, 5]; bagging fraction- [0.5, 0.75, 1] 33. For the logistic regression models, during the hyperparameters search we used penaliser at the raNGE OF 0-2 with 0.02 resolution for the l2 penalty. ### 4.8 SHAP As the feature importance analysis for the GBDT model, we used the SHAP method, which approximates Shapley values. SHAP (SHapley Additive exPlanations) originated in a game theory, intended to explain the output of any machine learning model. SHAP Approximates the average marginal contributions of each feature of a model across all permutations of the other features in the same model 34. ### 4.9 Predictors To estimate the contribution of each feature’s domain and for initial screening of features, we started by building a GBDT model based on 279 features plus genetics data originating from the UKB SNPs array. We used T2D related summary statistics from Genome-Wide-Association-Studies (GWAS). These are genetic studies designed to find correlations between known genetic variants and a phenotype of interest. To avoid data leakage, we used only GWASs that derived from outside the UKB population (See supplementary material for the full list of PRSs). As the feature importance analysis for the GBDT model, we used the SHAP method 34, which approximates Shapley values (See methods). To select the most predictive features for the anthropometry and the blood-tests models, we trained and tested the full-features model using the training and validation cohort, and then used this model’s feature importance to extract the most predictive features. We also analysed models which included data of family relatives with T2D using only the training and validation sets. As we did not observe any major improvement over the anthropometrics model, for the simplicity of the model, we decided to omit this feature. At the last step, we tested and reported the model on the holdout test. For the extraction of the five blood tests model, we performed a features selection process by evaluating logistic regression models using the training and validation datasets. We ran models with twenty, ten, and down to four features of blood tests together with age and sex as features, each time removing the blood test with the least essential feature score. We then selected the model with five blood tests (HbA1c%, Reticulocytes count, Gamma Glutamyl Transferase (GGT), Triglycerides, HDL cholesterol, age and sex) as the optimal balance between model’s simplicity (low number of features) and model’s accuracy (using more features) and report its results on the holdout test set. We normalised all the continuous predictors using the standard z-score. In order to avoid data leakage, the train-validation sets were normalised apart from the holdout test set. ### 4.10 Models calibration For each of the models, we calculated the deviation of the mean predicted probability from the actual T2D prevalence of each bin. We split the probabilities range (0-1) to ten prediction probabilities bins with probabilities resolution of 0.1 (Figure 3F). We assign each prediction’s sample to a decile bin according to the calibrated predicted probability of T2D onset. Since our data is highly imbalanced, with a prevalence of 2.17% T2D onset, we used one thousand bootstrapping iterations to better calibrate the models. As such, each participant might be present at several bins according to each prediction iteration of the bootstrapping process. We repeated this process also for the scoreboards models. ### 4.11 Extracting scoreboards To extract our scoreboards, we explored the train and validation data sets, and reported the results on the hold out data set. We calculated the weight of evidence (WoE) of our data by splitting each of our features into bins. We binned in higher resolution features that have greater importance, while maintaining monotonically increasing WoE. (Anon n.d.) For the quantisation of the risk groups of the scoreboards model, we performed one thousand iterations of the bootstrapping process on our validation data set. We considered several potential risk score limits that separate T2D onset probability in each of the scores groups, and we chose boundaries that showed a separation between the risk groups. We then measured the prevalence in each risk group on the test set and we report these results. ### 4.12 References for PRS summary statistics articles HbA1c353,36,37; Cigarettes per day, ever smoked, age start smoking38; HOMA-IR, HOMA-B, diabetes BMI unadjusted, diabetes BMI adjusted, fasting glucose 39; Fasting glucose, 2 hours glucose level,fasting insulin, fasting insulin adjusted BMI’-(MAGIC_Scott)40; Fasting glucose, fasting glucose adjusted for BMI,fasting insulin adjusted for BMI41; Two hours glucose level42; Fasting insulin 43; Fasting Proinsulin44; Leptin adjusted for BMI, Leptin unadjusted for BMI45; Triglycerides, Cholesterol, ldl, hdl46; BMI47; Obesity class1, obesity_class2, overweight 48;Anorexia49; Height50; Waist circumference, hips circumference51; Cardio52; Heart_Rate53; Alzheimer54; Asthma 55 ## Supporting information Supplementary material [[supplements/165092_file02.pdf]](pending:yes) ## Data Availability The UKB data are available through the UK Biobank Access Management System https://www.ukbiobank.ac.uk/ [https://www.ukbiobank.ac.uk/](https://www.ukbiobank.ac.uk/) ## 5. Acknowledgements This research has been conducted using the UK Biobank Resource under Application Number 28784 ## Footnotes * This version updated to include scoreboards models based on the anthropometric and five blood tests logistic regression models. We identified six participants with HbA1C%=6.5 and dropped them from the cohort of the final models. The results and figures were updated accordingly. Figures of the Scoreboards were added, including figures of the type 2 diabetes prevalence within each score group. * Received August 2, 2020. * Revision received July 1, 2021. * Accepted July 3, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## 6. Bibliography 1. 1.Zimmet, P., Alberti, K. G., Magliano, D. J. & Bennett, P. H. Diabetes mellitus statistics on prevalence and mortality: facts and fallacies. Nat. Rev. Endocrinol. 12, 616–622 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrendo.2016.105&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27388988&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 2. 2.International Diabetes Federation - Type 2 diabetes. at <[https://www.idf.org/aboutdiabetes/type-2-diabetes.html](https://www.idf.org/aboutdiabetes/type-2-diabetes.html)> 3. 3.WHO | Diabetes programme. at <[https://web.archive.org/web/20140329084830/](https://web.archive.org/web/20140329084830/)[http://www.who.int/diabetes/en/](http://www.who.int/diabetes/en/)> 4. 4.Beagley, J., Guariguata, L., Weil, C. & Motala, A. A. Global estimates of undiagnosed diabetes in adults. Diabetes Res. Clin. Pract. 103, 150–160 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.diabres.2013.11.001&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24300018&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 5. 5.Wilson, M. L. et al. Access to pathology and laboratory medicine services: a crucial gap. Lancet 391, 1927–1938 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(18)30458-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 6. 6.Home | ADA. at <[https://www.diabetes.org/](https://www.diabetes.org/)> 7. 7.Knowler, W. C. et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 346, 393–403 (2002). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa012512&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11832527&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000173686400002&link_type=ISI) 8. 8.Lindström, J. et al. Sustained reduction in the incidence of type 2 diabetes by lifestyle intervention: follow-up of the Finnish Diabetes Prevention Study. Lancet 368, 1673–1679 (2006). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(06)69701-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17098085&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000242099000034&link_type=ISI) 9. 9.Diabetes Prevention Program Research Group. Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study. Lancet Diabetes Endocrinol. 3, 866–875 (2015). 10. 10.Noble, D., Mathur, R., Dent, T., Meads, C. & Greenhalgh, T. Risk models and scores for type 2 diabetes: systematic review. BMJ 343, d7163 (2011). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNDMvbm92MjhfMS9kNzE2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA3LzAzLzIwMjAuMDguMDIuMjAxNjUwOTIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 11. 11.Collins, G. S., Mallett, S., Omar, O. & Yu, L.-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med. 9, 103 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1741-7015-9-103&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21902820&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 12. 12.Kengne, A. P. et al. Non-invasive risk scores for prediction of type 2 diabetes (EPIC-InterAct): a validation of existing models. The Lancet Diabetes & Endocrinology 2, 19–29 (2014). 13. 13.Bernabe-Ortiz, A., Perel, P., Miranda, J. J. & Smeeth, L. Diagnostic accuracy of the Finnish Diabetes Risk Score (FINDRISC) for undiagnosed T2DM in Peruvian population. Prim. Care Diabetes 12, 517–525 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.pcd.2018.07.015&link_type=DOI) 14. 14.Lindström, J. & Tuomilehto, J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care 26, 725–731 (2003). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiIyNi8zLzcyNSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA3LzAzLzIwMjAuMDguMDIuMjAxNjUwOTIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 15. 15.Meijnikman, A. S., De Block, C. E. M., Verrijken, A., Mertens, I. & Van Gaal, L. F. Predicting type 2 diabetes mellitus: a comparison between the FINDRISC score and the metabolic syndrome. Diabetol. Metab. Syndr. 10, 12 (2018). 16. 16.EPIC Centres - GERMANY. at <[https://epic.iarc.fr/centers/germany.php](https://epic.iarc.fr/centers/germany.php)> 17. 17.Schulze, M. B. et al. An accurate risk score based on anthropometric, dietary, and lifestyle factors to predict the development of type 2 diabetes. Diabetes Care 30, 510–515 (2007). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czo4OiIzMC8zLzUxMCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA3LzAzLzIwMjAuMDguMDIuMjAxNjUwOTIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 18. 18.Mühlenbruch, K. et al. Update of the German Diabetes Risk Score and external validation in the German MONICA/KORA study. Diabetes Res. Clin. Pract. 104, 459–466 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.diabres.2014.03.013&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24742930&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 19. 19.Eckel, R. H., Grundy, S. M. & Zimmet, P. Z. The metabolic syndrome. Lancet 365, 1415–1428 (2005). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(05)66378-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15836891&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000228401900031&link_type=ISI) 20. 20.Cheng, C.-H. et al. Waist-to-hip ratio is a better anthropometric index than body mass index for predicting the risk of type 2 diabetes in Taiwanese population. Nutr. Res. 30, 585–593 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.nutres.2010.08.007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20934599&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 21. 21.Jafari-Koshki, T., Mansourian, M., Hosseini, S. M. & Amini, M. Association of waist and hip circumference and waist-hip ratio with type 2 diabetes risk in first-degree relatives. J. Diabetes Complicat. 30, 1050–1055 (2016). 22. 22.Qiao, Q. & Nyamdorj, R. Is the association of type II diabetes with waist circumference or waist-to-hip ratio stronger than that with body mass index? Eur. J. Clin. Nutr. 64, 30–34 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ejcn.2009.93&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19724291&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000273563700006&link_type=ISI) 23. 23.Fekete, T. & Sopon, E. Glycaemic control and reticulocyte count in diabetic patients. Horm. Metab. Res. 18, 141 (1986). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1055/s-2007-1012251&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=3699688&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 24. 24.Kontush, A. & Chapman, M. J. Why is HDL functionally deficient in type 2 diabetes? Curr. Diab. Rep. 8, 51–59 (2008). 25. 25.Bitzur, R., Cohen, H., Kamari, Y., Shaish, A. & Harats, D. Triglycerides and HDL cholesterol: stars or second leads in diabetes? Diabetes Care 32 Suppl 2, S373–7 (2009). [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZGlhY2FyZSI7czo1OiJyZXNpZCI7czoxNToiMzIvc3VwcGxfMi9TMzczIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDcvMDMvMjAyMC4wOC4wMi4yMDE2NTA5Mi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 26. 26.Understanding A1C | ADA. at <[https://www.diabetes.org/a1c](https://www.diabetes.org/a1c)> 27. 27.Diabetes Prevalence 2019 | Diabetes UK. at <[https://www.diabetes.org.uk/professionals/position-statements-reports/statistics/diabetes-prevalence-2019](https://www.diabetes.org.uk/professionals/position-statements-reports/statistics/diabetes-prevalence-2019)> 28. 28.Fry, A. et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am. J. Epidemiol. 186, 1026–1034 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwx246&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28641372&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 29. 29.Hernán, M. A., Hernández-Díaz, S. & Robins, J. M. A structural approach to selection bias. Epidemiology 15, 615–625 (2004). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/01.ede.0000135174.63482.43&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15308962&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000223382600020&link_type=ISI) 30. 30.Alex, F., Alex, G., Bertr, R. G. F., Bertr, T. & THIRION. Scikit-learn: Machine Learning in Python. 31. 31.Rufibach, K. Use of Brier score to assess binary predictions. J. Clin. Epidemiol. 63, 938–9; author reply 939 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jclinepi.2009.11.009&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20189763&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 32. 32.Davidson-Pilon, C. et al. CamDavidsonPilon/lifelines: v0.24.16. Zenodo (2020). doi:10.5281/zenodo.3937749 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/zenodo.3937749&link_type=DOI) 33. 33.Ke, G. et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. (2017). 34. 34.Lundberg, S. M. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. (2017). 35. 35.Soranzo, N. et al. Common variants at 10 genomic loci influence hemoglobin A1(C) levels via glycemic and nonglycemic pathways. Diabetes 59, 3229–3239 (2010). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZGlhYmV0ZXMiO3M6NToicmVzaWQiO3M6MTA6IjU5LzEyLzMyMjkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNy8wMy8yMDIwLjA4LjAyLjIwMTY1MDkyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 36. 36.Walford, G. A. et al. Genome-Wide Association Study of the Modified Stumvoll Insulin Sensitivity Index Identifies BCL2 and FAM19A2 as Novel Insulin Sensitivity Loci. Diabetes 65, 3200–3211 (2016). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZGlhYmV0ZXMiO3M6NToicmVzaWQiO3M6MTA6IjY1LzEwLzMyMDAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNy8wMy8yMDIwLjA4LjAyLjIwMTY1MDkyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 37. 37.Wheeler, E. et al. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis. PLoS Med. 14, e1002383 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1002383&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28898252&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 38. 38.Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.571&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20418890&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000277179500017&link_type=ISI) 39. 39.Morris, G. P. et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci USA 110, 453–458 (2013). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czo5OiIxMTAvMi80NTMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNy8wMy8yMDIwLjA4LjAyLjIwMTY1MDkyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 40. 40.Scott, R. A. et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat. Genet. 44, 991–1005 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2385&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22885924&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 41. 41.Manning, A. K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2274&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22581228&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 42. 42.Saxena, R. et al. Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat. Genet. 42, 142–148 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.521&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20081857&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000274084400011&link_type=ISI) 43. 43.Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2383&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22885922&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 44. 44.Strawbridge, R. J. et al. Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes. Diabetes 60, 2624–2634 (2011). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZGlhYmV0ZXMiO3M6NToicmVzaWQiO3M6MTA6IjYwLzEwLzI2MjQiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNy8wMy8yMDIwLjA4LjAyLjIwMTY1MDkyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 45. 45.Kilpeläinen, T. O. et al. Genome-wide meta-analysis uncovers novel loci influencing circulating leptin levels. Nat. Commun. 7, 10494 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ncomms10494&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26833098&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 46. 46.Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2797&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24097068&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 47. 47.Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14177&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25673413&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 48. 48.Berndt, S. I. et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 45, 501–512 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2606&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23563607&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 49. 49.Boraska, V. et al. A genome-wide association study of anorexia nervosa. Mol. Psychiatry 19, 1085–1094 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/mp.2013.187&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24514567&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000343661700008&link_type=ISI) 50. 50.Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3097&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25282103&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 51. 51.Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14132&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25673412&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000349190300030&link_type=ISI) 52. 52.CARDIoGRAMplusC4D Consortium et al. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat. Genet. 45, 25–33 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2480&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23202125&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 53. 53.den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2610&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23583979&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 54. 54.Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2802&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24162737&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) 55. 55.Moffatt, M. F. et al. A large-scale, consortium-based genomewide association study of asthma. N. Engl. J. Med. 363, 1211–1221 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa0906312&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20860503&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F03%2F2020.08.02.20165092.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000282050000005&link_type=ISI)