Genetic risk scores used in cardiovascular disease prediction models: a systematic review

Background : Cardiovascular disease is caused by a combination of genetic and environmental risk factors. Some risk factors can change with age, but a genetic predisposition is permanent. Therefore, identifying the genotype of cardiovascular disease and using it alone or in combination with existing risk algorithms can improve risk prediction. This systematic review was conducted to examine existing studies on predictive models for cardiovascular disease using genetic risk score and to determine the clinical utility. Methods : An electronic database search was conducted to identify studies published from January 2005 to July 2020. The literature search was performed using the search terms “coronary artery disease”, “coronary heart disease”, “cardiovascular diseases”, “genetic risk score”, and “polygenic risk score”. Results : Through systematic review, 29 studies were identified. In most studies, genetic risk score was associated with the incidence of cardiovascular disease. In 23 studies, clinical utility was improved based on discrimination between or reclassification of subjects who did and did not experience an event, but the improvement was modest. Conclusions : The predictive model for cardiovascular disease using genetic risk score has limited usefulness in clinical practice due to methodological heterogeneity of genetic risk score constructs. Further research to develop a standardized protocol of genetic risk score constructs and validation studies with various cohorts from diverse populations are required


Introduction
Cardiovascular disease (CVD) is the leading cause of mortality worldwide, causing an estimated 17.9 million deaths each year [1].Early and accurate identification of individuals at high risk of CVD facilitates timely prevention and treatment and can lower public health costs by reducing unnecessary disease burden [2].
Conventional risk scores such as the Framingham Risk Score (FRS) [3], the American College of Cardiology/American Heart Association 2013 risk score (ACC/AHA13) [4], and QRESEARCH cardiovascular risk (QRISK1 and QRISK2) [5] have been developed and used in clinics.Conventional risk scores are useful for both the individual and the clinician by identifying individuals at increased risk of future cardiovascular events and helping them select appropriate lifestyle modifications and preventive medical treatment [6].However, these risk scores focus on relatively short-term risk (5-10 years), which is insufficient to identify people with subclinical disease [7,8].In particular, conventional risk scores might not identify individuals at a younger age who could likely attain long-term benefits [9].
The cause of CVD is a combination of genetic and environmental risk factors [10].Some risk factors can change with age; however, genetic predisposition is permanent.Therefore, identifying genotypes for CVD and us-ing them alone or in combination with existing risk algorithms can improve risk prediction [11].To associate genotype and phenotype, genetic researchers have performed many genome-wide association studies (GWAS) and have made significant advances in identifying CVD-associated genetic variations/single nucleotide polymorphisms (SNPs) [12].In 2007, the use of multi-location genetic risk scores (GRSs) was proposed to integrate the relatively small effects of individual genes and to better improve the accuracy of conventional risk scores [13].Because GRSs allow analysis of high genetic risk at any age, people at higher risk for the disease can be identified before clinical signs appear [14].Therefore, using GRSs for CVD prediction can help detect and prevent disease earlier.
The clinical utility of GRSs for CVD prediction depends not on the strength of their association with typical CVDs, but on their capability to predict future CVD events [15].Although research on GRSs for CVD has been progressing, the effects of GRSs on clinical decision-making are unclear and the predictive power remains limited.Comparison of GRSs that were added to well-established and validated risk prediction models is insufficient [16] and has shown mixed results.Because information on the current development of CVD prediction models using the GRS and comprehensive evaluation based on the models are lacking, systematic efforts are needed.Therefore, in the present study, the methodological characteristics of individual studies were identified, and the clinical utility of the GRS prediction model was evaluated by systematic review of the literature on CVD prediction models using GRSs.

Methods
This review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews (PRISMA) guideline [17].

Search strategy
To retrieve published studies for this review, systematic searches were conducted using three electronic databases: PubMed, Embase, and SCOPUS.We attempted to reconcile the definition of CVD prior to selecting a search term.CVD is a general term for conditions affecting the heart or blood vessels.The major contributing factor to CVD is atherosclerosis, which is narrowing of the arteries resulting from subendothelial deposition of cholesterol, cholesterol esters, and calcium within the vessel walls.Rupture of atherosclerotic plaques yields blood clots that result in myocardial infarction or stroke [18,19].Based on these mechanisms, CVD was defined as a composite of coronary heart death, stable or unstable angina, fatal or non-fatal myocardial infarction, coronary artery bypass grafting or percutaneous coronary intervention, and ischemic stroke events.Depending on the type of CVD in the included study, coronary artery disease (CAD) or coronary heart disease (CHD) was sometimes used instead of CVD, and all terms were included in the search.The following MeSH terms and keywords were used: ("coronary artery disease" OR "coronary heart disease" OR "cardiovascular diseases" OR "ischemic heart disease" OR "angina pectoris" OR "myocardial infarction" OR "stroke") AND ("genetic risk score" OR "polygenic risk score" OR "genomewide association study").Studies were searched from January 2005, when a public database of common variations in the human genome was reported, to July 2020.To identify additional studies, the bibliography of each included study was searched manually.

Selection criteria
After elimination of duplicates, studies related to the subject were screened through titles and abstracts.Next, studies were selected by full-text review based on the inclusion and exclusion criteria.Three authors independently selected the studies, and inconsistencies were resolved through discussion.
The inclusion and exclusion criteria were as follows: (1) The study population was adults dwelling in the community.Studies in which subjects were recruited from hospitals or clinical trials were excluded.Studies focusing on animals, children, and diseased adults were excluded.Although some studies began when subjects were children, studies in which disease occurred in adulthood were in-cluded.(2) After the observation period, only those outcomes referring to CVD were considered.However, hemorrhagic stroke was excluded due to differences in the underlying pathology.(3) Studies in which GRSs were used to predict CVD were included.All studies using GRSs consisting of direct and intermediate risk factors for CVD were included, and studies using only a single SNP or found only an association between GRS and CVD were excluded.(4) Studies were published in academic journals in English.

Data extraction
Two authors (HY and NIN) extracted data using a standardized form.To increase the accuracy of coding and data entry, the other author (EYL) independently verified all extracted data.The following items were extracted: Study characteristics (cohort name, ethnic group, sample size, age [mean], % of females, incident CVD, and followup period); development of a GRS (reference for SNP selection, selected phenotypes, number of SNPs used in GRS construction, and GRS calculation method); and evaluation of GRSs to predict CVD risk (base model for comparison with GRSs, whether family history was included in the base model, association between GRS and CVD, risk discrimination and reclassification to determine clinical utility of the GRS for CVD).

Assessment of risk bias
Each selected study was assessed for risk of bias using the Risk of Bias Assessment Tool for Non-randomized Studies (RoBANS 2.0) developed in 2011 by Health Insurance Review and Assessment Service (Wonju-Si, Republic of Korea) [20].Evaluation items included target group comparability, target group selection, confounding variables, measurement of exposure, evaluator blinding, evaluation of results, incomplete data, and selective outcome reporting.Individual studies were rated as "high", "low", or "unclear" with respect to bias and were assessed independently by the three authors.Where inconsistency was noted, a consensus was reached through discussion among the authors.

Results
The search resulted in 27,485 studies retrieved; 28 studies were selected based on the inclusion and exclusion criteria and one study was added from the manual search.Fig. 1 shows the search and selection processes.
Table 1 (Ref.[9,13,15,) shows the characteristics of the 29 studies.The sample size of the cohort used in the analysis varied from 1306 to 482,629 (median 6041).Studies in which prediction of CVD using GRSs was conducted primarily included Caucasian or European-ancestry populations, and only three studies included Asians (included as a category in one study).
Table 2 (Ref.[9,13,15,) shows how GRSs were developed in the reviewed studies.To construct GRSs,   [44] GWAS consortium CAD 1,745,180 Weighted Liu R. (2019) [45] GWAS CAD 267 Weighted Mosley J.D. (2020) [15] GWAS consortium CHD 6,630,149 Weighted Elliott J. (2020) [46] Cplus4D consortium CAD 1,037,385 Weighted a NHGRI catalog: The NHGRI-EBI Catalog of human genome-wide association studies.b GWAS consortium: combining two or more of "International Consortium for Blood Pressure GWAS" or "Global BPgen" or "The Cohorts for Heart and Aging Research in Genomic Epidemiology" or "Myocardial Infarction Genetics" or "Coronary Artery Disease".most studies have used large-scale data such as the National Human Genome Research Institute (NHGRI) catalog or large international consortia, such as CARDIoGRAM-plusC4D (Coronary ARtery Disease Genome-wide Replication and Meta-analysis plus The Coronary Artery Disease).The number of SNPs used in GRSs ranged from 8-6,630,149.In the past, only a limited number of SNPs associated with CVD or intermediate risk factors were used.In recent years, however, millions of SNPs have been used for GRS calculations.In 19 studies, GRSs were calculated by assigning weights based on the effect of each SNP, and in six studies, a simple count of the total number of risk alleles was used.Three studies analyzed both weighted and unweighted values.Table 3 (Ref.[9,13,15,) shows the comparison between base models and models with added GRS.In most base models, covariates included age, systolic blood pressure, total cholesterol, high-density lipoproteins, diabetes, and smoking status.In addition, sex, body mass index, diastolic blood pressure, lipid-lowering and antihypertensive agents, and serum markers were added or omitted depending on the existing risk score used (data not shown).In 17 studies, family history was included in the base model.An association between incidence of CVD and GRS was apparent in 27 studies.In a number of studies, C-statistic, the net reclassification index (NRI), or integrated discrimination index (IDI) was calculated to assess improvement between the base models and models with added GRS.In examination of C-statistic, discrimination was improved in some or all models in 18 studies.NRI results showed im-proved risk reclassification in some or all of the 17 studies.In two of those, classification improved only in the intermediate risk group.In 11 studies, clinical utility was confirmed by showing improvements in both discrimination and reclassification.The C-statistic value that discriminated CVD was 0.650 to 0.880 (mean [± SD]: 0.751 [± 0.057], median: 0.747) in the base models and 0.640 to 0.881 (mean [± SD]: 0.756 [± 0.057], median: 0.753) in models with added GRS.Increments range from -0.030 to 0.043 (mean [± SD]: 0.006 [± 0.010]).
On assessment of risk bias in each selected study, one study showed high bias in the 'incomplete data' category and another study showed high bias in the 'selective outcome' category.However, we included the two studies because the bias was insufficient to question the quality of the study results.

Discussion
This study was conducted to systematically review existing studies in which GRSs were used for CVD prediction and to determine the clinical relevance.Based on a systematic review process, 29 studies were identified.The GRSs developed in the reviewed studies were associated with incidence of CVD.A total of 23 studies showed clinical utility by improving discrimination or reclassification between subjects who did and did not experience an event.
The association between incidence of CVD and GRS implied that a genetic signal was present among the selected markers, and GRSs can be used for predicting of individual trait.Although individual SNPs have minimal effect on CVD prediction, GRSs containing multiple SNPs potentially can be a strong predictor of disease [47].
In the present study, the ability of GRS predictive models to discriminate was improved, but the improvement was modest.The first procedure in generating a GRS is 'variable selection' to determine which SNPs should be included in the model [48].In GWAS for CVD, 163 loci were reported through 2018, after the discovery of chromosome 9p21 risk locus in 2007.In addition, over 300 additional loci with false discovery rate values <5% indicate CAD risk and might be useful for improving CAD risk prediction [49].In several studies, genome predictive models that consider all accessible genetic variants were shown to identify more efficiently individuals at high risk of complex diseases [50,51].More recently, very large GRSs have been constructed using more than 1 million SNPs.However, because large GRSs included many SNPs below the genome-wide significance threshold for association with CVD, many SNPs might not contribute to the explanatory power of GRSs [15].Our review shows a recent trend in constructing GRS using a large number of SNPs, but there has been no noticeable trend for predictive ability.These inconsistent results prevent prediction of the number of SNPs required for accurate and robust GRSs for CVD.Although there is a number of challenges in this regard [52], due to the genetic structure of CVD, much larger sample sizes will be required to detect a sufficiently large number of variants to make meaningful contributions to risk prediction models and to construct useful predictive risk scores [53].Selecting the best set of truly susceptibility SNPs to increase the impact of GRS on clinical decision-making is likely to be stable only after GWAS has reached huge sample sizes containing hundreds of thousands of individuals [54].
The choice of phenotypes for deriving SNPs to be used for GRSs should be considered.GRSs can be constructed using SNPs that are clinically disease-associated [55].In the current review, studies were included in which GRSs consisting of only CVD-associated SNPs, as well as intermediate risk factor-associated SNPs, or a combination of the two types of SNPs were used.Although GRSs consisting of only CVD-associated SNPs as well as CVD plus intermediate risk factor-associated SNPs showed improvement in discrimination over conventional risk scores, GRSs consisting of only CVD-associated SNPs were the best predictor of CVD.These results indicate that intermediate risk factor-associated SNPs do not improve the prediction of CVD.Although intermediate traits might not be useful for predicting individual future risk of CVD, studies in which the risk variants linked to underlying causal genes are evaluated could identify new therapeutic targets for preventing disease [49].Early discrimination of dyslipidemia patients with or without CVD can lead to timely treatment with lipid-lowering drugs and consequently lower the risk of CVD to the level equivalent to that of the general public [56].Therefore, further studies are needed to identify SNPs strongly associated with symptoms, because SNPs associated with intermediate risk factors might be useful for explaining variations in the subclinical phenotype [57].
A family history is not always a risk factor, but can be easily identified [58].According to the American Cholesterol Guidelines, family history of CVD is a relative indicator in the evaluation for primary preventive support to reinforce statin recommendations [59].In several studies, family history was included in the base model to identify genetic associations.However, inclusion of family history did not lead to a difference in predicted values.Phenotypes are the result of both genetic and environmental interactions [60].Significant risk factors due to the shared nature of family genes should be elucidated in future genetic studies.
Furthermore, studies in which CVD was predicted using GRSs mainly included subjects of Caucasian or European ancestry and only three studies included Asians.The predictive capacity and diagnostic accuracy of findings in GWAS show biases when tested in non-European cohorts using GRSs derived from European-based GWAS.Until recently, 80% of the subjects involved in genetic studies were of European ancestry, 14% were Asians, and 6% were others [61].The involvement of individuals from diverse ethnicities in medical genomics is needed to evaluate the link between disease and related genetic variants for various populations for use in generalized risk prediction models [62].
The current review was limited to participants without CVD or intermediate risk disease in a population-based cohort because hyperselection of patients with these diseases can overestimate the effect size and prediction value [63].Although meta-analysis is the best evidence-based method to confirm the clinical utility of predictive models, such research has not been possible due to the methodological heterogeneity of GRSs between studies.Studies published after the literature search in 2020 were not included, and studies for which inclusion criteria were not identified through the abstract and title were likely excluded, which is another limitation.Despite these limitations, current development of CVD prediction models using the GRS and comprehensive evaluation based on models were systematically reviewed to emphasize the use of genetic information in predictive models for CVD traits.The findings would be helpful for future investigations and clinically useful if considered in the appropriate context.

Conclusions
Based on the results obtained in this review, GRSs were a significant predictor of CVD, and the predictive ability was improved but modest compared with traditional models.However, the methodological heterogeneity was too high to use the model as a guideline.The slight improvement and methodological heterogeneity of the predictive model limit the generalization of GRS as predictor and the application of GRS predictive models in clinical prac-tice.Therefore, further research is needed to develop a standardized protocol of GRS constructs and to validate the findings in various cohorts from diverse populations.

Temporary page!
L A T E X was unable to guess the total number of pages correctly.As there was some unprocessed data that should have been added to the final page this extra page has been added to receive it.
If you rerun the document (without altering it) this surplus page will go away, because L A T E X now knows how many pages to expect for this document.

Fig. 1 .
Fig. 1.PRISMA flow chart for inclusion in the systematic review.PRISMA, Preferred Reporting Items for systematic reviews and Meta-analysis.

Table 3 . Evaluation of genetic risk score for predicting cardiovascular disease risk.
Genetic risk score; CVD, Cardiovascular disease; HR, Hazard ratios; ACRS, ARIC CHD risk score; ATPIII, Adult treatment panel III; NS, not significant; TRF, Traditional risk factor; CHD, Coronary heart disease; MI, Myocardial infarction; FRS, Framingham heart study risk score; NA, not available; SBP, systolic blood pressure; DBP, diastolic blood pressure; ACS, Acute coronary syndrome; TC, Total cholesterol; LDL, Low density lipoproteins; CAD, Coronary artery disease; ACC/AHA, American College of Cardiology/American Heart Association risk score; QRISK2, QRESEARCH cardiovascular risk 2014 version.