Candidate SNP Markers of Atherogenesis Significantly Shifting the Affinity of TATA-Binding Protein for Human Gene Promoters Show Stabilizing Natural Selection as a Sum of Neutral Drift Accelerating Atherogenesis and Directional Natural Selection Slowing It

(1) Background: The World Health Organization (WHO) regards atherosclerosis-related myocardial infarction and stroke as the main causes of death in humans. Susceptibility to atherogenesis-associated diseases is caused by single-nucleotide polymorphisms (SNPs). (2) Methods: Using our previously developed public web-service SNP_TATA_Comparator, we estimated statistical significance of the SNP-caused alterations in TATA-binding protein (TBP) binding affinity for 70 bp proximal promoter regions of the human genes clinically associated with diseases syntonic or dystonic with atherogenesis. Additionally, we did the same for several genes related to the maintenance of mitochondrial genome integrity, according to present-day active research aimed at retarding atherogenesis. (3) Results: In dbSNP, we found 1186 SNPs altering such affinity to the same extent as clinical SNP markers do (as estimated). Particularly, clinical SNP marker rs2276109 can prevent autoimmune diseases via reduced TBP affinity for the human MMP12 gene promoter and therefore macrophage elastase deficiency, which is a well-known physiological marker of accelerated atherogenesis that could be retarded nutritionally using dairy fermented by lactobacilli. (4) Conclusions: Our results uncovered SNPs near clinical SNP markers as the basis of neutral drift accelerating atherogenesis and SNPs of genes encoding proteins related to mitochondrial genome integrity and microRNA genes associated with instability of the atherosclerotic plaque as a basis of directional natural selection slowing atherogenesis. Their sum may be stabilizing the natural selection that sets the normal level of atherogenesis.


Introduction
Atherosclerosis is an inflammatory disorder of arteries that can lead to myocardial infarction and stroke (i.e., two most frequent causes of death in humans according to the World Health Organization (WHO) [1]). The current conventional view is that low-density lipoprotein accumulation near the vessel inner wall is the precondition for atherogenesis initiation [2]. Next, monocytes surround such lipoprotein clusters to absorb them and differentiate into macrophage foam cells, which can separate and take away excess lipoproteins temporarily as well as return them with lipoproteins deficit for their homeostasis and so on until such cells die over time, thereby building these clusters up into fatty streaks [3]. Incidentally, such streaks can gradually transform first into thrombogenic bands, then into their fibrous agglomerates, and finally, into atherosclerotic plaques, which can be calcified and become inflammation hotbeds within blood vessels [4]. Eventually, this process leads to thrombosis of the artery, and as a result, to the death of the affected tissue [5], in particular, myocardial infarction or stroke [1]. Clinical observations indicate that atherogenesis is a nonmonotonic slow step-by-step cyclic process developing throughout the lifespan mainly postprandially (i.e., after a meal [6]) during acute infectious events [7] near an injury site in the vessel endothelium during hypertension [2,4]. Indeed, precursors of fatty streaks in an embryo have been clinically documented [8], whereas atherosclerosis is mostly a disease of old age [1]. Thus, at any given moment, the current atherogenic status of an individual reflects all his/her living conditions, lifestyle, and diseases in previous years, which can occasionally accelerate or retard atherogenesis depending on his/her atherogenesis susceptibility in accordance with his/her individual genome [2]. Therefore, the sequenced genome of an individual allows to increase his/her life expectancy via the slowing of his/her atherogenesis if this person chooses the living conditions and lifestyle that can minimize dietary fat, infectious diseases, extreme physical exertion, injuries, foreign substances in the blood, and the risks of comorbidities related to atherosclerosis. This is what predictive-preventive personalized participatory (4P) medicine can already do [9]. Its keystone is the top scientific project of the 21st century: "1000 Genomes" [10]; under this project, scientists have already sequenced thousands of individual genomes and assembled them into a reference human genome (i.e., Ensembl [11]) and variome (i.e., dbSNP [12] containing hundreds of millions of single-nucleotide polymorphisms [SNPs]) and made them publicly available via the UCSC Genome Browser [13]. Additionally, to help physicians to deal with individual genomes of their patients, scientists working on database dbWGFP [14] continuously search for, compile, systematize, and prioritize any available information on each of 10 10 potential SNPs throughout the human genome, as do the researchers behind two databases: ClinVar [15] and OMIM [16], containing only clinically documented and only well-studied SNPs, respectively.
According to OMIM data [16], the absolute majority of well-studied SNPs are within protein-coding regions of human genes, where they damage protein structure and/or functions without fail throughout the human body. These SNPs seem easy to detect but cannot be neutralized either therapeutically or by changing living conditions or the lifestyle (see, e.g., in [17]) without repair of the damage to gene sequences; this field is only beginning to be studied on laboratory animals and is widely discussed hypothetically in relation to humans. On the contrary, a negligible percentage of well-studied SNPs is within regulatory gene regions [16,18], where, without damaging a protein, they can modulate only gene expression levels, which vary from cell to cell, from tissue to tissue, and so on for many reasons; these SNPs are very hard to detect but easy to neutralize by medications, living conditions, and the lifestyle. Indeed, exogenous insulin successfully helps those with hypoinsulinemia who have no insulin resistance and who try to lead a lifestyle that does not provoke acute aggravation of this disorder. Of note, the best-studied regulatory SNPs are often located within a 70 bp proximal core promoter [19], where they modulate both TATA-binding protein (TBP) binding affinity for these promoters and transcription activity of these promoters (these parameters are proportionally related [20]), thereby acting as indispensable switches between inactive nucleosome packages and active preinitiation complexes [21].
The key concept of 4P medicine [9] is a clinical SNP marker of a given disease; alleles of this SNP allow to statistically significantly distinguish between representative cohorts of patients with this disease and conventionally healthy volunteers as the only acceptable criterion; this is because it quantifies the expected likelihood of a medical error associated with each clinical SNP marker used (see, e.g., in [22]). Nonetheless, it takes too much time, manual labor, and money to test how each of 10 10 human SNPs [14] affects each of 55,000 diseases (see ICD-11 [23]), and why should we if Kimura's theory [24] and Haldane's dilemma [25] predict no phenotypic manifestations for the vast majority of such SNPs.
As a first step, why not try cheap and fast bioinformatic genome-wide calculations to find this vast majority of neutral SNPs in humans? This approach should exclude them from clinical studies, which consequently could become more targeted and less expensive. Although the accuracy of current genome-wide computational predictions remains below the applicability threshold for clinical SNP tests [26], this accuracy increases every year [27][28][29][30][31][32].
In our previous studies, we first measured the TBP-promoter affinity in vitro [33][34][35], next revealed partial correlations "sequence => activity" in silico [34], and then generalized them as the three-step TBP-promoter binding process (i.e., TBP slides along DNA <=> TBP stops at a TBP-binding site (TBP-site) <=> the TBP-promoter complex is fixed by a 90 • bend of DNA [36]) as observed in vitro [37]. After that, we nevertheless verified our three-step model using all 68 independent experimental datasets that we could find within the PubMed database [38] (for review, see [39]) as well as using our own experiments under equilibrium [40], nonequilibrium [41], and real-time conditions [42][43][44] in vitro. On this basis, we created our publicly available Web service SNP_TATA_Comparator [45] whose input data are two DNA sequences, namely, one for the ancestral allele of the promoter under study, and the other for the minor allele of this promoter. Using these input data, SNP_TATA_Comparator calculates TBP binding affinity estimates for two corresponding promoter variants and standard errors of these estimates as well as statistical significance of the difference between them according to the Fisher Z-test [45]. Next, we validated the selected predictions of SNP_TATA_Comparator [45] in our own experiments ex vivo by means of human cell lines transfected with the pGL4.10 vector carrying a reporter LUC gene under the control of a promoter containing the SNP in question (for review, see [46]). In this way, we have repeatedly confirmed that the increase in the TBP-promoter affinity estimate predicted by our SNP_TATA_Comparator [45] statistically significantly corresponds to an increase in the expression level of the reporter gene regulated by this promoter and vice versa [46].
Finally, we applied this software to predict candidate SNP markers of resistance to anticancer therapy [47], obesity [48], and autoimmune [49] and Alzheimer's [50] diseases as well as markers of nonclinical aberrations in humans such as circadian rhythm disturbances [51], aggressiveness [52], female reproductive potential anomalies [53], and domination and subordination within a social hierarchy [54]. In these works of ours, we studied only SNPs located within 70 bp proximal promoter regions in front of transcription start sites of the human genes because TBP-sites are necessary here [21]. Among such TBP-binding promoter regions, the most valuable for us are those that contain known clinically proven SNP markers of a human pathology that reliably change TBP-binding affinity for the corresponding promoters and expression levels of the human genes regulated by these promoters in accordance with the clinically confirmed markers of these pathologies. This information allows us to most reliably predict clinical manifestations of other SNPs in such a promoter region that alter TBP-binding affinity for this promoter as do clinically proven SNP markers located within this promoter (according to our calculations using SNP_TATA_Comparator [45]). Therefore, we constantly update our collection of such clinically proven SNP markers of human pathologies on the basis of our curated search for relevant articles in the PubMed database [38] as well as other publicly available databases (for example, ClinVar [15]), and we have published its updates in many studies depending on the human pathologies considered there.
Our recent study on this topic is about 34 candidate SNP markers of atherosclerosis [55] that are located around the clinical SNP markers in TBP-sites of 17 human gene promoters. These clinical SNP markers seem to be subject to neutral drift as genetic loads, according to the statistical estimates from our previous works [49][50][51]54] within the framework of Kimura's theory [24] and Haldane's dilemma [25] using a previous dbSNP build (No. 147 dated 2016) [12].
Here, we followed the same approach by means of current dbSNP build No. 151 dated 2017 [12] (i.e., almost four times the size of the previous build, No. 147) to compare our results updated in this way with our above-mentioned previous results [55]. This approach allowed us to examine the robustness of genome-wide patterns [49-51,54] of the candidate SNP markers predicted by our SNP_TATA_Comparator [45] in the face of annual growth of the SNP count.
Finally, we analyzed several genes encoding proteins related to mitochondrial genome integrity [56] and microRNA (miRNA, miR) genes associated with instability of the atherosclerotic plaque [57] that are currently widely studied regarding the slowing of atherogenesis.
In addition, with the help of our SNP_TATA_Comparator [45], we analyzed all the 464 and 81 SNPs in question within all known promoters of four protein-coding genes and four miRNA genes, respectively, which comprehensively exemplify two gene networks related to the maintenance of mitochondrial genome integrity [56] and instability of the atherosclerotic plaque [57] in humans. As a result, we predicted 77 and 16 candidate SNP markers of this disorder and characterized them (rows 5 and 6 of Table 1).
We will describe in depth only our predictions regarding human gene CETP, whose promoter contains the only clinically proven SNP marker of atherosclerosis, so that we next review all our other predictions briefly in a similar way.
Finally, we will evaluate them all as a whole regarding their statistical significance with respect to the widely accepted genome-wide patterns of SNP occurrence.

The Only Clinically Proven SNP Marker of Atherosclerosis in a TBP-Site of a Human Gene Promoter
Human gene CETP (plasma lipid transfer protein) carries the only known clinical SNP marker rs1427119663 (i.e., 18 bp deletion 5 -G 71 GGCGGACATACATATAC 54-3 containing the TBP-site within the 70 bp proximal promoter region of this gene: a frame ( ) and a double-headed red arrow, respectively, in Figure 1a), which reduces CETP expression and thus retards atherogenesis [58], as depicted by the "↓" symbol (down arrow) in Table 2. Figure 1 illustrates how we predicted this rs1427119663-dependent CETP underexpression, when we retrieved the proper input data from (a) the UCSC Genome Browser [13] via (b) the dbSNP database [12] and entered these data into (c) our SNP_TATA_Comparator [45], which processed them using (d) standard bioinformatics-related software R [59], as indicated by the arrows there. This correct in silico prediction (Figure 1) of our SNP_TATA_Comparator [45] in the case of the only known clinical SNP marker of atherosclerosis (Table 2) indicates its applicability to atherogenesis research.
Near the known SNP marker (rs1427119663), there are 22 SNPs (Figure 1a), none of which have been characterized so far regarding any link to human diseases (for brevity, here we will refer to such SNPs as unannotated). Among these 22 unannotated SNPs, using SNP_TATA_Comparator [45], we predicted four SNPs (e.g., rs1002690375) that can cause CETP overexpression as a physiological marker of accelerated atherogenesis [60]. Accordingly, we can suggest four candidate SNP markers of accelerated atherogenesis, which are italicized in Table 2. To tell the reader about the effect of each predicted candidate SNP marker on gene expression in comparison either to the clinical SNP marker used or to one another, we prioritized these predictions heuristically using the p-value of Fisher's Z-test in terms of heuristic rank ρ-values, which vary in alphabetical order from "A" (the best) to "E" (the worst; Table 2: column ρ). Within our example considered, we designated the known clinically proven SNP marker rs1427119663 as "the best", whereas another SNP (candidate marker rs569033466) received a lower rating, "B".
Finally, using a PubMed keyword search [38], we found that swimming can slow atherogenesis down [61]. Notes. Atherogenesis: accelerated (↑) and slowed down (↓), N GENE and N SNP : total numbers of the human genes and of their SNPs meeting the criteria of this study. N RES : the total number of the candidate SNP markers predicted in this work that can increase (n > ) or decrease (n < ) the affinity of TATA-binding protein (TBP) for these promoters and to, respectively, affect the expression of these genes. n ↑ and n ↓ : the total numbers of the candidate SNP markers that can accelerate or slow down atherogenesis, respectively. P(H 0 ), the estimate of probability for the acceptance of this H 0 hypothesis, according to a binomial distribution; TBP-site, TATA-binding protein-binding site.   Notes. Hereinafter, Alleles: wt, ancestral; min, minor; "-", deletion. K D , dissociation constant of the TBP-DNA complex; α = 1 -p, significance (where p value is given in Figure 1); Gene expression changes (∆): an increase (>) and decrease (<); Atherogenesis (AS): accelerated (↑) and slowed down (↓); ρ, heuristic rank of candidate SNP markers from the "best" (A) to the "worst" (E). * This SNP also includes other neutral alleles. Diseases: RA, rheumatoid arthritis. Genes: CETP, cholesteryl ester transfer protein; MBL2, mannose-binding lectin 2 (synonyms: collectin-1); F3, coagulation factor III (synonyms: thromboplastin, tissue factor); TPI1, triosephosphate isomerase 1. Deletions, CETP: 18bp = gggcggacatacatatac.

Human Genes Associated with Cardiovascular Diseases
Human gene MBL2 encodes mannose-binding lectin 2 and has a clinically proven SNP marker (rs72661131) of stroke [64], preeclampsia [65], and variable immunodeficiency [66] due to MBL2 deficiency, as shown in Table 2. By searching PubMed [38], we found a retrospective clinical review [67] on MBL2 insufficiency as a physiological marker of atherogenesis speeding up via both formation and destabilization of atherosclerotic plaques in the course of thrombogenesis. With this in mind and without any cause-effect assumptions, we predicted that this known clinical SNP marker rs72661131 can also be a candidate SNP marker of acceleration of late atherogenesis ( Table 2).
Around a known SNP marker, rs72661131, we found an unannotated SNP (rs567653539) causing MBL2 overexpression as well as two unannotated SNPs (rs1471733364 and rs562962093) causing its underexpression ( Table 2). Using PubMed search software [38], we learned that the MBL2 excess and deficit correspond to acceleration of early and late atherogenesis in rheumatoid arthritis [68] and on a Western-pattern (standard American) diet [69]. That is why we predicted three more candidate SNP markers (rs567653539, rs1471733364, and rs562962093) of increased atherogenesis ( Table 2).
Human gene F3 (thromboplastin) contains clinically proven SNP marker rs563763767 increasing the expression of this gene thus causing either myocardial infarction or thrombosis [70] as complications of atherogenesis [71], which can be prevented by low-dose aspirin therapy [72], according to searches in PubMed [38]. Near rs563763767, there are two unannotated SNPs-rs1439518731 and rs966076891:t-that can also increase the thromboplastin level and thus cause these diseases as atherogenesis complications (Table 2). In addition, within the same promoter, we found two more unannotated SNPs (rs966076891:g and rs1190659847) able to reduce thromboplastin expression, whereas antithromboplastin therapy slows down atherogenesis in a murine model of human atherosclerosis [73], as we learned in the PubMed database [38] (Table 2). Summing up all our findings about the human F3 gene within this table, we predicted five candidate SNP markers of atherogenesis types listed there.
Human gene TPI1 coding for triosephosphate isomerase includes a known SNP marker (rs1800202) reducing this enzyme's amount [74], and thereby leading to hemolytic anemia and neuromuscular diseases [75], which correspond to traumatic complications of atherogenesis [76] and a risk factor for atherosclerosis development [77], according to PubMed [38]. In the vicinity of rs1800202, we uncovered two unannotated SNPs (rs1386262216 and rs781835924) that can lead to TPI1 underexpression too. This finding allows us to propose them as candidate SNP markers of enhanced atherogenesis for the same reasons [74][75][76][77], as one can see in Table 2.

Human Genes Associated with Blood Disorders
Human genes HBB and HBD correspond to βand δ-subunits of hemoglobin, promoters of which carry the greatest number (seven) of clinical SNP markers (e.g., HBB: rs33981098 and HBD: rs35518301) of hemoglobin insufficiency responsible for malaria resistance and thalassemia [78], which are atheroprotective [79]. Therefore, they can all be candidate SNP markers of delayed atherogenesis (Table 3). Looking through these promoters, we found four unannotated SNPs (e.g., HBB: rs281864525 and HBD: rs996092254), which can also cause hemoglobin deficiency, and therefore may be candidate SNP markers of an atherogenesis slowdown (Table 3).
In addition, there is substitution A>T at position −27 of the HBB promoter (hereinafter: HBB:−27A>T) [80] (not covered by either the "1000 Genomes" project [10] or dbSNP [12]), which causes HBB overexpression that has been clinically associated with the norm of the two above-mentioned biomedical traits [80] (Table 3). Nonetheless, our keyword search via PubMed [38] led to a clinical research article [81] on the heme released by extracellular HBB (after hemolysis), as an accelerator of atherogenesis. That is why we predicted that the known SNP marker HBB:−27A>T of the norm of both malaria resistance and thalassemia [80] can also be a candidate SNP marker of accelerated atherogenesis (Table 3). Finally, within promoters in question, we found three more unannotated SNPs (e.g., HBB: rs34500389, our prediction for which is presented in Figure 2a, as an illustrative example) that can raise the expression of the genes in question too. On this basis, we proposed three candidate SNP markers of accelerated atherogenesis, which are listed in Table 3. Human gene ACKR1 (atypical chemokine receptor 1, synonyms: glycoprotein D, Duffy blood group) contains the known SNP marker (rs2814778) of leukopenia [82] and resistance to malaria [83] due to low TBP-promoter affinity, and therefore ACKR1 underexpression. Thanks to PubMed [38], Human gene ACKR1 (atypical chemokine receptor 1, synonyms: glycoprotein D, Duffy blood group) contains the known SNP marker (rs2814778) of leukopenia [82] and resistance to malaria [83] due to low TBP-promoter affinity, and therefore ACKR1 underexpression. Thanks to PubMed [38], we learned that ACKR1 underexpression reduces atherogenesis in a mouse model of human atherogenesis [84]. For this reason, we suggested rs2814778 as a candidate SNP marker of an atherogenesis slowdown (Table 3). By analyzing the same promoter, we selected an unannotated SNP, rs1185314734, corresponding to glycoprotein D overexpression, which is a known physiological marker of atherosclerosis-related cardiovascular diseases, according to a retrospective clinical review [85] that was found using the PubMed keyword search utility [38] ( Table 3).
The promoter of human gene NOS2 carries a known SNP marker (NOS2:−51T>C) of malaria resistance [86,87], which is caused by overexpression of inducible nitric oxide synthase encoded by this gene. Due to PubMed [38], we know that an NO excess is an accelerator of atherogenesis according to both physiological [88] and nutritional [89] original research articles. Therefore, the known SNP marker (NOS2:−51T>C) of malaria resistance [86,87] could be additionally regarded as a candidate SNP marker of accelerated atherogenesis (Table 3).
Additionally, looking through the same promoter, we encountered an unannotated SNP, rs1339255364, which can reduce NOS2 expression, whereas any inhibition of NOS2 yields an atherosclerosis slowdown in line with a mouse model of human atherosclerosis [90], as revealed by a keyword search in PubMed [38]. With this in mind, we propose that rs1339255364 is a candidate SNP marker of retarded atherogenesis as readers can see in Table 3.

Human Genes Associated with Autoimmunity-Associated Diseases
Human gene SOD1 encodes Cu/Zn superoxide dismutase and carries a well-known SNP marker (rs7277748) of this gene's underexpression ( Figure 2b) and of familial amyotrophic lateral sclerosis [96] as a risk factor of atherosclerosis development; this complication is often provoked by various autoimmune diseases according to epidemiological reviews [97][98][99] (Table 4). Near rs7277748, there are three unannotated SNPs (e.g., rs1438766715 as a 26 bp deletion of the wild type TBP-site of this promoter), which can reduce the SOD1 level and thus have the same clinical manifestations (Table 4). Thus, we suggest them all as candidate SNP markers of atherogenesis acceleration. Finally, due to PubMed keyword search software [38], we know that short-term physical exercise [100], onion extract nutritional supplements [101], and antioxidants [102] can slow down these health problems (Table 4).
Human gene INS (insulin) contains a known SNP marker (rs5505) of neonatal diabetes mellitus mediated by hyperinsulinemia [15], which in turn is a well-known physiological marker of atherosclerosis [103,104]. Around rs5505, we selected three more unannotated SNPs able to cause the same hyperinsulinemia (e.g., rs1389349459) and thereby atherogenesis (Table 4). In addition, here we revealed the only unannotated SNP (rs11557611) associated with hypoinsulinemia, and this SNP speeds up atherogenesis too [105]. As readers can see in Table 4, within the examined promoter region of this gene, we identified all five candidate SNP markers of accelerated atherogenesis only. Nevertheless, thanks to the PubMed keyword search utility [38], we eventually learned about a dietary soy isoflavone as a norminsulinemic atheroprotector [106].  Human gene MMP12 contains a clinical SNP marker (rs2276109) of lower risk of many immune diseases, such as psoriasis [107], systemic scleroderma [108], and asthma [109], because of underexpression of the macrophage elastase encoded by this gene. Because MMP12 performs degradation of fibrinogen in the blood [110,111], its underexpression accelerates atherogenesis. Within the same promoter, we identified one more unannotated SNP (rs572527200) reducing MMP12 abundance, which can also be a candidate SNP marker of atherogenesis acceleration (Table 4). In addition, here we found one more unannotated SNP (rs1401366377) elevating the macrophage elastase level, which is known as a physiological marker of accelerated progression of atherosclerosis in transgenic rabbits [112] (Table 4). Looking through our predictions about this gene, we see all three candidate SNP markers accelerating atherogenesis only, whereas the PubMed keyword search utility [38] pointed us to milk products fermented by lactobacilli [113] and estrogen-like nutritional supplements [114], which can decelerate these health problems (Table 4).

Human Genes Associated with Obesity
Human gene GCG encodes glucagon, a so-called hunger hormone, because feelings of hunger decrease with a decrease in its concentration [115], i.e., during hypoglucagonemia [116] (Table 5), which can be caused by any of the five unannotated SNPs that we found in the promoter of this gene (e.g., rs183433761, as shown in Figure 2c). After searching PubMed, we learned that atherogenesis is accelerated postprandially in proportion to the ratio of insulin concentration to glucagon concentration [117]. This observation allows us to propose five candidate SNP markers of accelerated atherogenesis ( Table 5).
As for our keyword search in PubMed [38], in this way, we found a sport medicine article on some atherosclerosis-related postprandial effects, namely, in childhood, high-fat food ingestion just before physical exercise can slow down both growth and development [118].
Human gene LEP codes for leptin, which is also called obesity hormone because laboratory animals with a protein-damaging mutation in this gene are abnormally obese [119], as shown in Table 5. As readers can see in this table, within the human leptin promoter, we identified two unannotated SNPs able to cause obesity that can accelerate atherogenesis in line with a review article [120] found in PubMed using its keyword search utility [38]. In addition, here we found three unannotated SNPs increasing the LEP level (and thus protecting against both obesity and atherosclerosis [121]) as candidate SNP markers of an atherogenesis slowdown (e.g., rs34104384, Table 5).
Human gene APOA1 for apolipoprotein A1 carries a known SNP marker (APOA1:−35A>C) of hematuria, fatty liver, and obesity [122], and is a commonly accepted risk factor of atherogenesis acceleration [120]. Indeed, this prediction is consistent with the outcome of treatment of atherosclerosis using exogenous apolipoprotein A1 [123] as well as by both a low-protein diet and exercise increasing the endogenous APOA1 level in obese post-menopausal women [124]. Near this substitution APOA1:−35A>C, we found three more unannotated SNPs, which can cause APOA1 underexpression and, as a consequence, obesity and atherosclerosis. Therefore, we propose them as candidate SNP markers of accelerated atherogenesis (e.g., rs1428975217), as readers can see in Table 5.
Human gene HTR2C (5-hydroxytryptamine receptor 2C, synonym: serotonin receptor 2C) contains a known SNP marker (rs3813929) of overexpression of this gene causing obesity as a complication of antipsychotic treatment with olanzapine [15]. This SNP can therefore speed up atherogenesis, consistently with the above-mentioned review [120] and independent clinical data [125,126] (Table 5). Around rs3813929, we recognized five unannotated SNPs (e.g., rs1348095721), all of which seem to yield HTR2C overexpression too, thereby allowing us to recommend them as candidate SNP markers of an atherogenesis increase, as presented in Table 5.  Human gene IL1B encodes interleukin 1β and has a known biomedical SNP marker (rs1143627) of IL1B overexpression causing the greatest number and variety of human diseases, including obesity [127], Graves' disease [128], major recurrent depression [129], non-small cell lung cancer [130], hepatocellular carcinoma in hepatitis C virus infection [131], gastric cancer [132], gastric ulcer, and chronic gastritis [133] ( Table 5). Using the known association between this clinical SNP marker and above-mentioned obesity and independent single-cell RNA-seq data on IL1B excess as a pro-atherogenic factor [134], we can recommend verifying it as another candidate SNP marker of atherogenesis acceleration (Table 5).
Within the IL1B promoter region in question, we found the only unannotated SNP (rs549858786), which can reduce the expression of this gene ( Table 5). As for our keyword search in PubMed [38], it resulted in a pharmaceutical original research article [135] on IL1B deficiency as a physiological marker of successful treatment of atherosclerosis using oligomeric proanthocyanidins from Rhodiola rosea, as detailed in Table 5. This time, we surprisingly found an article [136] on proinflammatory effects of cigarette smoke condensates as an atherogenic risk factor, suggesting that passive smoking can be as dangerous as active one.

Human Genes Associated with Carcinogenesis
Human gene ADH7 (alcohol dehydrogenase 7) contains a clinically proven SNP marker (rs17537595) of ADH7 underexpression responsible for esophageal cancer [137], as shown in Table 6. According to the outcome of our keyword search in PubMed [38], post-esophagectomy necrosis is comorbid with atherosclerosis in this case [138]. This result allows us to propose this known SNP marker of carcinogenesis as a candidate SNP marker of accelerated atherogenesis (Table 6).
Finally, within the ADH7 gene promoter, we found three unannotated SNPs, which can also reduce its expression and thus speed up atherogenesis too (e.g., rs372329931, Table 6).
Human gene HSD17B1 codes for hydroxysteroid (17-β) dehydrogenase 1, whose deficit in the case of the well-known SNP marker rs201739205 is clinically detectable in patients with hereditary breast cancer [139], as readers can see in Table 6. Due to a PubMed search, we learned about insufficiency of this enzyme as a physiological marker of a successful antiatherosclerotic therapy based on the IMM-H007 drug [140]. That is why we predicted a candidate SNP marker (rs201739205) of an atherogenesis slowdown (Table 6). Furthermore, near it, we found two unannotated SNPs, which can also reduce this enzyme's level and therefore may be candidate SNP markers of atherogenesis retardation (Table 6; e.g., rs748743528).
Finally, within the HSD17B1 promoter region in question, we detected three unannotated SNPs able to raise the HSD17B1 level (e.g., rs1282820277), which is a well-known physiological marker of accelerated atherogenesis [141] according to PubMed [38] (Table 6). On this basis, we predicted three candidate SNP markers of atherogenesis acceleration due to an HSD17B1 excess (Table 6).
Human gene MLH1 encodes DNA mismatch repair protein MLH1, and its promoter contains two clinically confirmed SNP markers (rs63750527 and rs756099600) of nonpolyposis colon cancer [15] because an excess of this repair protein can prevent cancer cell apoptosis during either an immune response or anticancer chemotherapy (Table 6), which is also known as a physiological marker of atherogenesis retardation [142,143]. With this in mind, we predicted two candidate SNP markers (rs63750527 and rs756099600) of atherogenesis deceleration (Table 6). Near them, we selected three unannotated SNPs, which seem to increase this reparatory protein's abundance (e.g., rs753671152), and thus to retard atherosclerosis development as candidate SNP markers of an atherogenesis delay ( Table 6). In addition, there we found three other unannotated SNPs, which decrease MLH1 expression and, on the contrary, accelerate atherogenesis (e.g., rs587778905 as a 21 bp deletion with the wild type TBP-site and its local surroundings within the promoter being considered), which can be tested as candidate SNP markers for atherogenesis acceleration (Table 6). Finally, in this case, our keyword search in PubMed [38] unexpectedly returned a nutritional original research article [144] about atheroprotective effects of restrictions on alcohol drinking and red meat intake, according to their effects to DNA mismatch repair status.  Human gene RET has two clinically documented SNP markers-rs10900296 and rs10900297-corresponding to deficiency and excess of the Ret proto-oncogene encoded by this gene [15] (Table 6). Because of our keyword search in PubMed [38], we know about atheroprotective abilities of RET, whose deficiency dysregulates atheroprotector pentraxin-3 and vice versa [145]. With this in mind, we predicted a couple (rs551321384 and rs1191017949) of candidate SNP markers of atherogenesis acceleration during RET downregulation caused by them, whereas two others (rs1237152255 and rs1372293149) can be candidate SNP markers of slower atherogenesis due to a RET excess as their manifestation (Table 6).
Human gene ESR2 (estrogen receptor 2 (β)) contains a clinical SNP marker (rs35036378) of an ESR2-deficient primary pT1 tumor of the mammary gland [146], whereas an ESR2 deficit is also a well-known physiological marker of the calcification stage of atherogenesis [147] (Table 6). Thus, it can be considered a candidate SNP marker of enhanced atherogenesis ( Table 6). Around rs35036378, we chose the only unannotated SNP (rs766797386), which can reduce ESR2 concentration similarly to the above-mentioned candidate SNP marker of atherogenesis acceleration ( Table 6). As for PubMed keyword search results [38], nutritionists recommend bioactivated calcium (Ca) in natural food as an atheroprotective nutrient in contrast to Ca-enriched dietary supplements elevating the risk of coronary artery calcification [148].
Human gene DHFR codes for dihydrofolate reductase and carries a known SNP marker (rs10168) of methotrexate resistance during anticancer chemotherapy because of this enzyme's overexpression [149], which is in turn well known as an atheroprotector [150,151]. For this reason, we suggest a candidate SNP marker (rs10168) of retarded atherogenesis, as readers can see in Table 6. Near rs10168, we noticed the only unannotated SNP, rs750793297, which can also elevate the DHFR level, and thus be a candidate SNP marker of atherogenesis deceleration (Table 6). Finally, looking through this DHFR promoter, we recognized five unannotated SNPs reducing this enzyme's abundance such as candidate SNP markers of atherogenesis acceleration (e.g., rs1464445339) that could be slowed down by restricting both active and secondhand smoking because they reduce the blood folate level [151] (Table 6).

Human Genes Associated with Developmental Disorders
Human gene COMT carries two known SNP markers (rs370819229 and rs777650793) corresponding to dilated cardiomyopathy and cardiovascular disease [15] as a consequence of either a deficit or excess of catechol-O-methyltransferase encoded by this gene (Table 7). Using PubMed [38], we learned that both a substrate (estradiol) and metabolite (2-methoxy estradiol) of this enzyme are atheroprotectors [152]. That is why we propose any SNP-caused statistically significant changes in the COMT level as candidate SNP markers of an atherogenesis delay (Table 7, e.g., rs370819229 exemplified in Figure 2d).
Human gene TGFBR2 (transforming growth factor beta receptor 2) carries a clinical SNP marker (rs138010137) of underexpression of this gene resulting in aortic thoracic aneurysm and dissection [15]. Our keyword search in PubMed [38] resulted in an original research article on cellular genetics [153] about a TGFBR2 deficit provoking uncontrolled T-cell activation and maturation, which yields an inflammatory atherosclerotic plaque phenotype during hypercholesterolemia (Table 7). Consequently, we proposed s138010137 as a candidate SNP marker of accelerated atherogenesis (Table 7). Next, in the vicinity of rs138010137, we chose the only unannotated SNP rs1300366819, which can decrease TGFBR2 expression as a candidate SNP marker of atherogenesis acceleration (Table 7).
Finally, there we found another unannotated SNP, rs1310294304, which, on the contrary, can cause overexpression of this gene, as presented in Table 7. Nevertheless, the outcome of our keyword search in PubMed [38] revealed an association of TGFBR2 overexpression with hypertension, which is a well-known atherogenic risk factor [154]. Therefore, we predict that rs1310294304 is a candidate SNP marker of atherogenesis acceleration, as one can see in Table 7.   Human gene FGFR2 encodes fibroblast growth factor receptor 2 (synonyms: keratinocyte growth factor receptor, bacteria-expressed kinase) and carries two clinically proven SNP markers, namely, rs886046768 for FGFR2 overexpression associated with craniosynostosis and rs777650793 for its downregulation linked to bent bone dysplasia [15] (Table 7). Thanks to the PubMed keyword search utility [38], we found an original experimental research article on a murine model of human atherosclerosis, where a synthetic small-molecule FGFR2 antagonist called SSR128129E successfully blocked atherogenesis [155], enabling us to propose rs886046768 and rs777650793 as candidate SNP markers of atherogenesis acceleration and retardation, respectively (Table 7).
Similarly, within promoters of this gene, we found six and three unannotated SNPs increasing and decreasing its expression, respectively, as candidate SNP markers of speeding up atherogenesis (e.g., rs1212347974) and an atherogenesis slowdown (e.g., rs1377663539), which are listed in Table 7. Finally, this sort of atherogenesis acceleration has been successfully retarded using a heparin-derived oligosaccharide in a rat model of the human atherosclerosis [156], as we found after a keyword search in PubMed [38].

Selective In Vitro Validation
After our bioinformatics prediction of 145 candidate SNP markers of atherogenesis near the clinically proven ones (Tables 2-7), we first selectively verified them using an electrophoretic mobility shift assay (EMSA) under nonequilibrium conditions in vitro, as described in the section Materials and Methods. The primary experimental data in vitro on the five selected candidate SNP markers of atherogenesis predicted here-i.e., GCG: rs183433761, LEP: rs201381696, HBB: rs34500389, HBD: rs35518301, and F3: rs563763767-are exemplified by GCG: rs183433761 in Figure 3a,b. Finally, Figure 3c,d present their comparison with the predictions made in this work.
As readers can see in Figure 3c, according to five statistical criteria-i.e., Pearson's linear correlation (r), the Goodman-Kruskal generalized correlation (γ), and Spearman's (R) and Kendall's (τ) rank correlations-our predictions are in significant agreement with the data measured experimentally in terms of absolute values of equilibrium dissociation constant K D of TBP-promoter complexes, which are expressed in natural logarithm units. Independently, for the same five statistical tests, Figure 3d presents significant correlations between our predicted and experimental data, which are expressed here on a relative scale of differences between ancestral and minor alleles of a given SNP.
Considering all these robust correlations together, we can say that in this work, this application of our Web service SNP_TATA_Comparator [45] to study atherosclerosis in humans is valid and useful.

In Silico Validation of Our Predictions Based on Clinical SNP Markers as a Whole
After experimental verification of our predictions, next we compared them (using current dbSNP build No. 151 dated 2017 [12]) with those obtained with the previous build No. 147 of this database dated 2016 (i.e., almost four times fewer SNPs as compared to the current one) [12] to estimate the robustness of our method with respect to the annual growth of the SNP count.
In row 1 of Table 1, readers can see that the genome-wide norm is a greater number SNPs damaging the TBP-sites within human gene promoters (n ≤ 800) than those improving these sites (n ≥ 200). This genome-wide pattern was first noted empirically [62] and then confirmed significantly (p > 0.99, binomial distribution) using ChIP-seq data within the "1000 Genomes" project [63]. This observation allows us to estimate the cumulative effect of all the newly predicted candidate SNP markers at the genome-wide level in comparison with the clinical SNP markers used to predict them, as we have done in many studies [49][50][51]54] to verify in silico the reliability of our predictions.
Row 2 of this table shows the results of biomedical verification of our Web service SNP_TATA_Comparator using the clinically known SNP markers of diseases in TBP-sites that was conducted when the software was just created [45]. As one can see, its predictions fit the genome-wide norm shown by row 1 of Table 1, which means neutral drift of the majority of SNPs causing human diseases, in agreement with both Haldane's dilemma [25] and neutral evolution theory [24]. Row 3 here presents the results of our previous article on the atherosclerosis-related candidate SNP markers identified using one of the previous builds (No. 147) of dbSNP [12]. These data seem to follow the same pattern, indeed. Finally, as presented in row 4 of Table 1, our current predictions (made here by means of current build No. 151 of dbSNP [12]) reproduce the same pattern. This observation proves the robustness of our approach with respect to the annual growth of the SNP count.
Row 3 here presents the results of our previous article on the atherosclerosis-related candidate SNP markers identified using one of the previous builds (No. 147) of dbSNP [12]. These data seem to follow the same pattern, indeed. Finally, as presented in row 4 of Table 1, our current predictions (made here by means of current build No. 151 of dbSNP [12]) reproduce the same pattern. This observation proves the robustness of our approach with respect to the annual growth of the SNP count. As readers can see, the three rightmost columns of rows 1-4 of Table 1 present the numbers of candidate SNP markers accelerating (n↑ = 91) and slowing (n↓ = 54) atherogenesis as well as an estimate of the statistical significance of their difference from one another according to binomial As readers can see, the three rightmost columns of rows 1-4 of Table 1 present the numbers of candidate SNP markers accelerating (n↑ = 91) and slowing (n↓ = 54) atherogenesis as well as an estimate of the statistical significance of their difference from one another according to binomial distribution (p < 0.01). Consequently, readers can see reliable predominance of candidate SNP markers of atherogenesis acceleration over those of its deceleration, which means that the above-mentioned neutral drift of this kind of human SNPs supports atherogenesis acceleration. This finding seems to be in line with one of the well-known gerontological health care strategies when in utero, in childhood, adolescence, and reproductive adulthood, the human body can successfully respond to deadly stressors (e.g., acute infection) by subthreshold slowly progressing pathologies (e.g., atherogenesis), with clinically over-threshold manifestations (e.g., stroke) seen only in the very old [157]. It is important that this observation is statistically significant here, whereas in our previous article [55], there was only a tendency for this pattern (i.e., n↑ =19, n↓ =16, p > 0.33). Therefore, the fourfold increase in the SNP count in the current build No. 151 of dbSNP [12] versus its previous build No. 147 allows us to see a new genome-wide pattern.

Human Genes Related to Mitochondrial Genome Integrity and Instability of the Atherosclerotic Plaque
Following both in vitro and in silico verification of our predictions, we finally applied our newly validated Web service to atherosclerosis-related research to investigate several human genes associated with the maintenance of mitochondrial genome integrity [56], which are currently researched widely to slow atherogenesis.
Human gene POLG encodes a catalytic subunit of DNA polymerase γ (hereinafter: without any clinically proven SNP markers of human diseases within TBP-sites of its promoters) and carries six and 23 unannotated SNPs corresponding to under-and overexpression of this gene (Table 8). Using the PubMed keyword search utility [38], we found an original research article [158] on POLG deficiency as a physiological marker of an elevated level of mitochondrial DNA damage, which accelerates atherogenesis and vice versa. That is why we proposed six and 23 candidate SNP markers of speeding up atherogenesis (e.g., rs1266453407:g) and an atherogenesis slowdown (e.g., rs776506626, see Figure 2e), respectively, as one can see in Table 8. Finally, in this case, due to a PubMed search [38], we learned that this sort of atherogenesis acceleration could be prevented using antioxidant MitoQ [159], whereas the aforementioned type of atherogenesis retardation can be attenuated by both a fructose-rich diet [160] and alcohol without green tea [161].
Human gene TFAM (mitochondrial transcription factor A) carries seven unannotated SNPs reducing concentrations of this regulatory protein and as many unannotated SNPs upregulating it (Table 8). According to PubMed [38], in a murine model of a human disease, an adipose-tissue-specific deletion of TFAM is atheroprotective [164], whereas a TFAM excess is proatherogenic in a rat model of human atherosclerosis [165]. In addition, nutritionists have reported that chronic alcohol consumption upregulates TFAM as a proatherogenic risk factor [166]. All these independent observations allow us to suggest 14 candidate SNP markers of retarded (e.g., rs1349790536) and sped up (e.g., rs943871999) atherogenesis, which are listed in Table 8.
Human gene ATM has 14 and 16 unannotated SNPs, which can respectively decrease and increase the amount of the ATM serine/threonine kinase encoded by this gene (Table 8). Our keyword search through PubMed [38] yielded biomedical data on this enzyme's excess as an atheroprotector [167], and there is an ATM-deficient murine model of human atherosclerosis [168][169][170][171][172]. With this in mind, we propose 14 and 16 candidate SNP markers enhancing (e.g., rs773550815:g) and weakening (e.g., rs773550815:t) atherogenesis (Table 8). In particular, this time using PubMed [38], we found nutritionists' report that postprandial tea is an atheroprotector in contrast to proatherogenic effects of postprandial coffee [169] (Table 8).
Looking through Table 8, we did not find any neutral drift (row 5 of Table 1: n < = 29 and n > = 48 at p < 0.02) or predominance of candidate SNP markers accelerating atherogenesis (row 5 of Table 1: n ↑ = 27 and n ↓ = 50 at p < 0.01). By contrast, these patterns were significant in the case of local surroundings of clinical SNP markers of human diseases (Tables 1-7).
As an independent test of this discrepancy, we next similarly analyzed the same number of miRNA genes involved in another gene network of instability of the atherosclerotic plaque [57]. Let us consider what this test yielded (see Table 9).
Human gene MIR10B contains one unannotated SNP, rs1388274194, able to elevate the miR-10b level (Figure 2f), and another one, rs564940769, which can reduce it (Table 9). Next, a keyword search of PubMed [38] pointed to a biochemical research article [173] about an miR-10b deficit as an atheroprotector. That is why we recommend two candidate SNP markers (rs1388274194 and rs564940769) of speeding up and slowing down atherogenesis, respectively (Table 9).
Finally, in this case, in the PubMed database, we also learned that this sort of atherogenesis acceleration could be prevented using dietary supplements of berberine [174].
Human gene MIR21 carries four unannotated SNPs of miR-21 downregulation, which is atheroprotective [175] according to PubMed [38]. This result allows us to suggest all of them as candidate SNP markers of retarded atherogenesis (e.g., rs752908264), as shown in Table 9.
Human gene MIR143 carries four unannotated SNPs, which can cause its overexpression, whereas miR-143 upregulation in cells overloaded with cholesterol prevents their transformation into foam cells thereby slowing down atherogenesis [176][177][178], in line with our outcome of a keyword search in PubMed [38], as summed up in Table 9. On this basis, we predict four candidate SNP markers of an atherogenesis delay (e.g., rs369969688, Table 9). In addition, within the promoter of this gene, we recognized three unannotated SNPs (rs1369382070, rs568314295, and rs1033081876) causing miR-143 downregulation, which is a physiological marker of atherogenesis progression [179]. Thus, we classified them as candidate SNP markers of atherogenesis acceleration (Table 9).
Human gene MIR145 contains one unannotated SNP, rs909856793, decreasing the miR-145 amount, and two unannotated SNPs (rs746241408 and rs778670319), which can elevate the miR-145 level, as readers can see in Table 9. Because of our keyword search in PubMed [38] resulted in a clinical cohort-based report [180] on an miR-145 excess as an atheroprotector and vice versa, we predicted rs909856793 to be a candidate SNP marker of speeding up atherogenesis and two candidate SNP markers (rs746241408 and rs778670319) of its slowdown (Table 9). Row 6 of Table 1 summarizes our predictions here for the core promoters of the miRNA genes involved in the gene network of instability of the atherosclerotic plaque [57] (Table 9) in comparison with the protein-coding genes from the same gene network ( Table 8). As readers can see, there is no neutral drift (row 6 of Table 1: n < = 5 and n > = 11 at p < 0.05) and no predominance of candidate SNP markers accelerating atherogenesis (row 6 of Table 1: n↑ = 2 and n↓ = 14 at p < 0.01). This means that the two independent predictions are in agreement that natural selection against underexpression of the human genes dealing with mitochondrial genome integrity [56] and atherosclerotic plaque instability [57] supports an atherogenesis slowdown.

DNA Sequences
We retrieved SNPs from the dbSNP database [12] and DNA sequences from the Ensembl database [11] in reference human genome assembly GRCh38/hg38 via the UCSC Genome Browser [13].

Analysis of DNA Sequences
We analyzed SNPs within DNA sequences using our previously created public Web service SNP_TATA_Comparator [45], which implements our model of three-step TBP-promoter binding, as described in depth within Supplementary Materials [181][182][183] (i.e., Section S1 "Supplementary DNA sequence analysis").

Keyword Searches in the PubMed Database
For each candidate SNP marker predicted in this work, we performed a standard keyword search in the PubMed database [38] as illustrated in Figure 4.

DNA Sequences
We retrieved SNPs from the dbSNP database [12] and DNA sequences from the Ensembl database [11] in reference human genome assembly GRCh38/hg38 via the UCSC Genome Browser [13].

Analysis of DNA Sequences
We analyzed SNPs within DNA sequences using our previously created public Web service SNP_TATA_Comparator [45], which implements our model of three-step TBP-promoter binding, as described in depth within Supplementary Materials [181][182][183] (i.e., Section S1 "Supplementary DNA sequence analysis").

Keyword Searches in the PubMed Database
For each candidate SNP marker predicted in this work, we performed a standard keyword search in the PubMed database [38] as illustrated in Figure 4.

In Vitro Measurements
For each of the five chosen candidate SNP markers of atherogenesis predicted here-GCG: rs183433761, LEP: rs201381696, HBB: rs34500389, HBD: rs35518301, and F3: rs563763767-using an electrophoretic mobility shift assay (EMSA), we experimentally measured absolute values of equilibrium dissociation constant K D of TBP-promoter complexes, as described in detail within Supplementary Materials [184] (i.e., Section S2 "Supplementary in vitro measurement").

Statistical Analysis
A comparison of our predictions with the experimental values of equilibrium dissociation constant K D of TBP-promoter complexes was conducted using two options, "Multiple Regression" and "Nonparametrics," in a standard toolbox, STATISTICA (Statsoft TM , Tulsa, USA).

Conclusions
Because TBP-binding regions are the best-studied regulatory sequences within the human genome [21], here, using SNP_TATA_Comparator [45], we analyzed 1189 SNPs within these regions and as a result predicted 237 candidate SNP markers of atherosclerosis in addition to the only clinically proven SNP marker of this disorder (Tables 2-9) as shown in row 7 of Table 1.
This result increases targetability fivefold, and thus reduces the costs as compared to the traditional random heuristic selection of candidate SNP markers for clinical testing in patients and conventionally healthy volunteers (e.g., see [22]). Given that the regulatory and protein-coding regions of different transcripts can overlap with each other, any given SNP can simultaneously be a missense mutation, an intronic or 5 -or 3 -untranslated variant, or some other variant (e.g., rs370819229 and rs777650793). Thus, further stratification of the list of 238 candidate SNP markers predicted here is possible thanks to a number of public Web services (for review, see [18]).
Those atherosclerosis-related candidate SNP markers that can survive clinical examination-on cohorts of patients and healthy volunteers as whole-genome landmarks of this disorder-may become useful for physicians (may help to optimize treatment of a patient according to his/her individual sequenced genome) as well as for the general population (may help to choose a lifestyle slowing the inevitable atherogenesis).
Additionally, when the dbSNP database [12] includes the occurrence rates of minor alleles of SNPs in various human races and large ethnic groups (e.g., Slavs), in addition to such data already present in dbSNP for narrow aboriginal ethnic groups (e.g., Scotland scotlades) as migration-related SNP markers, the list of candidate SNP markers predicted by SNP_TATA_Comparator [45] can be stratified in this regard.
Unexpectedly, summing up all the above results, here we for the first time detected two statistically significant patterns in the genome: (1) neutral drift that leads to accelerated atherogenesis and (2) the direction of natural selection that impedes atherogenesis. Therefore, their superposition on each other stabilizes the total effect and thus establishes the norm of the trait of atherogenesis development.
Finally, it should be noted that the atherogenesis rate is a trait that can be controlled via adjustment of the lifestyle (e.g., diet), as unambiguously demonstrated in Tables 2-9.