A Gene-Set Enrichment and Protein–Protein Interaction Network-Based GWAS with Regulatory SNPs Identifies Candidate Genes and Pathways Associated with Carcass Traits in Hanwoo Cattle

Non-synonymous SNPs and protein coding SNPs within the promoter region of genes (regulatory SNPs) might have a significant effect on carcass traits. Imputed sequence level data of 10,215 Hanwoo bulls, annotated and filtered to include only regulatory SNPs (450,062 SNPs), were used in a genome-wide association study (GWAS) to identify loci associated with backfat thickness (BFT), carcass weight (CWT), eye muscle area (EMA), and marbling score (MS). A total of 15, 176, and 1 SNPs were found to be significantly associated (p < 1.11 × 10−7) with BFT, CWT, and EMA, respectively. The significant loci were BTA4 (CWT), BTA6 (CWT), BTA14 (CWT and EMA), and BTA19 (BFT). BayesR estimated that 1.1%~1.9% of the SNPs contributed to more than 0.01% of the phenotypic variance. So, the GWAS was complemented by a gene-set enrichment (GSEA) and protein–protein interaction network (PPIN) analysis in identifying the pathways affecting carcass traits. At p < 0.005 (~2,261 SNPs), 25 GO and 18 KEGG categories, including calcium signaling, cell proliferation, and folate biosynthesis, were found to be enriched through GSEA. The PPIN analysis showed enrichment for 81 candidate genes involved in various pathways, including the PI3K-AKT, calcium, and FoxO signaling pathways. Our finding provides insight into the effects of regulatory SNPs on carcass traits.


Introduction
Hanwoo (Bos taurus coreanae) is the indigenous premium beef cattle of South Korea. It has been intensively selected for higher meat and carcass quality for the last four decades [1]. It is well known for its extensive marbling, texture, flavor, and juiciness, making it the most economically important species in Korea [2,3]. In spite of its premium price, due to its superior meat quality, Hanwoo beef is very popular amongst both Korean and foreign consumers. The breeding value and the genetic worth of Hanwoo is estimated based on the marbling score (MS), carcass weight (CWT), backfat thickness (BFT), concentrate and rice straw and were slaughtered between 17 and 62 months of age. The carcass and meat quality traits analyzed in this study were carcass weight (CWT), backfat thickness (BFT), eye muscle area (EMA), and marbling score (MS). Traits were measured as described by Bhuiyan et al. [29]. A descriptive statistical summary of the phenotypes are given in Table 1.

SNP Genotyping, Imputation, and Filtering for Regulatory SNPs
Genomic DNA was isolated from 10,215 Hanwoo tissue samples using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA, USA). After checking for quality and quantity on a Nanodrop 1000 (Thermo Fisher Scientific, Wilmington, DE, USA), the samples were genotyped on a Customized Hanwoo SNP50 BeadChip (58,990 SNPs). Only autosomes were included in the analysis. The genotypes were then phased using Eagle v.2.3.2 [30] and then imputed, one chromosome at a time, using Minimac3 [31], to a high-density level, using a reference population consisting of 480 animals genotyped with Bovine HD BeadChip (777,962 SNPs). They were then imputed to the sequence level one chromosome at a time, using whole genome sequence data of a reference population of 311 progeny tested Hanwoo bulls [1], resulting in a total of 27,980,473 SNPs; the imputation pipeline followed for imputing from 770K to the sequence level was the same as the one used for 50K to HD imputation. Following a previous study [32], only SNPs with Minimac3 imputation quality statistic (R 2 ) higher than 0.4 were retained for further analysis, resulting in a total of 12,980,473 SNPs. The overall average imputation accuracy, post quality control (R 2 ) was 0.76, which was similar to a previously reported study [33].
The physical position of the imputed SNPs was determined using the bovine genome assembly UMD3.1 [34] and the SNPs were annotated with SnpEff version 4.3 [35]. The SNPs were then filtered with SnpSift [36] to include only non-synonymous (missense) SNPs and protein coding SNPs within 5-kb upstream of a gene. These SNPs were considered as regulatory SNPs due to their effect on the protein structure and gene expression [26,37]. The SNPs were further filtered for quality with PLINK v.1.9 software [38] under the following criteria: minor allele frequency (MAF) < 0.01, genotype call rate < 0.10, individual call rate < 0.10, and Hardy-Weinberg equilibrium <0.000001. This resulted in a final set of 450,062 SNPs and 9693 animals for further analysis. All relevant data generated in this study are available within the paper or in the supplementary files. The SNP file generated in this study is freely available for download from National Agricultural Biotechnology information center (www.nabic.rda.go.kr) under the accession no NV-0622.
The genetic contribution of SNPs was estimated using a Bayesian mixture model implemented in BayesR software [41] (https://github.com/syntheke/bayesR) that fitted all markers simultaneously with four posterior distributions for each marker. The method assumes the SNPs in the mixture model to be normally distributed and that the SNP effects are derived from a combination of four distributions with effect size proportions between 0 and 1 (i.e., explaining 0, 0.01, 0.1, and 1% of the genetic variance). For mixing, a chain length of 50,000 samples was used with the first 20,000 samples being discarded as burn-in [42].

Gene-Set Enrichment Analysis and Protein-Protein Interaction Analysis
The analysis was performed following the method described by Dadousis et al. [13,21]; briefly, for each trait a nominal p < 0.005 was used to filter for SNPs from the GWAS analysis. Using the SNP ID's, gene names assigned to SNPs were filtered from the VCF file that was previously annotated with SnpEff version 4.3 [35]. The genes were then assigned to functional categories using the Gene Ontology (GO) [43] database under biological process and molecular function categories, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) [44] pathway database. The gene-set enrichment analysis was performed using goseq R package [45]. A Fisher's exact test was performed to test for overrepresentation of the genes in each function/pathway (gene-sets) and a false discovery rate (FDR) correction (5%) was applied to account for multiple testing. The background for the GSEA analysis was all the genes to which the SNPs used in the GWAS was annotated to (14,267 genes).
Finally, a protein-protein interaction network (PPIN) analysis was performed using the STRING database [46]. The PPIN analysis was limited to high confidence interactions with scores > 0.99; the network was clustered with an MCL with the default inflation parameter. The KEGG pathway enrichment of the PPI network was carried out using ClueGO v.2.5.5 [47].

Results and Discussion
Regulatory SNPs that control gene expression and genetic variants that cause protein coding changes can contribute to phenotypic variation. However, since complex traits are controlled due the additive genetic effects of a large number of genes that have small effects, several of these SNPs fail to reach the stringent thresholds adopted in GWAS to control for multiple testing. Therefore in this study, we performed a regulatory SNP GWAS (promoter and non-synonymous protein coding SNPs) and complemented it with GSEA and PPIN analysis to understand the genetic contribution and regulatory role of these SNPs on four carcass traits (BFT, CWT, EMA, and MS) in Hanwoo cattle.

Phenotypes and Genomic Heritability Estimates
The carcass traits of the sample were normally distributed ( Table 1). The mean values of CWT, BFT, EMA, and MS measured at a similar slaughter age were consistent with those of Roh et al. [48] but differed marginally with [49]. Several previous studies have reported lower estimates for these traits [29,50], which could be attributed to differences in the number of animals investigated or nutritional status during finishing period or due to differences in the methods employed for measuring the phenotypes. The lower phenotypic CV for CWT and EMA (12% and 13%) and higher estimates for BFT and MS (35% and 31%), are consistent with previous studies [29,51]. The Genetic and phenotypic correlation between traits are given in Figure 1; the phenotypic correlation among traits was stronger than their respective genetic correlation. CWT was genetically and phenotypically positively correlated with BFT, EMA, and MS, with the strongest correlation between CWT and EMA, whereas BFT had a low phenotypic correlation with EMA and MS, while it had a low negative genetic correlation with EMA. MS was highly positively correlated with EMA, both phenotypically and genetically. The high positive correlation between CWT and EMA, both genetically and phenotypically, is consistent with previous reports in Hanwoo [29,50,52,53]; similar correlations have been reported in Brahman and Japanese Brown cattle [54,55]. Kim et al. [51] had also reported a medium negative correlation between EMA and BFT. The genetic and phenotypic correlation estimated between MS and EMA was higher than what was reported previously by Roh et al. and Son et al. [48,56] but consistent with the estimates of Hwang et al. [49]. These results suggest that a selection based on CWT will have a low to medium improvement in BFT, EMA, and MS, while a selection based on BFT or MS will have minimal improvement on BFT.
Genes 2019, 10, x FOR PEER REVIEW 5 of 25 nutritional status during finishing period or due to differences in the methods employed for measuring the phenotypes. The lower phenotypic CV for CWT and EMA (12% and 13%) and higher estimates for BFT and MS (35% and 31%), are consistent with previous studies [29,51]. The Genetic and phenotypic correlation between traits are given in Figure 1; the phenotypic correlation among traits was stronger than their respective genetic correlation. CWT was genetically and phenotypically positively correlated with BFT, EMA, and MS, with the strongest correlation between CWT and EMA, whereas BFT had a low phenotypic correlation with EMA and MS, while it had a low negative genetic correlation with EMA. MS was highly positively correlated with EMA, both phenotypically and genetically. The high positive correlation between CWT and EMA, both genetically and phenotypically, is consistent with previous reports in Hanwoo [29,50,52,53]; similar correlations have been reported in Brahman and Japanese Brown cattle [54,55]. Kim et al. [51] had also reported a medium negative correlation between EMA and BFT. The genetic and phenotypic correlation estimated between MS and EMA was higher than what was reported previously by Roh et al. and Son et al. [48,56] but consistent with the estimates of Hwang et al. [49]. These results suggest that a selection based on CWT will have a low to medium improvement in BFT, EMA, and MS, while a selection based on BFT or MS will have minimal improvement on BFT. The proportions of genomic variance attributed to the regulatory SNPs were found to be 0.29, 0.25, 0.29, and 0.38 for BFT, CWT, EMA, and MS, respectively (Table 1). This is between 44% and 77% of the heritability estimated in a previous study using 11.2 million SNPs from across the genome of Hanwoo [1], suggesting that the regulatory SNPs might have a large effect on the traits evaluated in this study. This is consistent with previous reports that had reported that the missense variants (non-synonymous variants) are able to capture a large proportion of the genetic variance in beef and dairy cattle [12,57]. The genic region, i.e., the protein coding regions, were also able to capture more genetic variances than other regions for complex traits in humans [58].

Genome-Wide Association Study.
The GWAS was performed with 450,062 regulatory SNPs (protein coding, promoter SNPs, and non_synonymous SNPs) to find regions associated with the four traits studied. The mixed linear model-based GWAS revealed that 15, 176, and 1 SNPs were significantly associated with BFT, CWT, and EMA, respectively (Figure 2, Supplementary Table S1). The significantly associated SNPs for The proportions of genomic variance attributed to the regulatory SNPs were found to be 0.29, 0.25, 0.29, and 0.38 for BFT, CWT, EMA, and MS, respectively (Table 1). This is between 44% and 77% of the heritability estimated in a previous study using 11.2 million SNPs from across the genome of Hanwoo [1], suggesting that the regulatory SNPs might have a large effect on the traits evaluated in this study. This is consistent with previous reports that had reported that the missense variants (non-synonymous variants) are able to capture a large proportion of the genetic variance in beef and dairy cattle [12,57]. The genic region, i.e., the protein coding regions, were also able to capture more genetic variances than other regions for complex traits in humans [58].

Genome-Wide Association Study
The GWAS was performed with 450,062 regulatory SNPs (protein coding, promoter SNPs, and non_synonymous SNPs) to find regions associated with the four traits studied. The mixed linear model-based GWAS revealed that 15, 176, and 1 SNPs were significantly associated with BFT, CWT, and EMA, respectively (Figure 2, Supplementary Table S1). The significantly associated SNPs for BFT ware located on BTA19, BTA4, BTA6, BTA14, and BTA19 for CWT; and for EMA on BTA 14. No significant loci were detected for MS. These significant regions were harbored by 4 (BFT), 115 (CWT), 7 (EMA), and 2 (MS) genes ( Table 2). The most significant SNPs were rs109467607 (BFT), rs210030313 (CWT), and rs210030313 (EMA) located in NOG (Noggin) and CHCHD7 (Coiled-coil-helix-coiled-coil-helix Domain Containing 7) (CWT and EMA) ( Figure 2, Table 2). The majority of the significantly associated SNPs for BFT were located on a 3.933 Kb region on BTA19 spanning the NOG gene for BFT; all the 15 significant SNPs on this gene were promoter SNPs.   The most significantly associated SNPs for CWT was located in a 42.01 Mb region on BTA14, which included genes such as CHCHD7, UBXN2b (UBX Domain Protein 2B), C8orf34 (Chromosome 8 Open Reading Frame 34), and TRIM55 (Tripartite Motif Containing 55).
A previous GWAS with 50K and 777K data have revealed major QTL for CWT on BTA 14, BTA4, and BTA6 in Hanwoo [1,8]; here we also detected significant QTLs on BTA17 and BTA19. Among the most significant SNPs for CWT on BTA14 were variants located in CHCHD7, UBXN2b, C8or34, FAM184B, TRIM55, POLR2K, CYP7A1, SDCBP, PRKDC, TOX, and PLAG1. Variants around PLAG1, CHCHD7, UBXN2B, FAM184B, and TOX have been previously reported to be associated with CWT [8,59], stature [60], live weight [61], reproductive traits [62], and puberty [63] in Hanwoo, Japanese black cattle, Nellore cattle, and Brahman cattle. A few of these loci were also associated with EMA ( Figure 3). Most of these variants were all 5 upstream promoter variants, and the large number of variants over several genes suggests a synergistic effect for the major QTL on BTA14 for CWT in Hanwoo, confirming the findings of Bhuiyan et al. [1]. Among the significantly associated SNPs for CWT, 9 SNPs were non _synonymous SNPs; these were located on TBC1D31 (TBC1 Domain Family Member 31), SPIDR (Scaffold Protein Involved in DNA Repair), PRKDC (Protein Kinase, DNA activated Catalytic Subunit), DNAJC5B (DNAJ Heat Shock Protein Family (Hsp40) member C5 β), CRH (Corticotropin Releasing Hormone), ADHFE1 (Alcohol Dehydrogenase Iron Containing 1), and NCAPG (Non-SMC Condensin I complex Subunit G). These included rs449968016 and rs41726906 on BTA14 and rs109570900 on BTA6. NCAPG, which is involved in chromatin condensation [64], has been found to be associated with CWT and body frame size [65,66]. Mutations in this gene has been implicated with cattle growth in three cattle populations [65,67]. Previous studies had not detected any significant QTL for BFT in Hanwoo [1,8]; however, we detected a significant QTL on BTA19, nearby the NOG (Noggin) gene, which induces stem cell adipogenesis [68] and was found to be associated with meat quality traits in Nellore cattle [69].

Contribution of Genomic Variants
The SNP effects was estimated using BayesR, to understand the genetic architecture and the proportion of variance explained the SNPs in each of the four distributions (with the variance 0* σ 2 A , 10 −4 * σ 2 A , 10 −3 * σ 2 A , and 10 −2 * σ 2 A ). The results are given in Tables 2 and 3. The SNPs that had the largest effects for the traits analyzed were located on BTA3 and BTA19 (BFT), BTA6 and BTA14 (CWT), BTA14 (EMA), and BTA 11 (MS) (Supplementary Figure S1). The effects were small, with~98% of the analyzed SNPs having close to zero effects, with the rest having different degrees of genetic contribution to the traits studied. The percentage of SNPs that had the largest effect (10 −3 * σ 2 A and 10 −2 * σ 2 A ) varied between 0.001%-0.04% but they explained only 16.06%-50.83% of the total genetic variance. The effect size estimated are in agreement with previous reports [1,42,70] that reported a large proportion of SNPs contributing a zero to close to zero effect, and the infinitesimal theory. The number of SNPs having a large effect size varied between 5050 (CWT) and 8524 (MS), indicating the polygenic nature of these traits. The effect sizes of the top markers are given in Supplementary Table S2. The SNPs with the largest effect size for the traits analyzed were rs109974824 on BTA19 for BFT, rs109062047 on BTA6 for CWT, rs210030313 on BTA14 for EMA, and rs136161587 on BTA11 for MS.

Gene-Set Enrichment and Protein-Protein Interaction Network Analysis Analyses
Though several regulatory SNPs were found to be significant for CWT, very few SNPs reached the significance threshold for other traits. Therefore, we decided to supplement the GWAS analysis with GSEA and PPIN analyses. Out of the 450,062 SNPs tested in GWAS, 348,088 were located in annotated genes or within 5 Kb windows upstream of the genes. In total, 14,267 background genes were annotated (Figure 1) in the Bos taurus UMD3.1 assembly. A total of 2657, 3064, 2261, and 2362 SNPs had a nominal b < 0.005 (Table 4) for BFT, CWT, EMA, and MS, respectively. They were mapped to 759, 731, 626, and 628 genes (Supplementary Table S2). GSEA showed that 25 GO and 18 KEGG terms were significantly enriched (Table 5). Genes involved in positive regulation of transcription from the RNA polymerase II promoter, Neuron projection development, Phospholipase activity, Extracellular matrix binding, and calcium ion binding ABC transporters were enriched amongst BFT associated SNPs, while regulation of cell proliferation, cell adhesion, the PI3K-Akt signaling pathway, Calcium signaling pathway, and cell cycle were enriched among genes harboring SNPs associated with CWT. Valine, leucine and isoleucine degradation, folate biosynthesis, glycerophospholipid metabolism, choline metabolism, and the insulin receptor signaling pathways were enriched amongst genes harboring SNPs associated with MS. For EMA, the associated SNP-bearing genes for cell cycle, the p53 signaling pathway, cell adhesion molecules, cell adhesion, and blood vessel development were enriched. Since a large number of promoter SNPs were used in this study, we performed a PPIN (protein-protein interaction network) analysis using all the genes previously used for GSEA, to identify significant SNP harboring genes that physically interact ("Functional epistasis"). Sixty-five genes were found to interact (Figure 3a) through the PPIN analysis. These three clusters were found to have a high degree of interaction. Cluster 1 (Figure 3a . Genes in this pathway have been previously found to be associated with growth and carcass traits in cattle [71]. Ubiquitin-mediated proteolysis ensures cell survival through protein turnover, and ubiquitination is also essential for signal transduction, endocytosis, and chromatin rearrangement and repair [72,73]. Functional enrichment analysis of the genes in the cluster showed that they were part of the calcium signaling, spliceosome, and ubiquitin-mediated proteolysis pathways. Genes belonging to the TGF-β signaling pathway, pathways in cancer, PI3KT signaling pathway, and ECM receptor interaction pathways were also enriched ( Figure 3b).

Calcium Signaling Pathway
GSEA and PPIN revealed that calcium-related processes such as calcium ion binding, cellular calcium ion homeostasis, and the calcium signaling pathways were amongst the enriched terms. In total, 109 calcium-related genes were part of the gene set (Table 4, Figure 3). Calcium plays an important role in meat tenderization, feed efficiency, and muscle contraction, and several genes involved in calcium-related processes were also found to affect meat quality in Angus cattle [74,75]. Calcium signaling is also key for regulating muscle growth is beef cattle [76]. Moreover, the calpain/calpastatin system, which is a key regulator of meat tenderness and is associated with carcass and meat quality traits such are tenderness, flavor and juiciness, and marbling score, [77][78][79][80][81] is calcium dependent. Some key calcium signaling pathway genes enriched were AVPR1A (Arginine vasopressin Receptor 1A), CCKAR (Cholecystokinin A), CHRM2 (Cholinergic Receptor Muscarinic 2), EDNRB (Endothelin Receptor Type B), GHRL (Ghrelin and Obestatin Prepropeptide), PROKR1 (Prokineticin Receptor 1), and NMS (Neuromedin-S). Polymorphisms in several of these genes were found to be associated with meat quality and productivity traits [82][83][84].

ECM Receptor Interaction, PI3K-Akt Signaling, and Pathways in Cancer
The extra cellular matrix (ECM) is critical for tissue architecture and is involved in adipogenesis [85]. ECM comprises of a mixture of macromolecules, including glycosaminoglycans and fibrous proteins such as lammin, elastin, collagen, and fibronectin [85]. Several ECM-related terms, such as cell adhesion and extra cellular matrix interaction, were also enriched ( Table 4). ECM receptor interaction has been previously implicated in adipogenesis and meat tenderness and was found to be upregulated in subcutaneous fat and intramuscular fat [86]. The PI3K-Akt signaling pathway plays a central role in controlling skeletal muscle mass and metabolism by increasing protein synthesis together with inhibition of protein degradation [87,88]. Members of the PI3K-Akt signaling pathway were enriched amongst cattle with a larger eye muscle area and also affected their intramuscular fatty acid content [89,90]. Several genes that function in cellular proliferation and cell division were part of pathways in cancer, including GHRL (Ghrelin and obestatin Prepropeptide), polymorphisms which are associated with growth and economically important traits in beef cattle [91,92].

Other Important Pathways and Terms Enriched
Several other important pathways were enriched. These included the insulin receptor signaling pathway, glycerophospohlipid metabolism, choline metabolism, and cell cycle. Insulin has a critical effect on adipogenesis [99]. The genes enriched included FOXO1, a forkhead box transcription factor, which plays an important role in energy metabolism, stress resistance, apoptosis, and cell cycle arrest; polymorphisms in this gene are associated with growth traits in Qinchuan cattle [100].
Glycerophospholipid are a major class of complex lipid with an esterified glycerol backbone, two fatty acids, and a polar head group [101]. Glycerophospholipid metabolism regulates beef fatty acid content and affects beef taste. GPD2 (glycerol-3-phophate dehydrogenase 2), which is involved in glyerophospholipid metabolism, was previously found to be associated with marbling score in Hanwoo cattle [93,102]. Choline is an essential nutrient that improves lipogenesis [103]. PLCG1 (Phospholipase C, gama 1), a gene involved in choline metabolism, was found to be significantly associated with carcass traits in Hanwoo [104].

Conclusions
The pathways and genes identified in this study enrich our standing of the molecular mechanisms underlying complex traits in Hanwoo. The candidate SNPs identified to be associated with the evaluated traits will help in breeding Hanwoo cattle with superior carcass traits. Our result shows that the regulatory SNPs are able to capture a large proportion of the total genetic variation. The genes in the associated pathways identified in this study, such as calcium signaling, ECM receptor signaling, PI3K-Akt signaling, regulation of cell proliferation, insulin signaling, glycerophospholipid, and choline metabolism, might be good candidates for identifying markers that might be associated with carcass traits in cattle. Integrating gene expression data along with the regulatory SNPs used in this study might help in identifying genes and SNPs that have a significant effect on carcass traits.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/3/316/s1, Figure S1: Manhattan plot with estimated genetic variation explained by individual SNPs for BFT, CWT, EMA and MS. Combination of normally distributed variance ranging between 0 to 1% were used for estimating SNP effects using BayesR. Table S1: List of significant SNPs (p < 1 × 10 −5 ) associated with BFT, CWT, EMA and MS, Table S2: List of all SNPs (p < 0.005) associated with BFT, CWT, EMA and MS that were used for GSEA and pathway analysis.