Screening the Reference Genes for Quantitative Gene Expression by RT-qPCR During SE Initial Dedifferentiation in Four Gossypium hirsutum Cultivars that Have Different SE Capability

RNA sequencing (RNA-Seq)-based gene expression analysis is applicable to a wide range of biological purposes in various species. Reverse transcription quantitative PCR (RT-qPCR) is also used to assess target gene expression utilizing stably expressed reference genes as internal control under a given set of conditions. However, investigations of the reference genes for RT-qPCR normalization in the process of somatic embryogenesis (SE) initial dedifferentiation in Gossypium hirsutum are rarely reported. In this study, on the basis of our previous transcriptome data of three different induction stages during SE initial dedifferentiation process in four G. hirsutum cultivars that have different SE capability, 15 candidate genes were selected during SE initial dedifferentiation process, and their expression stability was evaluated by geNorm, NormFinder, and BestKeeper. The results indicated that the two genes of endonuclease 4 (ENDO4) and 18S ribosomal RNA (18S rRNA) showed stable expression in the four different G. hirsutum cultivars, endowing them to be appropriate reference genes during three induction stages in the four cotton cultivars. In addition, the stability and reliability of the two reference genes of ENDO4 and 18S rRNA were further verified by comparing the expressions of auxin-responsive protein 22 (AUX22) and ethylene-responsive transcription factor 17 (ERF17) between RT-qPCR results and the RNA-seq data, which showed strong positive correlation coefficient (R2 = 0.8396–0.9984), validating again the steady expression of ENDO4 and 18S rRNA as the reliable reference genes. Our results provide effective reference genes for RT-qPCR normalization during SE process in different G. hirsutum cultivars.


Introduction
Gene expression level analysis is crucial in many fields of biological research [1]. Currently, RNA sequencing (RNA-Seq) has become the prevalent method used to analyze the transcriptional gene expression levels in various species [2,3]. Reverse transcription quantitative PCR (RT-qPCR) is the widely applied method to quantify gene expression and to validate transcriptomic data [4,5] for its prominent advantages of high sensitivity, reproducibility, and specificity [6]. However, inaccurate quantification and low-quality RNA may lead to a decrease in RT-qPCR specificity [7]. Thus, it is necessary and crucial to screen stably expressed reference genes to normalize target gene expression level under a given set of conditions. Reliable reference genes are usually selected from the stably

Total RNA Extraction and cDNA Synthesis
Total RNA extracted from different cotton tissues were subjected to check the purity using a NanoPhotometer spectrophotometer (IMPLEN, Calabasas, CA, USA), with the concentration and integrity measured by a Qubit®2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and the Bioanalyzer 2100 system (Agilent Technologies, Carlsbad, CA, USA). The cDNA was synthesized from each RNA sample using the PurelinkTM RNA Mini Kit (Life Technologies, Carlsbad, CA, USA) and the PrimerScript RT reagent Kit (Takara, Shiga, Japan) following the manufacturer's protocols. The detailed steps were established according to our previously described method [23].

Selection of Reference Genes and Design of Primers
We performed high throughput RNA-Seq on four cultivars of G. hirsutum at three different induction stages (0h, 3h, and 3d). The four cultivars have different SE differentiation rate. Hypocotyl of 6-d sterile seedlings was cut into 1 cm segments with successive induction at 0 h, 3 h, and 3 d on callus-induction medium (MS medium plus B5 vitamins, supplemented with 0.05 mg/L IAA, 0.05 mg/L kinetin, 0.05 mg/L 2,4-D, pH 5.8) [23]. To evaluate the gene expression stability, q-value≥0.05, FPKM≥5, and |log2FoldChange|<1 were used as the criteria for screening reference genes at all sampling points [24], and 15 reference genes were selected. Gene-specific primers were designed based on the sequences of the 15 genes using the online software NCBI/Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) according to the following parameters: PCR product size of 100-150 bp, primer melting temperature (Tm) of 57 • C-60 • C, and primer pairs separated by at least one intron on the corresponding genomic DNA [25]. All primer pairs were synthesized by BGI TECH (BGI TECH, Shenzhen, China), and the PCR products were verified with electrophoresis on 1% agarose gels.

RT-qPCR Analysis
RT-qPCR was executed in 96-well plates on a LightCycler®480 Real-Time PCR System (Roche Diagnostics, Mannheim, Germany) with an SYBR Green-based PCR assay. Reactions with a total volume of 10 µL, including 1 µL of a template (first-strand cDNA), 0.4 µL each of 10 µM forward and reverse gene-specific primers, 5 µL of 2×SYBR Premix Ex Taq II (TLi RanseH Plus) (Takara, Dalian, China), and 3.2 µL of ddH 2 O. The RT-qPCR conditions were as follows: initial denaturation at 95 • C for 30 s, followed by 45 cycles of 95 • C for 10 s, 60 • C for 10 s, and 72 • C for 10 s. The RT-qPCR analysis was tested in three biological replicates. Additionally, three technical replicates were used for each RT-qPCR analysis, and the R 2 and amplification efficiency (E) were counted by the standard curves with the diluted series on the basis of the diluted cDNA series (version 1.5, Roche Diagnostics, Mannheim, Germany). The PCR efficiency was detected by the equation (E = (10 [−1/slope] − 1) × 100%) [19].

Data Analysis
The raw Cq (quantification cycle) values are listed in Supplementary Table S1. Three common software programs, geNorm, NormFinder, and BestKeeper, were applied to calculate the expression stability of candidate reference genes.
For geNorm and NormFinder, Cq values were converted into relative quantities according to the formula: 2 −∆Ct (∆Ct = the corresponding Cq value − minimum Cq) [26]. The BestKeeper calculations were directly based on raw Cq values of each gene.

Validation of Reference Genes
To test the stability and reliability of the reference genes, the results of RT-qPCR were compared with the RNA-Seq data. The expression profiles of ERF17 and AUX22 were analyzed and compared with the values of Fragments Per Kilobase of transcript sequence per millions of base pairs sequenced (FPKM) obtained from the RNA-Seq data. The RT-qPCR analysis was conducted in the three biological replicates. Data were analyzed using the Origin 8 software and R package.

Isolation of Candidate Reference Genes in Different G. hirsutum Cultivars
According to our previous RNA-Seq-based transcriptome data of three different induction stages during SE initial dedifferentiation process in four G. hirsutum cultivars that have different SE capability, the screen rules of q-value ≥ 0.05, FPKM ≥ 5, |log2FoldChange| < 1, and relatively lower coefficient of variance (CV) of FPKM were set as the higher criteria at all sampling points to select the candidate reference genes. A total of 15 candidate reference genes were obtained, with the details of all reference genes listed in Supplementary Table S2 and the heatmap of FPKM during three induction stages in four cotton cultivars shown in Figure 1. FPKM-based heatmap of the 15 candidate reference genes during three induction stages in four cotton cultivars. Fifteen candidate reference genes were selected from the Gossypium hirsutum transcriptome datasets with the screening conditions of q-value ≥ 0.05, FPKM ≥ 5, and |log2FoldChange| < 1. Higher or lower FPKM of candidate reference genes were colored by red or green in four cotton cultivars, respectively. The color bar was shown at the upper apex. The visualized heatmap was generated using the R package based on the FPKM from the G. hirsutum transcriptome datasets. The sequences of the 15 candidate reference genes, including 18S rRNA, ENDO4, EF1α, ADP-ribosylation factor 1 (ARF1), ADP-ribosylation factor 2 (ARF2), eukaryotic peptide chain release factor 3A (ERF3A), eukaryotic translation initiation factor isoform 4E-2 (IF4E2), NEDD8 ultimate buster 1 (NUB1), polypyrimidine tract-binding protein homolog 3 (PTBP3), RNA polymerases I, II, and III subunit (RPAB5), transcription initiation factor (IIF), beta subunit (T2FB), TBP-associated factor 11 (TAF11), U-box domain-containing protein (UBE4), ubiquitin carrier protein 7 (UBC7), and ubiquitin fusion degradation 1 (UFD1), were obtained from NCBI GenBank nucleotide sequence database with the detailed information of the gene names and the corresponding accession numbers provided in Table 1.

Verification of Primer Specificity and PCR Amplification Efficiency
The 15 candidate reference genes were amplified by RT-qPCR using cDNA as the template to verify the specificity of the primers. One distinctive peak was observed in each melting curve, indicating that all the specific primers were appropriate for further RT-qPCR detection (Supplementary Figure S1). Gel electrophoresis results indicated that all specific primers could obtain a corresponding PCR product as designed in Table 1 successfully. The RT-qPCR amplification efficiency and R 2 of the candidate reference genes were calculated based on the slopes of the standard curves in the four cotton cultivars, which showed that the RT-qPCR amplification efficiency ranged from 96.13% (TAF11) to 120.30% (UBE4), and the R 2 values ranged from 0.9677 to 1.0977 ( Table 1), suggesting that the primers had high specificity and amplification efficiency and were appropriate for RT-qPCR.

Expression Profile Analysis of the Candidate Reference Genes at Different Induction Stages in Different Cotton Cultivars
RT-qPCR was used to measure the gene expression levels of the 15 candidate reference genes at three different induction stages in the four different cotton cultivars. Gene expression level was calculated based on the quantification cycle (Cq) that referred to the amplification-related fluorescence to reach a specific threshold level of test, with a smaller Cq representing a higher gene expression.  Table S1). The most highly expressed gene was ERF3A in the X42 cultivar at the induction stage of 3 d, while END04 showed the lowest level in R15 at the induction stage of 0 h. Additionally, 18S rRNA showed the least gene expression variation, while UBE4 showed the greatest variation among the different cotton cultivars.

Expression Stability Analysis of Reference Genes
To identify the stably expressed gene for cotton RT-qPCR normalization, the gene expression stability was assessed using three publicly available statistical tools, geNorm, NormFinder, and BestKeeper. geNorm can be used to check the most suitable number of reference genes by estimating M value that corresponds to the stability, with M = 1.5 or <1.5 indicating the standard or increased degree of stability of the candidate reference genes [19]. As shown in Table 2 and Figure 3, the results of the geNorm analysis showed that 18S rRNA, END04, and ARF1 were the most suitable reference genes in the four different cotton cultivars at three induction stages, while UBC7 and UBE4 were the least suitable reference genes. For the cotton cultivar YZ1, ARF1 had the lowest M values, and TAF11 had the highest M values under the same conditions. END04, IF4E2, and 18S rRNA in R15, END04, ARF2, and ARF1 in X33, and ARF2, T2FB, and UFD1 in X42 were detected as the more stably expressed genes, respectively. The variation V/(V n/n+1 ) between gene pairs can also establish the optimal number of reference genes for normalization. Generally, V n/n+1 is the cut-off value. If V n/n+1 = 0.15, the optimal number of reference genes for accurate normalization should be optimal n+1; if V n/n+1 < 0.15, the optimal number of reference genes should be n [27]. As shown in Figure 3, the V 2/3 value was below 1.5; in conclusion, according to the geNorm analysis, the most stable reference genes were 18S rRNA and END04 in four cotton cultivars.   The NormFinder software is a Visual Basic application tool that establishes the expression stability of reference genes based on the stability value (Sv). The NormFinder analysis differs slightly from geNorm analysis, which takes into account intra-and inter-group variation when calculating the normalization factor (NF) [20]. Genes that have lower levels than an average of expression stability are more stable, and thus may be suitable as reference genes. As shown in Table 3, the results of NormFinder analysis were highly consistent with those of the geNorm analysis. The most stable genes were 18S rRNA, END04, and ARF1 in four cotton cultivars. PRAB5, ARF1, and 18S rRNA were the most stable genes in YZ1. END04, IF4E2, and 18S rRNA had high expression stabilities in R15. For the cotton cultivar X33, END04, ARF2, and 18S rRNA appeared to have the most stable expressions. ARF2, T2FB, and UFD1 were highly stably expressed in X42. The BestKeeper software can be used to analyze the stability and expression of reference genes based on the coefficient of variance (CV), standard deviation (SD), and correlation coefficient (R) [21]. Reference genes are considered to be stable if R is high, but SD and CV are low. Additionally, if SD > 1, the gene is considered to be unacceptable [9]. As shown in Table 4, END04, 18S rRNA, and ARF1 were relatively stable expressed genes under all conditions. TAF11 was an unstable gene with a low CV± SD value and lower R value. For R15 cotton cultivar, 18S rRNA, UBC7, and ARF1 were identified as relatively suitable reference genes. END04, 18S rRNA, and PTBP3 in X33 and T2FB, ARF2, and EDD04 in X44 were the more stably expressed genes and could be considered as acceptable reference genes with high R values and low CV ± SD values. These results of comprehensive consideration of the geNorm, NormFinder, and BestKeeper analyses indicate that 18S rRNA and ENDO4 could be appropriate reference genes.

Expression Stability Validation of the Reference Genes of 18S rRNA and ENDO4
Through the identity analysis between the RT-qPCR results and the RNA-Seq data, two target genes, AUX22 and ERF17, were selected to further test the stability and reliability of the validated reference genes of 18S rRNA and ENDO4. The relative expression levels of AUX22 and ERF17 were calculated based on the validated reference genes and then were compared with the relative expression levels and the FPKM values of these reference genes in the RNA-Seq data. The 18S rRNA and ENDO4 were used as reference genes for the normalization of the target gene expression in the four cotton cultivars (Figure 4). Strong positive correlation coefficients (R 2 = 0.8396-0.9984) between the RT-qPCR results and the RNA-Seq data, and similar patterns between relative expression profiles and the FPKM values, were also observed ( Figure 5). The results indicated that the 18S rRNA and ENDO4 were suitable as the reference genes to obtain the reliable RT-qPCR results in the four cotton cultivars.

Discussion
Gene expression analysis is an effective basic method to predict plant gene function [28]. RT-qPCR is also widely used for gene expression detection using proper reference gene as an internal control [6]. Whereas, the expression of the reference gene is unavoidably influenced by different tissues and treatments, thus leading to unreliable results [29]. Therefore, it is essential to select valid and reliable reference genes as internal controls for normalization to ensure the reliability and accuracy of RT-qPCR reactions, without consideration of different experimental conditions [8]. In this study, on the basis of our high throughout RNA-Seq data of four G. hirsutum cultivars at three induction stages, we analyzed 15 candidates as reference genes according to the conditions of |log2FoldChange|<1 and a small coefficient of variation (CV) of FPKM values (Figure 1), which are widely considered parameters for choosing candidate reference genes [30].
Three public statistical algorithms, geNorm, NormFinder, and BestKeeper, can be used to determine the stability of reference gene expression [19][20][21]. Some studies of the two programs of geNorm and NormFinder were mostly used to identify candidate reference genes, showing the results that differed slightly in Oxytropis ochrocephala [30]. Several reports also showed that the NormFinder results were consistent with geNorm results in a variety of organisms, including the Sapium sebiferum [31], Chrysanthemum morifolium and Chrysanthemum lavandulifolium [32], Oryza sativa [33], and Rhododendron molle [34]. In this study, the ranking of NormFinder was relatively consistent with the results of geNorm analysis (Tables 2 and 3). However, the results differed from those obtained using BestKeeper, indicating that UBC7 was the most reliable reference gene in R15 analyzed by BestKeeper, but was the least reliable one analyzed by geNorm and NormFinder (Table 4). The divergent ranking may be caused due to the different algorithms [35]. By BestKeeper analysis, most studies utilized CV and SD of the Cq values, as well as the R values, to evaluate the stability and expression of reference genes [2,34,35]. TAF11 showed the lowest CV± SD values across all cultivars with the lowest R values in YZ1 and X42 (Table 4) and was identified as the least reliable gene by geNorm and NormFinder ( Figure 3 and Table 3). Through integrated analysis of the three programs, better accuracy for each reference gene can be obtained, and our results showed that 18S rRNA and ENDO4 were appropriate and reliable candidates as reference genes for RT-qPCR normalization in the four cotton cultivars under the condition of three different induction stages. ENDO4 encodes a putative endonuclease without enzyme activity, and 18S rRNA is part of the ribosomal RNA that constitutes the basic components of the eukaryotic cells. Ribosomal genes are often recognized as suitable reference genes [9] but are observed as the least stable expressions in peaches [36]. Generally, EF1α and UBCs were often used as reference genes in plants, such as Platycladus orientalis and Caragana intermedia [3,37], while EF1α, UBC7, and UBE4 showed unstable expressions in the different cotton cultivars (Tables 2 and 3).
The selected reference genes based on RNA-Seq were often further verified by RT-qPCR [38]. The RT-qPCR results were compared with that of RNA-Seq data to further validate the stability of ENDO4 and 18S rRNA expression, with the appearance of ENDO4 and 18S rRNA as reference genes in the four cotton cultivars showing strong positive R 2 between the RT-qPCR results and the RNA-Seq data ( Figure 4). When 18S rRNA and ENDO4 were used as the reference genes in the four cotton cultivars, the expression patterns of ERF17 and AUX22 indicated similar correlation between the RT-qPCR results and the RNA-Seq data ( Figure 5), suggesting that ENDO4 and 18S rRNA are reliable reference genes for studies of gene expression normalization in these four cultivars of G. hirsutum under three different induction stages.

Conclusion
We selected 15 candidate reference genes for RT-qPCR normalization in different cotton cultivars at three different induction stages according to the transcriptome datasets of G. hirsutum. After assessment of the 15 candidates by three statistical software geNorm, NormFinder, and BestKeeper, two genes of ENDO4 and 18S rRNA were identified as appropriate reference genes during three induction stages in the four cotton cultivars. The results of further validation of 18S rRNA and ENDO4 genes by comparing the RT-qPCR results and the RNA-Seq data showed a strong positive correlation, indicating that ENDO4 and 18S rRNA are reliable reference genes for studies of gene expression normalization. Our results identify necessary and appropriate reference genes as internal controls for RT-qPCR normalization and provide an effective basis for accurate quantification of target gene expression in different G. hirsutum cultivars.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/7/497/s1, Figure S1: Melting curves of the 15 selected reference genes, Table S1: Cq value of the 15 candidate reference genes, Table S2: The details of the 15 candidate reference genes based on RNA-Seq data.

Conflicts of Interest:
The authors declare no conflict of interest.