The Complete Plastid Genome of Rhododendron pulchrum and Comparative Genetic Analysis of Ericaceae Species

: Background and Objectives: Rhododendron pulchrum Sweet ( R. pulchrum ) belongs to the genus Rhododendron ( Ericaceae ), a valuable horticultural and medicinal plant species widely used in Western Europe and the US. Despite its importance, this is the ﬁrst member to have its cpGenome sequenced. Materials and Methods: In this study, the complete cp genome of R. pulchrum was sequenced with NGS Illumina HiSeq2500, analyzed, and compared to eight species in the Ericaceae family. Results: Our study reveals that the cp genome of R. pulchrum is 136,249 bp in length, with an overall GC content of 35.98% and no inverted repeat regions. The R. pulchrum chloroplast genome encodes 73 genes, including 42 protein-coding genes, 29 tRNA genes, and two rRNA genes. The synonymous ( Ks ) and nonsynonymous ( Ka ) substitution rates were estimated and the Ka / Ks ratio of R. pulchrum plastid genes were categorized; the results indicated that most of the genes have undergone purifying selection. A total of 382 forward and 259 inverted long repeats, as well as 221 simple-sequence repeat loci (SSR) were detected in the R. pulchrum cp genome. Comparison between di ﬀ erent Ericaceae cp genomes revealed signiﬁcant di ﬀ erences in genome size, structure, and GC content. Conclusions: The phylogenetic relationships among eight Ericaceae species suggested that R. pulchrum is closely related to Vaccinium oldhamii Miq. and Vaccinium macrocarpon Aiton. This study provides a theoretical basis for species identiﬁcation and future biological research of Rhododendron resources. were and 0.05, while the values of rrn23, trnS-GCU, trnV-GAC, rrn5 and rps19 were higher than 0.05. The results indicate that, in general, the nucleotide diversity among the nine Ericaceae species is high.


Introduction
Rhododendron species belong to family Ericaceae and have been widely used as valuable horticultural and medicinal plants in China, Western Europe, the US, and Japan due to their beautiful vegetative forms and bright-colored flowers [1,2]. The family of Ericaceae consists of nine subfamilies, including exclusively autotrophic species, fully mycoheterotrophic (MH) species, and partially MH species [3]. The association between autotrophy and heterotrophy is related to drastic changes in plant morphology (such as the loss and/or reduction of vegetative organs) [4,5], physiology (for example, loss of chlorophyll and high stomatal conductivity) [5,6], genome (including rampant sequence divergence and gene loss) [7,8]. Chloroplast genome has been widely employed for studying the transition from autotrophy to heterotrophy due to its conserved size, gene content, linear gene order, and structure.
Rhododendron contains between 600-1000 species, and is among the largest genera of Ericaceae [9]. Previous Rhododendron classification was mainly based on phenotypic characteristics, and thclassification systems proposed by David Chamberlain (Edinburgh Botanic Garden, UK) and

Whole Plastid Genome Comparison
To investigate the phylogenetic position of R. pulchrum, MUMmer [38] was used for pairing sequence alignment of the cp genomes, and the mVISTA (http://genome.lbl.gov/vista/mvista) [39] program was employed for comparing the complete cp genome of R. pulchrum to eight other related species whose cp genomes are sequenced. These eight species were divided into two groups at the genus level, including Vaccinieae (Vaccinium macrocarpon, NC_019616. 1

Phylogenetic Analysis
The Ericaceae cp genomes were obtained from the Organelle Genome and Nucleotide Resources database on NCBI. The sequences were initially aligned using MAFFT [36] (version 7, https://mat.cbrc. jp/alignment/software/) and the resulting multiple sequence alignment was visualized and manually adjusted in BioEdit [40]. Actinidia deliciosa C.F.Liang & A.R.Ferguson (NC_026691.1) and Actinidia chinensis Planch (NC_026690.1) were used as outgroups. The phylogenetic tree was constructed by the GTRGAMMA model implemented in RAxML.

Features of the R. pulchrum cp Genome
A total of 3.68 Gb of clean data consisting 12.28 million pair-end reads were produced. All reads were deposited to NCBI Sequence Read Archive (SRA) under accession number MN182619. The complete R. pulchrum cp genome is 136,249 bp in length (Figure 1), and it does not take the form of a typical quadripartite structure due to the lack of inverted repeats (IR). The overall GC content of the R. pulchrum cp genome is 35.98% (Table 1) and we identified 73 functional genes, including two rRNA genes, 29 tRNA genes, and 42 protein-coding genes ( Table 2). We then estimated codon usage frequency based on the protein-coding and tRNA genes. As shown in Figure 2, the cpGenome was composed of 8,693 codons (65 different types) encoding 20 amino acids, among which leucine (Leu) was the most frequently used amino acid (948 in number, 10.90%) and cysteine (Cys) was the least abundant (76 in number, 0.87%) ( Table S1). The results suggest the R. pulchrum cp genome prefers synonymous codons ended with A or U with a relative synonymous codon usage value (RSCU) > 1.

Ka/Ks Analysis of Base Variation
To test whether the remaining cp genes in R. pulchrum have undergone selection, we estimated the synonymous (Ks) and nonsynonymous (Ka) substitution rates (Table S2). The Ka/Ks ratios were then categorized, with Ka/Ks < 1, Ka/Ks = 1, and Ka/Ks > 1 denoting purifying, neutral, and positive selections, respectively, in the context of a codon substitution model. According to our results, only two genes, rps15 and psbZ, underwent positive selection compared with the eight Ericaceae species (Table  S2). By contrast, most remaining genes were shown to have undergone purifying selection, which was evidenced by a Ka/Ks ratio below 1 and the presence of negatively selected sites within some genes. During the transformation of Ericaceae from exclusively autotrophic (R. pulchrum, V. macrocarpon and V. oldhamii) to heterotrophic (Monotropeae, six species). We found signs of purifying selection in rpl32, which is the only gene annotated in this transformation, in four heterotrophic species (Allotropa virgata, Hemitomes congestum, Monotropsis odorata, and Pityopus californicus); whereas the rpl32 gene was absent from two heterotrophic species, Monotropa hypopitys and Monotropa uniflora.

Long-Repeat and SSR Analysis
A total of 576 long repeats were identified in the R. pulchrum cp genome, including 382 forward (F) and 259 inverted repeats (I) ( Table S3). The long repeats exhibited substantial variation in length-we found 460 (79.86%), 98 (17.01%), and 18 (3.13%) repeats of 15-30 bp, 30-100 bp and 100-1000 bp, respectively. The longest forward repeat was 951 bp in length and was identified in one spot involved the sequence of ycf3 gene and the intergenic space.

Long-Repeat and SSR Analysis
A total of 576 long repeats were identified in the R. pulchrum cp genome, including 382 forward (F) and 259 inverted repeats (I) ( Table S3). The long repeats exhibited substantial variation in lengthwe found 460 (79.86%), 98 (17.01%), and 18 (3.13%) repeats of 15-30 bp, 30-100 bp and 100-1000 bp, respectively. The longest forward repeat was 951 bp in length and was identified in one spot involved the sequence of ycf3 gene and the intergenic space.

Comparative Analysis of Gene Content and Genome Structure
Differences in cp sequences can help to infer the gene flow between species [41]. Thus, the complete R. pulchrum cp genome was compared with those of exclusively autotrophic and mycoheterotrophic species of Ericaceae, respectively. As shown in Table S5, gene content of R. pulchrum is quite distinct from the other eight Ericaceae species. We determined that while 9-12 (rps) genes were found among the other eight species, only two (rps15 and rps 19) were found in R. pulchrum. rps15 was only annotated in the three exclusively autotrophic species (R. pulchrum, V. macrocarpon and V. oldhamii); rps19 was annotated in all nine Ericaceae species. There have 5-8 (rpl) genes were found among the other eight species, only one gene (rpl32) were found in R. pulchrum. There have 4 (rrn) genes were found among the other eight species, only two gene (rrn23 and rrn5) were found in R. pulchrum. However, gene content of R. pulchrum has common characteristics with the other eight species. For example, no rpo genes coding RNA polymerase subunits were found in R. pulchrum and six species of Monotropeae, while 4 genes were found in two species of Vaccinieae; 28-30 (trn) genes coding Transfer RNA were found in three exclusively autotrophic species (R. pulchrum, V. macrocarpon and V. oldhamii), while only 15-18 genes were found in six species of Monotropeae. Photosynthesis-related genes (psb, psa, pet, atp, rbc, ndh) were lost in the cp genomes of the six nonphotosynthetic species of Monotropeae. Moreover, 37, 41 and 44 photosynthesis-related genes were found in genome of R. pulchrum, V. macrocarpon and V. oldhamii, respectively, with 34 of these genes in common. Four genes (atpF, ndhG, ndhK and PsbZ) were lost in V. macrocarpon, and seven genes (atpF, ndhB, ndhD, ndhF, PetA, psal and psaJ) were lost in R. pulchrum. Besides self-replication and photosynthesis related genes, there have eight other genes (accD, ccsA, cemA, clpP, infA, lhbA, matK and rp3) were found in the other eight Ericaceae species. Among which, only one gene (ccsA) were found in genome of R. pulchrum, and three genes (accD, clpP and infA) were only found in six nonphotosynthetic species of Monotropeae.
As shown in (Figure 4, Figure 5 and Figure S1, the R. pulchrum cp genome is quite distinct from the eight Ericaceae species. Specifically, the R. pulchrum cp genome is more divergent than mycoheterotrophic Ericaceae species; sequence diversity is higher in the noncoding than coding regions (Figures 4 and 5). Only one gene (trnL-CAU) in R. pulchrum exhibited higher similarity to that in the six mycoheterotrophic Ericaceae species ( Figure 4A). By contrast, twenty R. pulchrum cp genes showed relatively higher similarity to those of exclusively autotrophic species of Ericaceae ( Figure 4B). Together, these data revealed a high level of genetic variation among species of different genera within Ericaceae, especially between mycoheterotrophic (Monotropeae) and exclusively autotrophic species (Vaccinieae and Rhododendron).

The Phylogenetic Tree of Ericaceae
Phylogenetic analysis was performed based on an alignment of concatenated nucleotide sequences of all ten angiosperm cp genomes ( Figure 6). A phylogenetic tree was built by using the Gtrgamma model and the Bayesian inference (BI) method based on RAxML, with Actinidia deliciosa and Actinidia chinensis as outgroups. All relationships inferred from these cp genomes received high supports with the support values ranging between 83 and 100. It is worth noticing that the nine species from family Ericaceae did not form a clade. Six heterotrophic species in genera Monotropeae clustered into one clade, and the three exclusively autotrophic Ericaceae species (R. pulchrum, V. oldhamii and V. macrocarpon) formed another (Figure 6). Phylogenetic analysis was performed based on an alignment of concatenated nucleotide sequences of all ten angiosperm cp genomes ( Figure 6). A phylogenetic tree was built by using the Gtrgamma model and the Bayesian inference (BI) method based on RAxML, with Actinidia deliciosa and Actinidia chinensis as outgroups. All relationships inferred from these cp genomes received high supports with the support values ranging between 83 and 100. It is worth noticing that the nine species from family Ericaceae did not form a clade. Six heterotrophic species in genera Monotropeae clustered into one clade, and the three exclusively autotrophic Ericaceae species (R. pulchrum, V. oldhamii and V. macrocarpon) formed another ( Figure 6).

Discussion
The complete cp genome of R. pulchrum differs significantly from those of the other eight Ericaceae species with regard to genome size, structure, GC content, genes structure, but was similar to those of the two Vaccinieae species. Compared with exclusively autotrophic Ericaceae species (photosynthetic, Vaccinieae and Rhododendron), the cp genomes of mycoheterotrophic species (nonphotosynthetic, Monotropeae) are substantially smaller in size (ca. 33-41 kb) and gene content [10,11,42]. The R. pulchrum cp genome was found to have 73 functional genes, 110 and 133 genes were annotated in those of Vaccinium macrocarpon and Vaccinium oldhamii, whereas 40-45 genes were annotated in species of Monotropeae respectively-suggesting that Ericaceae cp genomes are highly variable. Gene content of R. pulchrum is quite distinct from the eight Ericaceae species, such as most of the self-replication related genes (rps, rpl, rrn) being missing compared to the other eight species, but similar as it has maintained most of the trn self-replicating genes. Interestingly, both R. pulchrum and six Monotropeae species are missing the rpo genes that are present in the two Vaccinieae species. Photosynthesis-related genes (psb, psa, pet, atp, rbc, ndh) were lost in the cp genomes of nonphotosynthetic Ericaceae species [10], while 37, 41 and 44 photosynthesis-related genes were found in genome of R. pulchrum, V. macrocarpon and V. oldhamii, respectively. Thus, supporting the position of Rhododendron between Vaccinieae and Monotropeae during evolution. According to the published data, the plastid genomes of most photosynthetic land plants are conserved in size (140-160 kb) and display the typical quadripartite structure by showing the LSC, LSC, and two IRs [14,19,20].

Discussion
The complete cp genome of R. pulchrum differs significantly from those of the other eight Ericaceae species with regard to genome size, structure, GC content, genes structure, but was similar to those of the two Vaccinieae species. Compared with exclusively autotrophic Ericaceae species (photosynthetic, Vaccinieae and Rhododendron), the cp genomes of mycoheterotrophic species (nonphotosynthetic, Monotropeae) are substantially smaller in size (ca. 33-41 kb) and gene content [10,11,42]. The R. pulchrum cp genome was found to have 73 functional genes, 110 and 133 genes were annotated in those of Vaccinium macrocarpon and Vaccinium oldhamii, whereas 40-45 genes were annotated in species of Monotropeae respectively-suggesting that Ericaceae cp genomes are highly variable. Gene content of R. pulchrum is quite distinct from the eight Ericaceae species, such as most of the self-replication related genes (rps, rpl, rrn) being missing compared to the other eight species, but similar as it has maintained most of the trn self-replicating genes. Interestingly, both R. pulchrum and six Monotropeae species are missing the rpo genes that are present in the two Vaccinieae species. Photosynthesis-related genes (psb, psa, pet, atp, rbc, ndh) were lost in the cp genomes of nonphotosynthetic Ericaceae species [10], while 37, 41 and 44 photosynthesis-related genes were found in genome of R. pulchrum, V. macrocarpon and V. oldhamii, respectively. Thus, supporting the position of Rhododendron between Vaccinieae and Monotropeae during evolution. According to the published data, the plastid genomes of most photosynthetic land plants are conserved in size (140-160 kb) and display the typical quadripartite structure by showing the LSC, LSC, and two IRs [14,19,20]. However, our results indicate that the R. pulchrum cp genome lacks the IRs, which has also been reported in Monotropeae [10], Medicago [43,44] and Erodium plants (Erodium carvifolium HQ713469.1).
The R. pulchrum cp genome has an overall GC content of 35.98%, which is lower than exclusively autotrophic Ericaceae species including Vaccinium oldhamii (36.75%) and Vaccinium macrocarpon (36.80%) but higher than heterotrophic Ericaceae species (Monotropeae), including Pityopus californicus (34.37%), Monotropa hypopitys (34.31%), Hemitomes congestum (33.71%), Allotropa virgata (33.09%), Monotropsis odorata (31.20%), and Monotropa uniflora (28.47%). In plants, GC content is often associated with the degree of primitiveness of a taxon [45], whereas in MH species, it is often related to plastome degradation level [7]. Further, consistent with that observed with Q. acutissima [46], the ycf 3 and trnL-UAA genes in R. pulchrum have the largest intron and the smallest intron, respectively. The ycf 3 gene has been reported to be necessary for stabilizing the accumulation of photosystem I complexes [47], and the gain of intron is usually considered have closely relationship with the evolution of photosynthesis [46]. This result indicates that ycf 3 is likely a key player in photosynthesis in R. pulchrum. Thus, the complete cp genomes of genera Vaccinieae, Monotropeae, and Rhododendron are useful for studying Ericaceae species evolution, photosynthetic gene degradation, and the phylogenetic relationships among Ericaceae species.
In this study, a total of 221 cp SSRs were identified in R. pulchrum, more than that has been reported for other Ericaceae species (62) [11]. This may be a result of the complex genetic background of Rhododendron. Meanwhile, the cpSSRs identified here rarely contain tandem G or C repeats, which is in line with the published data [48]. These cpSSR markers offer an alternative to morphological features and nuclear DNA markers in defining Rhododendron subsections. The cpSSRs yielded from this study can be employed for genetic structure determination, as well as the diversity, differentiation, and maternity studies of R. pulchrum and its related species.
Compared with the eight Ericaceae species, only two genes rps15 and psbZ-which encode the small subunit of ribosome and the core complex of photosystem II protein, respectively-showed signs of positive selection. Co-annotated genes were found in all the eight Ericaceae species during the transformation of Ericaceae from exclusively autotrophic (Rhododendron, R. pulchrum) to heterotrophic (Monotropeae, six species). In addition, we observed a high level of genetic diversity among Ericaceae genera, especially between mycoheterotrophic (Monotropeae) and exclusively autotrophic species (Vaccinieae and Rhododendron), which is consistent with the genomic rearrangement events in Ericaceae plants reported by previous studies [11]. In general, the cp genomes of species from the same family are conserved [23], and cp sequences have been successfully applied to the phylogenetic studies of angiosperms [49,50]. Our phylogenetic analysis clustered the nine Ericaceae species (from three genera) and two outgroup species into three distinct phylogenetic groups, including one outgroup (Actinidia deliciosa and Actinidia chinensis), one exclusively autotrophic group (R. pulchrum, Vaccinium macrocarpon, and V. oldhamii), and one heterotrophic group (Monotropa hypopitys, Allotropa virgata, Hemitomes congestum, Monotropa uniflora, Monotropsis odorata, and Pityopus californicus). In addition, the cp genome of R. pulchrum is closely related to those of Vaccinium oldhamii and Vaccinium macrocarpon species. Taken together, our results revealed the potential of Ericaceae family as suitable materials for studying plastid genome development and evolution.

Conclusions
In this study, we report the characterization of the complete cp genome of R. pulchrum, a horticultural and medicinal species cultivated worldwide. The R. pulchrum cp genome displays unique characteristics compared with species from other genera in Ericaceae, especially genus Monotropeae. The phylogenetic analysis revealed that R. pulchrum was closely related to species of Vaccinium oldhamii and Vaccinium macrocarpon. The complete cp genome assembly of R. pulchrum provided in this research is valuable for future population genomic and phylogenomic studies, and will benefit Rhododendron conservation and utilization.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/2/158/s1, Table S1. Codon-anticodon recognition patterns and codon usage of the R. pulchrum cp genome. Table S2. Ka/Ks ratios of the cp genes from R. pulchrum and related species (NA = not available). Table S3. Long repeats identified from the R. pulchrum cp genome (F = forward, I = inverted repeat; IGS = intergenic space). Table S4: Simple sequence repeats (SSRs) identified in the R. pulchrum cp genome. Table S5: Annotated genes of the nine species of Ericaceae cp genomes.