Characterization and Comparison of Two Complete Plastomes of Rosaceae Species (Potentilla dickinsii var. glabrata and Spiraea insularis) Endemic to Ulleung Island, Korea

Potentilla dickinsii var. glabrata and Spiraea insularis in the family Rosaceae are species endemic to Ulleung Island, Korea, the latter of which is listed as endangered. In this study, we characterized the complete plastomes of these two species and compared these with previously reported plastomes of other Ulleung Island endemic species of Rosaceae (Cotoneaster wilsonii, Prunus takesimensis, Rubus takesimensis, and Sorbus ulleungensis). The highly conserved complete plastomes of P. dickinsii var. glabrata and S. insularis are 158,637 and 155,524 base pairs with GC contents of 37% and 36.9%, respectively. Comparative phylogenomic analysis identified three highly variable intergenic regions (trnT-UGU/trnL-UAA, rpl32/trnL-UAG, and ndhF/rpl32) and one variable genic region (ycf1). Only 6 of the 75 protein-coding genes have been subject to strong positive selection. Phylogenetic analysis of 23 representative plastomes within the Rosaceae supported the monophyly of Potentilla and the sister relationship between Potentilla and Fragaria and indicated that S. insularis is sister to a clade containing Cotoneaster, Malus, Pyrus, and Sorbus. The plastome resources generated in this study will contribute to elucidating the plastome evolution of insular endemic Rosaceae on Ulleung Island and also in assessing the genetic consequences of anagenetic speciation for various endemic lineages on the island.

In this study, we characterized the complete chloroplast genome sequences of Potentilla dickinsii var. glabrata (subfamily Rosoideae) and Spiraea insularis (subfamily Amygdaloideae), which are the only Rosaceae species endemic to Ulleung for which plastomes are yet to be sequenced, and compared these with the plastomes of the aforementioned four Ulleung Rosaceae species. The comparative analysis of these six plastomes will shed light on the plastome structure and evolution of endemic insular species in the family Rosaceae, which have evolved through the speciation mechanism of anagenesis. We anticipate that further analyses of these plastome sequences will enable us to identify hotspot regions that contribute to determining population genetic diversity and structure, thereby allowing us to assess genetic differences between pairs of continental progenitor and insular derivative species.

Genome Size and Features
The plastome of P. dickinsii var. glabrata has 155,524 bp and comprises a large single-copy (LSC) region of 85,213 bp, a small single-copy (SSC) region of 18,657 bp, and two inverted repeat (IR) regions of 25,827 bp. The complete plastome sequence of S. insularis is slightly larger at 158,637 bp and comprises an LSC region of 86,997 bp, an SSC region of 18,910 bp, and two IR regions of 26,365 bp (Figures 1 and 2, and Table 1). The plastomes of P. dickinsii var. glabrata and S. insularis contain 131 and 132 genes, respectively, with the difference in gene number being attributable to the presence of an rps19 pseudogene in S. insularis. Both plastomes contain 84 protein-coding, eight ribosomal RNA, and 37 transfer RNA genes. The overall guanine-cytosine (GC) content of the P. dickinsii var. glabrata and S. insularis plastomes are 37.0% and 36.91%, respectively (Table 1). Of the six Rosaceae endemic to Ulleung Island, C. wilsonii is characterized by the longest plastome (159,997 bp), whereas that of P. dickinsii var. glabrata is the shortest. The plastomes of R. takesimensis and Sorbus ulleungensis were found to have the highest and lowest GC content of 37.1% and 36.5%, respectively. The plastome sequences of both P. dickinsii var. glabrata and S. insularis were found to contain a total of 17 duplicated genes in the IR regions (seven tRNA, four rRNA, and six protein-coding genes). Fifteen genes (ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps12, rps16, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) contain a single intron, whereas clpP and ycf3 each contain two introns. In both P. dickinsii var. glabrata and S. insularis, the plastome contains a partial ycf1 gene of 1227 and 1301 bp, respectively, located in the IRb/SSC junction region, whereas a complete ycf1 gene of 5808 and 5613 bp, respectively, is located in the IR region at the SSC/IRa junction. The infA gene located in the LSC region of the P. dickinsii var. glabrata and S. insularis has become a pseudogene. Interestingly, the highly conserved group II intron of atpF has been lost in S. insularis, as has previously been observed in the Rubus species R. boninensis, R. crataegifolius, R. takesimensis, and R. trifidus [47,48,60]. When compared with representative plastomes of species in the Rosaceae, those of Potentilla, Fragaria, Rosa, and Rubus in the subfamily Rosoideae lineage all show atpF intron loss, whereas in contrast, the plastomes of Cotoneaster, Malus, Prunus, Pyrus, and Sorbus in the subfamily Amygdaloideae retain intron-containing atpF genes (Table 1). On the basis of the current phylogenetic framework, it appears that loss of the atpF intron has occurred only once in the subfamily Rosoideae; however, it remains to be determined whether this loss has also occurred in other lineages of Rosoideae, as well as in the broader phylogenetic framework, including within the Rosacea and Rosid families.
The frequency of codon usage in the P. dickinsii var. glabrata and S. insularis plastomes was calculated for the chloroplast genome based on the sequences of protein-coding and tRNA genes ( Figure 3), which revealed that the average codon usage in these two species was nearly identical, i.e., 26,008 for P. dickinsii var. glabrata and 26,015 for S. insularis. Moreover, we found the distribution of codon types to be consistent. The relative synonymous codon usage (RSCU) value was also similar to that in R. takesimensis. Consistent with the patterns detected in Rubus [60] and other angiosperms [61] and algal lineages [62], we found that codon usage in the P. dickinsii var. glabrata and S. insularis plastomes is biased toward a high RSCU value of U and A at the third codon position.
The frequency of codon usage in the P. dickinsii var. glabrata and S. insularis plastomes was calculated for the chloroplast genome based on the sequences of protein-coding and tRNA genes ( Figure 3), which revealed that the average codon usage in these two species was nearly identical, i.e., 26,008 for P. dickinsii var. glabrata and 26,015 for S. insularis. Moreover, we found the distribution of codon types to be consistent. The relative synonymous codon usage (RSCU) value was also similar to that in R. takesimensis. Consistent with the patterns detected in Rubus [60] and other angiosperms [61] and algal lineages [62], we found that codon usage in the P. dickinsii var. glabrata and S. insularis plastomes is biased toward a high RSCU value of U and A at the third codon position. The predicted number of RNA editing sites in the plastomes of P. dickinsii var. glabrata and S. insularis is 42 and 51, respectively, with the same cut-off value, and 19 and 15 of 35 protein-coding genes are predicted to undergo RNA editing, respectively (Table S1). These genes include photosynthesis-related genes (atpF, atpI, ndA, ndhB, ndhD, ndhF, ndhG, petB, petG, psbE, and psbF), self-replication genes (rpl23, rpoA, rpoB, rpoC2, rps2, rps14, and rps16), and others (accD, clpP, and matK). We detected no RNA editing sites in the accD, atpF, atpI, psbE, psbF, and rps16 genes of the P. dickinsii var. glabrata plastome, whereas no RNA editing sites were found at petG and rpl23 in the S. insularis plastome. Compared with other species, the ndhF gene of S. insularis showed an exceptionally high frequency (i.e., three-fold higher) of RNA editing sites. The ndhB gene is also characterized by the highest number of potential editing sites (11 sites), followed by the ndhD gene (6 sites), which is consistent with the findings of previous studies [63][64][65].

Comparative Analysis of Genome Structure
The plastomes of the six Rosaceae species endemic to Ulleung Island (i.e., C. wilsonii, S. insularis, P. dickinsii var. glabrata, Prunus takesimensis, R. takesimensis, and Sorbus ulleungensis) were plotted with mVISTA, using the annotated R. takesimensis plastome as a reference (Figure 4). The results indicated that the LSC region is the most divergent, whereas the two IR regions are highly conserved. In addition, the non-coding regions were found to be more divergent and variable than the coding regions. As expected, these findings are consistent with the patterns observed in common angiosperms [44,47,49,60,61]. These six plastomes are highly conserved despite differences in estimated divergence times in the Late Cretaceous period, with the crown ages of Rosoideae and Amygdaloideae being estimated to be 75.78 million and 90.18 million years, respectively [10].  Sliding window analysis performed using the DnaSP program revealed highly variable regions in the plastomes of the six endemic Rosaceae taxa ( Figure 5). Comparison of the six plastomes revealed that the average value of nucleotide diversity (Pi) over the entire chloroplast genome was 0.042, with the most variable region (a Pi value of 0.14508) being the trnT-UGU/trnL-UAA intergenic region. We also detected high variability in two other intergenic regions (rpl32/trnL-UAG (Pi = 0.14342) and ndhF/rpl32 (Pi = 0.13267)) and one genic region (ycf1 (Pi = 0.1285)). In addition, we detected several variable regions with Pi values greater than 0.1, namely, rps16/trnQ-UUG, trnR-UCU/atpA, rpoB/trnC-GCA/petN, trnT-GGU/psbD, trnP-UGG/psaJ/rpl33, and ycf3/trnS-GGA/rps4. Among these highly variable regions, those with Pi values greater than 0.12 can be used to generate chlorotype diversity data to infer the origin and evolution of endemic species on Ulleung Island. Although the two newly sequenced species are only rarely found on Ulleung Island, R takesimensis is among the more commonly occurring species on the island, and we similarly detected highly variable regions, including rpl32/trnL, rps4/trnT, trnT/trnL, and psbZ/trnG, in this species [48]. Furthermore, we also found that the ycf1 gene shows the highest sequence divergence and, thus, would appear to have potential value for the phylogenetic analysis of Rosaceae and angiosperms in general [46]. Sliding window analysis performed using the DnaSP program revealed highly variable regions in the plastomes of the six endemic Rosaceae taxa ( Figure 5). Comparison of the six plastomes revealed that the average value of nucleotide diversity (Pi) over the entire chloroplast genome was 0.042, with the most variable region (a Pi value of 0.14508) being the trnT-UGU/trnL-UAA intergenic region. We also detected high variability in two other intergenic regions (rpl32/trnL-UAG (Pi = 0.14342) and ndhF/rpl32 (Pi = 0.13267)) and one genic region (ycf1 (Pi = 0.1285)). In addition, we detected several variable regions with Pi values greater than 0.1, namely, rps16/trnQ-UUG, trnR-UCU/atpA, rpoB/trnC-GCA/petN, trnT-GGU/psbD, trnP-UGG/psaJ/rpl33, and ycf3/trnS-GGA/rps4. Among these highly variable regions, those with Pi values greater than 0.12 can be used to generate chlorotype diversity data to infer the origin and evolution of endemic species on Ulleung Island. Although the two newly sequenced species are only rarely found on Ulleung Island, R takesimensis is among the more commonly occurring species on the island, and we similarly detected highly variable regions, including rpl32/trnL, rps4/trnT, trnT/trnL, and psbZ/trnG, in this species [48]. Furthermore, we also found that the ycf1 gene shows the highest sequence divergence and, thus, would appear to have potential value for the phylogenetic analysis of Rosaceae and angiosperms in general [46]. genes have been subjected to strong purifying selection in the Rosaceae chloroplast. In general, previous studies showed that Ka/Ks values are usually less than one [68], because synonymous nucleotide substitutions occur more frequently than nonsynonymous substitutions. Additionally, most genes of the chloroplast genome evolved under purifying selection due to functional limitation during chloroplast genome evolution [69][70][71][72]. Furthermore, both the positive selection of the rbcL gene and the NADH dehydrogenase subunit genes were previously reported in several studies, which is related to temperature, drought, carbon dioxide concentration, and photosynthetic rate [71][72][73][74]. Positive selection is considered to be indicative of an adaptation to environmental change, ecological niche, or coevolutionary processes [73,75], and we can, thus, speculate that the selection patterns detected for the Rosaceae endemic taxa on Ulleung Island may be associated with adaptation to an oceanic climate in the insular setting. However, any correlation between insular environment and positive selection pressures on genes will require further study.

Phylogenetic Analysis
Maximum likelihood analysis conducted on the best-fit model of "K3Pu + F + G4" enabled us to reveal phylogenetic positions among the endemic Rosaceae taxa on Ulleung Island (Figure 6). Positive selection analysis, performed using the EasyCodeML [66] program with the site-specific model based on CodeML algorithms [67], enabled us to identify positively selected genes among endemic Rosaceae on Ulleung Island (Table 2). Among the conserved genes, six genes with positively selected sites within the endemic Rosaceae plastomes on Ulleung Island were identified with effectively significant LRT p values (Table 2). These six genes include one subunit of acetyl-CoA carboxylase (accD), one Rubisco gene (rbcL), one ribosome small subunit gene (rps3) of self-replication, and three NADH-dehydrogenase subunit genes (ndhB, nhdD, and ndhF) of photosynthesis. Based on the M8 model, the rbcL gene had five positive sites, followed by ndhF (three sites), and rps3 (two sites). The other three genes each had only one positive site. However, most of the genes, 69 of the 75 genes, had an average Ka/Ks ratio of below 1, indicating that these genes have been subjected to strong purifying selection in the Rosaceae chloroplast. In general, previous studies showed that Ka/Ks values are usually less than one [68], because synonymous nucleotide substitutions occur more frequently than nonsynonymous substitutions. Additionally, most genes of the chloroplast genome evolved under purifying selection due to functional limitation during chloroplast genome evolution [69][70][71][72]. Furthermore, both the positive selection of the rbcL gene and the NADH dehydrogenase subunit genes were previously reported in several studies, which is related to temperature, drought, carbon dioxide concentration, and photosynthetic rate [71][72][73][74]. Positive selection is considered to be indicative of an adaptation to environmental change, ecological niche, or coevolutionary processes [73,75], and we can, thus, speculate that the selection patterns detected for the Rosaceae endemic taxa on Ulleung Island may be associated with adaptation to an oceanic climate in the insular setting. However, any correlation between insular environment and positive selection pressures on genes will require further study.

Phylogenetic Analysis
Maximum likelihood analysis conducted on the best-fit model of "K3Pu + F + G4" enabled us to reveal phylogenetic positions among the endemic Rosaceae taxa on Ulleung Island ( Figure 6). However, given that the phylogenetic tree was constructed based on only a partial representation of the entire Rosaceae family, the positions determined should be considered provisional and interpreted with caution. Nevertheless, our phylogenetic analysis of 29 representative plastomes within the rose family strongly supports the monophyly of Potentilla (100% bootstrap support) and the sister relationship between Potentilla and Fragaria (100% bootstrap support). We found that P. dickinsii var. glabrata was sister to a clade containing P. freyniana, P. freyniana var. chejuensis, P. stolonifera, and P. stolonifera var. quelpaertensis, whereas the clade of genus Spiraea (S. insularis and S. martini) is sister to a clade containing Cotoneaster, Malus, Pyrus, and Sorbus (Amygdaloideae; 100% bootstrap support). In addition, the monophyly of Cotoneaster, Spiraea and Sorbus was strongly supported, with a 100% bootstrap support value of each genus. Cotoneaster wilsonii showed sister relationships with two congeneric species, C. horizontalis and C. franchetii, while Sorbus ulleungensis showed sister relationships with congeneric S. helenae and S. rufopilosa. Lastly, we found that the genus Prunus represented the earliest diverged lineage within the subfamily Amygdaloideae and the monophyly of Amygdaloideae and Rosoideae was strongly supported with 100% bootstrap support.

Plant Sampling, DNA Isolation, and Plastome Sequencing/Annotation
To characterize plastome sequences among endemic species of the family Rosaceae on Ulleung Island, we collected samples from Potentilla dickinsii var. glabrata and Spiraea insularis, the two Ulleung endemic species in this family for which the plastomes have yet to be sequenced. Fresh leaves were collected from the Key-Chungsan Botanical Garden, which was specifically designated by the Ministry of Environment, Korea for the ex situ conservation of numerous native and endemic plant species from Ulleung Island. Voucher specimens were collected and deposited in the Ha Eun Herbarium of Sungkyunkwan University, Korea. Total DNA was isolated using a DNeasy Plant Mini Kit (Qiagen, Carlsbad, CA, USA) and sequenced using an Illumina HiSeq 4000 sequencer (Illumina, Inc., San Diego, CA, USA) at Macrogen Corporation (Seoul, Korea). A total of 33,832,832 and 29,396,622 paired-end reads (150 bp) were generated for P. dickinsii var. glabrata and S. insularis, respectively, and these were subsequently assembled de novo using Velvet v. 1.2.10 (EMBL-EBI, Cambridge, UK) with multiple k-mers [76]. tRNAs in the sequences were confirmed using tRNAscan-SE 2.0 (The Lowe lab, Santa Cruz, CA, USA) [77]. Annotation was conducted using Geneious R10 (Biomatters, Auckland, NewZealand) [78] and the annotated plastome sequences were deposited in the GenBank databank (with accession numbers of MT412406 and MT412405 for P. dickinsii var. glabrata and S. insularis, respectively). The annotated GenBank (NCBI, Bethesda, MD, USA) format sequence file was used to draw a circular map with the OGDRAW program v1.2 (CHLOROBOX, Postdam-Golm, Germany) [79].

Comparative Plastome Analysis
The complete plastomes of P. dickinsii var. glabrata and S. insularis were compared with those previously obtained for four Rosaceae species endemic to Ulleung, namely, C. wilsonii (NC046834), Prunus takesimensis (NC039379), R. takesimensis (NC037991), and Sorbus ulleungensis (NC03702). The six Rosaceae plastomes were aligned using MAFFT v. 7 [80] and adjusted manually using Geneious [78]. Using DnaSP v. 6.10 software [81], we performed a sliding window analysis, with a step size of 200 bp and window length of 800 bp, to determine plastome nucleotide diversity (Pi). The codon usage frequency was calculated using MEGA7 [82], yielding RSCU values [83], which are a simple measure of the non-uniform usage of synonymous codons in a coding sequence. For this purpose, we employed the DNA code used by bacteria, archaea, prokaryotic viruses, and in plant chloroplasts [84]. To predict putative RNA editing sites in the six plastomes, protein-coding genes were identified using the online program predictive RNA editor for plants (PREP) suite [85], with 22 genes used as references, based on a cut-off value of 0.8. Analyses based on the complete chloroplast genomes and the concatenated sequences of 75 common protein-coding genes among the studied species were conducted with MAFFT v. 7 [80], using Geneious R10 [78], and the Maximum likelihood phylogenetic tree was constructed with IQ-TREE ver. 1.4.2 [86]. To evaluate for natural selection pressure in the protein coding genes of the six plastomes, the site-specific model was performed using EasyCodeML [66] with CODEML algorithms [67]. Seven codon substitution models were investigated and compared to detect positively selected sites based on likelihood ratio tests (M0, M1a, M2A, M3, M7, M8, and M8a).

Phylogenetic Analysis
For the purposes of phylogenetic analysis, we analyzed the complete plastome sequences of 23 representative species from the family Rosaceae: seven species of Potentilla, including P. centigrana (NC041209), P. freyniana (NC041210), and P. stolonifera (NC044418); two species each from the genera Fragaria, Malus, Prunus, Pyrus, Rosa, and Rubus; one species each from the genera Cotoneaster, Physocarpus, and Sorbus. Dryas drummondii of the subfamily Dryadoideae was included as an outgroup species. The sequences of all species were aligned using MAFFT v. 7 [80] in Geneious [78]. Maximum likelihood analysis based on the best-fit model of K3Pu + F + G4 was conducted using IQ-TREE v. 1.4.2 [86], and non-parametric bootstrap analysis was performed with 1000 replicates.

Conclusions
In this study, we determined the complete plastome sequences of two species in the family Rosaceae (Potentilla dickinsii var. glabrata and Spiraea insularis) that are endemic to Ulleung Island, Korea. We found very little structural or organizational differences among the plastomes of six Rosaceae species endemic to this island. The frequency of codon usage was biased toward high RSCU values of U and A at the third codon position, and we found that the ndhB and ndhD genes are characterized by a high number of potential RNA editing sites. Comparative analysis among the six endemic Rosaceae species revealed three highly variable intergenic regions (trnT-UGU/trnL-UAA, rpl32/trnL-UAG, and ndhF/rpl32) and a single highly variable genic region (ycf1). These hotspot regions could be used to assess the genetic consequences of anagenetic speciation for endemic Rosaceae taxa on Ulleung Island. We also confirmed that a majority of the protein-coding genes (61 of 75) common to the chloroplast genomes of the endemic Rosaceae have been subjected to positive selection. Phylogenomic analysis based on selected Rosaceae plastomes supported the monophyly of the genus Potentilla and the sister relationship between S. insularis and a clade containing Cotoneaster, Malus, Pyrus, and Sorbus. The plastome resources reported in this study will enable us to gain a better understanding of the plastome evolution of insular endemics among the Rosaceae on Ulleung Island, as well as that of other members within the family Rosaceae.