Small RNA NGS Revealed the Presence of Cherry Virus A and Little Cherry Virus 1 on Apricots in Hungary

Fruit trees, such as apricot trees, are constantly exposed to the attack of viruses. As they are propagated in a vegetative way, this risk is present not only in the field, where they remain for decades, but also during their propagation. Metagenomic diagnostic methods, based on next generation sequencing (NGS), offer unique possibilities to reveal all the present pathogens in the investigated sample. Using NGS of small RNAs, a special field of these techniques, we tested leaf samples of different varieties of apricot originating from an isolator house or open field stock nursery. As a result, we identified Cherry virus A (CVA) and little cherry virus 1 (LChV-1) for the first time in Hungary. The NGS results were validated by RT-PCR and also by Northern blot in the case of CVA. Cloned and Sanger sequenced viral-specific PCR products enabled us to investigate their phylogenetic relationships. However, since these pathogens have not been described in our country before, their role in symptom development and modification during co-infection with other viruses requires further investigation.


Introduction
Apricot (Prunus armeniaca) is one of the most popular fruits in central Europe, especially in Hungary, where it is not only consumed as a fresh fruit, but also serves as a raw material for jam and "Palinka" (a distilled spirit) production. Thanks to intensive breeding programs since 1950, many varieties have become available with improved characteristics for these specific purposes.
In accordance with the usual routine [1], mother trees of the new, approved varieties free from viruses are kept in isolator houses to prevent subsequent exposure to viruses, especially the Plum pox virus (PPV), and infections. They are also kept in open field stock nurseries which provide propagation material for the future.

Plant Material, Sample Preparation
Samples were collected from an isolator house and open field stock nursery in Érd, at the Research Station of the Fruticulture Research Institute of NARIC. Leaf samples from four different branches of the tree, from three different varieties-Ligeti óriás (Parkland giant), Pannónia kajszi (Pannonian apricot), Magyar kajszi (Hungarian apricot)-were collected. Leaf samples of the in vitro cultured plantlets from all three varieties were also collected. RNA was extracted from leaf samples by the CTAB method [42]. RNA pools, representing each variety at different locations (isolator house or stock nursery) were prepared by mixing equal amounts of RNA originating from different individuals. These pools were used for sRNA library preparation (five libraries in total) using Truseq Small RNA Library Preparation Kit (Illumina, San Diego, CA, USA) and our modified protocol [42]. Samples were sequenced using a single index on a HiScan2000 by UD Genomed (Debrecen, Hungary) 50 bp, single end (8 samples/1 sequencing lane). Fastq files of the sequenced libraries were deposited to the GEO and can be accessed through series accession number GSE114251.

Pipeline for Data Evaluation of NGS Results (Bioinformatics)
For bioinformatics analysis, we used our published pipeline [42]. Briefly, the resulting reads were sorted according to their indexes. Adapters of the sequenced reads were removed by the Trimmomatic program [43], their quality was checked by the FastQC program (http://www. bioinformatics.babraham.ac.uk/projects/fastqc) and deduplicated by the Picard MarkDuplicates tool (http://broadinstitute.github.io/picard). For virus detection, we used two different pipelines in parallel: (A) short reads were mapped to viral reference genomes (Refseq viral database of NCBI from only plant and invertebrate hosts were used) by the BWA-aln short read aligner [44] with default options. Mapped reads were counted both with and without deduplication using samtools idxstats [45]. Redundant reads of the resultant hits were normalized to read/million read. Consensus viral sequences from the aligned deduplicated reads were generated using the samtools/bcftools [45] pipeline. Coverage of the appropriate genome was counted as the percentage of the genome covered by nucleotide information from the mapped sRNA reads. (B) De novo assembling of the deduplicated reads was performed by using Velvet with k-mer: 13, 15, 17 [46]. The generated contigs were annotated by MegaBLAST [47] to the RefSeqs and nr (viral database of NCBI from plant and invertebrate hosts) of NCBI.

Validation of Predicted Virus Diagnostics by RT-PCR
cDNA was synthetized from pooled RNA extracts representing each library or each individual tree using a random primer of the RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher Scientific, Waltham, MA, USA), according to the manufacturer's instructions. The generated cDNA was used as template for PCR reactions using Phire Hot Start II DNA Polymerase (Thermo Fisher Scientific) and published diagnostic primer (for PPV, [48]) or new ones (see Table S1) which were designed according to the sequenced sRNA reads. PCR products were analysed by agarose gel electrophoresis. For Sanger sequencing, cDNA was synthetized from pooled RNA extracts of individual plants and virus-specific PCR was done using Phusion Hot Start II High-Fidelity DNA Polymerase (Thermo Fisher Scientific). The purified products were cloned into pGEM ® -T Vector System I (Promega, Fitchburg, WI, USA) and sequenced. Sequences were deposited into GenBank (GenBank Accession Numbers: MH321189-91.).

Phylogenetic Analysis
To compare sequenced and cloned PCR products, we used MEGA 7.0.21 [49] with the implemented neighbor-joining algorithm. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches.

Validation by Northern Blot
For Northern blot analyses, 3 µg of total RNA was separated on formaldehyde-1.5% agarose gel and blotted to Amersham Hybond-NX membrane (GE Healthcare, Chicago, IL, USA), by the capillary method using 20× SSC (3 M NaCl and 0.3 M Na-citrate; pH 7.0). Hybridization was carried out at 65 • C in Church buffer (0.5 M Sodium Phosphate buffer, pH 7.2 containing 1% BSA, 1 mM EDTA, 7% SDS) overnight with the appropriate radioactively labelled probe, washed for 5 min in 2× SSC, 0.1% SDS and for 15 min in 0.5× SSC, 0.1% SDS at the temperature of the hybridization and exposed to an X-ray film. Virus-specific, P32-labelled DNA probes were prepared by using the DecaLabel DNA Labelling Kit (Thermo Fischer Scientific). As a template for Northern blot, probe PCR-amplified and purified product of the cloned region of CVA was used.  (Table S2). In vitro cultures which served as a propagation material for the mother trees were also available and tested. RNA pools were prepared from the RNA extracts of leaves according to the plant varieties and places of origin. In total, five sRNA libraries were sequenced by Illumina HiScan (Magyar kajszi variety from the isolator was not sequenced). As a result, 11.9-13.3 million raw reads/library were generated (Table S3). After quality control and trimming of the adapters, duplicates were removed and non-redundant reads (530,000-1.7 million/library), without removing the host-specific sRNAs, were used for virus diagnostics. In different libraries, 4.75-22.86% of the total non-redundant reads were mapped to viral reference genomes. In parallel, in the five libraries, 3444-16,284 contigs were assembled with different k-mers (13,15,17) using Velvet (Table S4), resulting in only 0-154 contigs of viral origin.

Small RNA NGS-Based Virus Diagnostics
In order to identify viruses in the samples, sRNA reads or longer contigs, were aligned and mapped to reference genomes and partial sequences of viruses originating from plant or insect hosts. During this analysis, among all the contigs, PPV-, CVA-and LChV-1-specific contigs were identified (Table S4). The length of the contigs ranged from 31-188 nt. In 2_LO_sn, there were 14 and in 5_M_sn 128 PPV-specific contigs, there were only six and eight CVA-and 26 LChV-1-specific contigs in 3_P_ih, 4_P_sn and 5_M_sn, respectively (Table S5). When sRNA reads were directly aligned to the viral reference genome, the number of matched non-redundant and normalized redundant reads was counted and coverage (in %) of the whole viral reference genome was also calculated (Table S5). According to our analysis, two libraries: 2_LO_sn and 5_M_sn represented plant materials that were infected with PPV. Besides PPV, we identified infection of CVA in 3_P_ih, 4_P_sn and LChV-1 in 5_M_sn libraries. Virus-specific reads originated from the entire genome in each case ( Figure S1a,b upper panels). CVA-and LChV-1-specific reads were mainly 21 nt long with both sense and antisense origins, while more than one-third of the LChV-1-specific sRNA reads were 22 nt long, suggesting a key role of antiviral DCL4 and DCL2 during their biogenesis ( Figure S1a,b lower panel) [20,25].

Validation of the Small RNA NGS Virus Diagnostics
In order to validate our deep sequencing results, we synthesized cDNA from pooled RNAs representing the sequenced libraries and set up RT-PCR reactions with published diagnostic primers (for PPV [48]) or with primers designed according to the sequenced sRNA reads (Table S1).
Positive controls (cDNA from virus-containing samples) and negative controls were always included. According to sRNA NGS analyses, two libraries: Ligeti óriás and Pannónia kajszi, which originated from an open field stock nursery, were infected with PPV. To reveal how widespread this infection is, the presence of the virus was validated by RT-PCR using cDNA produced from the individual trees ( Figure S2). According to this analysis, only one tree from each variety was infected. The presence of PPV was rare; it was missing from the isolator and occurred only at the open field; consequently, it could be the result of an onsite infection that originated from the neighborhood.
sRNA NGS also showed the presence of two additional viruses, never before described in Hungary: CVA and LChV-1.

Validation of the Presence of Cherry Virus A
The presence of CVA indicated by sRNA reads in 3_P_ih and 4_P_sn libraries could be validated by both RT-PCR, amplifying the entire putative MP region (Figure 1a) and Northern blot analysis (Figure 1b), using pooled RNA representing the sequenced libraries, indicating that Pannónia kajszi trees, even in the isolator house, were infected with this virus. Moreover, each tested individual tree proved to be positive for CVA presence (Figure 1c). This result raised the possibility that this infection did not derive from the environment, but instead originated from the established in vitro cultures. Testing two batches of plantlets from two different lines of in vitro cultures of Pannónia kajszi variety line 2 showed the presence of the virus (Figure 1d), supporting this theory. As CVA is not on the quarantine list, its presence was not checked during sanitation. Even though its presence is not connected to any visual symptom, especially not on the apricot, it may interfere with other viruses; therefore, its presence should be avoided in any propagation materials.

Validation of the Presence of Little Cherry Virus 1
Apricot was not considered as a natural host of LChV-1; moreover, the mechanical inoculation of this host with this virus failed [50], but recently its presence on apricots was reported in the Czech Republic [17]. With sRNA NGS, we detected its presence in the 5_M_sn library, produced from trees of the Magyar kajszi variety on the open field stock nursery. The presence of the virus was validated by RT-PCR (Figure 2a). Although Magyar kajszi trees in the isolator were not NGS sequenced, according to RT-PCR they were found to be uninfected with LChV-1 (Figure 2a). Moreover, RT-PCR of the individuals showed that only one tree in an open field was infected (Figure 2b). To find out whether in vitro clones of this variety contain the virus, we checked two batches of two different lines of the in vitro cultures for the presence of LChV-1 and we did not find any infection. These findings suggest that infection of the apricot occurred in the open field stock nursery and not by grafting during its propagation. Although its vector transmission is not proven, LChV-1 could be transmitted by insect vectors [51]. The apricot stock nursery is close to a sweet and sour cherry variety collection, which could serve as a reservoir for the virus. However, the infection rate was low and we found LChV-1 in Magyar kajszi, one of the varieties which was shown to be infected in the Czech Republic. This coincidence, even in the lack of symptoms, suggests the susceptibility of this variety to LChV-1.

LOih LOsn Pih Psn Mih Msn
Results of the sRNA NGS and its comparison with RT-PCR, summarized in Table 1, show that this high-throughput method can be reliably used for virus diagnostics.

Phylogenetic Relationship of Hungarian CVA and LChV-1 Isolates
During RT-PCR validation, cDNAs were not only synthesized from RNA pools, but from RNAs extracted from individuals as well. Using these individual specific cDNAs, the PCR experiment was repeated by using a proofreading DNA polymerase. Amplified products were cloned and sequenced by traditional Sanger sequencing. Sequences were deposited into GenBank (Accession numbers: MH321189-91) and used for phylogenetic comparison ( Table 2). According to the phylogenetic analysis of the MP coding region of the CVA isolate from Pannónia kajszi, this isolate belongs to Group V, together with isolates from non-cherry hosts and more importantly together with other CVA isolates originating from the P. armeniaca host (Figure 3). Pairwise comparison of CVA sequences from this host showed high similarity (98-99% nt identity), except for one Canadian isolate (16C256_N6), which has a higher diversity (92% identity for this part of the movement protein coding region), supporting the observation of Gao which questioned the origin of the mode of exchange between and the adaptation of CVA to various hosts (Table S6) [30].
Phylogenetic analysis of Hungarian LChV-1 isolate (both according to HSP70h and CP sequences) shows that it is very closely related to the Italian isolate ITMAR, which has been found to be the main causative agent of Kwanzan stunt disease, clustering to Group III (Figure 4) [35]. As a result of a comparison of sequences derived from different countries and hosts, the LChV-1 partial CP coding region from the Hungarian apricot showed the highest similarity to the isolate EU716000 from Italy from P. salicina and HG792407 from Greece from P. avium. Comparative analysis of the HSP70h coding part of the genome showed that the German (NC_001836) and the Italian (EU715989) ITMAR isolates showed the highest similarity to the Hungarian HSP70h sequence (Table S7), supporting their close relation in Group III, as demonstrated by the phylogenetic analysis comparison of their CP coding sequences. Unfortunately, the Czech LChV-1 isolate from apricot has no available sequence data for HSP70h or the CP region, hindering the identification of its relationship to the Hungarian isolate.

Conclusions
New high-throughput sequencing-based methods can provide valuable information about the presence of different viral pathogens. Due to their speed and reliability, they proved to be a suitable alternative to labor-and time-intensive biological indexing [52,53]. In addition to dsRNA seq, here we show that other NGS-based techniques, such as sRNA NGS, can also be reliably used for virus diagnostics in woody plants, such as fruit trees. Results gained by sRNA NGS could be validated by other molecular biology methods such as RT-PCR and Northern blot. Although sensitivity of NGS can sometimes lead to false positive results [54], our results imply that virus elimination during the production of the propagation material is essential and needs to be monitored by the most sensitive virus diagnostic methods in order to minimize the possibility of viruses passing through this control. sRNA NGS is one alternative which can be used during the production of virus-free propagation material to avoid unknown and unwanted infections. However, before becoming a diagnostic tool, adopted in the certification protocols, apart from the reduction of the sequencing cost, standardization and improvement of the bioinformatics pipeline is highly needed as well [55].
sRNA NGS technology helped us to describe the unknown and rare presence of LChV-1, in apricot. The virus diagnostics of the in vitro cultured plant, trees under the isolator net and in an open field in the stock nurseries suggested that while infection with LChV-1 happened in an open field from the surroundings, CVA infection originated from the propagation material. However, to test this hypothesis, the presence of LChV-1 in the surrounding cherry and sour cherry plantations together with the presence of mealybugs or other possible vectors should be tested.
In this work, CVA and LChV-1 were first described in Hungary. To get a more detailed picture of their prevalence and distribution in different cultivated Prunus species and species present in the natural vegetation, further investigation is necessary.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/10/6/318/s1. Figure S1: Schematic representation of the (a) CVA and (b) LChV-1 specific sRNA reads, Figure S2: RT-PCR validation of the presence of PPV in the sampled individual trees using primers amplifying 242 bp of the coat protein; Table S1: Sequence of the used of PCR primers for virus detection with their appropriate references, Table S2: Basic information of the sampled varieties with the number of the small RNA library, Table S3: Initial statistics of the sequenced reads of different libraries, Table S4: Initial statistics of the velvet built contigs, Table S5: Summary of bioinformatics analysis, Table S6: Percent identity matrix from pairwise comparison of the sequences available at GenBank coding MP of the CVA, originating from P. armeniaca host, Table S7: Percent identity matrix from pairwise comparison of the sequences available at GenBank coding HSP70 of the LChV-1.
Author Contributions: D.B., N.J-C., T.V., J.B. performed the experiments and analyzed the data, J.M. and G.E.T. made bioinformatics work, L.K.S., Z.K. and É.P. made in vitro cultures and analyzed data, E.V. conceptualized the experiments, analyzed the data and wrote the manuscript. All authors discussed the results, contributed and reviewed the final manuscript.