Using Next Generation Sequencing to Study the Genetic Diversity of Candidate Live Attenuated Zika Vaccines

Zika virus (ZIKV) is a mosquito-transmitted positive-sense RNA virus in the family Flaviviridae. Candidate live-attenuated vaccine (LAV) viruses with engineered deletions in the 3’ untranslated region (UTR) provide immunity and protection in animal models of ZIKV infection, and phenotypic studies show that LAVs retain protective abilities following in vitro passage. The present study investigated the genetic diversity of wild-type (WT) parent ZIKV and its candidate LAVs using next generation sequencing analysis of five sequential in vitro passages. The results show that genomic entropy of WT ZIKV steadily increases during in vitro passage, whereas that of LAVs also increased by passage number five but was variable throughout passaging. Additionally, clusters of single nucleotide variants (SNVs) were found to be present in the pre-membrane/membrane (prM), envelope (E), nonstructural protein NS1 (NS1), and other nonstructural protein genes, depending on the specific deletion, whereas in the parent WT ZIKV, they are more abundant in prM and NS1. Ultimately, both the parental WT and LAV derivatives increase in genetic diversity, with evidence of adaptation following passage.


Introduction
Zika virus (ZIKV) is a mosquito-borne flavivirus that is transmitted by Aedes species mosquitoes. For decades after its discovery in 1947 in Uganda, ZIKV infections were infrequent and caused mild disease, if any [1]. However, ZIKV infections spread rapidly through the Americas in 2015, leading to severe fetal neurological malformations and deficiencies, termed congenital Zika syndrome [2]. In adults, ZIKV infections cause fever, rash, and in some cases Guillain-Barré syndrome.
Other medically important mosquito-transmitted flaviviruses are yellow fever, West Nile, Japanese encephalitis, and dengue serotype 1-4 viruses. Flavivirus genome organization consists of a single-stranded, positive-sense RNA that contains a 5' untranslated region (UTR) followed by a single open reading frame encoding a polyprotein, and a 3'UTR, which consists of sequence components that are conserved among flaviviruses. Processing of the polyprotein results in 10 proteins: three structural proteins: capsid, pre-membrane/membrane (prM), and envelope (E), and seven nonstructural (NS) proteins: NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5. The structural proteins make up the mature virion, whereas the nonstructural proteins function primarily in replication.
Efforts to combat ZIKV infections led to the development of several vaccine candidates, including live-attenuated vaccines (LAVs). Recombinant infectious clones of ZIKV strain FSS13025 lacking a sequence of 10-30 nucleotides in the 3'UTR have been evaluated as LAV candidates [3][4][5]. The rationale for these mutant viruses is based on recombinant live attenuated dengue viruses lacking 30-262 nucleotides in the 3'UTR. The shortest of these deletions (den∆30) has been further developed into a candidate tetravalent LAV, which is currently undergoing phase III clinical trials (TV003/Instituto Butantan and other vaccine developers) [6,7]. The ZIKV 3'UTR∆10 (∆10), 3'UTR∆20 (∆20), and 3'UTR∆30 (∆30) viruses were generated as candidate LAVs in order to identify the vaccine with the best efficacy [3]. Initial in vitro and in vivo tests showed that the three ZIKV LAVs are attenuated in mice, and additional studies showed that ∆10 and ∆20 were protective in mice and nonhuman primates, as well as the fact that maternal vaccination in mice protects against ZIKV infection and transmission in utero [3][4][5]8]. In vitro passaging of LAVs is necessary in order to generate working and seed lots, and phenotypic studies have shown that 3'UTR deletion mutants retain protective abilities following in vitro passage. Sanger sequencing comparisons of the genomes after passage 0 (P0) or P5 in Vero cells indicated that ∆10, ∆20, and ∆30 retained the engineered deletions but also gained up to four amino acid substitutions in the E protein and one in the NS1 protein [3]. Importantly, no phenotypic changes were observed between P0 and P5 of ∆10. Additional testing of P4, P7, and P10 of ∆10 and ∆20 showed that up to three amino acid substitutions per clone were detected at P7 and P10; however, there was little phenotypic variability in mice between the immunogenicity of P0 and P10, suggesting that ∆10 and ∆20 are valuable LAV candidates for ZIKV [9]. In the present study, the genetic diversity of the wild-type (WT) parental virus FSS13025 and 3'UTR deletion mutants from P0 to P5 in Vero cells were examined using next generation sequencing (NGS) to establish subconsensus changes and monitor genetic diversity following passaging that may contribute to attenuation. The results demonstrated that genetic diversity increases for both WT and attenuated ZIKVs and showed clusters of genetic diversity in prM and NS1 gene regions that contribute to adaptation. Furthermore, the attenuated ZIKVs have an additional adaptation in the E gene region that may contribute to the attenuated phenotype, as it is not detected in WT parental ZIKV.

Sequence Analysis
NGS determination of viral RNA genomes was performed using the Illumina platform and analyzed with Shannon entropy calculations and variant determination using Vphaser2.0, as previously detailed [11]. Briefly, RNA was extracted from culture supernatants of each virus stock, and complementary DNA libraries were constructed and sequenced using the Illumina platform. Paired-end reads were processed and de novo sequences were generated. Consensus FASTA sequences were aligned to confirm the engineered deletions. The variability (or uncertainty) at each nucleotide position was determined using Shannon entropy calculations, as previously described [11][12][13]. Single nucleotide variants (SNVs) were detected using very sensitive local conditions and controlled for false discovery and strand bias (α = 0.05).

Statistical Analyses
Sequence coverage and mapped read statistics were determined using Qualimap v2.1.2; non-biased SNVs were determined in Vphaser2; and means, standard error, Kruskal-Wallis with post-tests, and Spearman correlation were performed using GraphPad Prism v8. Microsoft Excel v16 was used to organize and sort the data.

Next Generation Sequencing Results
Vero cells are widely used and accepted by regulators for the production of viral vaccines, including LAVs, and in vitro passaging of LAVs is necessary in order to generate primary and secondary seeds as well as vaccine lots. In order to study the sub-consensus level genetic diversity of the 3'UTR deletion mutants, NGS was performed and the results were compared to those acquired with the infectious clone of the WT parent virus. Every passage from P0 to P5 of WT, ∆10, ∆20, and ∆30 was sequenced using the Illumina platform. The mean coverage depth of individual viruses ranged from 540 to 10,609 (Table 1). Consensus sequence alignments confirmed that the engineered deletions were retained after passaging and that the consensus genomic WT, ∆20, and ∆30 sequences were unchanged following five passages, confirming genetic stability until P5 (Table 1). However, from P4 to P5 of ∆10, five nucleotide changes occurred (C2298U, G2306U, G2797A, C8963U, and C9293U), resulting in two amino acid substitutions (E K443N and NS1 R103K). Both substitutions were previously detected in ∆10 P5 and have been reported [3].

Shannon Entropy Increases during In Vitro Passage
Shannon entropy was calculated in order to determine genetic diversity at each nucleotide position along each genome ( Figure S1). In all ZIKVs, several high entropy positions were identified throughout the genome and collectively increased during passaging (Figure 1a). Furthermore, mean genomic entropy was higher at P4/P5 than at P0, indicating the fact that overall entropy increases during passaging ( Figure 1a). Kruskal-Wallis and Dunnett's post-tests showed significant differences across passages (p < 0.0001) ( Table S1). The mean genomic entropy of WT gradually increased and became stabilized by P4 (Figure 1a), and high positional entropy was most abundant in the prM and NS1 genes, as previously discovered [11], but the E, NS2B, NS3, and NS5 genes had at least one position of notable entropy ( Figure S1). In comparison, positional entropy values of ∆10 P0 and P1 were low; however, mean genomic entropy increased dramatically from P0 to P1, then steadily decreased through P2 and P3, followed by another sharp increase at P4 that was retained (Figure 1a). High positional entropy values clustered in prM, E, NS1, and NS5 in ∆10 P3-P5 ( Figure S1). The mean genomic entropy values of ∆20 and ∆30 were similar to each other because both increased at every passage, with the exception of a slight decrease at P3 (Figure 1a). High entropy positions clustered in the C, prM, E, and NS1 genes of both ∆20 and ∆30, and were also detected in the NS3 and NS5 genes ( Figure S1).   Supplementary Table S1. Positions with increasing entropy values across passages were tracked as indicators of possible genetic adaptation sites and were detected in the prM, E, and NS1 gene regions of all ZIKVs ( Figure 1b); however, all gene regions had at least one possible adaptation position in at least one ZIKV. Notably, WT had 17 positions in NS1 with increasing entropy, compared with only 4-5 in the NS1 of ∆10, ∆20, and ∆30. Conversely, WT ZIKV contained only one position with increasing entropy in the prM (798) and E (2390), whereas approximately 4-7 positions increased in entropy in the attenuated ZIKVs (Figure 1b). Thus, all ZIKVs may undergo adaptation in response to passaging.

Variants Increase during In Vitro Passage
Single nucleotide variants (SNVs) present in each passage of WT, ∆10, ∆20, and ∆30 were identified (Figure 2a). In WT, the number of SNVs decreased from 67 at P0 to 35 at P1, then increased gradually, peaking at 93 in P4. In WT P1-P5, the majority (approximately 74%) were non-synonymous SNVs. Interestingly, in ∆10, the number of SNVs increased from 22 at P0 to 169 at P1; P2-P4 had approximately 40 SNVs and increased to 69 at P5. Therefore, although individual positional entropy values were low in ∆10 P1, it is possible that the abundance of SNVs increased the mean genomic entropy (Figure 1a) observed for this passage. Next, both ∆20 and ∆30 had fewer SNVs than did ∆10 through P3. With the exception of ∆10 P1, the passages of attenuated ZIKVs had fewer SNVs than WT.
Across the genome, SNV frequencies increased similarly to the positional entropy values. Passaging of WT led to the emergence of SNVs in the prM, E, and NS1, as previously reported [11]; the highest frequency SNVs identified in WT were at P5: position 798 in prM at 13% and 3282 in NS1 at 15% (Figure 2b and Figure S1a). For ∆10 P1, the overwhelming majority of SNVs (162 of 169) were very low frequency (less than 1%), and by P3, SNV frequencies began to increase, including those at positions 2298, 2306, 2797, 8963, and 9293, which eventually become consensus changes in ∆10 P5 (Figure 2b and Figure S1b). SNVs in ∆20 were present at low frequency (less than 5%), except for clusters in prM, E, and NS1 (Figure 2b and Figure S1c). In ∆30, SNVs clustered in prM and E at P2 and P3 had the highest frequencies (approximately 10%), but SNV mean frequencies decreased to approximately 1% at P4 and P5 ( Figure S1d). Overall, the SNV analyses show that the diversity of WT during serial passaging is dominated by non-synonymous SNVs at low frequencies, whereas the frequency of SNVs in the attenuated mutants is broader in range due to increased frequencies in the prM, E, NS1, and, to a lesser extent, NS5.

Sequence Coverage Affects Genomic Diversity Measurements
As stated above, genome coverage varied among the stocks. In order to determine the effects of genomic sequence coverage on diversity, correlation tests were used. Sequence coverage was found to be significantly correlated with the mean genomic entropy (p < 0.0001) and with the total number of SNVs detected (p = 0.0007), but not with the frequency of the SNVs detected (p = 0.0888) (Figure 3). These data underscore the importance of providing dataset descriptions when comparing subconsensus diversity among different viruses.

Sequence Coverage Affects Genomic Diversity Measurements
As stated above, genome coverage varied among the stocks. In order to determine the effects of genomic sequence coverage on diversity, correlation tests were used. Sequence coverage was found to be significantly correlated with the mean genomic entropy (p < 0.0001) and with the total number of SNVs detected (p = 0.0007), but not with the frequency of the SNVs detected (p = 0.0888) ( Figure 3). These data underscore the importance of providing dataset descriptions when comparing subconsensus diversity among different viruses.

Discussion
The genetic diversity of in vitro passaged parental WT ZIKV strain FSS13025 and its candidate LAV derivatives based on 3'UTR deletion mutants, namely, Δ10, Δ20, and Δ30, were examined using NGS. Two methods of diversity analysis were applied in this study: entropy calculations and statistically significant SNV identification. Consensus sequence analysis confirmed that the three candidate LAVs did not incorporate changes in the consensus sequence following five passages in Vero cells, except that Δ10 incurs two coding substitutions after P4: E K443N (2306) and NS1 R103K (2797), as previously shown using conventional sequencing [3]. Thus, all three candidate LAVs retained the attenuating deletions following passage in a cell substrate acceptable for producing live human vaccines. Interestingly, the NS1 103 substitution was also recently detected in a liveattenuated ZIKV strain PA259249 (Panama, 2015) [14] generated through sequential passaging in HeLa cells, indicating that this position undergoes adaptation as a result of passaging in both Vero and HeLa cell culture. Less is known regarding the E 443 substitution, yet we have previously shown that American ZIKV isolates contain SNVs at this position [11]. The prM and NS1 provide genomic diversity to WT ZIKVs, and the current entropy and SNV analyses indicate that infectious clones of attenuated ZIKVs also possess high diversity in these genes and in the E gene region.
Individual passages of each of the four ZIKVs led to multiple positions with high sequence diversity. Some positions had high entropy value or SNV frequency in a single passage, but others increased with passaging. The increase of high positional entropy and SNV frequency across passages indicated that genome optimization was occurring as a result of serial passaging, or as adaptation to Vero cells. The present study provides additional evidence for a role of prM and NS1 in adaptation of ZIKVs. Specifically for WT, nucleotide positions 798 (prM 109) and 3282 (NS1 265) were found to have SNVs that reach 13% and 15% frequency, respectively, at P5. The former SNV was previously identified in ZIKVs Dak41524 and FSS13025 [11]. Furthermore, the NS1 K265E substitution was previously characterized as a Vero adaptation that increases virus assembly but does not affect mouse virulence of ZIKVs FSS10325 and PRVABC59 [15]. In the attenuated ZIKVs, the E protein was also important for adaptation. For example, Δ10, Δ20, and Δ30 each had increasing high frequency SNVs in positions 2304 and 2306 encoding a substitution for E K443 (Figure 2c), which is conserved among related flaviviruses with the exception of Japanese encephalitis virus, which has an R instead of a K. E 443 is located in the second helix of the stem, which may play an important role in dengue virus assembly and entry [16]; however, its function in ZIKV has not been determined. Interestingly, the results of the SNV analysis varied for E 443 in the present study. In Δ10, the result was a consensus substitution of K to N at P5. In Δ20, two different SNVs (G to U at 19% and G to C at 15%) were detected at 2306, both encoding a K to N substitution that did not reach consensus level, presumably due to competition of three nucleotides (G, U, and C) at that position. In Δ30, a SNV at

Discussion
The genetic diversity of in vitro passaged parental WT ZIKV strain FSS13025 and its candidate LAV derivatives based on 3'UTR deletion mutants, namely, ∆10, ∆20, and ∆30, were examined using NGS. Two methods of diversity analysis were applied in this study: entropy calculations and statistically significant SNV identification. Consensus sequence analysis confirmed that the three candidate LAVs did not incorporate changes in the consensus sequence following five passages in Vero cells, except that ∆10 incurs two coding substitutions after P4: E K443N (2306) and NS1 R103K (2797), as previously shown using conventional sequencing [3]. Thus, all three candidate LAVs retained the attenuating deletions following passage in a cell substrate acceptable for producing live human vaccines. Interestingly, the NS1 103 substitution was also recently detected in a live-attenuated ZIKV strain PA259249 (Panama, 2015) [14] generated through sequential passaging in HeLa cells, indicating that this position undergoes adaptation as a result of passaging in both Vero and HeLa cell culture. Less is known regarding the E 443 substitution, yet we have previously shown that American ZIKV isolates contain SNVs at this position [11]. The prM and NS1 provide genomic diversity to WT ZIKVs, and the current entropy and SNV analyses indicate that infectious clones of attenuated ZIKVs also possess high diversity in these genes and in the E gene region.
Individual passages of each of the four ZIKVs led to multiple positions with high sequence diversity. Some positions had high entropy value or SNV frequency in a single passage, but others increased with passaging. The increase of high positional entropy and SNV frequency across passages indicated that genome optimization was occurring as a result of serial passaging, or as adaptation to Vero cells. The present study provides additional evidence for a role of prM and NS1 in adaptation of ZIKVs. Specifically for WT, nucleotide positions 798 (prM 109) and 3282 (NS1 265) were found to have SNVs that reach 13% and 15% frequency, respectively, at P5. The former SNV was previously identified in ZIKVs Dak41524 and FSS13025 [11]. Furthermore, the NS1 K265E substitution was previously characterized as a Vero adaptation that increases virus assembly but does not affect mouse virulence of ZIKVs FSS10325 and PRVABC59 [15]. In the attenuated ZIKVs, the E protein was also important for adaptation. For example, ∆10, ∆20, and ∆30 each had increasing high frequency SNVs in positions 2304 and 2306 encoding a substitution for E K443 (Figure 2c), which is conserved among related flaviviruses with the exception of Japanese encephalitis virus, which has an R instead of a K. E 443 is located in the second helix of the stem, which may play an important role in dengue virus assembly and entry [16]; however, its function in ZIKV has not been determined. Interestingly, the results of the SNV analysis varied for E 443 in the present study. In ∆10, the result was a consensus substitution of K to N at P5. In ∆20, two different SNVs (G to U at 19% and G to C at 15%) were detected at 2306, both encoding a K to N substitution that did not reach consensus level, presumably due to competition of three nucleotides (G, U, and C) at that position. In ∆30, a SNV at 2304 that encodes a K to Q substitution reached 7% frequency. In WT, SNVs at 2305 and 2306 were present at P1-P5, but the frequencies remained consistent throughout passaging at or below 1%, and were unlikely to contribute to adaptation. Additional SNVs with increasing frequencies were detected in the E gene region of ∆10, ∆20, and ∆30, but were not shared ( Figure 2).
Correlation tests revealed a possible relationship between genome sequence coverage and mean entropy values and, to a lesser extent, the number of SNVs (Figure 3). The sequence coverage data for the 24 ZIKVs in this study ranged from 500 to 10,600 ( Table 1). Coverage of the attenuated mutants at P5 was greater than 8000, whereas coverage of P4 was approximately 2000, yet entropy and SNV data were not proportionally increased. Therefore, coverage levels may present a limitation, but other factors, such as in vitro passage, exerted pressures on the genetic variability of passaged ZIKVs. Passaging in Vero cells may not mirror the mutational landscape that LAVs generate in the human host, yet the data are important in understanding mutation dynamics in the context of characterization of candidate Zika LAVs.

Conclusions
Overall, the genetic diversity of WT began to increase immediately during early passages and leveled after P4, indicating that WT ZIKV adapted during the passage series. ∆20 and ∆30 had mean entropy and SNV numbers that remained low through P3 then increased. Genetic diversity of ∆10 differed in that the number of synonymous SNVs and entropy increased sharply at P1 and then decreased by P2 only to rise again after P3, indicating that in vitro culture placed immediate stress on the virus sequence. Additionally, adaptation of SNVs in ∆10 led to consensus sequence changes by P5. Ultimately, all ZIKVs increased in genetic diversity after passage, but the kinetics differed between WT and attenuated ZIKVs. This study demonstrated that adaptation in mammalian cells to propagate vaccine virus includes both consensus and subconsensus changes that may contribute phenotypically, underscoring the importance of tracking genetic diversity of LAV candidates currently in development in order to determine which virus passage will be utilized for vaccine seed stocks. Ultimately, the application of NGS diversity analysis to study the genetic stability of LAVs remains under development and should be considered in the context of detailed phenotypic and immunogenicity studies.