Functional Characterisation of New Sesquiterpene Synthase from the Malaysian Herbal Plant, Polygonum Minus

Polygonum minus (syn. Persicaria minor) is a herbal plant that is well known for producing sesquiterpenes, which contribute to its flavour and fragrance. This study describes the cloning and functional characterisation of PmSTPS1 and PmSTPS2, two sesquiterpene synthase genes that were identified from P. minus transcriptome data mining. The full-length sequences of the PmSTPS1 and PmSTPS2 genes were expressed in the E. coli pQE-2 expression vector. The sizes of PmSTPS1 and PmSTPS2 were 1098 bp and 1967 bp, respectively, with open reading frames (ORF) of 1047 and 1695 bp and encoding polypeptides of 348 and 564 amino acids, respectively. The proteins consist of three conserved motifs, namely, Asp-rich substrate binding (DDxxD), metal binding residues (NSE/DTE), and cytoplasmic ER retention (RxR), as well as the terpene synthase family N-terminal domain and C-terminal metal-binding domain. From the in vitro enzyme assays, using the farnesyl pyrophosphate (FPP) substrate, the PmSTPS1 enzyme produced multiple acyclic sesquiterpenes of β-farnesene, α-farnesene, and farnesol, while the PmSTPS2 enzyme produced an additional nerolidol as a final product. The results confirmed the roles of PmSTPS1 and PmSTPS2 in the biosynthesis pathway of P. minus, to produce aromatic sesquiterpenes.


Introduction
Over the last 25 years, nearly 65,000 chemical structures of terpenoids have been discovered, making terpenoids the class of natural products with the greatest structural diversity [1,2]. Terpenoids are involved in a variety of important functions in regulating plant growth (especially for terpenoid lactones) and play an ecological role in attracting pollinators [3]. Terpenoids are grouped into different classes based on the number of 5-carbon building blocks [4][5][6][7]. All terpenoids are derived from the common phosphorylated five-carbon (C5) building units, isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) [8]. There are two major pathways involved in the biosynthesis of terpenoids, namely, the mevalonate (MVA) pathway, which is primarily found in eukaryotes, and the methylerythritol phosphate (MEP) pathway (non-mevalonate pathway), which is primarily found in prokaryotes and plant chloroplasts [9][10][11]. For sesquiterpene biosynthesis, IPP and DMAPP undergo condensation to form farnesyl pyrophosphate

Screening and Isolation of Sesquiterpene Synthase Gene from P. minus
Two new candidates of sesquiterpene synthase genes, PmSTPS1 (comp62410_co_seq6) and PmSTPS2 (comp47018_c0_seq1), were successfully identified through the sequence analysis of P. minus transcriptome [40]. The 1098 bp PmSTPS1 transcript contained an open reading frame (ORF) of 1047 bp, encoding 348 amino acids with a calculated molecular mass of 40.9 kDa and an isoelectric point (pI) of 6. 15 ( Figure S1). The ORF of PmSTPS1 started from the nucleotide position at 25 and ended at position 1071. The deduced amino acid sequence of PmSTPS1 (GenBank accession no. MG921605) showed no signal peptide. A ProtParam analysis of the predicted amino acid sequence of PmSTPS1 revealed 47 negatively charged residues (Asp and Glu) and 52 positively charged residues (Arg and Lys), which represented the aliphatic index of this protein. This was a positive factor for the increased thermostability of the globular protein. The second sesquiterpene synthase transcript, PmSTPS2 (GenBank accession No: MG921606), was 1974 bp long and had an ORF of 1695, which encoded a polypeptide of 564 amino acids ( Figure S2). The ORF of PmSTPS2 started from the nucleotide position at 97 and ended at position 1798. The calculated molecular mass of the mature protein was approximately 65.96 kDa, with a predicted pI of 5.75. A ProtParam analysis of the predicted amino acid sequence of PmSTPS2 identified 83 negatively charged residues (Asp and Glu) and 70 positively charged residues (Arg and Lys).
Based on the BLASTx analysis, the predicted amino acid sequences of PmSTPS1 (Table 1) and PmSTPS2 (Table 2) had the closest hit to drimenol synthase from Persicaria hydropiper, with a 96% and 46% identity, respectively. The predicted amino acid sequence of PmSTPS2 was consistent with those of other sesquiterpene synthases encoding proteins of 550-580 amino acids, with molecular weights of 60-70 kDa. Conversely, the length of PmSTPS1, with only 348 amino acids, was much shorter than that of the other sesquiterpene synthases. Therefore, only PmSTPS2 met the range of other reported plant terpene synthases [21,[41][42][43][44][45].
The presence of the conserved domains in the PmSTPS1 and PmSTPS2 proteins was consistent with and similar to that of the other terpene synthase features. The terpene synthase family N-terminal domain (PF01397) and the synthase family C-terminal metal-binding domain (PF03936) contained highly conserved aspartate-rich motifs (DDxxD), which were essential for enzyme-substrate binding and catalytic function. The first aspartate-rich motif played a role in the determination of the chain length for the resulting prenyl pyrophosphate.
Based on the multiple sequence alignment of PmSTPS1 (Figure 1), several conserved motifs that were found in typical terpene synthases were identified, including the DDxxD (residue 100-104) and NSE/DTE (residue 245-253) motifs. The DDxxD and NSE/DTE motifs flanked the entrance of the active site. In addition to these motifs, there was a highly conserved arginine-rich RxR motif, which was involved in the complexing of the diphosphate group, after the ionisation of FPP [18,39]. The RxR motif was located at 45 amino acids, upstream of the first DDxxD motif. For PmSTPS2, the conserved arginine-rich (RxR) region at amino acid position 278-281 was conserved in all of the terpene synthases [19]. Moreover, the aspartate-rich motif of DDxxD, which might have been the Mg 2+ binding site, was located at position 314-318 of the amino acid sequence. Another metal binding motif, the NSE/DTE motif, was detected at amino acid position 461-467 ( Figure 1). The less conserved motif NSE/DTE, apparently evolved from a second motif that was conserved in prenyl transferase. In general, these motifs were located on the opposite sides of the active site [6,7]. The metal binding residues appeared as NSE in most microbial and fungal cyclases and as DTE in most plant cyclases.

Phylogenetic Analysis of P. minus Sesquiterpene Synthase (PmSTPS)
The PmSTPS1 and PmSTPS2 amino acid sequences were aligned and compared with other flowering plant terpene synthase sequences, using Clustal Omega (Figure 2), and they showed a low sequence similarity (42.94%). The phylogenetic analysis showed a particularly close relationship between the PmSTPS1 and PmSTPS2 amino acid sequences. The PmSTPS1 was clustered in the same clade with sesquiterpene synthase (PmSTS) from Persicaria minor and drimenol synthase from Persicaria hydropiper. The results showed that the PmSTPS1 from P. minus was grouped into a single clade with a 43.04% identity, which suggested a monophyletic origin of the gene. Additionally, PmSTPS2 was placed in the same clade with (+) delta-cadinene synthase from Ricinus communis. Moreover, PmSTPS1 and PmSTPS2 were grouped together with the terpene synthases from the Santalum and Vitis vinifera species. Multiple sequence alignments of PmSTPS1 and PmSTPS2 amino acid sequences, with sesquiterpenes from other plants species, showed a high sequence similarity (42-96%).

Expression of PmSTPS1 and PmSTPS2 in E. coli
For the analysis of the protein expression, recombinant bacterial strains harbouring pQE2 in E. coli M15 with PmSTPS1 and PmSTPS2 were compared with those harbouring the control empty pQE2 vector. The cells were harvested at different times (1, 3, and 5 h) post-induction. After sonication and centrifugation of the bacteria, soluble and insoluble crude fractions were separated with 10% SDS-PAGE. The SDS-PAGE analysis ( Figure S3) showed unclear corresponding protein bands at the expected size at different post-induction times, as well as in the control sample. However, Western Blotting ( Figure S4) confirmed the correct size of recombinant proteins. There was no band was observed in the control sample as expected. Correct protein sizes of 40.9 and 65.9 kDa were obtained for PmSTPS1 and PmSTPS2, respectively. From these findings, the recombinant PmSTPS1 and PmSTPS2 proteins from P. minus were successfully expressed in E. coli, and the activities of these enzymes were further investigated via enzymatic assays.

Identification of PmSTPS1 and PmSTPS2 Assay Products
A functional characterisation of the PmSTPS1 and PmSTPS2 genes was performed by an in vitro enzyme assay of the recombinant proteins. In this crude protein assay, the E. coli strain harbouring empty pQE-2 vector was used as the control strain. A GC-MS analysis showed that PmSTPS1 and PmSTPS2 produced β-farnesene, α-farnesene, and farnesol as the final products. Additionally, PmSTPS2 also produced nerolidol. For PmSTPS1, the products formed were β-farnesene (14.49 min), α-farnesene (15.79 min), and farnesol (16.98 min). Additionally, the principle products from PmSTPS2 enzyme were β-farnesene (14.49 min), α-farnesene (15.83 min), farnesol (16.84 min), and nerolidol (17.25 min). Based on the GC-MS analysis, both extracts from PmSTPS1 and PmSTPS2 showed multiple peaks for corresponding sesquiterpene products, compared with no peaks observed in the control sample, which did not exhibit any major products, although exogenous substrates were added ( Figure 3). These findings therefore demonstrated the successful production of sesquiterpenes in the recombinant E. coli strains overexpressing PmSPTS1 and PmSTPS2 enzymes, respectively.  All of the peaks were further confirmed by comparison with NIST and Wiley libraries, mass spectra, authentic sesquiterpene standards, and control ( Figure S5). Interestingly, although the sizes of PmSTPS1 and PmSTPS2 were different, the two enzymes were capable of producing similar sesquiterpene products, β-farnesene, α-farnesene, and farnesol ( Figure 3), but they did so at different levels. PmSTPS1 successfully converted the precursor FPP to produce 9.50% β-farnesene as the main product, followed by 8.86% α-farnesene, and 5.08% farnesol. PmSTPS2 showed synthesises nerolidol (48.33%) as a major product, followed by farnesol (15.30%), β-farnesene (5.07%), and α-farnesene (2.76%). Although the (E,E)-farnesyl pyrophosphate (FPP) substrate was added to the enzymatic assay, the control pQE-2 sample did not produce any significant products, indicating that endogenous metabolites did not affect the protein expression and analysis in this study.

Discussion
In this study, we provided the first cloning and functional characterisation of PmSTPS1 encoding a putative β-farnesene synthase, and PmSTPS2 encoding a putative nerolidol synthase in P. minus. A sequence comparison between the PmSTPS1 and PmSTPS2 indicated that the two enzymes had different protein and nucleotide sequences. However, both of the enzymes were structurally similar to other plant sesquiterpene synthases and contained all of the conserved motifs, including DDxxD, RXR, and NSE/EDTA, which were important for terpene synthase functionality [5,45,46]. Based on the phylogenetic analysis, PmSTPS1 and PmSTPS2 clustered on the same group with four distinct sesquiterpene synthases.
In addition, the unexpected band in the upper layer ( Figure S4) might have been caused by protease aggregation, according [47]. As shown in several recent studies, the protein sizes of a few plants sesquiterpene synthases showed a nearly similar molecular weight, with PmSTPS2 within the range of 60-70 kDa [24,48,49]. In general, terpene synthase could be classified into monoterpene, sesquiterpene, and diterpene synthase, with 550-860 amino acids encoding a 50-100 kDa protein [4,50]. As a result of the absence of the signal peptide sequence with 50-70 amino acids, the size of sesquiterpene synthases were typically smaller than those of the monoterpenes and diterpenes [51]. Ee et al. [37] also reported that the protein size of the P. minus β-sesquiphelandrene synthase was 65.1 kDa. Additionally, studies on β-caryophyllene synthase, that were encoded by OkBCS (GenBank accession no. KP226502) from Ocimum kilimandscharicum Gürke, showed a molecular weight of 63.6 kDa [24]. Until recently, no short sesquiterpene synthase sequence was characterised as a PmSTPS1. However, the short-chain length of this enzyme could be associated with several prenyltransferase (PT) enzymes. Based on previous findings, two prenyltranferases, Santalum farnesyl diphosphate synthase (SaFDS) and Hedychium farnesyl pyrophosphate synthase (HcFPPs), comprising 1029 and 1068 bp nucleotide sequences and encoding polypeptides of 343 and 356 amino acids, respectively, were reported [3,52].
Many terpene synthases (TPSs) had the ability to synthesise one or multiple products from a single substrate, regardless of whether it was farnesyl pyrophosphate (FPP) or geranyl pyrophosphate (GPP) [53,54]. In addition, the sesquiterpene synthases from different plant species produced more than one product [25,55,56]. Interestingly, although the sizes of PmSTPS1 and PmSTPS2 were different, the two enzymes produced similar sesquiterpene products (β-farnesene, α-farnesene, and farnesol, Figure 3), albeit at different percentages. Both PmSTPS1 and PmSTSP2 could catalyse the formation of β-farnesene, α-farnesene, nerolidol, and farnesol, whose functions were different from those of the previous STPS enzymes characterised in P. minus [36][37][38][39]. In addition, the enzymes demonstrated an inherent capacity for TPS enzymes to evolve different products and substrate specificities [57]. Moreover, the main factor of sesquiterpene diversity was the large number of different sesquiterpene synthases expressed in plants and the ability of some sesquiterpene synthases to form multiple products from a single FPP substrate [58].
Based on previous chemical profiling studies of P. minus essential oils from hydro-distillation extraction, low levels of nerolidol and farnesol were detected at 0.24% and 0.14%, respectively [34]. A similar finding was reported, which indicaed that the percentage of β-farnesene and α-farnesene compounds were also found at 0.92% and 0.82%, respectively [31,32,34]. The percentage of terpenes that were obtained directly from the GC-MS analysis of P. minus leaf essential oil was lower compared with the products that were produced by the enzymatic assay of crude protein PmSTPS1 and PmSTPS2. The variation in composition could have been because of the variable amounts of sesquiterpenes that were produced in the plants, depending on environmental factors. Nevertheless, the sesquiterpenoids that were produced by in vitro assay potentially contributed to the plant fragrance, because most of the acyclic sesquiterpenes compounds, namely, farnesene, nerolidol, and farnesol, were previously reported in various plant essential oils [21,59,60]. These compounds could be potentially commercialised as fragrances, flavouring agents, or pharmaceutical products. In addition, farnesene is an important compound for diesel and jet fuels [61]. Understanding the physiological and ecological roles of plant volatile sesquiterpenes has been challenging. Several sesquiterpenes compounds might have acted as defense chemicals against biological stresses. For instance, E,E-α-farnesene was reported to have potential for use as an alarm pheromone in the control of aphid pests [62]. Furthermore, the existence of farnesol in P. minus essential oil could have been related to the biosynthetic pathway of juvenile hormone (JH) III [63]. Nerolidol was not only used in cosmetics and non-cosmetic products [64], but was also proven to possess pharmacological and biological activities [65,66]. Therefore, the advantages of nerolidol have made it a promising drug candidate for industrial production [67,68].

Plant Material
P. minus plants was grown in an experimental plot at Universiti Kebangsaan Malaysia (UKM) under natural light and environmental conditions. The samples were originally collected from Ulu Yam, Selangor, Malaysia (UY; 3 • 16 14.63 N, 101 • 41 11.32 E), and were identified using ITS sequences [69]. The voucher specimens were deposited at the UKM herbarium. Leaf samples from P. minus plants were harvested in the morning, between 8 to 9 am, frozen in liquid nitrogen, and stored at −80 • C for RNA extraction.

RNA Isolation and cDNA Synthesis
Total RNA was isolated and extracted as it was previously reported [70]. The quantity, purity, and integrity of the RNA were determined using standard methods. Three micrograms of RNA were reverse transcribed into cDNA using the Onetaq ® One-step RT-PCR kit (New England Biolabs, Ipswich, MA, USA), according to manufacturer's instructions.

Candidate Gene Selection and Isolation of Full-Length PmSTPS1 and PmSTPS2
The candidate gene selection was achieved by mining the P. minus transcriptome data [40] for transcripts that were related to the sesquiterpene biosynthetic pathway. The assembled transcripts were classified as sesquiterpene synthase, based on homology search, and the terpene synthases were selected and fully sequenced prior to further analysis. Two new P. minus sesquiterpene synthase (PmSTPS) candidate genes were identified. The predicted ORFs for PmSTPS1 (GenBank accession no: MG921605) and PmSTPS2 (GenBank accession no: MG921606) were amplified by PCR, using the Q5 High-Fidelity DNA Polymerase (NewEngland Biolabs, Ipswich, MA, USA). The cDNA-gene specific PCR primers were PmSTPS1_F (5'-AAAGGTACCATGCCAA GGCTCG-3 ) and PmSTPS1_R (5 -TTTGTCGACAATCGGAATGGGAT-3 ); PmSTPS2_F (5 -AAAGGTACCATGTCAT CCCAAA-3 ), and PmSTPS2_R (5 -TTTAAGCTTAATGGAGAGAGGTT-3 ) were synthesized to amplify the 5 end and 3 end, respectively.
The PCR reaction mixture contained 1× reaction buffer, 2 mM MgCl 2 , 0.2 mM dNTPs, 5 units of Taq polymerase (Promega, Madison, WI, USA), 0.5 µM of forward and reverse primers, and 20 ng of template cDNA. The reaction was performed under the following conditions: pre-denaturation at 98 • C for 30 s, followed by 32 cycles of 98 • C for 10 s, 60 • C for 10 s, and 72 • C for 20 s, with a final extension at 72 • C for 2 min. The amplicons were digested with KpnI/SalI and KpnI/HindIII before being cloned into pUC19 and pUC57-Kan cloning vectors, respectively. The ligation mixtures were transformed into E. coli top 10 competent cells, before sub-cloning into the pQE-2 plasmid. The positive transformants were screened on LB agar that were supplemented with 100 µg/mL ampicillin, 20 µg/mL X-gal, and 0.1 mM IPTG (ThermoFisher Scientific, Waltham, MA, USA). The recombinant plasmids pQE-2:PmSTPS1 and pQE-2:PmSTPS2 were then transformed into the E. coli M15 competent cells and the transformants were selected on LB agar that was supplemented with 50 µg/mL kanamycin. The positive transformants were confirmed by colony PCR, and the gene sequences were verified via DNA sequencing (First BASE Laboratories, Seri Kembangan, Selangor, Malaysia).

Full-Length cDNA Sequence Analysis and Phylogenetic Tree Construction
The ORF for PmSTPS1 and PmSTPS2 were predicted using the ORF finder program (http://www.ncbi.nlm.nih.gov) and were subjected to BLASTX and BLASTP analyses. Multiple sequence alignment was achieved using the Clustal Omega pairwise alignment algorithm. Verification of the cDNA sequence, including the amino acid sequence, theoretical isoelectric point (pI), and predicted molecular weight (MW) of the analyses, was performed using ExPASy Proteomic tools (http://www.cn.expasy.org/tools/protscale.html). The physical and chemical characteristics of all of the deduced amino acid sequences were analysed by the ProtParam tool (http://web.expasy.org/program/). The signal peptide targeting location of the deduced proteins was predicted using the SignalP method (http://www.cbs.dtu.dk/services/SignalP) and ChloroP program (http://www.cbs.dtu.dk/services/ChloroP/). A protein domain analysis was performed using the SMART (Simple Modular Architectural Research Tool) database (http://smart.embl-heidelberg.de/).

Phylogenetic Analysis
Phylogenetic and molecular evolutionary analyses of the amino acid sequences of the PmSTPS1 and PmSTPS2 from different plant species were constructed using the default parameters of PhyML software, which were available at Phylogeny.fr web services (www.phylogeny.fr/version2_cgi/ simplephylogeny.cgi) [71]. PhyML was employed to construct a phylogenetic tree, by generating multiple alignments through the neighbour-joining computational method.

Expression of PmSTPS1 and PmSTPS2 in E. coli
A single colony of recombinant E. coli M15 cells harbouring pQE-2:PmSTPS1, pQE2:PmSTPS2, and empty pQE-2 (as a negative control) were inoculated into 10 mL of an LB medium containing kanamycin (50 µg/mL), and were grown overnight at 37 • C. Approximately 2 mL of the cultures were added to 200 mL of fresh LB, which contained 50 µg/mL kanamycin. The cultures were induced with 0.5 mM IPTG at OD600~0.5. The cultures were incubated for 1, 3, and 5 h at 37 • C, and were then harvested by centrifugation at 4000× g for 30 min at 4 • C. Subsequently, the bacteria were resuspended in 100 mL of 25 mM sodium phosphate buffer, pH 7.5, containing 0.5 M Tris-HCl, 5% glycerol, 1 mM dithiothreitol (DTT), 10 mM MgCl 2 , 1 mM MnCl 2 , pH 7.5, and 1 mM lysozyme (Sigma-Aldrich, St. Louis, MI, USA) [20]). The cells were sonicated for 2 min at 5 s pulses, with 5 s between the pulses on ice, using the Sonic Dismembrator Model 100 (Fisher Scientific, Hampton, NH, USA). The cell lysate was then centrifuged at 10,000× g for 30 min at 4 • C.

Enzyme Assay
A standard assay was done according to a previous method, with slight modifications. Standard assays were performed in 2.5 mL glass GC vials containing 200 µg of crude protein mixed with 50 mM Tris (pH 7.5), 10 mM MgCl 2 , and 100 µM of (E,E)-farnesyl pyrophosphate (FPP) (Sigma-Aldrich, St. Louis, MI, USA). The reaction mixture with a total volume of 200 µL was vortexed, overlaid with 500 mL hexane, and incubated at 30 • C for 2 h. The hexane phase was concentrated to 200 mL by passing N 2 at the opening of the tube and was then further used for the GC-MS analysis.

Detection of Sesquiterpenes Using GC-MS
The samples were analysed using a Clarus 600 GC-MS (PerkinElmer Inc., Waltham, MA, USA) that was equipped with a capillary column (Elite-5 30 m × 0.25 mm, film thickness 0.25 µm). The GC was operated at a flow rate of 2 mL/min, and the mass selector detector (MSD) was operated at 70 eV. Splitless injections (1.5 µL) were performed with an injector temperature of 250 • C. The GC system was programmed with an initial oven temperature of 50 • C (5 min hold), which was then increased to 180 • C at 10 • C/min (4 min hold), followed by a 100 • C/min ramp at 240 • C (1 min hold). A solvent delay of 8.5 min was allowed before the acquisition of the MS data. The MS system was operated in selected ion monitoring (SIM) mode to scan for the molecular ions at product peaks, which were quantified by the integration of peak areas with library search, using the NIST library [72].

Conclusions
In summary, two new sesquiterpene synthases, PmSTPS1 and PmSTPS2, which were identified from P. minus leaf transcriptomics analysis, were cloned and characterised. Both of the enzymes produced industrially important acyclic sesquiterpenes, β-farnesene, α-farnesene, and farnesol. PmSTPS2 also produced nerolidol as the major product from FPP conversion. This study demonstrated the production of P. minus characteristic fragrance-related sesquiterpenes, by both PmSTPS1 and PmSTP, as well as the potential of further metabolic engineering in E. coli, using PmSTPS2 for the microbial production of nerolidol.
Supplementary Materials: The following are available online, Figure S1: Nucleotide and predicted amino acid sequence of PmSPTS1 and from P. minus; Figure S2: Nucleotide and predicted amino acid sequence of PmSTPS2 from P. minus.; Figure S3: SDS-PAGE analysis of recombinant pQE2_PmSTPS1 and pQE2-PmSTPS2 proteins is marked with red box and molecular mass markers are indicated; Figure S4: The expression analysis of PmSTPS1and PmSTPS2 in E. coli M15 after 0.5 mM IPTG induction at 1, 3, and 5 h; and Figure S5: Mass spectra of major three sesquiterpenes produced by recombinant PmSTPS1 and PmSTPS2 in comparison with the mass spectra from authentic standards.