Classification of Raw Stingless Bee Honeys by Bee Species Origins Using the NMR- and LC-MS-Based Metabolomics Approach

The official standard for quality control of honey is currently based on physicochemical properties. However, this method is time-consuming, cost intensive, and does not lead to information on the originality of honey. This study aims to classify raw stingless bee honeys by bee species origins as a potential classifier using the NMR-LCMS-based metabolomics approach. Raw stingless bee honeys were analysed and classified by bee species origins using proton nuclear magnetic resonance (1H-NMR) spectroscopy and an ultra-high performance liquid chromatography-quadrupole time of flight mass spectrometry (UHPLC-QTOF MS) in combination with chemometrics tools. The honey samples were able to be classified into three different groups based on the bee species origins of Heterotrigona itama, Geniotrigona thoracica, and Tetrigona apicalis. d-Fructofuranose (H. itama honey), β-d-Glucose, d-Xylose, α-d-Glucose (G. thoracica honey), and l-Lactic acid, Acetic acid, l-Alanine (T. apicalis honey) ident d-Fructofuranose identified via 1H-NMR data and the diagnostic ions of UHPLC-QTOF MS were characterized as the discriminant metabolites or putative chemical markers. It could be suggested that the quality of honey in terms of originality and purity can be rapidly determined using the classification technique by bee species origins via the 1H-NMR- and UHPLC-QTOF MS-based metabolomics approach.

The untargeted metabolomics approach mainly consists of two interrelated steps; metabolite fingerprinting and metabolite profiling. Metabolite fingerprinting can be defined as high-throughput qualitative screening of metabolic composition of an organism or tissue with the primary aim of sample comparison and discrimination analysis. All steps from sample preparation, separation, and detection should be rapid and as simple as is feasible. Mostly, no attempt is initially made to identify the metabolites present. Metabolite fingerprinting is often used as a forerunner to metabolite profiling [19].
The next step is metabolite profiling of which can be defined as identification and quantification of the metabolites present in an organism. However, metabolite profiling is only feasible for a limited number of components, which are usually chosen on the basis of discriminant analysis or on molecular relationships based upon molecular pathways/networks [19].
It is possible to verify the botanical origin and exclude adulteration with sugars for honey [20]. Several metabolomics studies such as classification of honeys by different botanical origins [4,[21][22][23][24][25][26][27] and geographical origins [23,28,29] have been carried out to address honey originality issues. To the extent of our knowledge, the untargeted metabolomics approach has not yet been applied in the classification of raw stingless bee honeys by different bee species origins within a specific geographical region.
There were also several reports from previous studies that used various types of predictors to determine the entomological origins of stingless bee honeys. For example, hexoses-to-maltose ratio in high performance liquid chromatography (HPLC) analysis was proposed as predictor (taxonomic marker) to predict entomological origins of stingless bee honeys [30]. In addition, Ramón-Sierra et al. [31] demonstrated the utility of protein profiles obtained from electrophoresis as predictor of entomological origins for stingless bee honeys. On the other hand, Kek et al. [32] showed that the entomological origins of stingless bee honeys (four different bee species) can be used as classifier to classify raw honeys using their chemical profiles and mineral contents. According to Ramón-Sierra et al. [31], it is possible to differentiate honey samples in terms of bee species origins using metabolomics tools. Therefore, this study aims to classify raw stingless bee honeys by bee species origins as a potential classifier using the NMR-and LC-MS-based metabolomics approach.

Collection of Honey Samples
Honey samples from the three stingless bee species were collected from the meliponiculture site of the Syamille Agrofarm and Resort (4.8992 • N, 100.8964 • E), Perak, Malaysia ( Figure S1). The stingless bee hives were distributed in random spots within a land area of 48,562.3 m 2 and were surrounded by several species of fruit trees (Myrtaceae, Meliaceae, Oxalidaceae, Moraceae, Rutaceae, Arecaceae), ornamental trees (Polygonaceae, Asteraceae, Rubiaceae), and resin-secreting trees (Dipterocarpaceae). These hives were made from naturally hollowed tree trunks, topped with wooden nest boxes, inside were cerumen pots of resins and wax mixtures to store nectar collected by the stingless bee [33]. For each species of stingless bee, honey samples were collected from 30 cerumen pots from six different nest boxes. Hence, five cerumen pots from each nest box. The collection was made between 7:00 a.m. and 11:30 a.m. All samples were kept in a cooler box with ice for transportation to the laboratory, prior to storing in the chiller at 4 • C until further analysis.

Sample Preparation
The chilled honey samples were allowed to sit at room temperature for at least 30 min prior preparation for data acquisition by 1 H-NMR spectroscopy [34]. The phosphate buffer was prepared by dissolving 1.232 g of KH 2 PO 4 and 10 mg of TMSP (0.01%) in 100 mL of D 2 O. The buffer solution was then adjusted to pH 6 with 1 M NaOD solution.
Honey samples for 1 H-NMR measurement were prepared according to the procedure described by Kim et al. [34] with slight modifications. Honey in 5 mg was dissolved in 120 µL of deuterated methanol, followed by 480 µL of phosphate buffer. After centrifugation at 10,000 revolutions per min (RPM) for 2 min, 600 µL of the supernatant was pipetted into a round bottom NMR tube (4.97 mm × 4.2 mm, 178 mm) by Norell (Morganton, VA, USA) for data acquisition.

Data Acquisition
The 1 H-NMR spectra of stingless bee honey samples were acquired in duplicates, at 25 • C, on a 500 MHz Unity Inova NMR spectrometer (Varian Inc., California, CA, USA). Each spectrum was acquired over a spectral width of 0-10 ppm, using 64 scans and acquisition time of 256.8 s. The presaturation pulse sequence was used to suppress residual water signal with low power selective irradiation. D 2 O was used as internal lock and TMSP-2,2,3,3-d4 was used as reference standard at 0.00 ppm.
2.3.4. Data Pre-processing 1 H-NMR spectra were automatically phased using VnmrJ version 2.3 A (Varian Inc., Palo Alto, CA, USA) and further pre-processed with automatic baseline correction (spline) using Chenomx NMR Suite version 6.1 (Chenomx Inc., Edmonton, AB, Canada). The spectral intensities were scaled to TMSP (set to 0.0 ppm) and residual methanol (3.24-3.33 ppm) and water (4.68-4.88 ppm) signals were excluded. The remaining spectral regions were divided into 0.04 ppm bins, giving a total of 238 integrated regions (X-variables) per spectrum. The data were then converted to ASCII format and imported to Microsoft Excel 2010. A total of 79 spectra of 1 H-NMR were acquired, binned, and used to develop the classification model.

Data Analysis
The resulting Microsoft Excel file were imported into SIMCA version 14.1 (MKS Data Analytics Solutions, Umeå, Sweden) and pareto-scaled for multivariate data analysis (MVDA). Prior to classification of the honey samples by OPLS-DA model, principal component analysis (PCA) was performed to obtain an overview of the basic variation among the honey samples and to determine the presence of outliers. The robustness of OPLS-DA model was validated by means of cross-validation and response permutation test using 100 random permutations.

Characterization of Discriminant Metabolites
Variables in the OPLS-DA model with both values of variable importance in projection of greater than one (VIP > 1) and error bar not exceeding zero were considered as discriminant metabolites.
Characterization of discriminant metabolites was carried out based on the match between experimental 1 H-NMR signals and reference compound of in-house library of Chenomx Profiler version 8.2 (Chenomx Inc., Edmonton, AB, Canada).
The identity of characterized metabolites was further supported by comparison of two-dimensional (2D) 1

Sample Preparation
Stingless bee honey samples (in three biological replicates per species) were randomly selected from the collected 30 replicates (Section 2.2). An aliquot of 100 µL of each honey sample was mixed with 500 µL of MeOH:ACN:Water (1:1:1 v/v) and vortexed until fully mixed. The mixture was centrifuged at 9520 relative centrifugal force (RCF) at 4 • C for 30 min. The supernatant was then filtered and transferred into sample vial and stored at −80 • C until further analysis. For quality control (QC) purpose, all the extracted samples (n = 9) were mixed.

Data Acquisition
The extracted samples were subjected to Vanquish TM Horizon UHPLC system (Thermo Fisher Scientific, Waltham, MA, USA) coupled with electrospray ionisation Impact II QTOF-mass spectrometry system (Bruker Daltonics, Bremen, Germany). Honey sample of 10 µL was injected into Kinetex F5 LC column (2.1 mm × 100 mm, 2.6 µm; Phenomenex, Torrance, CA, USA). The column was maintained at 40 • C and eluted at a flow rate of 600 µL/min during analysis. The mobile phase was composed of solvent A (mixture of H 2 O, 0.1% HCOOH, and 1% NH 4 OAc (10 mM)) and solvent B (mixture of acetonitrile/methanol [6:4 v/v], 1% of 0.1% HCOOH and 1% NH 4 OAc (10 mM)). The gradient elution program was initiated from 1% to 40% solvent B in 5 min, followed by 100% solvent B from 5.1 min to 8 min and maintained for the next 2 min. The column was conditioned with the initial gradient for 3 min before each sample injection.
The mass data acquisition was set to m/z values of 50-1500 amu. Positive and negative mode of electrospray ionisation (ESI) were deployed at 4200 V and −4200 V, respectively. The ion source conditions were set as follows: gas temperature of 300 • C, drying gas flow at 12 L/min, and nebulizer flow at 5.0 bar. Mass calibration standard of sodium formate (10 mM) was introduced post-column via a 6-port valve at 0.1-0.3 min. The m/z values of acquired data were calibrated against the introduced sodium formate and subsequently converted into netCDFdata format (*.cdf) using ACD/Spectrus Processor version 2017.1.3.

Data Processing
The mass netCDF files were processed using MZmine version 2 [35] to compensate for variations in the retention times and m/z values between each analysis. Upon completion of the data processing, the mass spectral data were exported into a file with a comma-separated values data format (*.csv) as a peak list table, with rows representing the honey samples and columns representing the integrated and normalized peak areas (m/z values and retention times).

Data Analysis
The generated csv file (*.csv) was imported to SIMCA version 14.1 (MKS Data Analytics Solutions, Umeå, Sweden) and set with unit variance (UV) scaling. As with the NMR data, PCA overview was carried out prior to the PLS-DA model. The validity of the PLS-DA model was also verified by cross-validation and response permutation tests of 100 random permutations.

Characterization of Diagnostic Ions
ESI ionizes polar compounds more efficiently with basic sites (ESI + ) or acidic sites (ESI − ) [36]. The m/z values of precursor ions (ESI + and ESI − ) in the PLS-DA model with VIP > 1 and with error bar not exceeding zero were considered as diagnostic ions. Characterization of diagnostic ions could be performed based on comparison with online databases such as Metlin, KEGG, PubChem, ChemSpider, and MetFrag. Fragmentation of diagnostic ions was also generated for essential uses in structural elucidation and confirmation of the metabolite identity.

PCA of 1 H-NMR Spectral Data
An overview of PCA on 1 H-NMR metabolite fingerprints was performed to identify outliers in the data using Hotelling's T2. The PCA score plot (R2X = 0.991; Q2 = 0.946), showed no observations outside the tolerance ellipse based on Hotelling's T2, indicating that there were no strong outliers present in the data ( Figure S2). The three types of honey samples were seen to be separated into two main clusters by PC2. Although the H. itama honey samples were separated from those of T. apicalis, there was partial overlap of the G. thoracica honey samples with those of both H. itama and G. thoracica. Therefore, a discriminant-based classification model was developed to classify the honey samples.

PCA of UHPLC-QTOF Mass Spectrometric Data
The PCA score plots were constructed using positive (ESI + ) ( Figure S3) and negative (ESI − ) precursor ions ( Figure S4). Both score plots (ESI + : R2X = 0.61; Q2 = 0.253, ESI − : R2X = 0.901; Q2 = 0.799) showed no strong outliers in the mass data of honey samples. The honey samples of three species of stingless bee were clustered well into three separate groups, justifying further data analysis using the PLS-DA classification model.

OPLS-DA ( 1 H-NMR Spectral Data)
An OPLS-DA model was built using the 1 H-NMR data of the honey samples. As shown by the OPLS-DA score plot (Figure 1), the honey samples were classified into three separate groups with 100% correct classification (Table 1) according to their bee species origins. Honey samples from T. apicalis were discriminated from those of H. itama and G. thoracica honey by PC1, while H. itama honey samples were discriminated from those of G. thoracica by PC2.
For each component in PLS and OPLS models and their corresponding DA-extensions, summary of fit plot displays R2Y and Q2 bars. R2Y is the percent of variation of the training set-Y with OPLS-explained by the Y-predictive components. R2Y is a measure of fit, i.e., how well the model fits the data wherein a poor R2Y value is given when there is poor reproducibility (noisy data) in the training data set, or when for other reasons X does not explain Y. Q2 indicates how well a model could predict a new data set. Q2 > 0.5 indicates good predictability and Q2 > 0.9 is excellent. The difference between R2Y and Q2 larger than 0.2-0.3 indicates the presence of a few outlying data points [37]. A poor Q2 value is given when the data is noisy, or when the relationship between X to Y is poor, or when the model is dominated by a few scattered outliers. In this study, the OPLS-DA model demonstrated an optimum goodness of fit (R2Y = 0.921) and satisfied the criteria for good predictability (Q2 = 0.838), and the difference between R2Y and Q2 was 0.083.   Response permutation testing aims to assess the risk whether the OPLS-DA model is spurious or not. A model could fit well to the training set, however it might not predict Y well for the new observations. R2Y and Q2 of the original model are compared with R2Y and Q2 of several models based on data whereby the order of the Y-observations has been randomly permuted, while the X-matrix was kept constant. The model is validated when R2Y-intercept does not exceed 0.3-0.4 and that the Q2-intercept does not exceed 0.05 (Eriksson et al., 2006). The present results gave Y intercepts of 0.382 (R2), −0.58 (Q2) for H. itama honey; 0.373 (R2) and −0.604 (Q2) for G. thoracica honey and, 0.364 (R2) and −0.641 (Q2) for T. apicalis honey ( Figure S5). Hence, the OPLS-DA model was verified as valid.

PLS-DA (UHPLC-QTOF Mass Spectrometric Data)
PLS-DA classification model was used instead of OPLS-DA model for UHPLC-QTOF mass spectral data. This is due to the response permutation tests of OPLS-DA models for m/z values of both ESI + ( Figure S6) and ESIprecursor ions ( Figure S7) were shown to be overfitting.
The PLS-DA score plots for m/z values of both ESI + ( Figure 2) and ESI − precursor ions (Figure 3) for the honey samples were clearly classified into three discernible groups with 100% correct  Response permutation testing aims to assess the risk whether the OPLS-DA model is spurious or not. A model could fit well to the training set, however it might not predict Y well for the new observations. R2Y and Q2 of the original model are compared with R2Y and Q2 of several models based on data whereby the order of the Y-observations has been randomly permuted, while the X-matrix was kept constant. The model is validated when R2Y-intercept does not exceed 0.3-0.4 and that the Q2-intercept does not exceed 0.05 (Eriksson et al., 2006). The present results gave Y intercepts of 0.382 (R2), −0.58 (Q2) for H. itama honey; 0.373 (R2) and −0.604 (Q2) for G. thoracica honey and, 0.364 (R2) and −0.641 (Q2) for T. apicalis honey ( Figure S5). Hence, the OPLS-DA model was verified as valid.

PLS-DA (UHPLC-QTOF Mass Spectrometric Data)
PLS-DA classification model was used instead of OPLS-DA model for UHPLC-QTOF mass spectral data. This is due to the response permutation tests of OPLS-DA models for m/z values of both ESI + ( Figure S6) and ESIprecursor ions ( Figure S7) were shown to be overfitting.
The PLS-DA score plots for m/z values of both ESI + (Figure 2) and ESI − precursor ions (Figure 3) for the honey samples were clearly classified into three discernible groups with 100% correct classification (Table 2) Figure S9). Hence, all of these PLS-DA classification models for both precursor ions were verified valid.
Based on the results of OPLS-DA and PLS-DA classification models, it was obvious that the honey samples were able to be classified into three different groups by bee species origins. The metabolite contents of honey differ according to bee species origins because stingless bees might have selective affinity towards certain types of floral nectars. There are a few types of flowers that stingless bees do not frequent, typically flowers with long corolla and have very little nectar or low brix value. However, stingless bees have an affinity for inflorescences, flowers with long anters, short corollas, and clustered flower head [38]. In addition, the metabolites contents are varied due to different stingless bee species might secrete different types of enzymes to the foraged nectar. According to Kek et al. [32], the composition of honey is probably affected by the type of bee because honey-making process is highly related to enzymes added by the bees. Therefore, bee species origins could be suggested as reliable classifier in rapid determination of honey quality in terms of originality and purity using NMR-LCMS-based metabolomics approach.

Characterization of Discriminant Metabolites ( 1 H-NMR Spectral Data)
The loading column plots were generated to display the interclass separation based on binned regions. Variables with VIP > 1 ( Figure S10) and with error bar not exceeding zero (Figure S11a-c) were designated as discriminant metabolites. L-Lactic acid was determined as the most influential metabolite in VIP plot of OPLS-DA model for 1 H-NMR spectral data.
The discriminant metabolites listed in Table 3 were proposed as chemical markers for each honey type. D-Fructofuranose was the major discriminant for H. itama honey ( Figure S12), while β-D-Glucose, and D-Xylose, and α-D-Glucose were the discriminant metabolites for G. thoracica honey ( Figure S12). On the other hand, the variables contributing most significantly to discrimination of T. apicalis honey from both H. itama and G. thoracica honey were L-Lactic acid, Acetic acid, and L-Alanine ( Figure S12). The identities of all discriminants were further verified based on the results obtained from 1 H-1 H J-resolved experiments (Figure 4 and Figure S12).

Characterization of Diagnostic Ions (UHPLC-QTOF Mass Spectrometric Data)
The diagnostic ions of ESI-MS fingerprints responsible for group separation among H. itama, G. thoracica, and T. apicalis honeys were as listed in Table 4a-c, respectively. There were uncharacterized metabolites that differentiated the three honey types with m/z value of 446.203 [M + H] + being the most influential diagnostic ion (VIP = 1.91) in PLS-DA model of UHPLC-QTOF mass spectrometric data (Table 4a). The major contributor to H. itama honey was 446.203 m/z [M + H] + (Table 4a) (Table 4c).     VIP value of more than one (VIP > 1), which has an above average influence on Y summarised the influence of every term in the matrix X on all the Y's [37]. The variability in the metabolite fingerprints of stingless bee honey related to the bee species origins observed via 1 H-NMR spectroscopy (Q2 = 0.838) and UHPLC-QTOF mass spectrometry (ESI + : Q2 = 0.833; ESI − : Q2 = 0.976) can function as good predictors to determine bee species origins of stingless bee honey. An accurate prediction of bee species origins by such good predictors i.e., discriminant metabolites ( 1 H-NMR spectral data) and diagnostic ions (UHPLC-QTOF mass spectrometric data) for the determination of honey originality has been established.

Discussion and Conclusions
D-Fructofuranose was described as a discriminant metabolite for H. itama honey, whereas β-D-Glucose, D-Xylose, and α-D-Glucose were responsible for the discrimination of G. thoracica honey ( Table 3). The hydrolysis of sucrose from nectar into D-Glucose and D-Fructofuranose is mainly catalysed by invertase enzymes (sucrase, glucosidase, transglucosidase), which are secreted by cephalic glands of stingless bee workers [39,40]. D-Xylose was also found in the honey as reported by Ohmenhaeuser et al. [25].
Acetic and L-Lactic acids of T. apicalis honey (Table 3) are the products of fermentation from carbohydrates. Basically, there are three main categories of fermentation in stingless bee honey: Alcoholic, acetic, and lactic fermentation. The alcoholic fermentation (indicated by bubbles and foam) is performed by yeasts, which convert carbohydrates (sugar) into alcohol and CO 2 . Subsequently, acetic fermentation is performed under aerobic conditions by certain strains of bacteria (commonly Bacillus), which convert alcohol molecules and O 2 into acetic acid and water. In addition, lactic fermentation can occur when bacteria mostly convert carbohydrates into lactic acid and water or other organic molecules although yeasts and other fungi can perform similar function. However, those three categories of fermentation can naturally be mixed in stingless bee honey via enzymatic reaction of those microorganisms [40]. Another discriminant metabolite of T. apicalis was L-Alanine (Table 3), which also found in the honey as reported by Boffo et al. [41].
QTOF MS-based identification of metabolites may consist of three main steps, i.e., characterization of metabolites guided by mass spectral database, validation of metabolite structure, and confirmation of metabolite identity (targeted metabolite profiling) [42]. In this study, diagnostic RT, m/z values of precursor ions (ESI + and ESI − ), and MS-MS fragment ions as well as their corresponding intensities of detected ions represent as distinct ESI-MS fingerprints for each type of stingless bee honey. However, characterization of all diagnostic ions could not be completed m/z values of precursor ions could not be matched with MS-MS fragment ions in any databases.
The present study demonstrates the feasibility of using untargeted metabolomics approach for rapid classification of raw stingless bee honeys by bee species origins. Two complementary approaches, 1 H-NMR and UHPLC-QTOF MS are more appropriate for honey quality control. 1 H-NMR spectroscopy provides rapid detection by minimum sample preparation, non-destruction of the samples, and high reproducibility of data. These advantages are complemented by the UHPLC-QTOF MS, which enables high sensitivity or trace-level detection of metabolites with only minimal amounts of sample. LCMS also provides the possibility to extend the range of compounds detected by using different ion sources (e.g., electrospray ionization or atmospheric chemical ionization) and/or ion modes (positive and negative). The variability of metabolites detected exhibits the complementary nature of 1 H-NMR spectroscopy and UHPLC-QTOF MS techniques (Tables 3 and 4(a-c)).
Nevertheless, the numbers of bee species origins involved in the present study were limited to only three stingless bee species. Further research need to be conducted on many other species of stingless bee, as mentioned by Schwarz [43] and Rasmussen [44] in more diversified geographical coverage of beekeeping areas.
In conclusion, this study showed that stingless bee honeys can be classified by bee species origin (as reliable classifier) using a robust untargeted metabolomics approach to determine the honey originality. As a result, this study may give a significant impact to the global meliponiculture and apiculture industries in improving the efficiency of honey quality control.