High-Resolution Vegetation Mapping Using eXtreme Gradient Boosting Based on Extensive Features

: Accurate mapping of vegetation is a premise for conserving, managing, and sustainably using vegetation resources, especially in conditions of intensive human activities and accelerating global changes. However, it is still challenging to produce high-resolution multiclass vegetation map in high accuracy, due to the incapacity of traditional mapping techniques in distinguishing mosaic vegetation classes with subtle di ﬀ erences and the paucity of ﬁeldwork data. This study created a workﬂow by adopting a promising classiﬁer, extreme gradient boosting (XGBoost), to produce accurate vegetation maps of two strikingly di ﬀ erent cases (the Dzungarian Basin in China and New Zealand) based on extensive features and abundant vegetation data. For the Dzungarian Basin, a vegetation map with seven vegetation types, 17 subtypes, and 43 associations was produced with an overall accuracy of 0.907, 0.801, and 0.748, respectively. For New Zealand, a map of 10 habitats and a map of 41 vegetation classes were produced with 0.946, and 0.703 overall accuracy, respectively. The workﬂow incorporating simpliﬁed ﬁeld survey procedures outperformed conventional ﬁeld survey and remote sensing based methods in terms of accuracy and e ﬃ ciency. In addition, it opens a possibility of building large-scale, high-resolution, and timely vegetation monitoring platforms for most terrestrial ecosystems worldwide with the aid of Google Earth Engine and citizen science programs.


Introduction
Earth's diverse ecosystems provide generous resources for human populations. Accurate mapping of the diverse ecosystems or communities could facilitate environmental planning, resource management, and social wellbeing [1][2][3][4]. Since the rapid growth of the human population and climate warming is accelerating the pace of vegetation dynamics [5,6], timely information on the distribution of vegetation over regional to global scales has growing significance for a wide range of end users.
Mapping vegetation at the initial stage mainly depends on experts' empirical knowledge in defining boundaries between vegetation classes [7]. Even though fairly accurate at a small regional scale, this method is not only time-consuming but also has limited extendibility. Use of remote sensing (RS) technology has greatly increased the efficiency and accuracy of the mapping [8]. Availability of large stacks of various RS data, such as the Moderate Resolution Imaging Spectroradiometer (MODIS), Landsat, Polarization Synthetic Aperture Radar (PolSAR), Light Detection And Ranging (LiDAR), and aerial photography, enable mappers to produce vegetation maps from regional to global scales with considerable accuracy [9][10][11]. However, it still remains challenging to increase the mapping To test the extendibility of our method to different ecosystems, we selected DzB and NZ as our study cases, due to the striking differences in main vegetation types (desert vs. grassland and forest), climate (dry vs. wet), location (northern vs. southern hemisphere) and topography (plain vs. mountain). Moreover, DzB and NZ are representative places that include most of the major terrestrial To test the extendibility of our method to different ecosystems, we selected DzB and NZ as our study cases, due to the striking differences in main vegetation types (desert vs. grassland and forest), climate (dry vs. wet), location (northern vs. southern hemisphere) and topography (plain vs. mountain). Moreover, DzB and NZ are representative places that include most of the major terrestrial habitat types including forests, grasslands, shorelines, wetlands, and deserts so that the extendibility of our method could be better substantiated. The simplified field survey we proposed is based on the reasoning that characteristics of dominant and subdominant species of each synusia (upper canopy, lower canopy, shrub layer, and herbaceous layer) determine the vegetation types [32,33]. During implementation, one should make sure that the following information is recorded with precision: (1) geographical coordinates, (2) dominant and subdominant plant species in each synusia, and (3) their relative coverages (RLCs). In addition, ground photographs should be taken with special care. At each site, more than two photographs should be taken, with at least one wide-angle or close-range photo (should be taken perpendicular to the ground), respectively. After the fieldwork, the wide-angle photos were helpful for confirming the accuracy of the information of dominant or subdominant species. The close-range photos should be calibrated according to view angle, enhanced, and then segmented to correct the initial RLC records. Sampling routes should be designed based on the main habitat distribution in the mapping region, according to previous research or RS images. For mapping accuracy, we recommend the sample size of each type to be more than 10.
Based on the simplified field survey procedures, we conducted extensive field sampling in DzB, which is located in northwestern China and covers about a 200,000 km 2 area, about 1.5 times larger than England, from mid-May 2015 to late August 2016 (summers only). With the small amount of rainfall and long hot summer, the vegetation mainly consists of xerophytes such as Haloxylon spp., Calligonum spp., Agriophyllum squarrosum and Erodium oxyrrhynchum. We took between two and four photos within every 1-2 km distance point using a Canon digital camera. Overall 3006 points (506 farmland and 2500 natural vegetation points) with the key information (two plant species for both woody and herb layers) were photographed. Plant species that were difficult to recognize were collected and sent back to the laboratory for further identification. To ensure the accuracy of information of plant species, all the ground photographs were reassessed and corresponding corrections were made. We preferred to assemble the species matrix into community-level entries before model generation due to possible niche overlaps among observed species and the spatial scale of the study areas [30]. According to the RLC of the plants, only the points of natural vegetation were divided into woody (2026 points with >5% RLC of woody or semi-woody plants) and herbaceous plant groups (474 points with <5% RLC of woody or semi-woody plants) before vegetation classification. After that, we classified the plots into 32 (27 shrub and 5 tree layer classes) woody and 17 herbaceous plant groups (Figures S1 and S2) by adopting Ward's minimum variance hierarchical clustering method based on a Mantel correlation test, by which the optimal number of classes for dividing the cluster tree were determined [34]. Based on the Chinese vegetation classification system and the classification results [35], we developed a hierarchical vegetation classification system (HVCS) of DzB, composed of seven vegetation types, 20 subtypes, and 50 associations (including farmland) (Table S1).

NZ: Simulated Survey Data
The second case, NZ, is an island country located southeast of Australia in the southern hemisphere. It has a temperate climate with plentiful rainfall, which makes NZ one of world's biodiversity hotspots [36]. Natural vegetation in NZ is mainly composed of grasslands, tussocks, and complex temperate forest consisting of Pinaceae Lindl., Fagales, giant tree Pteridiaceae, and creepers [37,38].
Because a large part of NZ consists of agricultural types such as croplands and pastures, which are comparatively easy for classification but have less significance in vegetation mapping than natural vegetation types (forests, scrubs, etc.), evaluations on totally randomly chosen points would be less convincing. To tackle this problem, we collected 11,234 vegetation survey plots between 1978 and 2017 from the Global Biodiversity Information Facility (GBIF) [39] and added 1035 randomly chosen points of agricultural areas and glaciers, defined based on the field photographs from EOMF field photo library (http://www.eomf.ou.edu/photos/), by singling out one point every 1 km. Then, the vegetation types were extracted from the latest-available vegetation cover map of NZ (VCMNZ, 2015 updated) [40]. Finally, classes with less than eight occurrences were filtered out, and a total of 41 vegetation classes remained (Table S2). To test the performance of the XGBoost model in a different number of classes, we also predicted a map of habitats (10 classes available from GBIF), including the original field survey types, agricultural areas, and glaciers.

RS Data
RS data used in this study includes features computed on MODIS and Landsat 8 Operational Land Imager (OLI) images, which were preprocessed on GEE. The details are addressed below.
(2) Landsat data. Landsat 8 OLI image collections of tier 1 calibrated top-of-atmosphere (TOA) reflectance in the last 10 years during summer were selected and then filtered for a cloud-free image via GEE [52]. Based on the cloud-free image, spectral variables, textures, color spaces, and moving window statistical variables were computed.
(ii) Textures are widely used to denote the differences in objects with similar reflectance, since they can provide the essential information about spatial orders of color patterns or reflectance intensity of the object [57,58]. In this study, 13 out of the 14 textures (maximal correlation coefficient not included) based on gray level co-occurrence matrix (GLCM) were computed with compute unified device architecture (CUDA) programming [59]. We calculated a set of textures for both the panchromatic band (band 8) and the above spectral variables by setting the window size to 13 × 13 for band 8 and to 11 × 11 for the spectral variables, and by setting shift distance to center to 1. As suggested, the value for every texture in each pixel was calculated as the average in four directions [57]. Then, principle component analysis (PCA) and linear discriminant analysis (LDA) were optionally adopted for dimensionality reduction (DR).
(iv) Moving window statistical variables, including coefficient of variance (CV), standard deviation (Std.), skewness, and kurtosis, can be beneficial for discriminating objects that have special spectral frequency distributions [61,62]. Hence, we calculated moving window CV, Std., skewness, and kurtosis values with a 9 × 9 moving window for each band and spectral variable, with CUDA programming on GPUs.
Because 30 m resolution (15 m resolution for band 8) Landsat 8 images contained roads and buildings irrelevant to vegetation mapping, filtering them out was a necessary step to ensure mapping accuracy. First, all of the calculated features were down-sampled to 120 m resolution via the nearest neighbor method. Then, median filter with size 1320 m (11 pixel width × 120 m/pixel width), Gaussian filter with sigma 3.5 along with mean filter with size 1080 m (9 pixel width × 120 m/pixel width) were employed in sequence. Finally, the resulting images were down-sampled to 1 km resolution by the nearest neighbor method.
(3) Geographic data. Distances (d(x)) to the saline (DzB) and fresh waterbody (DzB and NZ) and sea shoreline (NZ) were computed through the following eikonal equation (Equation (1)) solved by the fast marching method [71], where high-resolution water masks were collected from the JRC Global Surface Water Mapping Layers, v1.0 on GEE [72].
Furthermore, slope, aspect, altitude, and hill-shade were calculated using ArcGIS software (version 10.3, ESRI ® ), based on the Shuttle Radar Topography Mission (SRTM) 30 m digital elevation model (DEM) obtained from the United States Geological Survey (USGS) Earth Resources Observation and Science (EROS) data center [73].
Finally, all the features were resampled to 1 km × 1 km spatial resolution to match the vegetation data. The feature processing procedures are specified in Figure 2.

Modeling and Mapping
Developed from gradient boosting machine (GBM), an ensemble learning model adding new decision trees to complement those already built for better outcomes, XGBoost inherits the merits of GBM and provides state-of-the-art results with higher computing speed under many circumstances. Compared with GBM, XGBoost ameliorates the iterative optimization procedure [21,74]. In addition, several programming skills, such as pre-sorting feature values combined with histogram splitting and parallel and heterogeneous programming also promote classification performance and computing efficiency. The XGBoost model is not only suitable for data mining proposes, but it is also appropriate for image classification tasks such as land cover mapping in RS [21][22][23]75,76].
Bootstrap aggregating (bagging) ensemble learning, an effective ML framework known as subsampling samples (rows) or features (columns) with replacement as training sets and aggregating the trained base models, could avoid the negative effect of interdependency among variables, control overfitting, and increase robustness to noise. The framework was also proven to be efficient in enhancing model performances for various classifiers [77][78][79][80][81], which was also supported by our comparison of evaluations of mapping results via bagging vs. single model (Table S3). Besides, it may reduce the noise and increase the connectivity between patches of mapping results so that prediction would be more representative of natural vegetation patches (Box S1) [27]. Given these advantages, we used XGBoost as the base model of the bagging ensemble framework, then built the XGBoost bagging model as the classifier for vegetation mapping.
To evaluate the vegetation map, we randomly chose 497 or 2045 plots out of the total 3006 or 12269 plots (approximately 1/6) for DzB or NZ, respectively, as the test set, and the remaining, training set, for cross-validation and further evaluations. Distribution maps of train and test points are available in Figure 3. In addition, to prevent loss of information, the points in the test set from those classes with fewer than eight presences in association (DzB) were put back to the training set for class balance (in section 2.2.2) and then removed before model training.

Modeling and Mapping
Developed from gradient boosting machine (GBM), an ensemble learning model adding new decision trees to complement those already built for better outcomes, XGBoost inherits the merits of GBM and provides state-of-the-art results with higher computing speed under many circumstances. Compared with GBM, XGBoost ameliorates the iterative optimization procedure [21,74]. In addition, several programming skills, such as pre-sorting feature values combined with histogram splitting and parallel and heterogeneous programming also promote classification performance and computing efficiency. The XGBoost model is not only suitable for data mining proposes, but it is also appropriate for image classification tasks such as land cover mapping in RS [21][22][23]75,76].
Bootstrap aggregating (bagging) ensemble learning, an effective ML framework known as subsampling samples (rows) or features (columns) with replacement as training sets and aggregating the trained base models, could avoid the negative effect of interdependency among variables, control overfitting, and increase robustness to noise. The framework was also proven to be efficient in enhancing model performances for various classifiers [77][78][79][80][81], which was also supported by our comparison of evaluations of mapping results via bagging vs. single model (Table S3). Besides, it may reduce the noise and increase the connectivity between patches of mapping results so that prediction would be more representative of natural vegetation patches (Box S1) [27]. Given these advantages, we used XGBoost as the base model of the bagging ensemble framework, then built the XGBoost bagging model as the classifier for vegetation mapping.
To evaluate the vegetation map, we randomly chose 497 or 2045 plots out of the total 3006 or 12269 plots (approximately 1/6) for DzB or NZ, respectively, as the test set, and the remaining, training set, for cross-validation and further evaluations. Distribution maps of train and test points are available in Figure 3. In addition, to prevent loss of information, the points in the test set from those classes with fewer than eight presences in association (DzB) were put back to the training set for class balance (in Section 2.2.2) and then removed before model training. Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 24 Green circles represent train set while white circles or red triangles represent test sets.

Feature Selection
Finding a more discriminating and informative subset from the extensive feature pool is essential for increasing computing efficiency. Generally, the feature selection methods could be categorized as the filter-based method and the wrapper-based method [82]. However, without considering the combined effects of features, the filter-based method could not work quite well if the dataset contains multiple classes (Table S4). Therefore, we adopted a widely used wrapper-based feature selection method-the sequential forward floating selection (SFFS)-as the basic framework for feature selection [83][84][85][86]. SFFS is a bottom up search procedure that adds a remaining feature to the original empty set on every loop, which gives the highest evaluations under the objective function, followed by a backtracking step that excludes the worst features under certain conditions. However, even though it is improved from its predecessors, SFFS still suffers from the nesting effect, because forward inclusion is always unconditional [83,84]. A new improvement on SFFS, the improved forward floating selection (IFFS) method was reported for better solutions, because it incorporates a replacing step into the basic framework after the backtracking step [87]. However, our test on IFFS based on the extensive feature pool showed that when collinearity among features were very high, IFFS still suffered from the nesting effect.
Therefore, in this study, we made a minor improvement on IFFS; namely, revised IFFS, by adding a plus-L-minus-R search into the streamline for noting that non-useful features may be included into the selected set when highly correlated features exist. The plus-L-minus-R search excluded R features after incorporating L features from the feature pool. Its parameters R and L were set to 1 and 5 respectively, because the computing time would increase a lot if L is small, and nonuseful features would be included if L is large. Besides, the replacing step from basic IFFS was optional for substituting either the most or the least important feature. The feature evaluations for plus-L-minus-R search were gathered from XGBoost inner feature importance scores, though feature evaluations of basic IFFS are kappa metrics computed by a stratified 10-fold cross-validation on the training set.
Additionally, several steps were conducted to improve the efficiency of the feature selection process. First, before computing, identical features with Pearson's correlations > 0.98 were removed. Second, after the loop completed, the set with the best evaluation (denoted as bestSET) and its location in the IFFS set sequence were recorded. The sets ahead of the bestSET with local maximum evaluations in the IFFS set sequence were then collected (denoted as maxSETs). Lastly, bestSET and the sets in maxSETs with evaluations >0.95*(evaluation of bestSET) were aggregated as the final feature set. The detailed feature selection procedures are specified in Figure 4.

Feature Selection
Finding a more discriminating and informative subset from the extensive feature pool is essential for increasing computing efficiency. Generally, the feature selection methods could be categorized as the filter-based method and the wrapper-based method [82]. However, without considering the combined effects of features, the filter-based method could not work quite well if the dataset contains multiple classes (Table S4). Therefore, we adopted a widely used wrapper-based feature selection method-the sequential forward floating selection (SFFS)-as the basic framework for feature selection [83][84][85][86]. SFFS is a bottom up search procedure that adds a remaining feature to the original empty set on every loop, which gives the highest evaluations under the objective function, followed by a backtracking step that excludes the worst features under certain conditions. However, even though it is improved from its predecessors, SFFS still suffers from the nesting effect, because forward inclusion is always unconditional [83,84]. A new improvement on SFFS, the improved forward floating selection (IFFS) method was reported for better solutions, because it incorporates a replacing step into the basic framework after the backtracking step [87]. However, our test on IFFS based on the extensive feature pool showed that when collinearity among features were very high, IFFS still suffered from the nesting effect.
Therefore, in this study, we made a minor improvement on IFFS; namely, revised IFFS, by adding a plus-L-minus-R search into the streamline for noting that non-useful features may be included into the selected set when highly correlated features exist. The plus-L-minus-R search excluded R features after incorporating L features from the feature pool. Its parameters R and L were set to 1 and 5 respectively, because the computing time would increase a lot if L is small, and non-useful features would be included if L is large. Besides, the replacing step from basic IFFS was optional for substituting either the most or the least important feature. The feature evaluations for plus-L-minus-R search were gathered from XGBoost inner feature importance scores, though feature evaluations of basic IFFS are kappa metrics computed by a stratified 10-fold cross-validation on the training set.
Additionally, several steps were conducted to improve the efficiency of the feature selection process. First, before computing, identical features with Pearson's correlations > 0.98 were removed. Second, after the loop completed, the set with the best evaluation (denoted as bestSET) and its location in the IFFS set sequence were recorded. The sets ahead of the bestSET with local maximum evaluations in the IFFS set sequence were then collected (denoted as maxSETs). Lastly, bestSET and the sets in maxSETs with evaluations >0.95*(evaluation of bestSET) were aggregated as the final feature set. The detailed feature selection procedures are specified in Figure 4. of 356 features for habitats and vegetation classes (NZ) were selected, respectively. The texture variables played an important role in vegetation mapping, especially for cases with numerous classes. For example, the feature set of associations (DzB) comprised 34 textures, accounting for 76% in total. Also, the distances to water were incorporated in all the cases, indicating their importance in vegetation mapping. In addition, LST and its seasonal changes, which could reflect the energy balance of the land surface, were also included in almost all the cases. To validate the contribution of the feature selection process to the mapping results, we compared the evaluations of results based on the selected set (by the revised IFFS, and the original IFFS) vs. the original feature pool (Table S4). The performances of mapping using the selected variables were not inferior to mapping using the original feature pool, indicating that the revised IFFS successfully selected a subset (~10% of total), useful in discriminating vegetation classes, from the extensive feature pool.

Vegetation Mapping
For vegetation data with HVCS, generally two ways for vegetation mapping are used: first, mapping the lower level classes directly and merging the result with higher level classes and second, mapping higher level classes as base map layer and then splitting the result with lower level classes. Deciding which one to choose always depends on expert knowledge and overall evaluations of maps. In DzB we found that mapping associations directly is less convincing, because some associations from the desert vegetation type with few observations had low accuracies. Thus, for DzB, we first computed a vegetation subtype map as the base map layer, and then split each subtype into association(s) according to HVCS; however, directly mapping was the only choice for NZ.
Before model training, we adopted synthetic minority over-sampling technique (SMOTE) to balance the training set after noting that an extremely imbalanced dataset could cause prediction errors for classes with few presences. The parameters of SMOTE were set to default, because it had optimal evaluations in this study. Then, a multiclass XGBoost bagging model was built on bootstrapped datasets (subsampling both columns and rows) from the training set, with the parameters of the XGBoost (the base model) set to default. The predicted results of each base model were then aggregated by weighted voting using kappa as the metric (Equations 2-3). The probability For example, the feature set of associations (DzB) comprised 34 textures, accounting for 76% in total. Also, the distances to water were incorporated in all the cases, indicating their importance in vegetation mapping. In addition, LST and its seasonal changes, which could reflect the energy balance of the land surface, were also included in almost all the cases.
To validate the contribution of the feature selection process to the mapping results, we compared the evaluations of results based on the selected set (by the revised IFFS, and the original IFFS) vs. the original feature pool (Table S4). The performances of mapping using the selected variables were not inferior to mapping using the original feature pool, indicating that the revised IFFS successfully selected a subset (~10% of total), useful in discriminating vegetation classes, from the extensive feature pool.

Vegetation Mapping
For vegetation data with HVCS, generally two ways for vegetation mapping are used: first, mapping the lower level classes directly and merging the result with higher level classes and second, mapping higher level classes as base map layer and then splitting the result with lower level classes. Deciding which one to choose always depends on expert knowledge and overall evaluations of maps. In DzB we found that mapping associations directly is less convincing, because some associations from the desert vegetation type with few observations had low accuracies. Thus, for DzB, we first computed a vegetation subtype map as the base map layer, and then split each subtype into association(s) according to HVCS; however, directly mapping was the only choice for NZ.
Before model training, we adopted synthetic minority over-sampling technique (SMOTE) to balance the training set after noting that an extremely imbalanced dataset could cause prediction errors for classes with few presences. The parameters of SMOTE were set to default, because it had optimal evaluations in this study. Then, a multiclass XGBoost bagging model was built on bootstrapped datasets (subsampling both columns and rows) from the training set, with the parameters of the XGBoost (the base model) set to default. The predicted results of each base model were then aggregated by weighted voting using kappa as the metric (Equations (2) and (3)). The probability maps of each vegetation class derived from the bagging framework were reserved for the uncertainty calculation (in Section 2.3). As for the parameters of the bagging framework, the optimal rates of 0.7 for both column and row sampling were found by grid search via stratified 10-fold cross-validation on the training set; the runtime (total base model number) was set to 610 because no further improvements on the evaluations or modelling stability were observed as the runtime increased ( Figure S3).
The subtype map of DzB was used as the base map layer for association mapping in which each subtype was divided to corresponding association(s) according to HVCS with small multiclass XGBoost bagging models. SMOTE was also implemented for each subtype before model training. The vegetation type map was obtained from the vegetation subtype map by merging the corresponding classes based on HVCS (Figure 1). Because the observation data of DzB and NZ did not include any urban areas or bodies of water, these land use types were directly extracted from the resampled land use map of China, 2010, for DzB [88] and the land cover database classes version 4.1, for NZ (download website: https://lris.scinfo.org.nz/layer/48423-lcdb-v41-land-cover-database-version-41mainland-new-zealand/). These land use types were not included in the evaluations.
Finally, the results were filtered with a majority filter of 3 × 3 window size to increase the connectivity of each class so that the distribution of vegetation patches would be more lifelike.

Evaluations
To evaluate the vegetation mapping results, metrics including overall accuracy (OA, Equation (2)), Cohen's kappa (kappa, Equations (3) and (4)), and the original confusion matrix were computed based on the predicted and real class of the reserved test points. In addition, to further analyze the mapping uncertainty, we computed the uncertainty of the base map layer via the confusion index (CI, Equation (5) where e, and N in Equations (2) and (3) are number of correct predictions, and number of total observations; p e , k, n r k , and n p k in Equations (3) and (4) are the hypothetical probability of chance agreement, the class predicted, number of observations of class k, and number of predictions of class k, respectively; µ max and µ max−1 in Equation (5) are the largest and the second largest probability values of each pixel of all possible classes generated from the bagging framework.

Results
Following the above workflow, we produced the map of vegetation types (seven classes), subtypes   An uncertainty analysis showed that 83.54% of the mapping area in DzB (according to the subtype map) was in complete agreement (CI ≤ 0.66), and increased to 94.96% if partial agreement (0.66 < CI ≤ 0.90) was included. For NZ, 89.71% and 71.20% of the mapping area showed complete agreement, which increased to 96.88% and 89.70% when partial agreement was included ( Figure A1). We found no significant relationships between distance to sampling points and uncertainty, indicating that uncertainty variations may largely stem from the predictive power of features and the number of observations in each class.
We also recorded the time cost of computation of model training, testing, and prediction, as presented in Table S5. The results showed that XGBoost almost cost less time than any other ML algorithm in prediction. However, in the process of model training and testing, XGBoost did not necessarily perform better than the others. An uncertainty analysis showed that 83.54% of the mapping area in DzB (according to the subtype map) was in complete agreement (CI ≤ 0.66), and increased to 94.96% if partial agreement (0.66 < CI ≤ 0.90) was included. For NZ, 89.71% and 71.20% of the mapping area showed complete agreement, which increased to 96.88% and 89.70% when partial agreement was included ( Figure A1). We found no significant relationships between distance to sampling points and uncertainty, indicating that uncertainty variations may largely stem from the predictive power of features and the number of observations in each class.
We also recorded the time cost of computation of model training, testing, and prediction, as presented in Table S5. The results showed that XGBoost almost cost less time than any other ML algorithm in prediction. However, in the process of model training and testing, XGBoost did not necessarily perform better than the others.

Discussion
In this study, we produced high-resolution (~1 km 2 ) multiclass (>40 classes) vegetation maps in two regions (DzB and NZ) with relatively high accuracy based on a combination of extensive features and field survey data, using XGBoost as the ML classifier. As the confusion matrices of the two cases illustrated (Figure A2), the vegetation classes of DzB and NZ were effectively separated by our method. Because DzB and NZ consist of most of the terrestrial habitat types worldwide, it could be reasonable to say that the method used in this study has potential to be used in vegetation mapping in various terrestrial ecosystems.
To further assess our mapping results, we evaluated the climate change initiative (CCI) land cover map produced by the European Space Agency (ESA) [10], MODIS vegetation continuous fields (VCF) produced by the National Aeronautics and Space Administration (NASA) [11], and remap (RS online land cover mapping pipeline) [91], based on the test set. The general mapping procedures for these projects mainly depend on the surface spectral reflectance in RS imagery and limited

Discussion
In this study, we produced high-resolution (~1 km 2 ) multiclass (>40 classes) vegetation maps in two regions (DzB and NZ) with relatively high accuracy based on a combination of extensive features and field survey data, using XGBoost as the ML classifier. As the confusion matrices of the two cases illustrated ( Figure A2), the vegetation classes of DzB and NZ were effectively separated by our method. Because DzB and NZ consist of most of the terrestrial habitat types worldwide, it could be reasonable to say that the method used in this study has potential to be used in vegetation mapping in various terrestrial ecosystems.
To further assess our mapping results, we evaluated the climate change initiative (CCI) land cover map produced by the European Space Agency (ESA) [10], MODIS vegetation continuous fields (VCF) produced by the National Aeronautics and Space Administration (NASA) [11], and remap (RS online land cover mapping pipeline) [91], based on the test set. The general mapping procedures for these projects mainly depend on the surface spectral reflectance in RS imagery and limited environmental data (remap) [9,92]. We found that our approach was more accurate than the projects above (DzB vegetation types: OA CCI = 0.760, kappa CCI = 0.334; OA VCF = 0.609, kappa VCF = 0.334 and OA remap = 0.156, kappa remap = 0.191; OA this study = 0.907, kappa this study = 0.821; and for NZ habitats: OA CCI = 0.381, OA VCF = 0.771, kappa VCF = 0.473, OA remap = 0.905, kappa remap = 0.817, OA this study = 0.946, kappa this study = 0.830). The evaluations of the compared projects were even lower when the number of classes increased (Tables S6-S8), indicating that our method outperformed them although they are effective in large scale mapping with a limited number of classes (< 20 classes) [9,92].
As the growth of plants is affected by both biotic and abiotic factors, a combination of environmental data and RS features could contribute to vegetation mapping, especially where the spectral differences of some classes are subtle or the effect of the microenvironment is significant. Here, we compared the mapping results with and without environmental data. For mapping without environmental data, the climatic, edaphic, and geographic data, as well as several MODIS products or indices (LST, Snow Index, TVDI, BCI, SMI, and SRSI) were removed from the feature pool. The feature selection was re-conducted, followed by training multiclass XGBoost bagging models directly on vegetation subtypes  Table 1). The differences in evaluation were subtle (~0.05 OA) when the number of classes was small, but the gaps were large (~0.10 OA) as the number of classes increased. Meanwhile, the differences in sensitivity (a metric, true positive/condition positive) showed that natural vegetation types, e.g., deserts (−0.08 on average, associations of DzB), grasslands or meadows (−0.134 on average, associations of DzB), and forests and scrubs (−0.170 on average, vegetation classes of NZ), presented an obvious decrease in classification accuracy, while the sensitivities of artificial types like agricultural land or glaciers were almost the same or even increased. Also, without the information on the bodies of water, the sensitivities of wetlands suffered great losses of −0.333 in DzB and −0.666 in NZ. Thus, for cases with numerous vegetation classes, the mapping results could be more accurate if incorporating environmental data. To compare the performance of XGBoost with other prevailing benchmark ML classifiers, we selected extremely randomized trees (extra-trees, ET), GBM, random forest (RF), quadratic discriminative analysis (QDA), SVM, and kNN as comparing models in this study. All the classifiers were programmed with scikit-learn (version 0.19.1), by setting all parameters to default [89]. The feature selection was conducted individually for each model on the extensive feature pool, with the plus-L-minus-R search removed if the inner feature importance score was unavailable (kNN, SVM, QDA). Then, we built multiclass predicting models using a bagging framework, the same as XGBoost, with the training set, and tested with the test set directly on vegetation types (DzB), vegetation subtypes (DzB), associations (DzB), habitats (NZ), and vegetation classes (NZ). The calculations were repeated 16 times, by setting the base model number to 72 in the bagging framework, considering the computing limitations. We found that the DT-based methods, i.e., XGBoost, ET, RF, and GBM could provide promising results (~10 classes,~0.9 OA;~40 classes,~0.7 OA), but kNN, SVM, and QDA could only perform fairly well when the number of classes was small (Figure 7). These results evinced the extendibility of DT-based algorithms in vegetation mapping. To further compare the DT-based methods, we re-conducted the mapping processes using the bagging framework, and setting the base model number to 610. Then, we computed the uncertainty according to Equation (5). The comparison of mean uncertainty values indicated that XGBoost performed the best in all the cases ( Table 2). In addition, the percentages of area of complete agreement or little agreement were the highest or lowest when using XGBoost as the base model ( Figure S4), highlighting the accuracy and robustness of the mapping results of XGBoost among the DT-based methods. Furthermore, a comparison of the computing time showed that the predicting time of XGBoost (using GPUs) was less than other compared methods in most cases. Although during model training and testing XGBoost was not the most time-saving ML algorithm, as prediction is crucial and the most time-consuming part in the mapping processes when the target resolution is high, it is reasonable to say that XGBoost is efficient in vegetation mapping. Compared with its predecessor GBM, XGBoost improved a lot in model performances, especially under conditions with numerous classes. Further evidence could be found in the realm of ecology. For example, Sandino, Pegg, Gonzalez and Smith [24] found that with reliable field data, XGBoost can distinguish forests infected with pathogens as high as OA = 97.32%. Dong, Xu, Wang and Pu [23] also successfully mapped submerged plants using XGBoost. Therefore, considering the advantages in programming and computing, XGBoost is a suitable method for vegetation mapping.
of mean uncertainty values indicated that XGBoost performed the best in all the cases ( Table 2). In addition, the percentages of area of complete agreement or little agreement were the highest or lowest when using XGBoost as the base model ( Figure S4), highlighting the accuracy and robustness of the mapping results of XGBoost among the DT-based methods. Furthermore, a comparison of the computing time showed that the predicting time of XGBoost (using GPUs) was less than other compared methods in most cases. Although during model training and testing XGBoost was not the most time-saving ML algorithm, as prediction is crucial and the most time-consuming part in the mapping processes when the target resolution is high, it is reasonable to say that XGBoost is efficient in vegetation mapping. Compared with its predecessor GBM, XGBoost improved a lot in model performances, especially under conditions with numerous classes. Further evidence could be found in the realm of ecology. For example, Sandino, Pegg, Gonzalez and Smith [24] found that with reliable field data, XGBoost can distinguish forests infected with pathogens as high as OA = 97.32%. Dong, Xu, Wang and Pu [23] also successfully mapped submerged plants using XGBoost. Therefore, considering the advantages in programming and computing, XGBoost is a suitable method for vegetation mapping.    As the plant community is characterized by its dominant species, the simplified field survey procedures we advocated could be an alternative solution to the second problem: the paucity of vegetation field survey data due to the costly, labor-intensive and time-consuming fieldwork. With targeted information on the main species aided by carefully taken ground photographs, the simplified field survey could successfully collect the information needed for vegetation mapping with reduced labor intensity, which was proven in our experiment where RLCs from the simplified method were almost the same as those from a standard investigation (R 2 = 0.97, p < 0.01) ( Figure S5).
By far, the studies on vegetation mapping showed no highly accurate results with numerous classes like ours [93]. By comparison, our method performed better, partly because of the extensive features we computed. Interdependency between features may be of potential concern for some mapping scientists. However, first, with an effective feature selection method, highly redundant or irrelevant features could be removed; second, due to improvements on the algorithm level (including the bootstrapping design of the bagging framework to control overfitting), our method could eliminate the possible negative effect of collinearity [21,77,80]. Some features may be pointless to ecological explanation, but the interpretability of variables should be prioritized in explanation-oriented research [94], and for a mapping perspective, more attention should be paid to the predictive power and the generality of the proposed methods instead; in other words, evaluation standards should be question-oriented.
Although outstanding in performance, our workflow still has room for further improvements. First, the simplified field survey might not be feasible in tropical and subtropical regions where the communities are more diverse and complex with fewer dominant species, defining many of which is quite hard if not impossible. Therefore, the accuracy of species information and RLCs derived from ground photos could not be guaranteed. In light of this, we suggest to use the simplified field survey procedures in temperate and polar zones. Also, the RLCs derived from ground photos might contain bias without careful treatment of plant overlaps. Second, the feature pool could be more extensive. By incorporating data from various satellites or ground monitoring sites, more details about the spectral characteristics of vegetation or environmental conditions could be unveiled, that is beneficial to enhancing performance. A change of phenology patterns or temporal features could contribute to discriminating vegetation classes [26,95]. To detect accurately, RS reflectance data with a shorter cadence (~8 day or less) should be employed, on which frequency analysis methods (e.g., Fourier transform) could reveal the patterns. Third, determining the scale of a vegetation map is still problematic. Since the time of using RS techniques in vegetation mapping, the mismatch of scales between field survey data and RS data is still an unsolved problem. Generally, the scale of a large part of RS or environmental data (>500 m) is far beyond the scale of field survey (1-100 m), because one can only accurately identify plant species within~10 m boundary. The basic logic of field surveying in vegetation mapping is to choose one point within a certain region as representative. However, the process could not be bias-free in the sight from an adult height. In ML fashion vegetation mapping, the scale should be accordant with RS data. With the development of RS technology, ever-increasing satellite data with higher resolution (<20 m), indeed close to the scale of field survey, have been generated, like reflectance data from Sentinels, GaoFens, and WorldViews. Thus, a robust fusion method of low-resolution and high-resolution data should be considered in feature computing. More object-based methods and convolutional neural networks (CNN)-based feature extraction methods, as approaches to combining different data, should be addressed [96,97]. Also, using an unmanned aerial vehicle (UAV) may be an ideal solution to this problem [76]. With the sight from a higher elevation, photos from UAVs could provide information with the scale between traditional field surveys and RS, which may be helpful to ensure the field survey data being more accurate and representative. In this sense, the necessity of data fusion methods in feature computing should be more emphasized. Lastly, although the workflow was tested on two different cases, the extendibility may still need further assessment through more cases. However, due to the lack of sound vegetation inventory data, testing the method with more cases is quite hard at present. Besides, as our cases included most of the terrestrial habitat types, it is safe to say that our method shed some light on future vegetation monitoring.
As supported by various studies in monitoring land use, habitat, or vegetation cover changes, etc., the ground photography-aided methods have broader application prospects [98][99][100]. In recent years, citizen science (or crowd sourcing) is becoming an alternative data source for plant phenology, as well as for wild life observation and disease monitoring [101]. Numerous projects or monitoring platforms have been launched or built where volunteers all over the world contribute their ground photos useful to researchers, such as PhenoCam [102], Penguin Watch [102], eBird [103], Season Spotter [104], and Field Photo Library by SciStarter [9]. With the workflow in this study, it is possible to launch and build timely vegetation monitoring programs in which volunteers upload ground photographs with geographical references and identify the main species with the guidance of experts. In addition, with an efficient ML algorithm, the platform could automatically map the vegetation in high resolution with relatively high accuracy based on extensive features. Indeed, increasing demands on computing power, especially when dealing with raster imagery of a large spatial extent, may be a potential concern. Fortunately, with emerging advancements in computational technology (e.g., super computers and heterogeneous computing devices) and revolutionary cloud computing platforms like GEE, ML model training and predicting could be accelerated [14,105]. Thus, with the advanced technologies, relatively high accuracy, and potential extendibility, building high-resolution vegetation monitoring systems using the workflow we proposed in various spatial scales (even worldwide) matching various needs could be a possible and important path we may consider in the next step.

Conclusions
In this study, we produced high-resolution multiclass vegetation maps using extensive features and vegetation data from simplified field surveys or simulation through XGBoost models in the cases containing most of the habitats of terrestrial ecosystems. The overall accuracies of DzB (0.907, for vegetation types; 0.801, for vegetation subtypes; 0.748, for associations) and NZ (0.946, for habitats; 0.703, for vegetation classes) were higher than many of the proposed vegetation mapping projects.
The findings indicated that our approach has potential in producing high accuracy vegetation mapping in various terrestrial ecosystems. In addition, it opened a possibility of building a timely vegetation monitoring platform with the aid of citizen science programs, Google Earth Engine, and advanced computing technologies.  Table S1 (DzB) and S2 (NZ) in Supplementary Materials.