Mapping Urban Land Cover of a Large Area Using Multiple Sensors Multiple Features

: Concerning the strengths and limitations of multispectral and airborne LiDAR data, the fusion of such datasets can compensate for the weakness of each other. This work have investigated the integration of multispectral and airborne LiDAR data for the land cover mapping of large urban area. Different LiDAR-derived features are involoved, including height, intensity, and multiple-return features. However, there is limited knowledge relating to the integration of multispectral and LiDAR data including three feature types for the classiﬁcation task. Furthermore, a little contribution has been devoted to the relative importance of input features and the impact on the classiﬁcation uncertainty by using multispectral and LiDAR. The key goal of this study is to explore the potenial improvement by using both multispectral and LiDAR data and to evaluate the importance and uncertainty of input features. Experimental results revealed that using the LiDAR-derived height features produced the lowest classiﬁcation accuracy (83.17%). The addition of intensity information increased the map accuracy by 3.92 percentage points. The accuracy was further improved to 87.69% with the addition multiple-return features. A SPOT-5 image produced an overall classiﬁcation accuracy of 86.51%. Combining spectral and spatial features increased the map accuracy by 6.03 percentage points. The best result (94.59%) was obtained by the combination of SPOT-5 and LiDAR data using all available input variables. Analysis of feature relevance demonstrated that the normalized digital surface model (nDSM) was the most beneﬁcial feature in the classiﬁcation of land cover. LiDAR-derived height features were more conducive to the classiﬁcation of urban area as compared to LiDAR-derived intensity and multiple-return features. Selecting only 10 most important features can result in higher overall classiﬁcation accuracy than all scenarios of input variables except the feature of entry scenario using all available input features. The variable importance varied a very large extent in the light of feature importance per land cover class. Results of classiﬁcation uncertainty suggested that feature combination can tend to decrease classiﬁcation uncertainty for different land cover classes, but there is no “one-feature-combination-ﬁts-all” solution. The values of classiﬁcation uncertainty exhibited signiﬁcant differences between the land cover classes, and extremely low uncertainties were revealed for the water class. However, it should be noted that using all input variables resulted in relatively lower classiﬁcation uncertainty values for most of the classes when compared to other input features scenarios.


Introduction
Detailed knowledge of the land cover types and their aerial distribution are essential components for the management and conservation of the land resource and are of critical importance to a series of studies such as climate change assessment and policy purpose [1][2][3].
In recent decades, satellite remote sensing has exhibited its ability to achieve the land cover information with different temporal and spatial scales in the urban area. A multispectral satellite image can collect spectral information of land surfaces, and supply extra advantages to discriminate differences between urban land cover classes [4][5][6][7]. Even though many studies have successfully employed multispectral data for the classification of urban land cover, classification accuracy is more likely to be lower using spectral signature alone in the urban environment as compared to other environments such as forest environment. This is due to the fact that the urban environment possesses larger spectral and spatial heterogeneity of surface materials and the more complex pattern of land use [8]. In addition to devoting attention to the improving classification techniques, the development of input variables can be treated as an alternative way to improve the classification accuracy of a land cover map [7,9,10].
Aside from the spectral information, spatial features concerning geometric and textural, etc., have been incorporated into the classification of land cover. Many previous studies showed that the addition of spatial features had been found to be valuable for improving the performance [10][11][12][13]. In particular, spatial features derived from Attribute Profiles (APs) have attracted more attention in the classification due to the capabilities for providing the complementary information of spectral features [14,15]. APs is an extension of Morphological Profiles (MPs), aiming to overcome the limitation of MPs that is fit for providing multilevel variability of structures in an image. Recent studies performed the fusion of spectral and spatial features with extended APs, and pointed out its effectiveness [15,16]. While the classification result using the union of spectral and spatial features can present a good representation of urban land cover, some urban land cover types with similar materials prove difficult to be identified by using multispectral image alone. One solution could be to include an additional third dimension into the classification [9].
The availability of the Light Detection and Ranging (LiDAR) system succeeds to map vertical structure of surface objects. The LiDAR systems emit laser pulses using the wavelength in near infrared and record the returned laser pulse signals after backscattered from the targets [17][18][19]. The LiDAR system plays an increasingly vital role in the urban land cover mapping due to its power to acquire the vertical structure of surface objects with high positional accuracy. Different LiDAR-derived features can be extracted from LiDAR data, including LiDAR-derived height, intensity and multiple-return features [18,20,21]. LiDAR-derived features can describe the 3D topographic features of the earth surface, and a range of LiDAR-derived height features such as normalized DSM (nDSM) and height variation, has proved its help in improving map accuracy [18,20,22,23]. Intensity data as the radiometric component of LiDAR data, which is the peak backscattered laser energy from the illuminated object, can provide additional information for the land cover classification [24,25]. Song et al. [26] initially investigated the possibility of intensity data as the input features for the urban land cover mapping, and concluded that intensity data could be conducive to the land cover classification. In addition, a small number of researchers investigated the contribution of multiple-return features to the land cover classification [27,28]. Charaniya et al. [27] adopted the difference between the first and last return as the auxiliary feature, leading to an accuracy improvement of 5% to 6% of roads and buildings classes. However, it has been shown that classification accuracy derived from using LiDAR data alone is limited owing to the lack of spectral information.
The combination of multispectral and LiDAR data can compensate for the shortcomings of each other, and generate better classification results than those obtained from using an individual sensor, considering the merits and limitations of multispectral and LiDAR data. Despite the fact that many studies have dedicated to combining multispectral and LiDAR data for improving classification performance, there are still some problems worth our deep concern. Firstly, the valuable information acquired from LiDAR data involves in elevation, to the author's knowledge, intensity and multiple-return features, whereas most of the research was based on one or two feature types derived from LiDAR data for classification [18,21,22,[29][30][31][32][33][34]. Therefore, there is limited knowledge concerning the integration of three feature types (i.e., height, intensity and multiple-return) acquired from airborne LiDAR and multispectral for the classification task. Secondly, while some LiDAR-dervied features such as mean absolute deviation (MAD) from median height based on height or intensity information have been increasingly used for the classification of tree species, only a few studies have used such features to aid in the classification of urban land cover [35]. APs were usually used to get the spatial information from the high-resolution images, whereas there is a lack of research dedicated to extracting spatial information from SPOT-5 image, and exploring its effects on the classification accuracy especially for the urban area. Lastly, another important aspect that should be noticed is that classification uncertainty as an additional accuracy measure can be employed to evaluate the spatial variation of the classification performances [36,37]. However, there is very little research that has explored the impacts of the integration of multispectral and LiDAR datasets on the classification uncertainty. This study is aimed at bridging these gaps.
In addition to evaluating the classification accuracy, the contribution of each feature to the classification accuracy was also explored by an assessment of the relative importance of all input features for the land cover classification in this study. The key blueobjectives of our study are presented as follows: (i) to investigate how much classification accuracy can be improved by integrating different input features provided by multispectral and LiDAR data; (ii) to quantify the relative importance of all input variables and explore the contribution of each feature to the classification accuracy; (iii) to assess the influence of different input features on the classification uncertainty.
Section 2 describes the study area and presents an overview of datasets as well as the preprocessing. Section 3 reports the methodological details, including feature extraction, classification algorithm, accuracy assessment in conjunction with classification uncertainty. The detailed experimental results are summarized in Section 4. Section 5 provides the discussion and summarizes the paper with remarks and future lines of the research.

Study Area
In this work, the study area is located in the central part of the city of Nanjing, China (Figure 1), which is situated on the south bank of the lower reaches of the Yangtze River. It extends approximately 150 km 2 (118 • 42 28 E-118 • 54 29 E, 32 • 2 40 -32 • 7 7 N), with altitude ranging between −7 m and 447 m above mean sea level. The climate of this area belongs to a humid north-subtropical climate with well-defined seasons [38]. The topography is characterized by mountains, hills, plains, and rivers. Rainfall averages 1033 mm per year, occurring mainly in summer. There is a variety of land cover classes in this area, including bare soil, buildings, cropland, grass and woodland (tree and shrub), which makes the study area as a representative test of land cover classification performance for an urban area.

LiDAR Data
Airborne laser scanning data was acquired by Optech ALTM Gemini instrument on 21 April 2009. The wavelength and frequency of the laser pulse were 1.06 µm and 167 kHz, respectively. The returned laser pulse signals after backscattered from the targets were recorded with the mean point density of 4.1 points/m 2 and up to four returns. The dataset contains point cloud data from a series of overlapping flight lines, each with an overlap of 20-30% between adjacent flight lines. LiDAR data provided accurate height data and contained multiple returns per laser pulse and the intensity information, which reflects the surface characteristics. The identified overlap and noisy points were removed from all LiDAR point clouds. Raster layers for both bare earth surface, digital elevation model (DEM), and the first return, digital surface model (DSM), were generated from a triangular irregular network (TIN) of the LiDAR data points with the pixel size of 10 m. Additionally, by subtracting the DSM with DEM, normalized DSM (nDSM) was created, which represents the height of the above-ground surface features.

SPOT-5 Image
A SPOT-5 multispectral image was acquired over the study area on 1 January 2010. The SPOT-5 data set was composed of shortwave infrared (SWIR) band (1.58-1.75 µm) with 20m spatial resolution, and three bands with the spatial resolution of 10m covering green (0.50-0.59 µm), red (0.61-0.68 µm), and near-infrared (0.78-0.89 µm). It was orthorectified and resampled to a 10 m pixel size using the software ENVI.
The details of ground reference data is shown in Table 1. We have collected them by using the visual interpretation of the orthorectified SPOT-5, aerial photograph and high-spatial-resolution images from Google Earth (http://earth.google.com/).

Spatial Features
Attribute Profiles (APs), an extension of the Morphological profiles (MPs), were employed to extract spatial features from the SPOT-5 image. APs is a multi-level decomposition of an image with a sequence of morphological attribute filters, which is well suited to modeling the geometrical characteristic instead of the size of the objects [15,39,40]. APs can be formulated as a concatenation of n morphological attribute thinning (δ C ) and n attribute thickening (φ C ), obtained by processing the image I according to a criterion T: Different spatial information, belonging to features present in the nDSM data, can be obtained by APs by many different types of attributes and criterions considered. In this work, two attributes are exploited: area (a) of the regions and standard deviation (s) of the pixels' grey-level values in the regions. The area can extract information on the scale of the objects, while the standard deviation is associated with the homogeneity of grey-level values of the pixels. In terms of each attribute, the rational values of λ should be selected to initialize each APs. To solve this problem, an automatic scheme is introduced [41]. As far as the λ s is concerned, it is initialized in a way that involves many deviations in the SPOT-5 image data, which can be expressed as follows: where σ min , σ max and ν s are separately assigned to 2.5, 20.5 and 6, which results in thickening and thinning operations. The spatial resolution of the SPOT-5 image ought to be considered in adjusting λ a with respect to area attribute, which is mathematically formulated as: where a min and a max are initialized with respective values of 1 and 22 with a step size ν a of 7, and r represents the pixel size of the remote sensing data. Here, all the bands except SWIR band provided by SPOT-5 image were employed to extract spatial features through the APs. As a consequence, a certain number of thickening and thinning operations are acquired for the area attribute.

LiDAR-Derived Features
Given that LiDAR and multispectral can provide complementary information, the LiDAR-based features were extracted to classify land cover except spectral features. The LiDAR feature vector comprises multi-return and intensity LiDAR features, where multi-return LiDAR features can be separate into height-based, return-based LiDAR features. The resulting feature vector f LiDAR can be formulated as follow: (

1) Height-based Features
Height information in LiDAR data has demonstrated an importance for the precise description of vertical structure [18,35]. In this regard, height-based LiDAR features were extracted based on 3D points heights within each pixel of 10 m spatial resolution.
(2) Intensity-based Features Given the high separability of spectral reflectance among different materials in the LiDAR sensor's spectral range (i.e., near-infrared spectral region), intensity, the radiometric component of LiDAR data, can be added as an another feature useful for classifying land cover [26,42]. Intensity variables used in this study were presented: mean, mode, standard deviation, variance, CV, skewness, kurtosis, AAD, L-moments (l1, l2), L-moment CV, L-moment skewness, L-moment kurtosis, and percentile values as with the Height-based features.

(3) Multiple-return Features
Based on the geometry of illuminated surfaces, several types of returns could be achieved in terms of a single pulse emission. However, there is a limited amount of research that has shown the potential use of multiple returns for land cover classification [43]. In this study, the following multiple-return variables were adopted: • abovemean: percentage first returns above mean. • abovemode: percentage first returns above mode. • allabovemean: (all returns above mean height)/(total returns). • allabovemode: (all returns above height mode)/(total returns). • afabovemean: (all returns above mean height)/(total first returns). • afafbovemode: (all returns above mode height)/(total first returns).
Moreover, it should be noted that all the datasets were co-georeferenced and resampled to a 10 m pixel size, which ensures the consistency of corresponding spatial position.
Different feature sets can contribute to improving classification accuracy in a variety of ways. Feature combination can integrate the advantages of different feature sets and has proved to be effective in many projects of urban land cover classification. To gain a detailed understanding of what level of classification result could be obtained by using different scenarios of input features using Random Forest classifier, seven different scenarios were employed for the image classification experiments, as shown in Table 2.

Algorithm Principle
Random Forests is a popular ensemble learning approach that has been proved to enhance classification performance significantly and robust to the noise [44,45]. It must be pointed out that Random Forest has been successfully used for land cover mapping using multisource remote sensing data [46]. Random Forests is an ensemble of many classification and regression trees (CARTs) [47]. In training, L CARTs are grown, and each tree is created as follows: (1) bootstrapped samples of the original training set are generated; (2) the tree with no pruning is grown by using the randomly selected feature at each code to do the split. During the classification phase, a new pixel is put down each CART tree, and the output is determined by giving a majority solution over all the trees. As aforementioned, two parameters are needed to be carefully determined: the number of trees (n tree ) and the number of features used for the best split at each node (m try ). In this work, n tree is set to be the default value of 500 [46]. m try is fixed to the square root of the number of input features.

Feature Importance
Random Forests can generate a quantification of the relative importance of input features, which is of great value for land cover classification using multi-source variables.
The importance of input feature X i can be described as follows. For each tree t of the forest, bootstrapped samples of the training set are applied to the model training, about one-third of the remaining sample set, refer to an out-of-bag (OOB) sample, is used to assess the model performance. Denote by errOOB t the error on this OOB t samples. errOOB i t is defined as the error, which is computed by using the randomly permuted values of X i in OOB t . The importance of variable of X i can be expressed as: where the sum is over all CARTs t of the Random Forests, and n tree is the amount of CARTs included in the Random Forests.

Accuracy Assessment
Ground reference data was divided into training and validation. 5514 reference points were randomly selected for the training, and the remaining 90% reference data were used for the validation. The number of pixels belonging to training and test data is depicted in Table 1. Classification accuracies were summarized based on confusion matrices and derived accuracy metrics. The accuracy metrics consisted of overall accuracy (OA), class-specific accuracy (CA), User's accuracy (UA) and Producer's accuracy (PA). The UAs and PAs present information about the commission and omission errors in connection with individual classes, respectively, while the OA represents the percentage of correctly classified pixels. We run 30 times to report the averaged classification accuracies for the purposes of avoiding biased evaluation. In addition, the McNemar z-score was performed to quantitatively measure the differences between different classification scenarios.
The McNemar z-score were conducted by the results generated from only one independent run, whose classification result was closest to the averaged accuracy.

Classification Uncertainty
Classification uncertainty make the use of the spatial variation of the classifier performance and can be regarded as an advantageous measure to supplement the statistical accuracy metrics from the confusion matrix [48,49]. It should be noted that Random Forests has been employed to provide uncertainty information in the classification of land cover. A significant output of the RF classifier is a probability vector. It contains the class probabilities associated with a pixel x for all classes under consideration: p x = (p(1), p(2), ..., p(c)), where p(i) represents the probability of a pixel being classified into class i, and c is set to be the total number of land cover categories (seven in this study).
Shannon entropy (H) as a quantitative measure of uncertainty can provide the information contained in the probability vector P x , which has been shown the capable of indicating the classifier performance [50]: The value of entropy H can reach the maximum value 0.85 when all classes have equal probability (p(i) = 1/7), whereas entropy is equal to 0 for a pixel whose maximum probability is 1. The value of uncertainty was scaled to the interval [0, 1] in this study. To acquire the uncertainty values per class, we calculate the median values of entropy H per land cover category.

Classification Using LiDAR Data Alone
First, the classification results of only LiDAR are presented. As shown in Figure 2, the exclusive use of LiDAR-derived height features (Scenario 1 in Table 2) gave rise to the lowest overall classification accuracy 83.17%. The inclusion of intensity information (Scenario 2 in Table 2) increased overall map accuracy to 87.09%, 3.92% higher than Scenario 1. The result of McNemar z-score statistical test suggested the statistically significantly improvement with a 95% confidence level. Another 0.60% improvement was gained with the incorporation of multiple-return features (Scenario 3 in Table 2). In spite of the slight improvement, the result from the McNemar z-score statistical test between Scenario 2 and Scenario 3 indicated that the addition of intensity measures could achieve significantly better classification results. Table 3 displays the accuracies per class using LiDAR data only. It is apparent that using all input features provided by LiDAR data (Scenario 3) produced better or comparable producer's and user's results compared to that of Scenario 1 and Scenario 2. While there are five out of the seven land cover categories were recorded with User's accuracies higher than 80%, and Producer's results higher than 80% were produced for four out of the seven land cover classes. The urban land cover classes can be reclassified into three groups. The first group is comprised of classes with high producer's and user's accuracies, including building land, cropland, water, and woodland. The second group consists of classes with higher user's accuracy but lower producer's accuracy, the grassland class is included. From Table 4, this can be attributable to the fact that grassland was mainly misclassified bare soil and building land. The third group contains the classes with lower producer's and user's results, including building land and road. We can take bare soil as an example to explain this phenomenon. The class bare soil was confused by falsely including pixels from cropland and road, resulting in lower user's accuracy. Bare soil was mainly misclassified as building land and road, thus causing lower producer's accuracy.
The land cover map produced by using all LiDAR-derived features revealed some problems (Figure 3a). Woodland was misclassified as building land over a large area. Bare soil was not well distinguished from other land cover classes and was primarily misclassified as building land.

Classification Based Only on SPOT-5 Image
We explored the potential of spectral and spatial features derived from the SPOT-5 image for classifying urban land cover classes. The potential of spectral information for classification (Scenario 4 in Table 2) was firstly investigated, leading to an overall classification accuracy of 86.51% (Figure 2). In contrast, the overall classification increased by 6.03% when spatial features were concatenated to spectral measures (Scenario 5 in Table 2). The result of the McNemar z-score statistical test indicated Scenario 5 significantly outperformed the result achieved with dataset Scenario 4 at the 95% level.
Averaged producer's and user's accuracies per land cover category with SPOT-5 image only were presented in Table 3. As we can see from the table, producer's and user's accuracies with approximate or higher than 90% were recorded for building land, water, and woodland classes by using spectral features only. Producer's and user's accuracies were substantially increased when combined with spatial information. The increases were over 10% in all classes except water, which has been well discriminated from other land cover classes. The most noticeable increases in user's accuracies were for road (24.32%), cropland (17.21%), bare soil (17.09%), and grassland (11.51%). It is interesting to observe that although road class achieved relatively high user's accuracy, the producer's accuracy was very low (59.18%). What we can see from the Table 5, class road was mainly misclassified as building land. This confusion may be caused by the fact that class road looks spectrally similar to building land, which makes it very difficult to identify these two land cover classes by using features derived from SPOT-5 image only. Figure 3b shows the classification map derived from the integration of spectral and spatial information. Even though most of the classes were well identified, the road class still has the problem on this map. Many of the road sites were misclassified as building land. This may be in part owing to the reason that there are mixed pixels of building land and road in the urban area. Table 5. Example confusion matrix for 7-class urban land cover classification derived from SPOT-5 image and its spatial information using the Random Forests classifier (Scenario 5).

Reference Data
Tot   Table 2 were implemented. It can be seen from Figure 2 that classification accuracy derived from Scenario 6 using all available input features except spatial information was capable of classifying different land cover classes to an overall classification accuracy of 91.96%, which is similar to the classification performance achieved from the best classification using the SPOT-5 image. When the spatial features were combined for classification, all features together provided the maximum ability to separate different land cover classes, resulting in an overall map accuracy of 94.59%, and it is significant at the 95% level as compared to Scenario 6. Table 6 shows the averaged producer's and user's values derived from the combination of SPOT-5 and LiDAR data. When spatial features were employed for classification, the producer's and user's results tended to have significant increases for all of the urban land cover categories except already well-discriminated water class. The most noticeable increases were for bare soil (10.49%) and grassland (9.18%), concerning the producer's accuracy. What stands out in this table is that using all input features together afforded the maximum discriminative power to distinguish land cover types, leading to the highest producer's and user's accuracies for almost all land cover classes. As the Table 7 illustrates, although some of the road sites were falsely labeled as building land, higher user's and producer's accuracies were obtained compared to the other scenarios of input features.
By a visual inspection of the classification map (Figure 3c) gained by using all input variables, there was a better representation of all land cover classes though there are still some pixels belonging to some road class that is misclassified as building land.

Feature Importance for Urban Scenes
Aside from evaluating the influences of different scenarios of input variables on the overall accuracies, an assessment of the relative importance of all input variables was carried out to explore the contribution of each feature to the overall classification accuracy.
By integrating SPOT-5 image with LiDAR data, 102 input variables were contained for each pixel (Scenario 7 in Table 2). To gain insight into the contribution of each input feature, we conduct a feature importance analysis. Figure 4 presented the resulting importance of the 102 variables. From the figure, the nDSM appears the most useful feature in the urban land cover classification. Furthermore, the feature importance scores for the following LiDAR-derived height features were also very high, including variance, cubic mean, 75th percentile value and so on. On the contrary, LiDAR-derived intensity and multiple-return features were found to have little measurable impact on the classification result. Regarding the spectral features derived from a SPOT-5 image, SWIR was the most discriminating. The top 10 most important variable consists of four LiDAR-derived height features (i.e., nDSM, variance, cubic mean, 75th percentiles), SPOT-5 SWIR band and five spatial features.
To evaluate the contribution of each variable to the map accuracy, the number of input variables was subsequently reduced on account of the least importance ranking scores, and the corresponding overall classification accuracy was calculated ( Figure 5). From the figure, it can be seen that the overall map accuracies tended to decrease slightly when the first 92 lowest important features were removed. However, the overall accuracies began to fall off rapidly when the remaining 10 most important features are eliminated one by one.     Figure 6 illustrated the relative importances of all available variables for each land cover class. What we can discover from this figure is that the per-class variable importance varied in a large extent. Regarding the bare soil class (Figure 6a), the most important features involved in height features were nDSM and variance, SPOT-5 SWIR band and some spatial features. However, LiDAR-derived intensity and multiple-return features were not very relevant for the classification of bare soil class. As far as the building land class is concerned (Figure 6b), high values of importances were focused on the elevation and spatial variables. The nDSM belonging to height features exhibited an extremely high value of relevance in terms of the cropland class (Figure 6c). Furthermore, it is worth noting that LiDAR-derived intensity and multiple-return features seemed to be less valuable than other kinds of variables. As for the grassland class (Figure 6d), the features with higher values of importances covered all types of variables except multiple-return features. The feature relevances were more dispersed between height, intensity, spectral and spatial features for road class, which makes the road class more difficult to be distinguished (cf. Table 2). As shown in the (Figure 6f), the SPOT-5 SWIR band played the most important role in discriminating the water class. In addition, some of the height and spatial features also contributed greatly to the classification of the water class due to the higher relative importances. As for the woodland class (Figure 6g), the dominating features concentrated mainly on height and spatial features, among which the most important variables were the spatial features derived from SPOT-5 image.

Classification Uncertainty
As a complementary metric to the accuracy indices derived from the confusion matrix, classification uncertainty was implemented to provide another indicator to evaluate the classification quality within the acquired urban land cover map, which can be conducive to exploring and spatially locating the superiority of Random Forests with greater detail.
To evaluate the uncertainty of the produced classification maps, median values of the class-specific uncertainties were computed by means of H (Table 8). What is remarkable in this table is that the class-specific uncertainties showed obvious differences with respect to different scenarios of input variables. Results indicated that feature combination can tend to decrease classification uncertainty for different land cover classes, but there is no "one-feature-combination-fits-all" solution. In general terms, relatively high classification uncertainty was obtained for the classification results derived from using LiDAR data alone (i.e., Scenario 1-3). Scenario 7 using all input variables resulted in relatively lower classification uncertainty values for most of the classes when compared to other input features scenarios. The values of median uncertainty (H) differed greatly between the land cover classes in the study area.  Considering the uncertainty values of the land cover class water, all scenarios of input features showed low values of Shannon entropy H equal to 0. One more point we can conclude is that it is not always true that lower uncertainty values correspond to better classification accuracies for all land cover classes. For instance, Scenario 4 and 7 generated similarly low classification uncertainties for road class. However, although the classification uncertainty for class road was the lowest when Scenario 5 was used, road class showed very low classification accuracies in Table 8, which means that class road was misclassified but with little doubt about the final result. A similar result was found with respect to the grassland class. Regarding the woodland class, Scenario 5 and 7 all achieved the lowest value of uncertainty (0.03), corresponding to similar and higher class-specific accuracies. As for the bare soil class, the results on the classification uncertainty presented significant differences among the seven different scenarios of input features. The lowest value of classification uncertainty was obtained by using Scenario 5, but Scenario 5 and 7 all achieved similar and the relatively higher classification accuracy for the bare soil.
The frequency distribution of H was calculated for the incorrect and correct predictions as shown in Figure 7, and the mode of these distributions was discussed. As far as the correctly classified pixels are concerned, a high proportion of pixels were assigned to low values within the [0, 0.1] interval, and fewer correct predictions were associated with an uncertainty H above 0.5. This illustrates that a majority of correctly classified pixels were characterized by lower classification uncertainty, meaning that there was little doubt about the final classification result. Two observations are worth noting. First, correct predictions mostly had low uncertainties, independent of the input feature scenarios. Second, combining SPOT-5 and LiDAR gave rise to a higher proportion of low uncertainties in comparison with the single data alternatives, namely, the integration of SPOT-5 and LiDAR data decreased the uncertainty of classification (Figure 7a). A very large proportion of incorrectly classified pixels yielded the uncertainty values lying within the interval [0. 3, 0.9]. This implies that the class decision was uncertain if the mode voted for the incorrect class. Moreover, it was observed that there was a large difference between the seven scenarios of input variables (Figure 7b).
To spatially locate and analyze the merits and deficiencies of the classification results by different input features scenarios, the corresponding classification uncertainty maps were displayed in Figure 8. As the figure shows, distinct patterns in the distribution of classification uncertainty were depicted. It is clear that relatively higher values of Shannon entropy H were observed in the uncertainty map derived from LiDAR data alone, while the uncertainty map derived from Scenario 7 produced lower uncertainty values when compared to the uncertainty maps based on Scenario 3 and Scenario 5, especially for building land. The values of classification uncertainty tended to be higher in the eastern and south-western part of all the classification uncertainty maps, which were located in the peri-urban area. This phenomenon may have resulted from the mixed pixels, containing mixed characteristics from two or more land cover classes. Because of this, it can result in confusion in the process of classification, and assign approximate probabilities to these land cover class, which leads to higher classification uncertainty H.

Discussion
In this work, SPOT-5 and LiDAR data were used to classify different urban land cover classes using Random Forest classifier. We investigated the extent of improvement that various scenarios of input features can bring in classification tasks. Seven feature scenarios were used to incorporate the advantages of different feature sets, which is summarized in Table 2. When using LiDAR data only, height-based features generated the lowest classification accuracy when compared to other scenarios of input features. The addition of intensity information substantially improved the overall classification accuracy. By adding the return-based features, the overall classification was further improved. When using spectral features alone derived from the SPOT-5 image, the second lowest overall classification accuracy was achieved. The overall map accuracy increased by up to approximately 5% when spatial features were added. It should be noted that by combining SPOT-5 and LiDAR data and using all available features, we obtained the maximum power to discriminate different land cover categories, resulting in the best classification result.
By implementing analysis of the relative importance of all input variables, we can conclude that the nDSM from LiDAR-derived height features appears to be the most important feature in the land cover classification; this is similar to the findings of some previous studies [18,43]. In addition, the feature importance scores for the following LiDAR-derived height features were also high, including variance, cubic mean, 75th percentile value and so on, which means that those LiDAR-derived height features are also of great importance to the urban land cover classification. However, experimental results in this study suggested that LiDAR-derived intensity and return information contributed less to the increment of overall classification accuracy. The SPOT-5 SWIR band is the most beneficial band for the spectral information. On the other side, many spatial features also achieve high feature importance scores. The different input features have different contributions to the overall map accuracy ( Figure 5). It is not always the case that more input variables ought to generate higher overall classification accuracy. In this study, it has been demonstrated that the overall classification accuracy obtained by using only the 10 most important features is higher than all scenarios of input variables except Scenario 7 using all available input features. Moreover, feature importance per class revealed that the per-class variable importances appeared to be greatly variable.
Classification uncertainty analysis can be used as a tool to evaluate the spatial variation of classification performance, and it has been employed in some previous research [48,50]. However, there are very few studies focusing on the impacts of the fusion of multispectral and LiDAR data on the classification uncertainty. The classification uncertainty analysis described in this study are a first step towards the evaluation of classification performance obtained by the fusion of multispectral and LiDAR data. Results of classification uncertainty analysis revealed that feature combination can tend to reduce the classification uncertainty for different land cover classes, but there is no "one-feature-combination-fits-all" solution. The values of uncertainty (H) showed large differences between the land cover classes. It is interesting that the water class has extremely low classification uncertainty, independent of different scenarios of input features. Using all input variables resulted in lower class-specific uncertainties for most of the land cover types as compared to other scenarios of input features. In addition to lower classification uncertainty, all input features can tend to generate a larger proportion of correctly classified pixels with lower uncertainty, which means that there was little doubt about the final decision if the pixel was allocated with correct land cover class. The spatial uncertainty analysis showed that there were higher values of classification in the peri-urban area owing to the effects of mixed pixels.

Conclusions
In this work, we explored the use of multi-source remote sensing data to map urban land cover, with a particular focus on available input variables provided by airborne LiDAR and SPOT-5 data. The integration of three feature types (i.e., height, intensity and multiple-return) derived from LiDAR data and multispectral image was firstly used to map the urban land cover. In addition to evaluating the feature importance of all input features for the land cover classification, we firstly explored the impacts of the fusion of multispectral and airborne LiDAR data on the classification uncertainty in this study.
The following findings can be concluded according to the experimental results: • We found that the integration of LiDAR and multispectral can provide complementary information and improve the classification performance. The addition of intensity and spatial features are of immense value for improving the classification accuracy. The exclusive use of LiDAR-derived height features produces the land cover map with the lowest map accuracy. The best result is obtained by the combination of SPOT-5 and LiDAR data using all input features. • Analysis of feature relevance indicated that LiDAR-derived height features were more conducive to the classification of urban area when compared to LiDAR-derived intensity and multiple-return features. While the nDSM was the most useful feature in improving the classification performance of urban land cover, the feature importance scores for the following LiDAR-derived height features was also very high, including variance, cubic mean, 75th percentile value and so on. Selecting only the 10 most important features can result in higher overall classification accuracy than all scenarios of input variables, except the input feature scenario using all available input features. As for feature importance per class, the variable importance varied to a very large extent. • Results of classification uncertainty suggested that feature combination can tend to decrease classification uncertainty for different land cover classes, but there is no "one-feature-combinationfits-all" solution. The values of classification uncertainty showed marked differences between the land cover classes. Lower uncertainties were revealed for the water class. Furthermore, using all input variables usually resulted in relatively lower classification uncertainty values for most of the classes when compared to other input features scenarios.
There are some possible developments of this study to be included: (1) to incorporate more beneficial three-dimensional features from LiDAR to further enhance the classification performance; (2) to explore the influence of feature selection on the accuracy and uncertainty of urban land cover classification; (3) to investigate the role of increasing the size of training sets as a possibility to improve the results.
Author Contributions: Jike Chen conceived and designed the experiments; Jike Chen performed the experiments, analyzed the data and wrote the paper. Peijun Du, Junshi Xia, Changshan Wu and Jocelyn Chanussot gave comments, suggestions to the manuscript and checked the writing.