Scattering Feature Set Optimization and Polarimetric SAR Classiﬁcation Using Object-Oriented RF-SFS Algorithm in Coastal Wetlands

: The utilization of advanced remote sensing methods to monitor the coastal wetlands is essential for conservation and sustainable development. With multiple polarimetric channels, the polarimetric synthetic aperture radar (PolSAR) is increasingly employed in land cover classiﬁcation and information extraction, as it has more scattering information than regular SAR images. Polarimetric decomposition is often used to extract scattering information from polarimetric SAR. However, distinguishing all land cover types using only one polarimetric decomposition in complex ecological environments such as coastal wetlands is not easy, and thus integration of multiple decomposition algorithms is an e ﬀ ective means of land cover classiﬁcation. More than 20 decompositions were used in this research to extract polarimetric scattering features. Furthermore, a new algorithm combining random forest (RF) with sequential forward selection (SFS) was applied, in which the importance values of all polarimetric features can be evaluated quantitatively, and the polarimetric feature set can be optimized. The experiments were conducted in the Jiangsu coastal wetlands, which are located in eastern China. This research demonstrated that the classiﬁcation accuracies were improved relative to regular decision tree methods, and the process of polarimetric scattering feature set optimization was intuitive. Furthermore, the scattering matrix elements and scattering features derived from H/ α , Yamaguchi3, VanZyl3, and Krogager decompositions were determined to be very supportive of land cover identiﬁcation in the Jiangsu coastal wetlands.


Introduction
Coastal wetlands are important parts of coastal zones, having great ecological service functions in maintaining biodiversity, producing food, regulating the climate, and providing tourism [1,2].With the continuous development of coastal wetlands, natural wetland areas are continuously reduced, and landscape patterns are seriously fragmented, which causes many problems, such as decrease in the value of wetland ecosystem services, accelerated extinction of rare and endangered species, and invasion of harmful species [3][4][5]. Therefore, the use of advanced technical methods for regular monitoring and basic status survey of coastal wetlands is essential for the protection and sustainable development of this ecological type [6].
by combining decision trees. In theory, the classification results using multiple trees are better than a single tree. The comparisons between the proposed method and two decision tree methods were also performed to prove this.

Study Area and Data
The experiment was conducted in the Jiangsu coastal area, which is located along the Yangtze River Basin in eastern China (Figure 1). The typical land cover types in the selected study area include sea, fish pond, irrigable land, reed and alterniflora, Suaeda salsa, rice paddy, river, road, and sand. Reed and alterniflora were classified into one category because the scattering characteristics of these two kinds of land covers were exactly similar, and the resolution of the used SAR data was relatively limited. Therefore, it was difficult to distinguish these two kinds of land covers in polarimetric SAR images, and these were thus regarded as a mixed land type. L-band fully polarimetric ALOS PALSAR data with a resolution of 9.37 m × 3.57 m were used in the experiment. In addition, QuickBird high-resolution optical images and Google Earth images were simultaneously acquired as auxiliary data for visual interpretation of the results. The polarimetric SAR data was preprocessed with terrain correction, geocoding, multi-look, and filtering. Here, a multi-look processing with 6:1 in the azimuth direction and the range direction was applied to improve the image readability, and Refined Lee filtering with a 3 × 3 window size was employed to reduce noise [35]. A total of 44,961 sample points were collected during fieldwork, among which 28,654 were used for the random forest model construction and 16,307 for classification results validation. Some photographs of typical land covers were taken during the fieldwork ( Figure 2). The setting of the sample points is shown in Table 1. L-band fully polarimetric ALOS PALSAR data with a resolution of 9.37 m × 3.57 m were used in the experiment. In addition, QuickBird high-resolution optical images and Google Earth images were simultaneously acquired as auxiliary data for visual interpretation of the results. The polarimetric SAR data was preprocessed with terrain correction, geocoding, multi-look, and filtering. Here, a multi-look processing with 6:1 in the azimuth direction and the range direction was applied to improve the image readability, and Refined Lee filtering with a 3 × 3 window size was employed to reduce noise [35]. A total of 44,961 sample points were collected during fieldwork, among which 28,654 were used for the random forest model construction and 16,307 for classification results validation. Some photographs of typical land covers were taken during the fieldwork (Figure 2). The setting of the sample points is shown in Table 1.  Scattering matrix, coherent matrix, and covariance matrix can all be used to represent polarimetric SAR data [36]. Each element in the matrix contains different polarimetric scattering information. Therefore, these matrix elements can be used for PolSAR classification. Equations (1)-(3) represent different matrix elements.   Scattering matrix, coherent matrix, and covariance matrix can all be used to represent polarimetric SAR data [36]. Each element in the matrix contains different polarimetric scattering information. Therefore, these matrix elements can be used for PolSAR classification. Equations (1)-(3) represent different matrix elements.
where S hh , S hv , and S vv are the elements of the polarimetric scattering matrix, is inner product operation, || is determinant of a matrix, S * vv represents the adjoint matrix of S vv ; the coherence matrix elements and the covariance matrix elements are obtained via the second-order operations of the scattering matrix elements. Each set of group elements of the three forms of matrices contains all the information in the full polarimetric SAR data. Therefore, using any group of matrix elements to participate in polarimetric scattering information extraction is equivalent to using any of the other Remote Sens. 2020, 12, 407 5 of 17 groups for the same purpose [36]. In this study, three scattering matrix elements, S hh , S hv , and S vv , were employed in classification.

Polarimetric Decomposition Features
Polarimetric decomposition is an important scattering information extraction approach in polarimetric SAR data application [35,37]. As a well-known and basic decomposition, the Pauli decomposition expresses the backscattering matrix S with the so-called Pauli basis: where 1 0 0 1 , 1 0 0 −1 , 0 1 1 0 , and 0 −i i 0 are the Pauli basis, S hh , S hv , S vh , and S vv are the elements of the polarimetric scattering matrix. With the reciprocity theorem and S hv = S vh , the Pauli basis can be reduced to the first three matrices. S hh + S vv , S hh − S vv , and 2S hv are associated with physical scattering mechanisms, which are odd-scattering, even-scattering, and volume scattering, respectively [35]. The total power of the polarimetric SAR is obtained as follows: Thus, the Pauli image, where |a| 2 corresponds to blue band, |b| 2 corresponds to red band, and |c| 2 corresponds to green band, represents all information in a PolSAR data and can be utilized to visual interpretation [36].
In addition to Pauli decomposition, many other polarimetric decomposition algorithms have been proposed, each of which may have their inherent defects or deficiencies. Moreover, polarimetric features obtained from different decomposition algorithms have different sensitivities to land covers [35]. Therefore, integrating multiple polarimetric decomposition algorithms will be an effective way of making polarimetric SAR data more suitable for complex coastal environments [23,24]. In this study, 93 polarimetric scattering features were extracted by using the 20 polarimetric decomposition algorithms listed in Table 2, and a total of 96 polarimetric features were obtained by adding 3 scattering matrix elements. These 96 polarimetric parameters were combined into a multi-band image, and then object-oriented segmentation and classification were performed on the multi-band image.

Object-Oriented Method
In recent years, object-oriented methods have been increasingly applied in remote sensing image classification. Unlike pixel-based methods, object-oriented ideas consider not only the spectral properties (color) of pixels, but also the spatial connections of pixels (shape, texture, size, etc.), and treat some homogeneous pixels as an object [25][26][27]. The object is used as a unit for segmentation and classification, and thus the method is efficient, and the classification results are better matched with the real land covers. As the first step of object-oriented analysis, the result of image segmentation will directly affect the final classification performance [28]. Multi-scale segmentation, a more commonly used segmentation algorithm for object-oriented methods, is a bottom-up method that combines adjacent pixels or small segmentation objects [25]. This algorithm performs object delineation based on shape and color homogeneity and realizes image segmentation based on region merging technology under the premise of ensuring minimum average heterogeneity between objects and maximum homogeneity between internal pixels of objects. There are three parameters that are crucial to the effect of multi-scale segmentation, namely, image layer weights, scale parameter, and composition of homogeneity criterion.  Touzi   TSVM_alpha_s  TSVM_alpha_s1  TSVM_alpha_s2  TSVM_alpha_s3  TSVM_tau_m  TSVM_tau_m1  TSVM_tau_m2  TSVM_phi_s2  TSVM_psi1  TSVM_psi   TSVM_tau_m3  TSVM_phi_s3  TSVM_psi2   TSVM_phi_s1  TSVM_phi_s  TSVM_psi3   An_Yang3 An_Yang3_Vol An_Yang3_Odd An_Yang3_Dbl Arii3_ANNED_Odd Arii3_ANNED_Odd 1 Decompositions are cited from [36,[38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53] When too many layers participate in segmentation, the processing speed will be very slow, and the noise in each polarimetric feature layer will also cause the blurring of segmentation results and the fragmentation of blocks. Like in this study, if all of the obtained 96 polarimetric scattering features participate in segmentation, the calculation amount will be too large, and the segmentation result will not be satisfactory. As noted, the Pauli image can represent all the information contained in a PolSAR image, and the three bands of the Pauli RGB image correspond to the physical scattering mechanism of the real land covers. It can be used to qualitatively analyze the physical significance of the land covers in the polarimetric SAR image. Therefore, the Pauli RGB image was adopted for multi-scale segmentation. Moreover, the three bands of the Pauli image correspond to three scattering mechanisms with the same importance, so equal weights were set for each band during the segmentation [54].
The scale parameter is used to determine the maximum allowed heterogeneity of the generated image object [55]. The larger the value of the scale, the larger the size of the generated image object, Remote Sens. 2020, 12, 407 7 of 17 and vice versa. If the scale parameter is too small, the segmented objects will be too fragmented to reflect the shapes of the land covers; if the scale parameter is too large, the details of the land covers will be lost [56,57]. Homogeneity is used to represent the minimum heterogeneity and consists of two parts, namely color (spectrum) and shape, the weights of which add up to 1.0. Furthermore, the shape is represented by smoothness and compactness, the weights of which also add up to 1.0. The color and shape, and the smoothness and compactness, can be regarded as the "opposite value" to each other. If the shape of the land cover in the image can indeed reflect its characteristics, the weight of the shape can be appropriately increased, while that of color can be reduced. Smoothness represents the smoothness of the edge of the segmented object, and compactness represents the tightness of the whole segmented object. In the process of segmentation, the optimal scale parameter and homogeneity criterion are usually determined by multiple trials [23]. The criterion to evaluate the segmentation result is whether the segmentation object is large enough and contains only one category.

Random Forest and Feature Set Optimization
Random forest (RF) is a machine learning model developed in recent years. The theoretical basis of the model is the decision tree; RF is obtained by combining different decision trees [30,31]. In other words, many decision trees h(X, θ k ), k = 1, . . . are generated via the randomization of variables and data, and there is no connection between each tree. The parameter set θ k is an independent random vector with the same distribution. When the independent variable X is given, each decision tree uses the voting method to produce the optimal result. When original data are entered into the random forest model, each decision tree classifies the data. The final result is the classification result which appears most frequently in all trees.
In the process of random forest model training, the importance of each variable can be calculated [32][33][34]. There are two types of importance measures: mean decrease impurity and mean decrease accuracy (MDA). The former is a calculating method of the Gini index, and the latter is a calculating method based on OOB (out-of-bag) error [58,59]. Here, the MDA method is employed. This method directly measures the effect of each feature on the accuracy of model prediction. The basic idea is to rearrange the order of a certain column of feature values and to observe how much the model accuracy is reduced. For non-important features, this method has little effect on the accuracy of the model, but for important features, it can greatly reduce the accuracy of the model. That is, when random white noise is added to a feature, observing the effect on the accuracy of the result. A small influence means that this feature is not important, otherwise it is important. The calculation of the importance of a feature f in a random forest is as follows: Step 1. For each decision tree in a random forest, use the corresponding OOB (out-of-bag) data to calculate its out-of-bag data error, and record it as errOOB1; Step 2. Randomly add noise to the feature f of all samples of the OOB data outside the bag, and calculate the error of the data outside the bag again, and record it as errOOB2; Step 3. Assuming there are N trees in the random forest, the importance for feature f can be calculated as follows: where IM is the importance of feature f . The larger the IM value of a given feature, the more important it is. In this study, the importance values of polarimetric features extracted from 20 decomposition algorithms were calculated by utilizing random forest models; these feature parameters were then sorted from high to low according to their importance values. In the process of feature set optimization, sequential forward selection (SFS) algorithm was adopted [60]. The idea is as follows: If the original feature set is F and the current feature set is X, which contains n features, for each feature f i that is not selected for X (i.e., the remaining features after X is removed from F), the overall classification accuracy is calculated after introducing the feature into feature set X, and the optimal polarimetric feature is selected according to the overall accuracy of each calculation. That is, the feature subset corresponding to the highest classification overall accuracy is selected as the optimal set. The optimal feature subset has the least number of polarimetric parameters and the highest classification accuracy. As the sequential forward selection algorithm is employed to calculate the overall classification accuracy under all feature sets, and the feature set corresponding with the highest classification accuracy is selected, it will not fall into the local optimal solution. The process of optimizing the polarimetric scattering feature set using the RF-SFS algorithm is shown in Figure 3.

Experimental Flowchart
In this research, the object-oriented RF-SFS algorithm, which is a feature set optimization and classification method combining random forest (RF) model and sequential forward selection (SFS), was proposed. The specific processes of this method are as follows: (1) Filtering and other preprocessing were applied to the original polarimetric SAR data; (2) Twenty polarimetric decompositions listed in Table 2 were utilized to decompose the filtered coherent matrix, and 93 polarimetric decomposition features were obtained. Three scattering matrix elements, S11, S12 and S22, were used as matrix features, and thus a total of 96 polarimetric scattering features could be obtained; (3) The scattering features obtained from the previous step were combined into a multi-band image, to which object-oriented multi-scale segmentation was then performed; (4) Training samples based on the segmented object as the basic unit were randomly selected; (5) Formula (7) was used to calculate the importance of all polarimetric scattering features of training samples; (6) The features were ranked according to the importance value; (7) Using the sequential forward selection algorithm to optimize the feature set, the feature subset with the least number of features and the highest classification accuracy was obtained; (8) The selected optimal polarimetric feature subset was classified based on the object-oriented random forest model; (9) The classification accuracy was calculated by using validation samples. A flowchart of the proposed methodology is given in Figure 4.

Experimental Flowchart
In this research, the object-oriented RF-SFS algorithm, which is a feature set optimization and classification method combining random forest (RF) model and sequential forward selection (SFS), was proposed. The specific processes of this method are as follows: (1) Filtering and other preprocessing were applied to the original polarimetric SAR data; (2) Twenty polarimetric decompositions listed in Table 2 were utilized to decompose the filtered coherent matrix, and 93 polarimetric decomposition features were obtained. Three scattering matrix elements, S11, S12 and S22, were used as matrix features, and thus a total of 96 polarimetric scattering features could be obtained; (3) The scattering features obtained from the previous step were combined into a multi-band image, to which object-oriented multi-scale segmentation was then performed; (4) Training samples based on the segmented object as the basic unit were randomly selected; (5) Formula (7) was used to calculate the importance of all polarimetric scattering features of training samples; (6) The features were ranked according to the importance value; (7) Using the sequential forward selection algorithm to optimize the feature set, the feature subset with the least number of features and the highest classification accuracy was obtained; (8) The selected optimal polarimetric feature subset was classified based on the object-oriented random forest model; (9) The classification accuracy was calculated by using validation samples.
A flowchart of the proposed methodology is given in Figure 4.

Importance Analysis of Polarimetric Features and Feature Set Optimization
The importance values of the 96 polarimetric scattering features were calculated using Formula (7), as listed in Table 3. The importance values were ranked in descending order, as displayed in Figure 5. As previously stated, the larger the IM value of a given feature, the more important it is. From Table 3 and Figure 5, we can see the importance value of each feature and can know which features are more important for the experimental results. However, different combinations of features will produce different results when participating in classification. Therefore, determining which polarimetric features are selected for classification to obtain the highest classification accuracy is the next problem to be solved.

Importance Analysis of Polarimetric Features and Feature Set Optimization
The importance values of the 96 polarimetric scattering features were calculated using Formula (7), as listed in Table 3. The importance values were ranked in descending order, as displayed in Figure 5. As previously stated, the larger the IM value of a given feature, the more important it is. From Table 3 and Figure 5, we can see the importance value of each feature and can know which features are more important for the experimental results. However, different combinations of features will produce different results when participating in classification. Therefore, determining which polarimetric features are selected for classification to obtain the highest classification accuracy is the next problem to be solved.
For the feature set composed of 96 polarimetric features, the RF-SFS algorithm proposed in this study was adopted to optimize the feature set according to the importance value of each feature. Assuming the initial target feature subset was empty, the specific method was to add the feature with the highest importance value to the target feature subset each time, and then the features in the target subset were performed to classification and the overall accuracy of the result was calculated, and iterate iteratively until all features were added to the target subset and the accuracy was calculated. The relationship between the classification accuracy calculated by each iteration and the number of polarimetric features is shown in Figure 6. As can be seen from the figure, for the features that ranked in the first nine in terms of importance, the classification accuracy was greatly improved for each additional feature. After the number of features increased to nine, the classification accuracy changed slowly. When the number increased to 23, the classification accuracy reached its highest value, which was 87.29%. After that, the number of features increased, and the overall accuracy did not change significantly. Even when the number of features reached 38, the classification accuracy was slightly reduced, and was about 2.5% lower than the accuracy of the 23 features participating in the classification. Furthermore, as the number of features participating in the classification increased, the computational burden also increased. This result indicates that when too many polarimetric features are involved in the classification, the computational burden will be aggravated, and the information redundancy between features will reduce the classification effect. Therefore, the classification is based on the first 23 polarimetric features in the importance ranking. This ensures the accuracy of the classification results and avoids the redundancy of information between features.  For the feature set composed of 96 polarimetric features, the RF-SFS algorithm proposed in this study was adopted to optimize the feature set according to the importance value of each feature. Assuming the initial target feature subset was empty, the specific method was to add the feature with the highest importance value to the target feature subset each time, and then the features in the target subset were performed to classification and the overall accuracy of the result was calculated, and iterate iteratively until all features were added to the target subset and the accuracy was calculated. The relationship between the classification accuracy calculated by each iteration and the number of polarimetric features is shown in Figure 6. As can be seen from the figure, for the features that ranked in the first nine in terms of importance, the classification accuracy was greatly improved for each additional feature. After the number of features increased to nine, the classification accuracy changed slowly. When the number increased to 23, the classification accuracy reached its highest value, which was 87.29%. After that, the number of features increased, and the overall accuracy did not change significantly. Even when the number of features reached 38, the classification accuracy was slightly reduced, and was about 2.5% lower than the accuracy of the 23 features participating in the classification. Furthermore, as the number of features participating in the classification increased, the computational burden also increased. This result indicates that when too many polarimetric features are involved in the classification, the computational burden will be aggravated, and the information redundancy between features will reduce the classification effect. Therefore, the classification is based on the first 23 polarimetric features in the importance ranking. This ensures the accuracy of the classification results and avoids the redundancy of information between features. From Table 3 and Figure 5, it is also easy to see that among the top 23 polarimetric features with the higher importance values, hh S , hv S , and vv S are three scattering matrix elements; Entropy_shannon, Entropy, Lambda, HAA_T11, Pedestal, and Alpha are six polarimetric features from H/α decomposition; Yamaguchi3_Dbl and Yamaguchi3_Odd are two polarimetric features from Yamaguchi3 decomposition; VanZyl3_Dbl and VanZyl3_Vol are two polarimetric features from VanZyl3 decomposition; Krogager_Kd and Krogager_Ks are two polarimetric features from Krogager decomposition. A total of 15 of the first 23 features are derived from the scattering matrix and the above four polarimetric decomposition methods. In other words, the features obtained by the aforementioned algorithms account for more than 65% of the total number of the first 23 important features, indicating that these features are more important, and these decomposition algorithms have obvious advantages over other decomposition methods in coastal wetland classification.

Classification Results
In this research, the following three methods were used to conduct comparative experiments in the study area: (a) the proposed method; (b) QUEST decision tree algorithm without artificial pruning; and (c) QUEST decision tree algorithm with artificial pruning. In method (b), the tree depth was not set manually, and the decision tree grew freely. In method (c), in order to prevent overfitting caused by the infinite growth of the decision tree, the tree depth was set to 5. The classification results of these three methods are shown in Figure 7. In addition, the user's accuracy (UA), the producer's accuracy (PA), the overall accuracy (OA), and the Kappa coefficient of the three classification results were calculated for quantitative analysis, as shown in Tables 4-6. From Table 3 and Figure 5, it is also easy to see that among the top 23 polarimetric features with the higher importance values, S hh , S hv , and S vv are three scattering matrix elements; Entropy_shannon, Entropy, Lambda, HAA_T11, Pedestal, and Alpha are six polarimetric features from H/α decomposition; Yamaguchi3_Dbl and Yamaguchi3_Odd are two polarimetric features from Yamaguchi3 decomposition; VanZyl3_Dbl and VanZyl3_Vol are two polarimetric features from VanZyl3 decomposition; Krogager_Kd and Krogager_Ks are two polarimetric features from Krogager decomposition. A total of 15 of the first 23 features are derived from the scattering matrix and the above four polarimetric decomposition methods. In other words, the features obtained by the aforementioned algorithms account for more than 65% of the total number of the first 23 important features, indicating that these features are more important, and these decomposition algorithms have obvious advantages over other decomposition methods in coastal wetland classification.

Classification Results
In this research, the following three methods were used to conduct comparative experiments in the study area: (a) the proposed method; (b) QUEST decision tree algorithm without artificial pruning; and (c) QUEST decision tree algorithm with artificial pruning. In method (b), the tree depth was not set manually, and the decision tree grew freely. In method (c), in order to prevent overfitting caused by the infinite growth of the decision tree, the tree depth was set to 5. The classification results of these three methods are shown in Figure 7. In addition, the user's accuracy (UA), the producer's accuracy (PA), the overall accuracy (OA), and the Kappa coefficient of the three classification results were calculated for quantitative analysis, as shown in Tables 4-6

Discussion
The decision tree is a tree built on the basis of decision choices. It is simple to implement and is widely used in remote sensing classification [54,61]. It summarizes the training data set, learns to form a series of rule sets, and classifies the pixels or groups of remote sensing image data according to the generated rule sets. Sometimes the tree is too "lush", that is, there are too many nodes and overfitting may occur during the tree building process. Therefore, artificial pruning needs to be performed to remove some unnecessary branches. The author's previous research has proved the effectiveness of the QUEST decision tree algorithm in the classification of coastal wetlands [23]. However, it required artificial pruning of the tree and did not provide a method for quantitatively evaluating the validity of features. Random forests are obtained by combining decision trees, and its theoretical basis is decision trees. In theory, the classification results using multiple trees are better than a single tree. This research will also prove this through comparative experiments.
By comparing the experimental results, the overall accuracy of the classification results obtained by the object-oriented RF-SFS method was improved by more than 11% and the Kappa coefficient increased by 0.14 compared with those of the object-oriented QUEST decision tree method without artificial pruning. As shown in the boxed area of the figure, when the QUEST decision tree method without artificial pruning was adopted, some sporadic plots in the alterniflora region were divided into rice paddy and fish ponds, and some Suaeda salsa areas were not properly separated. The classification accuracy for these wetland land covers was lower than that of the proposed algorithm. In addition, the Suaeda salsa in the elliptical area was not completely identified, and most of these regions were mistakenly classified as alterniflora, and some Suaeda salsa areas that were closer to the sea were misclassified into irrigated land. Relatively satisfactory classification results were obtained for the aforementioned wetland types when the proposed method was used. As shown in Table 4, the user's and producer's accuracies for Suaeda salsa were 84.86% and 80.78%, respectively. For irrigable land, the two indicators were 87.89% and 77.65%, respectively. When the QUEST decision tree method without artificial pruning was performed, as shown in Table 5, the user's accuracies of Suaeda salsa and irrigable land were 61.37% and 75.39%, respectively, which were 23.49% and 12.5%, respectively, lower than that of the proposed method. Meanwhile, the producer's accuracies for these two land types were 72.90% and 57.06%, respectively, which were significantly lower than that of the proposed method. When artificial pruning was performed on the QUEST decision tree, as shown in Figure 7c and Table 6, the user's and producer's accuracies for Suaeda salsa were 80.69% and 77.16%, respectively, and the two indicators of irrigated land were 79.00% and 67.14%, respectively. These indicators were all improved compared with those for the algorithm without artificial pruning operation, but were still lower than the accuracy obtained by the algorithm proposed in this research. Through the comparison, it was found that some fish ponds in the results for the method without artificial pruning were misclassified into sea, and some paddy fields were mistakenly divided into irrigated land, as shown in Figure 7b. The results of the cases of misclassification for these wetland land covers have been improved after artificial pruning. However, compared with the results of the proposed method, there were still small blocks that were misclassified and the accuracy was still lower, as shown in Figure 7c and Table 6. It can be seen that the object-oriented RF-SFS method not only had certain advantages in the optimization of polarimetric feature set, but also achieved high classification accuracy without artificial pruning in the modeling process.

Conclusions
A single polarimetric decomposition has difficulty in mining all the scattering information contained in a PolSAR image. The decision tree method requires artificial pruning and cannot quantitatively evaluate the validity of features. Therefore, this research proposed a feature set optimization and polarimetric SAR image classification method that integrates multiple polarimetric decompositions, random forest model, and sequence forward selection algorithm, namely, the RF-SFS algorithm for land cover classification in coastal wetlands using fully polarimetric ALOS PALSAR data. The following conclusions can be drawn through a comparative analysis of the experiment: (1) The proposed method calculated the importance of each polarimetric feature in the construction of a random forest model, and the sequence forward selection algorithm was applied to select the optimal polarimetric feature set that is suitable for Jiangsu coastal wetlands classification according to the importance value. This method provided a quantitative reference for the reasonable optimization of feature sets; (2) The importance values of features from the scattering matrix and the four decomposition algorithms, namely, H/α decomposition, Yamaguchi3 decomposition, VanZyl3 decomposition, and Krogager decomposition, were higher than other features. This indicated that these features were more important and were determined to be very supportive of land cover identification in the Jiangsu coastal wetlands; (3) Compared with the object-oriented QUEST decision tree algorithm, regardless of whether the latter has been pruned, the proposed object-oriented RF-SFS method can achieve higher classification accuracies without artificial pruning.
Overall, the findings in this study demonstrated that the proposed object-oriented RF-SFS algorithm significantly contributes to coastal wetlands classification when using fully polarimetric ALOS PALSAR data. However, as the scattering characteristics of these two kinds of land covers were exactly similar, and the resolution of the used SAR data was relatively limited, reed and alterniflora were classified into one category in this research. Future studies should be conducted that include employing other novel classification algorithms or integrating of optical and SAR data to distinguish these two kinds of land covers.