An Efficient Representation-Based Subspace Clustering Framework for Polarized Hyperspectral Images

Recently, representation-based subspace clustering algorithms for hyperspectral images (HSIs) have been developed with the assumption that pixels belonging to the same land-cover class lie in the same subspace. Polarization is regarded to be a complement to spectral information, but related research only focus on the clustering for HSIs without considering polarization, and cannot effectively process large-scale hyperspectral datasets. In this paper, we propose an efficient representation-based subspace clustering framework for polarized hyperspectral images (PHSIs). Combining with spectral information and polarized information, this framework is extensible for most existing representation-based subspace clustering algorithms. In addition, with a sampling-clustering-classification strategy which firstly clusters selected in-sample data into several classes and then matches the out-of-sample data into these classes by collaborative representation-based classification, the proposed framework significantly reduces the computational complexity of clustering algorithms for PHSIs. Some experiments were carried out to demonstrate the accuracy, efficiency and potential capabilities of the algorithms under the proposed framework.


Introduction
Polarization, which can describe surface roughness and edge properties of objects and increase the contrast between objects and background, has been demonstrated over recent decades to provide useful information for atmosphere monitoring, land surface characterizing and material classification [1][2][3][4]. In addition, hyperspectral images (HSIs) consist of high-resolution spectral correlation and rich spatial information that support land-cover classification and clustering [5][6][7][8]. As a combination of HSIs and polarization, polarized hyperspectral images (PHSIs) that provide the multidimensional information of polarization, spectral, spatial and radiant features are expected to possess great potential in object detection and clustering tasks.
There have been some studies on the clustering of Polarimetric Synthetic Aperture Radar (PolSAR) images which is the product of active detection by radars and receivers [9]. However, in the field of passive detection that relies on the natural light reflected by objects, existing studies have only focused on HSIs' clustering so far [10,11]. With the development of spectropolarimeter, for example, the channeled interference imaging spectropolarimeter (CIISP) [12], PHSIs become very convenient to be obtained. This helps adding polarization into HSIs' clustering to be possible. However, there are only few related studies on polarized hyperspectral clustering. Therefore, proposing a clustering method that considers both of polarization and spectral information is a necessary complement to hyperspectral clustering and an extension of polarization applications as well.
Recently, a robust technique known as the representation-based clustering model has achieved remarkable success in the face recognition and motion segmentation fields [13][14][15]. Based on the assumption that pixels with different spectra from one kind of target lies in the same subspace, some clustering algorithms for HSIs were therefore proposed to make full use of the spectral-spatial information [16,17]. While due to the high computational complexity of representation-based clustering model, those algorithms usually require a long computation time and large computational memory. Especially for HSIs and PHSIs, which are usually accompanied by both high dimensionality and large scale, this shortcoming limits the application of those algorithms.
In this paper, a clustering framework for a representation-based clustering model combining the polarization and spectral properties of targets is presented. This clustering framework whose architecture is as shown in Figure 1 will be scalable for most of existing representation-based clustering algorithms. In order to reduce the computational complexity, a new sampling-clustering-classification strategy is adopted in the proposed framework. This sampling-clustering-classification strategy firstly samples the points of PHSIs by applying superpixel segmentation twice and then clusters the selected in-sample data with proposed polarized hyperspectral clustering method. Finally, the out-of-sample data will be matched into the clusters of in-sample data. The experimental results demonstrate that the proposed clustering framework significantly reduced the running time while ensuring overall accuracy.
However, there are only few related studies on polarized hyperspectral clustering. Therefore, proposing a clustering method that considers both of polarization and spectral information is a necessary complement to hyperspectral clustering and an extension of polarization applications as well.
Recently, a robust technique known as the representation-based clustering model has achieved remarkable success in the face recognition and motion segmentation fields [13][14][15]. Based on the assumption that pixels with different spectra from one kind of target lies in the same subspace, some clustering algorithms for HSIs were therefore proposed to make full use of the spectral-spatial information [16,17]. While due to the high computational complexity of representation-based clustering model, those algorithms usually require a long computation time and large computational memory. Especially for HSIs and PHSIs, which are usually accompanied by both high dimensionality and large scale, this shortcoming limits the application of those algorithms.
In this paper, a clustering framework for a representation-based clustering model combining the polarization and spectral properties of targets is presented. This clustering framework whose architecture is as shown in Figure 1 will be scalable for most of existing representation-based clustering algorithms. In order to reduce the computational complexity, a new sampling-clusteringclassification strategy is adopted in the proposed framework. This sampling-clustering-classification strategy firstly samples the points of PHSIs by applying superpixel segmentation twice and then clusters the selected in-sample data with proposed polarized hyperspectral clustering method. Finally, the out-of-sample data will be matched into the clusters of in-sample data. The experimental results demonstrate that the proposed clustering framework significantly reduced the running time while ensuring overall accuracy.
The remainder of this paper is structured as follows. Section 2 reviews the general form of representation-based clustering algorithms and briefly introduces the representation-based clustering for hyperspectral images. In Section 3, we propose the efficient representation-based clustering framework for polarized hyperspectral images. The experimental results are given in Section 4. Finally, Section 5 concludes this paper.

Representation-Based Subspace Clustering
Subspace clustering usually contains two tasks: (1) projecting the data into low-dimensional subspaces and (2) calculating the cluster membership of the dataset using statistical methods or The remainder of this paper is structured as follows. Section 2 reviews the general form of representation-based clustering algorithms and briefly introduces the representation-based clustering for hyperspectral images. In Section 3, we propose the efficient representation-based clustering framework for polarized hyperspectral images. The experimental results are given in Section 4. Finally, Section 5 concludes this paper.

Representation-Based Subspace Clustering
Subspace clustering usually contains two tasks: (1) projecting the data into low-dimensional subspaces and (2) calculating the cluster membership of the dataset using statistical methods or spectral clustering. Additionally, the core of spectral clustering is the construction of a similarity graph of which Remote Sens. 2019, 11, 1513 3 of 16 each vertex denotes a data point and the edge weights represent the similarities between connected points. The pairwise distance (PD) which computes the similarity based on the distance (e.g., the Euclidean distance) between two data points, and the reconstruction coefficients (RC) which denotes each data point as a linear representation of the other points, are two widely used approaches to build a similarity graph. Many of the recent studies, for example, the sparse subspace clustering (SSC) [13], low-rank representation subspace segmentation (LRR) [14] and least squares regression subspace segmentation (LSR) [15] have shown that the RC has a superior performance. These methods collectively called representation-based subspace clustering have a similar form. They first compute a self-representation matrix C by solving where Y ∈ R d×n is the data matrix with each column being a sample vector, • x denotes a proper norm, Ω(Y, C) is the regular term or penalty term, and λ > 0 is a trade-off parameter. After solving the optimization problem (1) by Alternating Direction Method of Multipliers (ADMM) [18,19], the affinity matrix is defined as: Finally, spectral clustering is applied to obtain the clustering result [20,21]. For different representation-based subspace clustering algorithms, • x and Ω(Y, C) have different forms. Table 1 summarizes the choices of several existing representation-based models, where • 1 is the l 1 -norm, • 2 is the l 2 -norm, • * is the nuclear norm, • 2,1 is the l 2,1 -norm and • F is the Frobenius norm. Table 1. The choices of • x and Ω(Y,C) of several existing representation-based algorithms.

Algorithms
• x Ω(Y, C) Subject to

Representation-Based Subspace Clustering for HSIs
In the representation-based clustering, HSIs with a size of m × n × D (m × n is the size of images and D is the number of spectral bands) can be transformed to 2D mn × D dataset where mn is the total number of pixels. Then all the pixels can be considered as a dataset Y=[y 1 , y 2 , . . . , y mn ] in R D in which every single data point can be regard as selected from a union of k affine subspaces Compared with other high-dimensional datasets, HSIs have some special features especially spatial features and spectral features. Directly applying the representation-based clustering, SSC algorithm for example, to HSIs is usually not able to exploit the spectral and spatial information. It is easy to find that similar data points have a high probability of sharing the same subspace. Based on this fact, some constraints from the spatial and spectral perspective are added [16,17]. Thus, the representation coefficient matrix C for HSIs can be achieved by solving the following optimization problem: where W is an added weighted matrix and • means the Hadamard product of two matrices. Considering that the function of the weighted matrix is to choose the more highly correlated hyperspectral pixels to represent the target signals, the more highly correlated pixels should be given larger weight. For example, in [16] the weighted matrix W is defined in the following way: where and y j are the spectral vectors corresponding to the ith pixel and the jth pixel of a given HSIs, and ∆ is a very small constant to avoid the extreme case of the weight of two very similar neighboring pixels.

Representation-Based Clustering Framework for PHSIs
Although existing representation-based subspace clustering algorithms can reach a good performance in HSIs' clustering, they are not able to deal with two input datasets, which are the spectral information dataset and polarized information dataset, at the same time. In order to achieve the purpose of adding polarization information into HSI clustering, it is necessary to modify the subspace clustering for the characteristics of polarized spectral information. The most important part of subspace clustering algorithms is the composition of self-representation matrix C. If C can reflect both kinds of information, the clustering result will be a combination of two datasets. Considering that if two pixels are in the same cluster, they should be more likely to be assigned to the same subspace whether from the perspective of spectra or polarization, the optimization problem (1) becomes: where α, λ 1 and λ 2 are trade-off parameters, Y (s) and Y (p) are spectral dataset and polarized dataset, respectively. Furthermore, we change the optimization problem to the following form: Optimization problem (6) can be solved using ADMM [18,19]. Then, we can get two coefficient matrices C (s) and C (p) . In addition, we use the following ideas to get the final representation matrix C: (1) if two pixels are similar in Y (p) , their corresponding Y (s) should be more likely to be in the same subspace; (2) also, if two pixels are similar in Y (s) , their corresponding Y (p) should be more likely to be in the same subspace.
So, we can get the final coefficient matrix C by: Based on Euclidean distance, W p is defined as: where ∆ is a very small constant that prevents the denominator from being zero, σ p is a parameter which makes W p adjustable on different datasets. Additionally, in order to better represent the similarity of the spectral shape, W s based on the spectral angle is proposed as: In the same way as in Section 2, C is utilized to build the similarity graph. The clustering result can then be achieved by applying the spectral clustering algorithm to the similarity graph [20,21]. The modified versions of algorithms listed in Table 1

The Sampling-Clustering-Classification Strategy
Due to the large size of polarized hyperspectral images, representation-based subspace clustering programs usually require a large computer memory and long computing time, which is not in line with actual needs. For example, if the size of image is 512 * 512, the coefficient matrix C will be a 262,144 * 262,144 matrix which makes clustering a hard task. In some related works, superpixel theory is used to reduce the time complexity and space complexity of algorithms [22,23]. However, even if we reduce the size of the input data from 512 * 512 pixels to thousands of superpixels, it will still take a lot of time to solve the optimization problem. For this reason, taking the SSC model as an example, a new sampling-clustering-classifying strategy is proposed and the PS-SSC based on this strategy is called Fast Polarized Spectral Sparse Subspace Clustering (FPS-SSC).
The first step of FPS-SSC algorithm is sampling. The original polarized hyperspectral images are firstly segmented by simple linear iterative clustering (SLIC) algorithm into N 1 superpixels S = [S 1 , S 2 , . . . S N 1 ] [24]. Then, we select a small number of data points as in-sample data and apply the PS-SSC algorithm over them. It is a challenging problem to select some key points as in-sample data. Although random sampling performs well in some cases, there is still some instability [25]. Due to the spatial characteristics of HSIs, here we propose a superpixel-based method to select in-sample data and its architecture is as shown in Figure 2.
In the same way as in Section 2, C is utilized to build the similarity graph. The clustering result can then be achieved by applying the spectral clustering algorithm to the similarity graph [20,21]. The modified versions of algorithms listed in Table 1 are called Polarized Spectral Sparse Subspace Clustering (PS-SSC), Polarized Spectral Low-Rank Representation clustering (PS-LRR), and Polarized Spectral Least Squares Regression clustering (PS-LSR), respectively.

The Sampling-Clustering-Classification Strategy
Due to the large size of polarized hyperspectral images, representation-based subspace clustering programs usually require a large computer memory and long computing time, which is not in line with actual needs. For example, if the size of image is 512 * 512, the coefficient matrix C will be a 262,144 * 262,144 matrix which makes clustering a hard task. In some related works, superpixel theory is used to reduce the time complexity and space complexity of algorithms [22,23]. However, even if we reduce the size of the input data from 512 * 512 pixels to thousands of superpixels, it will still take a lot of time to solve the optimization problem. For this reason, taking the SSC model as an example, a new sampling-clustering-classifying strategy is proposed and the PS-SSC based on this strategy is called Fast Polarized Spectral Sparse Subspace Clustering (FPS-SSC).
The first step of FPS-SSC algorithm is sampling. The original polarized hyperspectral images are firstly segmented by simple linear iterative clustering (SLIC) algorithm into N 1 superpixels S=[S1,S2,…SN 1 ] [24]. Then, we select a small number of data points as in-sample data and apply the PS-SSC algorithm over them. It is a challenging problem to select some key points as in-sample data. Although random sampling performs well in some cases, there is still some instability [25]. Due to the spatial characteristics of HSIs, here we propose a superpixel-based method to select in-sample data and its architecture is as shown in Figure 2.
In addition to the superpixel segmentation mentioned above, the SLIC is applied a second time to segment the original images into another superpixel set S'=[S'1, S'2, … S'N 2 ] with a number of N 2 which is much fewer than N 1 . We can easily get the geometric center coordinates GCʹ=[(xʹ 1 ,yʹ 1 ), (xʹ 2 ,yʹ 2 ), ⋯, (xʹ N 2 ,yʹ N 2 )] of each superpixel of Sʹ. Then, the geometric centers GCʹ are projected to the superpixel set S, and the superpixels in which the projected points located are the selected insample superpixels.  In addition to the superpixel segmentation mentioned above, the SLIC is applied a second time to segment the original images into another superpixel set S =[S 1 , S 2 , · · · S N 2 ] with a number of N 2 which is much fewer than N 1 . We can easily get the geometric center coordinates GC = [(x 1 , y 1 ), (x 2 , y 2 ), · · · , (x N 2 , y N 2 ) of each superpixel of S . Then, the geometric centers GC are projected to the superpixel set S, and the superpixels in which the projected points located are the selected in-sample superpixels.
By applying superpixel segmentation twice, the in-sample data can be selected stably. Then we can get the cluster assignment of in-sample data by applying the PS-SSC over them, which is the second step of FPS-SSC.
Naturally, the last part of FPS-SSC is to obtain the cluster membership of out-of-sample data by performing classification over it. Inspired by [25], we apply collaborative representation-based classification to measure the adjacency relationship among data points.
In addition, the collaborative representation-based classification algorithm is modified to accommodate polarized hyperspectral images.
For each out-of-sample data we can get the representation matrix c i regarding all in-sample data as a dictionary: where O i are the spectral data and the polarization data of each out-of-sample data O i , I (s) and I (p) are the spectral data and the polarization data of all in-sample data, respectively.
The solution of optimization problem (10) can be written as: Once the optimal c * i is achieved, for each cluster R k , the residual between O i and the in-sample points in R k can be obtained as: where D k is a diagonal matrix based on the in-sample clustering result: and jth in − sample data belongs to kth cluster 0, otherwise .
O i is then matched into the cluster that has the minimum residual: where identity(O i ) denotes the assignment of the out-of-sample data O i , which means O i belongs to kth . The proposed efficient clustering framework for polarized hyperspectral images is also scalable to LRR, LSR and other representation-based subspace clustering models. In addition, algorithms applying this framework can be summarized as Algorithm 1.

Output:
A 2-D matrix which records the labels of the clustering result of the polarized hyperspectral images.

Experimental Results and Discussion
In this section, some experiments are carried out to demonstrate the accuracy, efficiency and potential capabilities of the proposed polarized hyperspectral clustering framework. As described above, this method can be applied to most representation-based clustering algorithms. Here we mainly use the polarized spectral version and fast polarized spectral version of SSC as examples, and take LRR and LSR as supplementary analysis. We also compare the clustering performance of these methods with a benchmark method k-means and a novel non-subspace based clustering algorithm rank-two nonnegative matrix factorization (R2NMF) [26].
This section consists of four parts. Section 1 introduces the datasets we used for experiments, and Section 2 shows the clustering results of different algorithms. The analyses of sensitivity of parameters and selection of in-sample data and the number of superpixels are in Sections 3 and 4, respectively.

Instrument and Data
In order to evaluate the performance of the proposed method, here we obtained two polarized hyperspectral datasets. Dataset A was measured with a channeled interference imaging spectropolarimeter (CIISP) proposed by Zhang et al. The CIISP can simultaneously obtain four stokes vectors (S 0 , S 1 , S 2 and S 3 ) in 128 wave bands from 480 to 960 nm of a two-dimensional space target through one-dimensional push sweep. The experimental data was collected by the authors of this paper at a distance of 2 km, and the size of images is 512 × 512. Since the CIISP instrument is designed for spaceborne, collected images at this distance are somewhat blurred. This point posed a challenge to the clustering accuracy of the algorithms.
After getting four stokes vectors, since S 3 is usually negligible under natural conditions, the degree of linear polarization (DOLP) can be obtained by: Then we can regard S 0 as ordinary hyperspectral information and the DOLP as polarization information. Figure 3a This section consists of four parts. Subsection 1 introduces the datasets we used for experiments, and subsection 2 shows the clustering results of different algorithms. The analyses of sensitivity of parameters and selection of in-sample data and the number of superpixels are in subsection 3 and subsection 4, respectively.  Based on the SLIC algorithm, three bands (RGB) of S 0 were selected to construct a false color image, and then the false color image was segmented into thousands of superpixels. Note that for dataset A, since the spectral range (480-960 nm) of the CIISP does not include 435.8 nm which means red, the false color image is just an approximation. In the case where the number of superpixels is too small, there will be a sacrifice of image spatial resolution, which leads to a reduction in clustering accuracy. While if the number is too large, the computational time required for clustering will increase dramatically. In this paper, the number was placed at about 3000. Figure 3j,l shows the superpixel segmentation results.

Clustering Results and Discussion
Dataset A includes five different kinds of targets: blue-gray painted wall; windows without curtains; windows with curtains; white painted wall and window edges; and leaves. Although the window edges are composed of metal, the surface is a layer of white paint, so it belongs to the same cluster as the white painted wall. Additonally, if there are curtains in the window, a large part of the light passing through the glass window will be reflected by the curtains, otherwise the light will enter the room and only a small part will be reflected back. Therefore, the existence of curtains will cause the difference in S 0 and DOLP intensity. In addition, dataset B includes seven kinds of targets: white wall, blue wall, white smooth ground, black smooth ground, rough ground, vegetation 1 and vegetation 2. The cluster maps of the various clustering methods of dataset A and dataset B are shown in Figures 4 and 5, respectively, and the corresponding quantitative evaluation of the clustering results is provided in Tables 2 and 3. Considering the limitations of the spatial resolution of the instruments themselves, it is difficult to evaluate the clustering performance at the boundary of the two targets and some vague areas, so these pixels which are unlabeled in Figures 4j and 5j were ignored when calculating the accuracy (AC).      When the sampling-clustering-classification strategy is adopted, the running time is greatly reduced to less than 1 min. The FPS-SSC, FPS-LRR and FPS-LSR all achieved OAs of about 80% on dataset A and 85% on dataset B, which are even much higher than the PS-SSC, PS-LRR and PS-LSR. The PS version algorithms are equivalent to taking all 3249 superpixels as in-sample data in FPS version algorithms. Usually, for sampling strategies, the more sampling points there are, the higher the accuracy should be. However, the experimental results show a different trend. This is because the representation-based models themselves can achieve high accuracies on small-scale datasets, but on large-scale datasets, they may not perform as well as on small-scale ones. In addition to reducing the  It can be obviously observed from Tables 2 and 3 that adding polarization into HSIs' clustering improved the overall accuracy (OA) significantly. For dataset A, directly applying the SSC algorithm over the S 0 dataset can distinguish between blue-gray painted wall, windows without curtains, white painted wall and window edges, leaves, but the accuracy of windows with curtains is only 1.78%. While directly applying the SSC algorithm over the DOLP dataset can achieve a high accuracy of 89.08% on windows with curtains with low accuracy on blue-gray painted wall, windows without curtains, white painted wall and window edges. Both of two methods perform well on leaves, because leaves have distinct features on S 0 and DOLP. As a combination of spectral information and polarized information, the PS-SSC algorithm has the advantages of SSC (S 0 ) and advantages of SSC (DOLP) simultaneously. The accuracy of windows with curtains is 86.15% and the accuracy of other classes remains at a good level, which leads to a higher OA. It is similar for PS-LRR and PS-LSR that reach 72.46% and 71.02% of OA, respectively.

FPS-LSR
The dataset B does not have the window with curtains which is totally different from other targets in DOLP as in dataset A. Except for the edge contour, the DOLP intensity of different targets in dataset B is not significantly different. In this case, as can be seen from Table 3, adding polarization still improves the accuracy of clustering by about 5%.
However, on the other hand, due to the complexity of the optimization process, the time cost of these three methods is too high. The PS-SSC and the PS-LRR take more than 350 s of running time, and even though there is no iteration in solving the optimization problem in the PS-LSR algorithm, it still takes about 2 min. The k-means takes much less time than these subspace-based methods, but the OAs in both two datasets are much lower. Although the R2NMF can reach a higher OA than k-means with less running time, its OA is still not as good as subspace-based methods.
When the sampling-clustering-classification strategy is adopted, the running time is greatly reduced to less than 1 min. The FPS-SSC, FPS-LRR and FPS-LSR all achieved OAs of about 80% on dataset A and 85% on dataset B, which are even much higher than the PS-SSC, PS-LRR and PS-LSR. The PS version algorithms are equivalent to taking all 3249 superpixels as in-sample data in FPS version algorithms. Usually, for sampling strategies, the more sampling points there are, the higher the accuracy should be. However, the experimental results show a different trend. This is because the representation-based models themselves can achieve high accuracies on small-scale datasets, but on large-scale datasets, they may not perform as well as on small-scale ones. In addition to reducing the computational complexity, this is another reason why we adopt a sampling strategy to avoid clustering directly on polarized hyperspectral datasets. The relationship between OA and the number of selected in-sample data is discussed in Section 4.

Sensitivity of Parameters
Obviously, the various parameters required in the algorithms have an important impact on the final clustering results. In different practical application scenarios, the parameters need to be reset. It is desired for parameters to achieve the best performance by taking the same values on different datasets, so that users can easily set the parameters when applying these algorithms on various datasets collected by different instruments and different measurement environments.
In addition to the number of in-sample data, there are five trade-off parameters (α, λ 1 , λ 2 , β 1 and β 2 ) and two weight parameters (σ s and σ p ) in the proposed framework. For subspace clustering algorithms, regularization parameter λ is decided by the following formulation [13]: where λ is the adjustment coefficient and µ is fixed for a certain dataset. The five trade-off parameters, extending (16.a) and (16.b) to them, can be replaced with parameters not related to the dataset by: By changing one of those parameters while other parameters are fixed, the sensitivity of this parameter is determined. A change curve of the overall accuracy and in-sample accuracy with various values of different parameters is drawn in Figure 6.
Taking the OA of 79% or more of dataset A and the OA of 83% or more of dataset B as acceptable ranges, we can get the following phenomena from Figure 6: (1) λ 1 performs well on dataset A and dataset B in the range of eight orders of magnitude (10-10 8 ) with the peaks at 10 3 , and λ 2 can keep a high level of OA for both datasets in the range of 50-200; (2) β 1 can reach the acceptable OA taking value from 10 2 to 10 6 for dataset A and from 10 4 to 10 8 for dataset B, and β 2 meets this requirement in eight orders of magnitude (10-10 8 ) with the peaks at 10 3 for both datasets; (3) α can achieve the highest OA for the two datasets in the range of 10-20, and the peaks appear when α is 15; (4) For σ s and σ p , it is difficult to find fixed values for different datasets like the trade-off parameters, but they have a wide range of acceptable values on each dataset.
Summarizing the above points, we can reach the conclusion that the five trade-off parameters (λ 1 , λ 2 , β 1 , β 2 and α ) and the two weight parameters (σ s and σ p ) all have low parameter sensitivity. In addition, especially for the trade-off parameters, the best values on different datasets are very close.

Selection of In-Sample Data and the Number of Superpixels
In the proposed framework, images are firstly segmented into lots of superpixels by the SLIC algorithm. Intuitively, the final OA as well as the running time is closely related to the number of superpixels. Figure 7a,b respectively indicates the OA and running time of FPS-SSC, FPS-LRR, FPS-LSR in the cases where the numbers of superpixels are 1024, 3249, 5329, 7225 and 10,404 when the number of in-sample points is 429. As the number of superpixels increases, the OA does not keep increasing but remains at a stable level when the number is greater than 3000. While, the running time grows up monotonically with the number of superpixels. Therefore, as a trade-off choice, the suggested number of superpixels for 512 × 512 images is 2000-5000.

Selection of In-Sample Data and the Number of Superpixels
In the proposed framework, images are firstly segmented into lots of superpixels by the SLIC algorithm. Intuitively, the final OA as well as the running time is closely related to the number of superpixels. Figure 7a,b respectively indicates the OA and running time of FPS-SSC, FPS-LRR, FPS-LSR in the cases where the numbers of superpixels are 1024, 3249, 5329, 7225 and 10,404 when the number of in-sample points is 429. As the number of superpixels increases, the OA does not keep increasing but remains at a stable level when the number is greater than 3000. While, the running time grows up monotonically with the number of superpixels. Therefore, as a trade-off choice, the suggested number of superpixels for 512 × 512 images is 2000-5000.
The OA also depends on the selection method of the in-sample data. Figure 7c shows the comparison between random sampling and the superpixel-based method presented in Section 3, and the red dotted line indicates the mean and standard deviation of the results of applying FPS-SSC algorithm with random sampling for 10 times. Since the random sampling method is not in a position to ensure if the selection of in-sample data is suitable, it is accompanied by large standard deviation and great uncertainty. While applying the superpixel-based method, although its OA may be lower than the best results of random sampling, it achieves more than 75% of OA under different numbers of sampling data, and the results are much higher than the mean of the results of random sampling. Another benefit of the superpixel-based method is the stability, which means we only need one experiment to get a reliable clustering result. The OA also depends on the selection method of the in-sample data. Figure 7c shows the comparison between random sampling and the superpixel-based method presented in Section 3, and the red dotted line indicates the mean and standard deviation of the results of applying FPS-SSC algorithm with random sampling for 10 times. Since the random sampling method is not in a position to ensure if the selection of in-sample data is suitable, it is accompanied by large standard deviation and great uncertainty. While applying the superpixel-based method, although its OA may be lower than the best results of random sampling, it achieves more than 75% of OA under different numbers of sampling data, and the results are much higher than the mean of the results of random sampling. Another benefit of the superpixel-based method is the stability, which means we only need one experiment to get a reliable clustering result. Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 15 In addition, the clustering result relies on the number of in-sample data. The number of insample data affects the OA in two ways: (1) as the number increases, the OA generally shows an upward trend; (2) the number also affects the clustering accuracy of in-sample data, which is another important factor in OA. Figure 7d indicates the OA of FPS-SSC, FPS-LRR, FPS-LSR in the cases where the numbers of in-sample data are 214, 309, 429, 509, 622 and 764. Additionally, Figure 7e,f shows the corresponding In-sample Accuracy (IA) and running time, respectively. From the trend point of view, when the value of A is larger, the larger the OA. However, when the number takes values 509 and 622, the OA drops. This happens because when the number equals to 509 or 622, the accuracy of insample data is significantly reduced.
Additionally, as expected, the running time increases monotonically with the number of insample data. Following consideration of the cost of time, the suggested number of in-sample data is about one tenth of the number of superpixels. Specifically, for images with a size of 512 × 512, a high OA as well as relatively short running time can be achieved when the number of in-sample data is 400-500.

Conclusion
In this paper, we added polarization into the clustering of hyperspectral images and proposed a representation-based subspace clustering framework for polarized hyperspectral images by treating each kind of land-cover class as a subspace. Furthermore, faced with the shortcoming that representation-based subspace clustering requires too much running time, an improved efficient version adopting the sampling-clustering-classification strategy was presented. Extensive experimental results clearly show that algorithms using proposed framework (FPS-SSC, FPS-LRR, FPR-LSR) can achieve a superior clustering performance with short running time and low sensitivity of parameters.
The proposed method can be further improved by the adaptive determination of the weight parameters and automatic determination of the cluster number by subspace detection, which will be in our future work. In addition, the clustering result relies on the number of in-sample data. The number of in-sample data affects the OA in two ways: (1) as the number increases, the OA generally shows an upward trend; (2) the number also affects the clustering accuracy of in-sample data, which is another important factor in OA. Figure 7d indicates the OA of FPS-SSC, FPS-LRR, FPS-LSR in the cases where the numbers of in-sample data are 214, 309, 429, 509, 622 and 764. Additionally, Figure 7e,f shows the corresponding In-sample Accuracy (IA) and running time, respectively. From the trend point of view, when the value of A is larger, the larger the OA. However, when the number takes values 509 and 622, the OA drops. This happens because when the number equals to 509 or 622, the accuracy of in-sample data is significantly reduced.
Additionally, as expected, the running time increases monotonically with the number of in-sample data. Following consideration of the cost of time, the suggested number of in-sample data is about one tenth of the number of superpixels. Specifically, for images with a size of 512 × 512, a high OA as well as relatively short running time can be achieved when the number of in-sample data is 400-500.

Conclusions
In this paper, we added polarization into the clustering of hyperspectral images and proposed a representation-based subspace clustering framework for polarized hyperspectral images by treating each kind of land-cover class as a subspace. Furthermore, faced with the shortcoming that representation-based subspace clustering requires too much running time, an improved efficient version adopting the sampling-clustering-classification strategy was presented. Extensive experimental results clearly show that algorithms using proposed framework (FPS-SSC, FPS-LRR, FPR-LSR) can achieve a superior clustering performance with short running time and low sensitivity of parameters.
The proposed method can be further improved by the adaptive determination of the weight parameters and automatic determination of the cluster number by subspace detection, which will be in our future work.
Author Contributions: All the authors made significant contributions to this work. Z.C. devised the approach and analyzed the data; C.Z. and T.Y. designed the experiments and provided the data; T.M., Z.C. and Y.W. provided advice for the preparation and revision of the work.