Scatter Matrix Based Domain Adaptation for Bi-Temporal Polarimetric SAR Images

: Time series analysis (TSA) based on multi-temporal polarimetric synthetic aperture radar (PolSAR) images can deeply mine the scattering characteristics of objects in different stages and improve the interpretation effect, or help to extract the range of surface changes. However, as far as classification is concerned, it is difficult to directly generate the classification map for a new temporal image, by the use of conventional TSA or change detection methods. Once some labeled samples exist in historical temporal images, semi-supervised domain adaptation (DA) is able to use historical label information to infer the categories of pixels in the new image, which is a potential solution to the above problem. In this paper, a novel semi-supervised DA algorithm is proposed, which inherits the merits of maximum margin criterion and principal component analysis in the DA learning scenario. Using a kernel mapping function established on the statistical distribution of PolSAR data, the proposed algorithm aims to find an optimal subspace for eliminating domain influence and keeping the key information of bi-temporal images. Experiments on both UAVSAR and Radarsat-2 multi-temporal datasets show that, superior classification results with the average accuracy of about 80% can be obtained by a simple classifier trained with historical labeled samples in the learned low- dimensional subspaces.


Introduction
Owing to its advantages of all-day, all-weather and multi-polarization, polarimetric synthetic aperture radar (PolSAR) has become an important part of earth observation system [1]. In recent years, it has been widely used in land cover classification [2][3][4], target detection [5], hazard assessment [6,7], surface parameter inversion [8,9] and other fields. Time series analysis (TSA) based on multi-temporal PolSAR images can deeply mine the backscattering characteristics of objects in different stages [10,11] and improve the interpretation effect [12][13][14], or help to extract the range of surface changes [15][16][17]. However, as far as classification is concerned, it is difficult to directly generate the classification map for a new temporal image, by the use of conventional TSA or change detection methods. The reasons are as follows: on the one hand, lots of the TSA research articles mainly focus on investigating the scattering behavior evolution of specific targets in different time frames, e.g., Mascolo et al. [11] and Marechal et al. [13] have successfully analyzed the seasonal impact in wetland extraction and identified crop phenological stages by using time series PolSAR images, respectively. However, the investigation on only a specific target of interest is not satisfactory in the general classification cases. On the other hand, the bi-temporal [17] and multi-temporal [18] change detection methods usually focus on distinguishing the changed and unchanged regions, or further divide the changed regions into several types of changes [16], but it is difficult to reveal the category attributes from these types, such as waterbody, buildings, grass and kinds of crops, etc. In addition, although a few classification-based change detection methods (e.g., post-classification comparison [19,20]) can generate the classification result of post-temporal image, the category-labeled samples in post-temporal image are still required in the training phase, in order to keep the high-quality interpretation.
Essentially, the problem we want to solve is how to infer the category labels of pixels in a new temporal image, by employing label information only in the historical temporal image. As implementing in-situ surveys is very time-consuming and laborious, and simultaneously the remotely sensed data volume has been growing explosively in recent years, it is certainly conceivable that the above setting will be a bottleneck to boost the classification timeliness in the near future. It seems to be simple and intuitive that we can directly train a classifier based on historical labeled samples and then employ it to classify new temporal samples. However, no matter the classification models proposed in the PolSAR field like multi-variate complex Gaussian [21] and complex Wishart classifier [22,23], or the classification models proposed in the machine learning community like support vector machine (SVM) [24], random forest (RF) [25] and deep neural network [26,27], the reliability of them is on the condition that training and test samples are independent and identically distributed (i.i.d.). Due to the high complexity of backscattering process between the transmitted microwave and ground surface, the differences of space-time attributes, incident angles and other factors, sometimes make the backscattering characteristics of similar and even identical objects very different in multiple PolSAR images. This phenomenon results in the non i.i.d. samples, and thus seriously hinders the historical category-label information to play a key role in the new temporal image classification.
As one of the research hotspots in the machine learning community, transfer learning (TL) aims at applying the previous accumulated knowledge in one field to another different but related field [28]. The field with fund of knowledge for a certain task is referred to as source domain (SD), and the field with scarce knowledge for another related task is referred to as target domain (TD). For instance, Segev et al. [29] have proposed two model TL methods based on the RF model, and combined them to deal with several cross-domain image recognition problems. The fundamental purpose of TL is to solve the problem of adapting pre-existing data to new tasks, e.g., mass of labeled email data can be used to train a good classifier for junk email recognition in SD, but only scarce labeled message data exists in TD. TL is able to improve junk message recognition precision by the use of information in SD, which provides a potential way to deal with the problem we care about. An early overview of the TL techniques can be seen in [30]. In accord with the TL terminology, hereinafter the historical temporal image and new temporal image are considered as the SD data and TD data respectively. The dual-domain data are drawn from the same feature space, and own different but related probability distributions. In the TL field, domain adaptation (DA) is a main branch which learns domain-invariant features by matching the distributions of dual-domain data. DA theory assumes via a specific mapping transformation, the samples in both domains could approximately obey the i.i.d. condition. In this case, any classifier trained with the SD samples can be directly re-used for the TD data, so employing the DA methods makes it very easy to take full advantage of the pre-existing classification models. In this respect, two regularization frameworks [31,32] proposed by Argyriou et al. can learn low-dimensional representation shared between SD and TD tasks, and Blitzer et al. [33] introduced structural correspondence learning to automatically induce correspondences among features from different domains. Moreover, dimensionality reduction and low-rank representation [34] have been applied to build DA models, such as maximum mean discrepancy embedding [35], transfer component analysis [36] and maximum independence domain adaptation [37,38] etc. The above feature-based algorithms are also collectively known as transfer subspace learning (TSL), and inspired by this, a series of deep learning models have been transplanted into the TL field recently [39][40][41][42][43]. Different from the coarse "fine-tuning" operation (i.e., start with a pretrained deep learning model and update its parameters for a new task), the models proposed in [39-43] involve some special designed layer modules, training strategies and so on, in order to align the joint distributions of data across domains.
In this paper, the DA theory is introduced into bi-temporal PolSAR image processing, to deal with the discrepancy of distributions between the SD and TD data. In this regard, we design a novel TSL algorithm, named scatter matrix based domain adaptation (SMbDA): firstly, it constructs two objective subfunctions to keep the category separability or unsupervised structural information in two domains, by the use of graph embedding theory and scatter matrices; later in reproducing kernel Hilbert spaces (RKHSs), the proposed algorithm employs Hilbert-Schmidt independence criterion to reduce and even remove domain influence. Furthermore, a dissimilarity measure established on the statistical distribution of PolSAR data can be used to build a specific kernel mapping function, which helps the SMbDA find a better subspace for promoting information transfer effect of bi-temporal images. Via SMbDA projection, dual-domain data approximately keeps the i.i.d. condition and valuable category information, so we can train kinds of conventional classifiers with historical labeled samples and test them with unknown samples in new temporal image.
The rest of this paper is as follows: Section 2 first gives an overview of two relevant TSL methods, and then introduces our proposed SMbDA in detail. The above methods are comparative analyzed together using UAVSAR and Radarsat-2 multi-temporal datasets. All of the experimental results and a brief discussion are respectively in Section 3 and 4. Finally, Section 5 summarizes the main content and contributions, and our future work is also presented in this part.

Relevant Works
denotes the D-dimensional feature vector of a sample, the total sample set and the TD set consists of two parts, its feature space and the marginal probability distribution of its dataset ( ) P X .
In this paper, our focus needs to be on the marginal probability distributions of S X and T X , because time series data are drawn from the same feature space. As TSL assumes that SD and TD data would have similar low-dimensional feature structure, the discrepancy of data distributions between two domains can be reduced via mapping original data X into a new feature space, that is, the generated In this subsection, two TSL algorithms are briefly reviewed and give us a few patterns to transfer information across different domains.

Transfer Component Analysis
Transfer component analysis (TCA) [36] is a well-known unsupervised TSL method proposed by Pan et al. in 2011. It mainly utilizes the dual-domain unlabeled data to achieve the goal of DA. In terms of image classification, the preferable adaptation effect is matching conditional probabilities, . S Y and T Y are respectively the generated SD and TD data using ( ) f  .
However, absence of T C results in the difficulty of estimating the above conditional probabilities. An alternative approach is adopted by TCA. This method try to learn f by meeting the following two conditions, and Pan et al. believe that such a f can make S Y and T Y satisfy ( | ) ( | )  Shorten the distribution distance between ( ) S P Y and ( ) T P Y as much as possible  Preserve the valuable information of original data S X and T X after the transformation f For the first condition, TCA applies maximum mean discrepancy (MMD) to estimate the discrepancy of different marginal probability distributions. As a nonparametric estimation method, MMD simply calculates the distance between SD and TD sample centers in a RKHS, and does not require intermediate density estimate. For the second condition, TCA chooses to preserve data variance, and thus the principal component analysis (PCA) process is performed on dual-domain Gram kernel matrix. In addition, a regularization term used for controlling the model complexity and avoiding rank deficiency is also taken into account. In conclusion, the overall objective of TCA is minimizing both the MMD value between ( ) S P Y and ( ) T P Y , and the regularization term, with the constraint of preserving data variance.
TCA utilizes unsupervised information but ignores category labels. However, although the TD label set is scarce, it is easy to acquire the SD label set S C in many cases. Once the label information in S C is considered, a semi-supervised extension known as semi-supervised transfer component analysis (SSTCA) can be built on TCA. Besides distribution matching like the aforementioned TCA, another two conditions are also investigated in the SSTCA model:  Reduce the empirical error on the SD labeled data as much as possible  Preserve the local structure information of original data S X and T X after the transformation f For the first condition, SSTCA applies Hilbert-Schmidt independence criterion (HSIC) to estimate the dependence between samples and the corresponding labels. Increase of the dependence is roughly equivalent to the reduction of empirical error. Similar to MMD, HSIC is a nonparametric criterion [44]. Later a detailed description about this criterion is given in Section 2.3. For the second condition, reference to manifold learning theory [45], the locality preserving projection [46] process is performed on dual-domain Gram kernel matrix. Comparatively speaking, SSTCA is much more complicated and usually performs better than TCA.

Maximum Independence Domain Adaptation
As a criterion for estimating the dependence between two sets, HSIC can also be used to measure the independence between data and the corresponding domain. After a DA transformation, intuitively the more independent the data is, the better the information transfer effect.
So the domain feature set is . The feature augmentation operation is able to increase the initial input dimension before DA, to the benefit of better searching transformation approach. Using the above two kernel matrices, the independence (or in fact, the dependence) between dual-domain data and domain features is evaluated by HSIC.
On the other hand, PCA process is also performed on Gram kernel matrix to preserve data variance.
In conclusion, the objective of MIDA is simultaneously reducing domain influence (minimizing the HSIC criterion) and preserving variance (maximizing the trace of data covariance matrix).
Considering SD label information, the semi-supervised method named semi-supervised maximum independence domain adaptation (SMIDA) is built on MIDA. The idea is similar to SSTCA, videlicet, HSIC is applied again to estimate the dependence between samples and their category labels. As a result, in addition to reducing domain influence and preserving data variance, SMIDA needs to reduce the empirical error on SD labeled data.

PolSAR Data Description
In general, a PolSAR sensor alternately transmits and receives the horizontally polarized and vertically polarized electromagnetic waves. In each resolution cell, PolSAR data is represented as a 2 2 Sinclair matrix in brief, where all the items in Sinclair matrix are complex backscattering coefficients, and the symbol "H" indicates horizontal polarization, "V" indicates vertical polarization. Obviously, the matrix S contains abundant scattering information in different polarization state combinations, which is related to the sizes, orientations and dielectric properties of observed targets in the resolution cell. The reciprocity principle can be satisfied in most cases, and therefore the Sinclair matrix can be equivalently vectorized as a 3  1 complex Lexicographic vector  . The superscript "T" below represents transpose operation.
Because distributed targets vary with time or space, and always show stochastic behaviour in SAR images, the second-order statistics of Lexicographic vector is more suitable for describing these targets than the vector itself. In practice, the covariance matrix of  is adopted more often: The 3 3 Hermitian matrix C is known as polarimetric covariance matrix. Here the superscript "H" in equation (4) denotes conjugate transpose operation, and the angle brackets denote ensemble average operation. There are totally three real-valued diagonal elements and six complex-valued offdiagonal elements in this matrix, but only nine real-valued variables are mutually independent. As vector form widely serves as the input in DA and classification algorithms, the 9 1 vector consisting of the nine independent real-valued variables will be used as the feature descriptor of PolSAR targets and be input into several DA and classification models in Section 3. It is worth mentioning that, an alternative input form is the magnitudes and phase angles of elements in C , however we have not observed stable and better experimental results. For simplicity, we skip the relevant content in the following part.

Scatter Matrix Based Domain Adaptation
Two DA algorithms and two semi-supervised extensions have already introduced before. It is easy to see that, TSL takes both inter-domain information influence and intra-domain information preservation into account in the DA process. SD data is able to guide TD classification, only if to some extent, the data consistency across domains and key information integrity in each domain can be guaranteed. Given historical labeled samples, our emphasis in this paper is on the post-temporal supervised classification, so the category information preservation of SD data is pivotal. However, as the semi-supervised TSL extensions, SSTCA and SMIDA both put unsupervised structural information first, and put empirical classification error reduction in second place.
Start from category information preservation, a novel semi-supervised TSL algorithm named scatter matrix based domain adaptation (SMbDA) is proposed in this subsection. During the process of eliminating domain influence, this algorithm gives priority to keeping category separability in SD, and later preserves structural information in both domains. Different from the previous TSL methods, the SMbDA prefers to investigate category distinction and thus benefits the subsequent TD classification intuitively. The objective function F of SMbDA consists of three parts: Where S F , U F and DA F are respectively the supervised information preservation term, unsupervised information preservation term, and domain adaptation term.  and  are trade-off hyperparameters and both of them need to be nonnegative numbers. Y is the generated dataset via DA processing. As the feature descriptor of PolSAR targets is 9 1 vector, the feature dimension d of samples in Y is less than 9.
The integrated DA effect can be evaluated by (5). To promote nonlinear mapping ability, our SMbDA employs the similar kernel trick like TCA and MIDA. First, let a mapping function maps X into an extremely high and even infinite dimensional RKHS, leading to the implicit dataset Equation (6) shows that, the Gram kernel matrix includes four block matrices. The diagonal ones are conventional kernel matrices built on single domain and have been widely used in kernel-based machine learning models [24]. The off-diagonal ones are cross-domain kernel matrix, and In the next step, the linear dimensionality reduction is performed on  , or more specifically, a , and thus . Obviously, it does not matter that the explicit form of function is undefined, as Y is just related to the Gram kernel matrix and projection matrix. Once the inner product operation, also known as kernel mapping function, is selected, the matrix G K is determinate. As a consequence, Y changes only when U changes, and hence (5) can be rewritten as: The optimal U can be obtained by maximizing (7), and later Y is generated based on . U When an out-of-sample x comes, the inner product of ( ) x  and each ( ) i x  in  should be calculated at first, and later the corresponding projection vector of x would be gotten, using the product result of the previous step and projection matrix . U In order to avoid trivial solutions, some specific constraints need to be added, e.g., orthogonal From the above, the objective of SMbDA is (8). The remainder of this subsection will describe the main components S F , U F and DA F in detail.

Supervised Information Preservation
The difficulty level of object identification depends on category separability, that is, decided by both inter-category scatter and intra-category scatter. Linear discriminant analysis (LDA) [47] maximizes the former and minimizes the latter using a trace ratio operation. Instead of trace ratio, maximum margin criterion (MMC) [48] is inspired by the well-known SVM classifier and adopts a trace difference operation to achieve the similar goal. In comparison to LDA, MMC avoids the rank defect problem and therefore improves the robustness of solutions. Here we introduce and further generalize this criterion to the DA scope.
Consistent with the previous works, the SD sample set S X , SD label set S C and TD sample set T X are given, but there is no TD label set in our semi-supervised setting. Denote the separability between the i-th and j-th categories as ij J , then the total category separability J is a weighted sum: i P and j P are the prior probabilities of the i-th and j-th categories. Because the prior distributions of categories are all unknown, these probabilities are assumed to be equivalent, so 1 / i j c P P N   . The factor "1/2" in equation (9) is for balancing the total separability, as separability is symmetric, e.g., .
ij ji J J  J is an indicator to judge whether the labeled samples are easy to classify. In other words, it is able to evaluate the effectiveness of supervised information in a certain feature space.
The separability ij J needs to comprehensively investigate inter-and intra-category dispersions.
As for the generated SD sample set The square of distance can be expanded to, Tr y y y y Tr y y y y y y y y y y y y y y y y where S y is the sample center of S Y , hence Using (13) and (14), the first term in (12) can be derived as, which is basically consistent with the total inter-category scatter in LDA. Based on graph embedding [49], we can directly use the matrix description On the other hand, the second term in (12) can be expanded to equation (16), which is basically consistent with the total intra-category scatter in LDA. Similar to (15), the matrix description Taken (15) and (16) into account, the total category separability is, The capacity of supervised information preservation of S Y can be evaluated by . J Besides, the connection between Y and J should be built in the DA learning scenario. We put Y into (17), and accordingly adopt two generalized matricesˆB S andˆW S instead of B S and . W S Finally, we obtain the first subfunction S F of SMbDA based on MMC.

 
and, O represents a matrix with all zero elements. Compared with (17), the scaling constant1 / c N is omitted in (18). It is easy to see that, J and S F are equivalent, except the difference of input modes.

Unsupervised Information Preservation
Learning the projection matrixU by only preserving category separability is not enough. If the fundamental structural information of dual-domain samples is distorted after projection, the difficulty of subsequent classification is inevitable. However in most cases, only SD samples are labeled, so unsupervised information preservation is also necessary. Considering the high simplicity and practicability of PCA, the proposed SMbDA aims at maximizing data variance. As

Domain Influence Reduction
Except investigating supervised and unsupervised information of dual-domain data, domain influence ought to be reduced. As previously mentioned in Section 1, one of the most important goals in the DA field is to approximately hold the i.i.d. condition. That is to say, via the projection in RKHSs, it should look like that the SD and TD data are drawn from the same distribution. But this requirement is always hindered by inter-domain variable factors. Therefore, we consider to achieve this goal by virtue of reducing the dependency between data and domains. If the projected data Y are independent of the relevant domains, the domain which any sample in Y belongs to cannot be distinguished, and thus in the specific feature space, the inter-domain discrepancy is diminished.
As HSIC is a simple and nonparametric approach to estimate the dependency between two sets, we employ it to measure the dependency between Y and domain features. Similar to MIDA, domain features are defined as the one-hot encoding form [37] and are used to describe the background information of samples. When there are only one SD and one TD, the domain feature i d of a sample i x is shown in equation (1), and the domain feature set , and using the linear kernel function, the kernel matrix D advance, we can also omit the centering matrix H and the scaling factor in (22). As a consequence, the last subfunction DA F of SMbDA can be written as: From all the above, our SMbDA aims at maximizing the category separability, data variance, and simultaneously minimizing domain dependence by a linear projection in the RKHS. Combining three equations (18), (21) and (23)

( )
Solving (24) is equivalent to find the eigenvectors ofˆ( The eigenvectors corresponding to the d largest eigenvalues are the column vectors in . U

Wishart-Based Radial Basis Function
The selection of kernel mapping function is of great importance on the algorithm performance. Gaussian radial basis function (RBF) is a widely-used kernel function in image processing, which is defined as, where a , b are two arbitrary real-valued vectors.  is a smoothing parameter and should be a positive number. This function is the exponent of negative weighted square of the distance between feature vectors. It is evident that, a suitable distance measure is to the benefit of promoting the potential of RBF. In our previous work [1], a Wishart distribution-derived dissimilarity measure has been used to build a simple classification model, which achieves better experimental results than the classical Wishart classifier [22] and several mainstream models. We believe that, this measure is helpful to build a new RBF that is more suitable for PolSAR data. This dissimilarity measure is defined as, The symmetry property of dm indicates that, W RBF is a positive semi-definite function, so W RBF meets Mercer kernel theorem. As dm is derived from Wishart distribution, W RBF is named as Wishart-based RBF hereinafter. In general, the SMbDA algorithm is easy to implement and can be summarized as follows. Algorithm 1. SMbDA Input: SD and TD sample sets S X , T X , and SD label set S C Output: projection matrixU Step 1. Define domain feature of each sample based on (1) and form domain feature matrix D Step 2. Construct Gram kernel matrix G K based on (6) (Wishart-based RBF is recommended) Step 3. Normalize G K as Step 4. Construct two scatter-related matricesˆB S andˆW S based on (19) and (20) Step 5. Calculate the kernel matrix D K of domain features, Step 6. Eigen decompose the matrixˆ( Step 7. Select the d leading eigenvectors to construct the projection matrixU

Relationship with Other Methods
Leaving the difference of kernel mapping functions alone, the proposed SMbDA algorithm is closely related to a series of dimensionality reduction and TSL methods:  If the SD and TD data are regarded as a whole without considering inter-domain discrepancy, the Gram kernel matrix G K degrades into traditional kernel matrix, and accordingly the unsupervised information preservation term degrades into the objective function of standard PCA in kernel spaces. Then if we set 0   , SMbDA is the same as kernel PCA.
 If we only pay attention to SD samples, SMbDA is further simplified down to a kernel-based combination of MMC and PCA, which can be seen as a semi-supervised dimensionality reduction algorithm. We use two core matrices B S and W S to capture scatter information and preserve category separability. Originally, the two matrices were used in LDA and kernel LDA. In this paper, they are generalized and reused for dual-domain kernel matrix. In source domain, our SMbDA is similar to the idea in [51], in which a combination of local LDA and PCA was discussed.  As the inter-and intra-category scatter matrices have been reformed in [49], the proposed algorithm has an implicit relationship with the graph embedding framework. From this perspective, the termˆD can be regarded as a special Laplacian matrix.
 SMbDA, TCA and MIDA have some points in common. All of the three algorithms use the covariance matrix of data to keep unsupervised information, and try to reduce the negative cross-domain influence. However, TCA, MIDA and their semi-supervised extensions primarily consider unsupervised information. On the contrary, SMbDA primarily makes full use of label information to keep category separability, which makes it more benefit for classification in theory. Besides, SMbDA avoids inversion operation when solving projection matrix, and thus is more efficient than TCA and SSTCA.

Experimental Datasets and Parameter Settings
As in practice, TD samples are usually unlabeled, it is very difficult to transfer valuable information from SD to TD. Therefore, two groups of experiments have been conducted in this section. The first group is based on a multi-temporal PolSAR dataset obtained by the airborne UAVSAR system in Winnipeg, Canada. The time intervals are only several days, so the difficulty of DA and classification is relatively low. The second group is based on another multi-temporal PolSAR dataset obtained by the Radarsat-2 satellite in Erguna, China. The time intervals are as long as several months and the spatial distribution of objects was very different in time frames, so this group is much closer to reality. In both groups, discriminant analysis classifier (DAC) was selected as our classification model, and we investigated the classification performances when applying different DA algorithms, including TCA, SSTCA, MIDA, SMIDA and SMbDA. On one hand, Gaussian RBF was used for all the five algorithms to compare the effectiveness of them; on the other hand, Wishart-based RBF was also used for SMbDA to compare the effectiveness of different RBFs. The SMbDA model with Wishart-based RBF is called as WSMbDA for short. Our goal is to use the classifier trained with historical labeled samples to classify the samples in a new temporal image. So in the training phase, the samples in S X and T X have been picked randomly, and only the labels of SD samples were given to the DAC and DA methods. In the test phase, lots of TD out-of-samples were projected into the learned subspaces and then classified. The sample selection, domain adaptation and classification steps have been repeated 10 times to obtain reliable performances. Before the training phase, we picked some labeled TD samples to compare the classification results under different hyperparameter values and projection dimensions, and finally decided the optimal parameters for subsequent experiments. In our experiments, the search strategy refers to [36,37]. For the proposed SMbDA and WSMbDA, we first fixed 1, 1 4 e      and searched for the best  value in 6 . [10 ,10]  Afterwards, we fixed  and searched for the best  value in [0,10]. Finally, both  and  were fixed and we searched for the best  value in [0,10]. The big difference between the initial values of  and  is because the assumption that in a cross-domain classification task, supervised information is more likely to be helpful than unsupervised information. As the ranges of hyperparameters were continuous, logarithmic sampling was implemented. The same strategies were applied to TCA, SSTCA, MIDA and SMIDA, in order to fairly evaluate and compare these algorithms.

Experiments on UAVSAR Dataset
The Because the time intervals are very short, the categories of objects in the three images have not changed, so the correlation between any two images is very strong, and the difficulty of DA is relatively low. Although it seems that DA has no practical application significance for this dataset, it can still help us test DA effects under ideal conditions. Moreover, since most of the crops in this area were in their growing stage, there are indeed some backscattering differences of the same category in different times. The classification maps are generated by the use of different methods, as shown in figures 2-7.  By comparing the classification maps with ground truth map, it is found that the overall DA effects between Domain A and Domain B are significantly better than that between Domain A and Domain C, Domain B and Domain C. The reason is that, the data distribution discrepancy between Domain A and Domain B is very small (time interval is just 3 days), but the discrepancies between Domain A and Domain C, Domain B and Domain C are larger (time intervals are 12 days and 9 days). Furthermore, by observing the DA effect between Domain A and Domain C, and the DA effect between Domain B and Domain C, we find the latter is better than the former, because the imaging time between Domain B and Domain C is closer. This point also proves that inter-domain correlation can directly affect the difficulty level of DA, which is consistent with our intuition.

Experiments on Radarsat-2 Dataset
The  figure 11(a). There are five main types of ground objects in the imaging area: wheat, rapeseed, birches, shrubs and waterbody. As two kinds of crops, wheat and rapeseed both own different scattering characteristics in different growth stages. Birches and shrubs are also able to vary with seasons, local incidence angles and other factors. All these conditions have a bad impact on cross-domain learning. Moreover, the spatial distribution of crops has also changed from 2012 to 2013. Therefore, this dataset can be used as a typical verification data to test the performance of DA algorithms. In this subsection, we have conducted three challenging tasks: A->B, A->C, A->D, i.e., only use the labeled samples acquired on 2012 to classify the unlabeled samples acquired on 2013. 200 samples per category were randomly selected in the training phase. The classification maps are generated by the use of different methods, as shown in figures 9-11. Obviously, the three tasks are much more difficult than those in the previous subsection. Because the time intervals are less than two weeks in the UAVSAR dataset, the distribution discrepancies of this dataset are not very large. Even if we skip the DA step and directly use DAC to classify TD samples, the classification maps shown in figures 2-7(b) are still acceptable in part. But without any DA processing, the classification maps generated directly by DAC in figures 9-11(b) are almost completely wrong. Fortunately, all the DA methods take inter-domain distribution into account. As a result, the classification results shown in figures 9-11 (c)-(h) have significant improvements in different degrees. Besides, the performance of direct classification in the last task is significantly better than the first two, and a similar tendency happens in other methods as well. The reason is that, the backscattering information of same categories can be relatively similar in adjacent months. The imaging times in the last task satisfies this condition (Domain A: September, Domain D: August).

Discussion
In this part, two quantitative indices, overall accuracy (OA) and Kappa coefficient (Kappa), are selected to evaluate the performances of different DA algorithms. The precision evaluation results of two experiments are listed in Tables 1 and 2. The results given in Table 1 are consistent with our findings in Section 3, and it is obvious that four conclusions can be drawn from the WSMbDA can further improve the performances of SMbDA in most cases, and has obtained the best results in general, which verifies the superiority of Wishart-based RBF. The well-designed DA model SMbDA, coupled with the suitable kernel mapping function, is able to achieve the average OA value of more than 80% and the average Kappa value of more than 0.75.  Table 2. An interesting phenomenon is that, although the two evaluation indices of each method (except DAC) in the last task are very high, a large proportion of wheat was mistakenly classified into the rapeseed category in figure 11(c)-(g), resulting in the disappearance of blue areas in these classification maps. In contrast, although the overall performances of most of DA methods are poor in the first two tasks, the blue areas of wheat still exist in figures 9 and 10. This is because the two kinds of crops are both mature in the third task and thus the volume scattering components of them are both large, which causes the confusion between wheat and rapeseed. In addition, the wavelength of C-band microwave used by Radarsat-2 is short and accordingly the penetrability is weak. This point further aggravates the above dilemma. As a consequence, most of DA methods failed to preserve the backscattering differences between the two crops. This situation would change with longer wavelength. As seen from Table 2, the direct DAC classification results are very bad, the OA values are only 11%-20% and the Kappa values are around zero. However, WSMbDA always performs well. Especially in the task Domain A -> Domain D, WSMbDA is still able to accurately distinguish the main categories.

Conclusions
With the rapid growth of remotely sensed data volume, the inefficient in-situ surveys will limit classification timeliness in the near future. Domain adaptation helps to adapt pre-existing data to new tasks, which provides a potential way to deal with this problem. In this paper, a novel semisupervised domain adaptation algorithm, named scatter matrix based domain adaptation, has been proposed to transfer and share valuable information between bi-temporal PolSAR images. Different from the previous methods, the proposed algorithm pays more attention to supervised information preservation and hence is very helpful for supervised classification task. Empirical results have demonstrated that after applying it, the superior post-temporal classification maps can be obtained by a simple classifier trained with labeled samples in pre-temporal PolSAR imagery. Moreover, the performance of this algorithm can be further improved by the use of Wishart-based kernel mapping function. Apart from time series image processing, we believe the proposed algorithm also has the potential to adapt cross-regional PolSAR images. However, how to determine the hyperparameter values is still an opening issue. We plan to design an adaptive hyperparameter selection strategy in the future. Besides, this paper mainly focuses on the situation of single pre-existing source domain. We would like to generalize the proposed algorithm and make it suitable for multiple domains.