Quantifying Long-Term Land Surface and Root Zone Soil Moisture over Tibetan Plateau

It is crucial to monitor the dynamics of soil moisture over the Tibetan Plateau, while considering its important role in understanding the land-atmosphere interactions and their influences on climate systems (e.g., Eastern Asian Summer Monsoon). However, it is very challenging to have both the surface and root zone soil moisture (SSM and RZSM) over this area, especially the study of feedbacks between soil moisture and climate systems requires long-term (e.g., decadal) datasets. In this study, the SSM data from different sources (satellites, land data assimilation, and in-situ measurements) were blended while using triple collocation and least squares method with the constraint of in-situ data climatology. A depth scaling was performed based on the blended SSM product, using Cumulative Distribution Function (CDF) matching approach and simulation with Soil Moisture Analytical Relationship (SMAR) model, to estimate the RZSM. The final product is a set of long-term (~10 yr) consistent SSM and RZSM product. The inter-comparison with other existing SSM and RZSM products demonstrates the credibility of the data blending procedure used in this study and the reliability of the CDF matching method and SMAR model in deriving the RZSM.


Introduction
Soil moisture is a crucial land state variable that plays a critical role in energy partitioning and land-atmosphere interactions [1][2][3]. Comprehensions of soil moisture dynamics and trends can facilitate the identification of the interaction and feedbacks between the hydrological cycle and climate system [1,2,4,5]. In this context, the Tibetan Plateau represents as a natural laboratory, because it is an essential water tower in Asia and has significant effects on the Asian monsoon system, the atmospheric circulation, and the climate patterns [6]. Quantifying land surface soil moisture (SSM) and root zone soil moisture (RZSM) is indispensable for investigating the water and energy balance in the land-atmosphere system and the corresponding trends of climate change over the Tibetan Plateau. The observable climate change at the plateau scale can reshape the local environment and influence the hydrological cycles [7,8], but the systematic knowledge of the SSM and RZSM over the plateau are still limited [1,6,9].
There are three primary sources for the retrieval of soil moisture estimates, including in-situ measurements, satellite observations, and model simulations [10]. Several in-situ soil moisture (include AMSRE, AMSR2, SMOS, SMAP, and ESA-CCI products). Each dataset is available over different time spans, as displayed in Table 1.  The Tibetan Plateau Observatory (Tibet-Obs) [1] includes SSM and RZSM measurements in Naqu semi-arid network (five stations), Maqu sub-humid network (twenty stations), and Ali-Shiquanhe arid network (twenty stations), which are using ECH2O probes to measure the dielectric permittivity and obtain volumetric soil moisture content with calibration equations. The averaged SSM and RZSM data of each network were used to produce the in-situ data reference product while using the climate zone classification based on FAO aridity index map. As a result, the averaged in-situ measured SSM and RZSM values of Naqu network are used to indicate SSM and RZSM in the semi-arid regions in FAO Aridity Index map, and Maqu network corresponds to the sub-humid area, and Ali and Shiquanhe both correspond to the arid area [2]. Figure 1 presents the location of Tibetan Plateau and the distribution of Tibet-Obs networks.

Satellite Datasets
All of the satellite data and reanalysis data in this research were resampled to a resolution of 0.25 • , which is often adopted by land surface models. This is a trade-off between the higher resolution scatterometer data and the generally coarse passive microwave observations. The nearest-neighbor interpolation was used in the resampling step [34].
The VUA-NASA SSM product (AMSR-E) was derived from the Advanced Microwave Scanning Radiometer-Earth Observing System [15,35] while using the Land Parameter Retrieval Model. The SSM was derived from a forward radiative transfer model using a nonlinear iterative optimisation method and the microwave polarisation difference index of the brightness temperature. The SSM products that were used in this research were derived from C-band, as it has less signal attenuation from atmosphere and vegetation and less radio-frequency interference than the X-band [36,37].
The AMSR-2 SSM product is derived from the Advanced Microwave Scanning Radiometer 2 loaded on the Water GCOM-W1 satellite that was launched by the Japan Aerospace Exploration Agency [38]. The AMSR-2 is a follow-up of AMSR-E, since 3 July 2012 to provide a long-term satellite observation [39]. The data retrieval algorithm is similar to AMSR-E. Only the descending mode (01:30 a.m. local time) AMSR-E and AMSR-2 products were used while considering the limited uncertainties that were caused by temperature variations during night-time.
The SMOS product that was used in this research was the Level 3 descending SSM product derived by France National Centre for Space Studies (CNES). As one of the Earth Explorer Opportunity Missions from the European Space Agency (ESA), the SMOS satellite was launched with a 1.4 GHz L-Band radiometer on 2-November-2009 [40]. The L-band microwave emission is obtained from a zero-order radiative transfer equation while using the L-MEB biosphere model [41]. The multiangular observation helped SMOS to simultaneously obtain the SSM and ancillary information [41,42].
The SMAP product used in this research is the SMAP L3 Radiometer Global Daily data of descending orbit retrieved from L-band (1.41 GHz) brightness temperature of the passive microwave radiometer via the National Snow and Ice Data Center. NASA launched the SMAP satellite on 31-January-2015, and the observation target is a volumetric accuracy of 0.04 m 3 ·m −3 in the top layer of the land surface every two to three days. The tau-omega model was operated after the open water area was corrected [43].
The ESA Climate Change Initiative (ESA-CCI) [44][45][46] ACTIVE products are merged by two of the SSM products that were retrieved from the Advanced Scatterometer (ASCAT), which loaded on the Meteorological Operational (METOP-A and METOP-B) satellites [16], using the change detection method [16,23]. The original SSM products are provided as saturation degrees (0%-100%), and they were used to multiply porosity values that were provided by ESA-CCI [47] over the Tibetan Plateau to yield the volumetric soil moisture content.

Reanalysis Dataset
ERA-Interim SSM and RZSM product (ERA-Interim) is a part of the Land Data Assimilation System that was produced by the European Centre for Medium-Range Weather Forecasts [17]. The volumetric soil moisture content was produced at four layers, and the daily average soil moisture of the first layer (0-7 cm) is the one to be used in this research as a reference SSM dataset in climatology scaling before data blending [33]. Additionally, the daily average value of the root zone layer is used for comparison with the RZSM products generated in this research.
NASA and the National Oceanic and Atmospheric Administration (NOAA) developed the Global Land Data Assimilation System (GLDAS) [19]. It applies the land surface models with meteorological observations as the forcing. The data assimilation approaches are used to adjust the model states. The GLDAS daily SSM and RZSM data were compared with the SSM and RZSM products in this research.

Methods Description (a) Cumulative Distribution Function (CDF) Matching
The CDF matching approach has been widely used for removing the systematic differences between two series, such as bias reduction in satellite-observed SSM [31][32][33]48]. The method can also be used to transfer the data of different areas [25] and upscale the point data measurements [26]. To implement CDF matching, the first step is ranking the reference, and the to-be scaled data. Second, calculate the differences between the corresponding data of two ranked datasets and then perform linear regression analysis segment by segment on the CDF curves of the calculated differences and the to-be scaled data. Lastly, use the CDF matching parameters to scale the to-be scaled data for each segment [49]. The scaled data would show a similar pattern with the CDF curve of the reference data, which indicates that the systematic difference between the original observation and the reference data has been eliminated. It is worth noting that, among four conventional scaling methods (linear regression, linear rescaling, MIN/MAX correction, and CDF matching), the CDF matching shows the best performance [48], and it only requires one year data to calibrate [33].
The objective of least squares method is determining optimal weights for merging two or three independent datasets while using a weighted average method. It is one of the most widely used data assimilation methods, which has been used in numerous studies since it was shaped into the current form. It was used to blend remotely sensed and model-simulated SSM products with or without in-situ data constraint [50].
When the target product is merged as a linear combination of single products, the equation of data merging can be expressed as: where ω a , ω b , ω c are the relative weights of data sets a, b, c, and SM m is the target merged product. The merged product is unbiased optimal if the summation of optimal weights is one. It is a constraint of the solution to the estimation error variance minimization problem. Accordingly, the solution to minimize the error variance of SM m relate to weights ω a , ω b , ω c could be calculated from relative error variance σ 2 a , σ 2 b , σ 2 c . The to be minimised error variance of SM m can be expressed as Assume ∂σ 2 /∂ω 2 a = 0 and ∂σ 2 /∂ω 2 c = 0, the equations to determine the relative weights by using relative errors are presented below: The method can also work in two datasets situation. The equations are presented below: Triple collocation was used to determine the relative errors of each input datasets in this research. Triple collocation is an error estimation method that can be used to estimate the random error variances and systematic biases in different datasets without reliable reference data sets. It improved the accuracy of calibration or validation when compared with the dual comparisons that were widely used before. To implement the triple collocation method, three independent datasets should be used jointly to determine the relative errors [51]. The error variances of each dataset can be presented as: The SMAR model is a physically based infiltration model with two layers, which aims to estimate the RZSM (the second soil layer) from the SSM (the first soil layer) time series.
is the relative saturation of the second soil layer, S w2 [-] is the wilting point of the second soil layer, and y t j [-] represents the fraction of soil water infiltrating from the top layer to the lower layer. Coefficients a and b represent: is the soil water loss (evapotranspiration and percolation) coefficient, n i [-] represents the porosity of i th layer of soil, and Z ri [L] is the soil depth of i th layer.
As it is an analytical method, the parameters of the SMAR model were mathematically derived under some assumptions. First, when compared with infiltration, capillary rise and lateral flow are negligible in water mass exchange between two layers. Second, the water exchange happens immediately and ends within one day when soil water in the first layer exceeds field capacity, with an infinite permeability. Third, the soil water loss of the second layer decreases linearly from a relatively humid (does not include the real humid condition with a significant non-linear water loss function) condition to the wilting point [24,29,52].

Processing Procedures
The central processing procedures include satellite data merging, SSM data blending, and RZSM generating (the flowchart of processing procedure is presented in Figure 2). The division of the calibration period, the product period, and the validation period were based on data availability.

(a) Satellite Data Merging
Satellite data merging aims to merge all of the available passive microwave observation data into one PASSIVE product. The ESA-CCI PASSIVE product [44][45][46] was merged in the same way using TRMM Microwave Imager, WindSat Radiometer, AMSRE, AMSR2, and SMOS, but without SMAP, so, it will be used for comparison. As for active microwave observations, the ESA-CCI Merged ACTIVE products were used directly, as it was merged using Advanced Scatterometer (Metop) which will be applied the same way for this research. Nevertheless, the approach in our research differs from ESA-CCI, in that the in-situ SSM climatology is taken into account before the final SSM blending.
• Rescaling while using CDF matching Differences in sensor specifications, particularly in microwave frequency and spatial resolution, result in different absolute SSM values from AMSR2, SMOS, SMAP, and AMSR-E. It is needed to scale datasets into a common climatology using the CDF matching method. The climatology of AMSR-E was selected as the reference, because the AMSR-E SSM retrievals were identified as more accurate than other passive products due to the relatively low microwave frequency and high spatio-temporal resolution of the sensor [33]. • Error Characterization using triple collocation Error characterization aims to obtain the relative errors (stationary average random errors) of each dataset while using triple collocation analysis. The triple collocation analysis was performed for those pixels, where two or three datasets overlapped and were uncorrelated [53]. Furthermore, the uncertainties of the target product (PASSIVE product) can also be determined from the error variances of every single product.
• Optimal weights calculated while using the least-squares method and weighted averaging The scaled satellites data were merged using a weighted average on a pixel basis which considers the error properties of each dataset. The optimal weights for the weighted average were determined by the relative error variances of all of the input datasets over each specific merging period using the least-squares method. The merging method works in both three datasets and two datasets cases, as relative errors of each dataset have been determined. However, for specific locations, triple collocation analysis does not yield valid error estimates. In such cases, the weights were equally distributed amongst the available datasets.

(b) SSM Data Blending
• Rescaling while using CDF matching method First, as a reference dataset, ERA-Interim SSM products over the Calibration Period was scaled based on the obtained in-situ data climatology using CDF matching, which can also produce the seasonal CDF matching parameters for climatology scaling of ERA-Interim data over the Product Period (see Figure 2). The above-mentioned seasons are defined as Monsoon (May-October), Transaction1 (April), Winter (December-March), and Transction2 (November). Subsequently, the ERA-Interim SSM data over Product Period were scaled using the seasonal CDF matching parameters.
Next, another CDF matching was performed to scale each merged satellite dataset (PASSIVE and ACTIVE products generated in satellite data merging step) based on the in-situ scaled ERA-Interim data climatology over the Product Period.

•
Error Characterization using triple collocation • The relative errors among the scaled PASSIVE, ACTIVE, and ERA-Interim products were calculated while using the triple collocation method, which used three collocated datasets to constrain the relative error variance determination without a manually decided reference.

•
Optimal weight calculation using the least-squares method and weighted averaging • Similar with satellite data merging, a weighted average was used to merge scaled PASSIVE, ACTIVE, and ERA-Interim products over the Product Period and the optimal weights were obtained based on the relative errors while using least squares method.
(c) Deriving RZSM • Depth Scaling Using CDF Matching CDF matching method was used to calculate RZSM from the blended SSM data. The Tibet-Obs in-situ measured SSM and RZSM datasets [1] over the Calibration Period were used to generate the depth scaling parameters. Note that the RZSM refers to a mean value of the SM in the 0-80 cm depth: where i is the index of the soil layer; θ p [L 3 L −3 ] is the RZSM; L i [L] is the soil layer depth; and, θ i [L 3 L −3 ] is soil moisture in the i th soil layer. The specific Tibet-Obs product used in this research includes the soil moisture data at the depth of 5 cm, 10 cm, 20 cm, 40 cm, and 80 cm. The scaling parameters of the fifth-order polynomial fitting function were generated from the SSM and the differences between the corresponding values of ranked SSM and RZSM. Gao et al. [30] identified the fifth-order polynomial as the optimal choice based on a pre-analysis to define the fitting function for depth scaling. The scaling parameters were used to estimate the predicted difference between SSM and RZSM over the Product Period; then, the predicted RZSM were generated while using blended SSM and the predicted differences.

• RZSM Estimation Using SMAR model
In the four in-situ measurement networks of Tibet-Obs (Ali, Shiquanhe in an arid region, Naqu in semi-arid region, and Maqu in sub-humid region), SMAR model parameters were derived mathematically over the one-year Calibration Period while using the SSM and RZSM (same with the datasets used for depth scaling using CDF Matching). The parameters were applied over the entire Product Period.

Merged Satellites SSM Product
The rescaling procedure kept the seasonal dynamics and adjusted the absolute values of original datasets (e.g., SMOS, AMSR2, and SMAP SM products), follows the procedures by Liu et al. [33]. The consistency is a necessary condition for the following merging procedures. Figure 3 highlights such consistency of the time sires, where all the time series of raw data, scaled data, and merged data are presented. The merged PASSIVE data are close to the ESA-CCI PASSIVE product, but with a wider fluctuation. The raw satellites data were scaled and merged over each merging period, as presented in Table 1. For the merging periods of S1 and S3, as only one satellite dataset available, the merged SSM uses only that dataset. For S2, the merging time is relatively short, and the data of SMOS is much less than AMSRE, but these conditions did not influence the performance of triple collocation. For S4, the relative errors of SMOS and AMSR2 are similar, so the optimal weights for merging the two are similar. Table 2 presents the specific averaged relative errors and Optimal Weights over each merging period (for the explanation of parameters see Section 2.2). In general, AMSR data were assigned higher weights than SMOS data. Table 2. Summary of the relative errors and optimal weights (averaged values over the entire study area) used in satellites data merging.

Merging Period
Date σ 2 x ω x In the merging period S1 and S3, only one satellite dataset included in the merged product, which means that the weight is 1 for that satellite dataset. Otherwise, the merging weights were calculated from relative errors; the pixels without valid relative errors were set as the equal weights. As AMSRE and AMSR2 have low errors, the weights of them are relatively high. SMOS and SMAP have similar relative errors and they are higher than those of AMSRE and AMSR2, as such, their weights are relatively low. However, the difference is small, especially in the period of S5.

Blended SSM Product
First, the ERA-Interim SSM data were scaled based on the in-situ climatology. Figure 4 presents the time-longitude diagrams of in-situ data, the original ERA-Interim, and the scaled ERA-Interim, as well as an example illustrating how the CDF matching works (Figure 4c). Figure 4a shows the in-situ data climatology over the Calibration Period. The sub-humid regions (eastern part of the time-longitude diagram) show a significant seasonal difference, while the SM data in the western part show relatively low values in both the monsoon season and the winter. The original ERA-Interim SM data did not show apparent seasonal change as compared with the in-situ climatology in Figure 4a, as presented in Figure 4d. After scaling, the scaled ERA-Interim product shows a similar pattern with the time-longitude diagram of in situ climatology (Figure 4d). The volumetric water content is around 0.35 (m 3 /m 3 ) in the monsoon season, while it is around 0.07 (m 3 /m 3 ) in the winter.
Subsequently, the scaled ERA-Interim SM product was used to scale the PASSIVE and ACTIVE products. Although the original PASSIVE and ACTIVE products show some seasonal and spatial dynamics, they show different dynamic ranges and absolute values. Moreover, the PASSIVE product overestimated the SSM over Tibetan Plateau, especially during the monsoon seasons. While the ACTIVE one underestimates SSM over the winter. After scaling, scaled PASSIVE and ACTIVE data both show a similar climatology with the scaled ERA-Interim data, as well as the dynamic range of SSM. The effect of climatology scaling was significant, and the averaged SSM in the sub-humid region of Tibetan Plateau was up to 0.40 (m 3 /m 3 ) before scaling. After scaling, the absolute values of PASSIVE products changed based on the systematic difference with the reference data (scaled ERA-Interim).
The blending of these scaled SSM products was implemented in two situations: first, blending the three collocated SSM data, where the datasets have enough triplets; second, blending the rest of data, where one satellite data individually collocates with the scaled ERA-Interim data, but not collocate with another satellite data (i.e., no triplets). To tackle the first situation, the data number of each product and the corresponding triplets' number was checked. The ERA-interim product has more than 3500 data across the southern and eastern part of Tibetan Plateau and have many data in the arid zones. PASSIVE and ACTIVE products have less data number. The eastern regions have around 1200 triplets, while some of the western regions have less than 400 triplets. Only the data with more than 100 observation triplets were presented, as 100 observation triplets are required for a reliable estimation of the relative errors among the three SSM products [54]. Next, the minimum correlation coefficient among the three collocated SSM products was calculated to check whether the identified number of triplets is statistically significant to apply with triple collocation method. The distribution of minimum correlation coefficient is relatively equal when compared with the triplets' number and most of the areas are with a minimum correlation coefficient of more than 0.15, which is required for a sample with more than 100 data to achieve the statistical significance. The area with triplet number > 100 (where the P-value < 0.05) are all statistically significant to be applied with triple collocation method to identify the relative errors.  Figure 5 presents the time series of data that are involved in the data blending procedure. It is evident that scaled ERA-Interim, PASSIVE, ACTIVE data became align with each other, and the blended SSM, which constrained by the in-situ climatology, captured better the in-situ SSM dynamics when compared with ESA-CCI SSM product. Figure 6 shows the optimal weights of scaled ERA-Interim (a), PASSIVE (b), and ACTIVE (c) products, which were determined by the relative errors while using the least-squares method. Among the three SSM products, the ACTIVE product has the smallest average relative error (0.0032 cm 3 cm −3 ), while the highest weight (0.60) contributing to the blended products. The average weight of the PASSIVE product is 0.14 and the scaled ERA-Interim product is 0.30. The reason why the merging weights of PASSIVE product are relatively low is the low SSM retrieval rate of passive satellites over the central and western Tibetan Plateau, which are arid or semi-arid regions [16].  The other situation is to blend the scaled satellite SSM data individually collocating with the scaled ERA-Interim data, but not collocating with each other. The weights for blending were determined from the scaled PASSIVE product and ERA-Interim, or the scaled ACTIVE product and ERA-Interim. For example, there were scaled PASSIVE data individually collocated with the scaled ERA-Interim, the average weight of scaled ERA-Interim is 0.4598, and the average weight of the scaled PASSIVE is 0.5403, corresponding with a relative error of 0.0123 cm 3 cm −3 and 0.0063 cm 3 cm −3 , respectively.

RZSM Product
CDF matching is the first method that is used to generate RZSM. The calibration aims to generate CDF parameters of the scaling function for every single cell, which is a fifth-order polynomial aims to illustrate the non-linear relationship between the surface-root zone difference and the SSM data. Figure 7a presents the surface and root zone relationships found in the semiarid region. The RZSM is higher than SSM in the winter and lower in the monsoon season. The seasonal differences are low in semi-arid areas and are relatively evident in sub-humid regions. The RZSM in semi-arid region (Figure 7a) shows a smooth pattern with absolute values that are slightly higher than SSM in the winter and lower in the monsoon season. The results present that the RZSM estimation qualitatively kept the in-situ data climatology. The time series in Figure 7b is the averaged SSM and RZSM data in the sub-humid region. The way to generate the averaged in-situ RZSM was averaging soil moisture layer by layer across all stations due to the high heterogeneity of SSM and RZSM across Tibet-Obs stations. Using the SMAR model is the second method to generate RZSM. The calibration of the model was based on the blended SSM data and in-situ RZSM over two years in three climate zones. The calibration of the SMAR model needs a continuous input time series, which requires some procedures to fill or skip the gaps. In this research, the gaps less than three days are interpolated, and the long gaps are skipped, and the time series were cut into several sections, and the calibration was carried out over the relatively long time series. Table 3 presents the calibrated parameters and root mean square errors. In semi-arid region, the calibration parameters are the most trustworthy as the root mean square error is the lowest. Figure 7 compares the RZSM results from both CDF method and SMAR model. In Semiarid region, the results are similar, while the SMAR model generated a smooth result and the CDF method result is highly dependent on the input data (blended SSM). In sub-humid region, the SMAR result is very close to the reference in-situ data after a short model warming up period. The results for arid region are not shown, due to too many gaps in the data.

SSM Product
The homogenized and merged product presents SSM with global coverage and spatial resolution of 0.25 • . The time length of the product spans the entire Product Period covered by different sensors, i.e., 2007-2016. The blended SSM data were compared with different SSM products over the Product Period while using the Taylor diagrams in Figure 8. The Taylor diagram represents the correlation coefficient, the centred unbiased root mean square difference, and the standard deviation while using a two-dimensional plot [48]. The in-situ measurements of Tibet-Obs networks (Naqu, Maqu, Ali, and Shiquanhe) were served as the reference. The original SSM products in Figure 8   Comparing the blended SSM results in this research (A) with other SSM product (B, C, D, E). In the Ali network (Figure 8a), the blended SSM shows a better quality than SMOS and ERA-Interim SSM products and it has a similar RMSE and standard deviation and a higher correlation coefficient with AMSR2 and SMAP products. In the Shiquanhe and Naqu network (Figure 8b,c), the blended SSM gives a better estimation than all of these products. In Maqu network (Figure 8d), the blended SSM show a better estimation than AMSR2 and SMAP SSM product, but SMOS and ERA-Interim show lower standard deviation. However, the correlation coefficients are similar.
When comparing the simple average of original data (H) with the simple average of scaled data (I), the I data performs much better in arid and semi-arid regions with a low standard deviation, and root mean squares. It that means a climatology scaling is needed before merging and CDF matching. Comparing the simple average of scaled data (I) with the blended SSM data (A), it shows that the least-squares method performs better than arithmetic averaging in Ali and SQ networks, which gives a better result close to the in-situ measured data with a similar correlation coefficient and root mean squares (Figure 8a,b). Although they perform similar well in Maqu Figure 8d, it was still verified that the least-squares method is needed and performed better in the arid areas over the Tibetan Plateau.
The merged PASSIVE (F), merged ACTIVE data (G), original satellites data SMOS (B), AMSR2 (C), SMAP (D), and ERA-Interim (E) have higher standard deviations before climatology scaling. The more significant standard deviation indicates more fluctuations, especially for PASSIVE and ACTIVE. After a scaling based on the in-situ climatology, the standard deviation of them became smaller. It is one of the reasons why a climatology scaling is needed before a merging step. It is essential to eliminate the systematic difference between the different datasets before further merging can be performed.
The PASSIVE and ACTIVE products did not perform well before climatology scaling, while SMOS and AMSR2 performed well in most of the conditions. The SMOS data accurately estimated SSM and captured the in-situ SSM dynamics. Especially in the Shiquanhe and Naqu network (Figure 8b,c), it performs similarly to the blended SSM data. When comparing PASSIVE with ACTIVE, the scaled ACTIVE performs better, which means that the average weight of ACTIVE is greater than the weight of PASSIVE in the Blended product.

RZSM Product
The inter-comparison of RZSM products is a comparison between the RZSM products that were generated in this research (e.g., RZSM product from CDF matching and SMAR model) and other products (e.g., ERA-Interim RZSM data, GLDAS RZSM data, and SMAP L4 RZSM data) over the one-year validation period, when the RZSM measurements were available. The first RZSM product (presented with the tag "CDF" and red dots in Figure 9) was generated while using CDF Matching based on in-situ SM measurements. The other RZSM product (presented with the tag "SMAR" and red dots in Figure 9) was generated using the SMAR model. The other RZSM data ERA-Interim, GLDAS, and SMAP were presented with cyan dots in Figure 9.
In different climate zones, each RZSM product shows different characteristics. In the sub-humid region (Figure 9c), the SMAR product shows a lower standard deviation (0.058) when compared with CDF product (0.080), and it also shows a slightly lower root mean square error (0.037 vs. 0.047) and similar correlation coefficient (0.85 vs 0.81). Both methods of deriving RZSM are satisfactory when compared with in-situ measurements. Both are better than the GLDAS product refers to RMSE and standard deviation. Additionally, both are similar or slightly better than SMAP L4 product and ERA-Interim RZSM product refers to RMSE and correlation coefficient. In the semi-arid climate zone (Figure 9b), the SMAR model performs much better when compared with other products with a low standard deviation (0.049) and root mean square error (0.03). The relatively unsatisfactory performances of CDF matching may due to the SM dynamics of semi-arid region, as suggested Manfreda et al. (2007) [54]. The relationship between SSM and RZSM is more complicated, and the two variables are more likely to be decoupled in such regions. As such, a physically-based estimation (e.g., SMAR model) can be better than a simple depth scaling method [24,55]. When compared with other products, the SMAR model gives a better estimation than ERA-Interim, and SMAP RZSM product, and a lower RMSE than GLDAS RZSM product. The CDF method gives a better estimation than SMAP RZSM product and a higher correlation coefficient than the others. In the arid region (Figure 9a), the SMAR product and CDF product both show a mediocre performance, which might be due to the low availability of in-situ RZSM data in the arid region, as both of the methods rely on the dynamics of in-situ RZSM time series. Nevertheless, using SMAR model and CDF Matching to generate the RZSM is reliable and straightforward, which can give a robust prediction of RZSM from satellite SSM. It is worth to note that the relationship between SSM and RZSM is obtained while using zonally averaged data, which ignored some specific difference between different stations or different soil layers. For example, the 40cm soil moisture of some stations in Shiquanhe networks are extremely low, and the seasonal change is small, while, in another station of the same network, it is incredibly high and fluctuates wildly.

Conclusions
In this study, performing satellite data merging, climatology scaling, SSM data blending, and depth scaling produced consistent SSM and RZSM products. As discussed above, these procedures composed an integrated method to study SSM and RZSM using the sparse in-situ measurement networks over Tibetan Plateau. First, different satellites observed data were merged into two satellites-based products (i.e., ACTIVE and PASSIVE). Subsequently, the satellite products were constrained by in-situ measurements using CDF matching and the in-situ scaled land surface model-simulated data. Next, all of the input data were merged into a consistent SSM product. Last, the SSM product was scaled while using CDF matching to obtain the RZSM. Furthermore, an analytical model SMAR was also applied to enable a physically based estimation of RZSM.
Beyond the typical data blending research, this research constrained the blending data sets with the in-situ climatology [2]. It can eliminate the influence by using different land surface model simulations. It should be noticed that the choice of the climate classification method could also influence the in-situ climatology. Nevertheless, the generated consistent SSM and RZSM products captured the in-situ dynamics.
The methods for generating RZSM products are CDF matching and the SMAR model, which are recommended due to their simplicities and reliable performances. Both of them are an economical and efficient method for maximizing the use of limited in-situ measured data. The calibration of them can work while using a two-year coarse resolution in-situ RZSM measurement. The predicted RZSM products from the blended SSM product can follow the in-situ measurement reasonably well. The high correlation between RZSM products and in-situ measured RZSM and low uncertainties indicate that the CDF Matching and SMAR model can give a robust RZSM prediction. Nevertheless, the RZSM states and data availability vary from region to region. For example, the SMAR model performs well as a physical-based model in the semi-arid region, while CDF Matching performs slightly better in the arid region.