Evaluation of Evapotranspiration for Exorheic Catchments of China during the GRACE Era: From a Water Balance Perspective

Evapotranspiration (ET) is usually difficult to estimate at the regional scale due to scarce direct measurements. This study uses the water balance equation to calculate the regional ET with observations of precipitation, runoff, and terrestrial water storage changes (TWSC) in nine exorheic catchments of China. We compared the regional ET estimates from a water balance perspective with and without considering TWSC (ETWB: ET estimates with considering TWSC, and ETPQ: ET estimates from precipitation minus runoff without considering TWSC). Results show that the regional annual ET ranges from 417.7 mm/yr to 831.5 mm/yr in the nine exorheic catchments based on the water balance equation. The impact of ignoring TWSC on calculating ET is notable, as the root mean square errors (RMSEs) of annual ET between ETWB and ETPQ range from 12.0–105.8 mm/yr (2.6–12.7% in corresponding annual ET) among the exorheic catchments. We also compared the estimated regional ET with other ET products. Different precipitation products are assessed to explain the inconsistency between different ET products and regional ET from a water balance perspective. The RMSEs between ET estimates from Gravity Recovery and Climate Experiment (GRACE) and ET from land surface models can be reduced if the deviation of precipitation forcing data is considered. ET estimates from Global Land Evaporation Amsterdam Model (GLEAM) can be improved by reducing the uncertainty of precipitation forcing data in three semiarid catchments. This study emphasizes the importance of considering TWSC when calculating the regional ET using a water balance equation and provides more accurate ET estimates to help improve modeled ET results.


Introduction
Evapotranspiration (ET) is one of the most important components of the climate system connecting the water, energy, and carbon cycle [1,2]. ET changes can be used as an indicator of climate change, especially in areas where the water cycle is accelerated [3,4]. However, regional ET is often difficult to estimate. The flux tower observing station network can provide accurate ET observations at each site [5], but it often has too sparse sites for basin scale study. Remote sensing provides an opportunity to monitor spatial-temporal changes in ET [6,7], but regional calibration and uncertainty from vegetation and Xue et al. [30]. However, TWSA can have large variability on seasonal and interannual scales due to human water consumption [31,32] and the building of reservoirs [22,23]. Zeng et al. [33] also acknowledged that ignoring TWSC would bring much bias into ET estimation, especially in regions with low ET. Wang [34] points out the importance of considering interannual TWSC in the estimation of ET. Hence, we compare the difference of ET estimates by considering and not considering TWSC in the water balance equation to explore the impact of TWSC on the ET estimate on interannual and monthly scales.
This study aims to (1) estimate the regional ET of nine exorheic catchments in China using the water balance equation considering TWSC; (2) analyze the impact of not considering TWSC and different TWSC products on the ET estimates; (3) explain the inconsistency between different ET products and regional ET from a water balance perspective. The flowchart of this study is shown in Figure 1.

Study Area
Nine exorheic catchments were divided from eleven hydrological gauge stations from River Sediment Bulletin of China (RSBC) [35]. The catchment boundaries are derived from the location of gauge stations, Shuttle Radar Topography Mission (SRTM) elevation data (https://www2.jpl. nasa.gov/srtm/), and processed in ArcGIS with Soil and Water Assessment Tool (ArcSWAT) plugin (https://swat.tamu.edu/software/arcswat/). The ArcSWAT is an ArcGIS-ArcView extension and a graphical user input interface for the SWAT (Soil and Water Assessment Tool) model. The Gaoyao, Shijiao, and Boluo hydrological gauge stations are all in the Pearl River Basin (PRB), we merge them into one catchment ( Figure 2). The YRB is divided into two sub-basins, based on its two hydrological gauge stations: Yichang and Datong. The information of the nine catchments is listed in Table 1, and the catchments are also presented in Figure 2.  Table 1. The blue curves represent the main rivers and tributaries in China. The country boundary is shown with the dash line. Table 1. Descriptions for the nine exorheic catchments of China (sorted by areas and location). The climate categories are based on annual precipitation and dryness [36].

Water Balance Equation
The ET can be estimated from surface water balance on the basin or continental scales, which usually serves as a benchmark for other products. The equation is as follows: where ET WB is calculated ET, P is precipitation, Q is river discharge, and ds/dt is the change in terrestrial water storage for a specific time period [4,11,15]. TWSC is estimated as the temporal derivative of TWSA from the GRACE products [37,38]. ET, P, and Q are the cumulated amount in a full month [4,39]. Then ds/dt (TWSC) is the differential of two consecutive months of TWSA at the beginning of a month. To obtain the time point of every beginning of a month of TWSA time series, we interpolate TWSA by an interpolation method. A similar process of calculation can be found in Li et al. [40]. Firstly, seasonal and trend signals are estimated using unweighted least squares and then interpolated for every beginning of a month. Secondly, the residuals removed by TWSA time series subtracting seasonal and trend signals are then interpolated by linear interpolation. Finally, the sum of interpolated residuals, seasonal, and trend signals are the interpolated TWSA time series. Root mean square error (RMSE) is used to evaluate the deviation between ET WB and other types of ETs. The equation is as follows: where N is the data length (time series), X i is the ith estimated ET results from other methods, Y i is the ith ET WB result.

Data
An overview of the datasets can be found in Table 2. This table lists relevant information about the datasets used in this study, such as the TWSC, precipitation, runoff, ET, spatial resolution, and corresponding links of data access. A detailed description of these data is provided below. We used the Center for Space Research (CSR) GRACE RL05 Mascons data to estimate the TWSC. The mascon solutions are global and can be better applied to hydrology, oceanography, and the cryosphere without any post-processing and without applying any empirical scaling factors [41]. The data can be downloaded from http://www2.csr.utexas.edu/grace. The Mascons data is represented at a 0.5-degree lon-lat grid and is estimated with the same standards as the CSR RL05 spherical harmonics solutions using GRACE Level-1 observations. C 20 coefficients were replaced, degree-1 coefficients (Geocenter) and glacial isostatic adjustment (GIA) corrections were applied. More details about the CSR GRACE RL05 Mascons (CSR-M) can be found in Save et al. [41]. With the development of post-processing GRACE satellite data, several GRACE solutions can be used for hydrology applications. However, different solutions would lead to different TWSC estimates.
To evaluate the impact of TWSC from different GRACE solutions on the estimate of ET, we also take JPL Mascons [42], CSR GRCTellus Land data [43], and CSR RL05 spherical harmonics solutions with the DDK4 filter applied (CSR-DDK4) [44] as a comparison. All of these above solutions are processed with the same C 20 coefficients replaced, the same degree-1 coefficients, and GIA corrections.
The processing of JPL Mascons is based on external information provided by near-global geophysical models to constrain the solution. JPL Mascons use the coarse 3-degree spherical cap Mascons, and they are downscaled to 0.5 • × 0.5 • using downscaling factors (dsf) calculated from Community Land Model (CLM ver. 4.0) [42], the grid values of JPL Mascons are multiplied by downscaling factors (JPL-M.dsf). The CSR and JPL mascon solutions can be used directly without leakage corrections. CSR GRCTellus Land data is developed by Landerer and Swenson [43] from CSR data, and scaling factors are provided to account for the signal loss during processing related to truncation to degree and order 60 and application of a 300 km Gaussian smoothing filter. The grid values of CSR GRCTellus Land are multiplied by scaling factors (CSRT-GSH.sf). The DDK filter is proposed by Kusche et al. [44], and the DDK4 filter shows a good performance in the application of the upper Yellow River [45]. The process of CSR-DDK4 is similar to CSR GRCTellus Land data while replacing 300 km Gaussian smoothing filter and destriping filter with the DDK4 filter. There is no leakage correction applied in CSR-DDK4 in this study, as with in Yi et al. [45], and in fact, the results for CSR-DDK4 are at the same level with the other solutions.

In Situ Precipitation and Runoff Data
Gridded precipitation data was obtained from the China Meteorological Data Service Center (CMDC, hereafter, P CMDC ). The gridded precipitation data was generated by a thin plate spline spatial interpolation of precipitation observations from 2472 weather stations. It has a monthly temporal resolution and 0.5 • × 0.5 • spatial resolution over all of China [46]. This data is validated by cross-validation and error analysis with gauge-based precipitation, indicating good quality. This precipitation data has been used in several studies [13,22,47]. The monthly runoff datasets are from eleven gauge stations recorded in RSBC (hereafter, Q RSBC ) (http://www.mwr.gov.cn/sj/tjgb/ zghlnsgb/) [35], which integrates all runoff considering the upstream of the corresponding catchment. The runoff measurements are all from well-gauged rivers in China.

Land Evapotranspiration Products
We use two kinds of land evapotranspiration products for comparison, which included GLDAS ET and GLEAM ET ( Table 2). Two versions of GLDAS LSM data are used for inter-comparison in this paper, i.e., GLDAS version 1 (GLDAS-1) and GLDAS version 2.1 (GLDAS-2.1) [9]. ET outputs from both GLDAS versions are driven by Noah LSM [48]. GLDAS-1 datasets cover the time period from 1979 to the present. GLDAS-2.1 datasets cover the period from 2000 to the present. Their temporal resolutions used here are monthly. More information and details about the GLDAS-1 and some improvements and changes about the GLDAS-2.1 are available at https://ldas.gsfc.nasa.gov/gldas/. The ET outputs from GLDAS-1 and GLDAS-2.1 are expressed as ET GLDAS-1 and ET GLDAS-2.1 .
We use GLEAM v3.2a ET products (hereafter, ET GLEAM ), which were published jointly by Vrije Universiteit Amsterdam, Netherlands and Ghent University, Belgium [6]. The data has a spatial resolution of 0.25 • × 0.25 • and daily temporal resolution. We sum them to the monthly results in this study. GLEAM uses a set of algorithms to separately estimate the different components (transpiration, bare-soil evaporation, interception loss, open-water evaporation, and sublimation) of land ET. The Priestley and Taylor equation was used in GLEAM to calculate potential evaporation based on observations of surface net radiation and near-surface air temperature. The rationale of GLEAM is to maximize the recovery of information on evaporation contained in current satellite observations of climatic and environmental variables [6].

Precipitation Forcing Data and Modeled Runoff Data
The precipitation forcing data from the GLDAS-1, GLDAS-2.1 (hereafter, P GLDAS-1 and P GLDAS-2.1 ), and the Multi-Source Weighted-Ensemble Precipitation (MSWEP, precipitation forcing data of GLEAM, hereafter, P MSWEP ) datasets [49] are used to explain the difference of ET results. They are also computed as regional averages. As runoff is another critical variable in the water balance equation, we also calculate the mean runoff outputs of the nine exorheic catchments from GLDAS-1 and GLDAS-2.1 Noah LSM (hereafter, Q GLDAS-1 and Q GLDAS-2.1 ) and compare the results with those for in situ runoff.

Methods
In the above-mentioned data sets, which are provided with different spatial resolutions, are used with regional averages results, the different spatial resolutions have little impact on ET estimates. The grids in the catchments are used to extract regional averages estimates. Their results are computed on a monthly scale. The ET results are shown at monthly mean and annual scales, as the amount of annual ET, interannual changes, and mean annual cycles of ET are the main characteristics of ET. Besides, the difference between the ET estimates can be clearer at monthly mean and annual scales.
We use the TWSC from GRACE (CSR-M), P CMDC , and Q RSBC to derive ET WB . To explore the impact of TWSC on ET estimate, we also estimate the ET from precipitation minus runoff directly (expressed as ET PQ ) without considering TWSC. The results are shown in Section 3.1.1.
Here we compute TWSC results from different GRACE solutions to evaluate their impact on ET while keeping all other inputs (P and Q) unchanged (Section 3.1.2). The GRACE products include CSR-M, JPL-M.dsf, CSRT-GSH.sf, and CSR-DDK4. The TWSC used for ET estimates from JPL-M.dsf, CSRT-GSH.sf, and CSR-DDK4 are expressed as ET JPL-M.dsf , ET CSRT-GSH.sf , and ET CSR-DDK4 , respectively. They are further discussed in Section 4.1.
We then compare the ET estimates from GLDAS and GLEAM with ET WB (Section 3.2). The discussion of RMSEs between ET WB and other ET estimates are shown in Section 4.2. We attempt to analyze the deviation of precipitation and runoff to the regional ET estimate from a water balance perspective (see Sections 3.3 and 4.3), quantifying the deviations between P CMDC and precipitation forcing data, Q RSBC , and Q GLDAS estimates. Previous studies demonstrate that the TWSA (or TWSC) from GRACE and GLDAS are comparable [50][51][52]. Hence, we do not compare the TWSC component in the water balance equation.
We compute the deviations between water balance ET and other ET results, precipitation results, runoff results, and results of precipitation minus runoff (Section 3.3). We assess the impact of deviation of precipitation and runoff on the estimate of ET based on RMSE and the change of RMSE (Section 4.3). The RMSEs are calculated between annual ET WB minus annual P CMDC and other ET estimates minus their precipitation forcing data (expressed as RMSE (ET-P)) from a water balance perspective. Similarly, the RMSE (ET-Q) represents the calculated RMSEs between annual ET WB minus Q RSBC and ET GLDAS minus Q GLDAS , and the RMSE (ET-(P-Q)) represents the calculated RMSEs between annual ET WB subtracting the result of P CMDC minus Q RSBC and annual ET GLDAS subtracting the result of P GLDAS minus Q GLDAS from 2003-2015. The proportions of RMSEs changed of RMSE (ET-P), RMSE (ET-Q) and RMSE (ET-(P-Q)) relative to RMSE (ET) are further computed.

Uncertainty Estimation
The TWSC estimates used in the estimate of ET WB are from CSR-M. Hence, we only estimate the uncertainty of TWSC based on CSR-M. The uncertainty estimate followed the method used in Landerer and Swenson [43] and Scanlon et al. [53]. Details about the method can be found in the supporting information of Scanlon et al. [53]. As the TWSC is the differential of two consecutive months, the uncertainty of TWSC is √ 2 of the uncertainty of TWSA. The uncertainties of monthly precipitation and runoff data collected by gauge are estimated to 10% and 5%, respectively [2,4,13]. The uncertainty of monthly ET WB is estimated by uncertainties of TWSC, precipitation, runoff based on error propagation law.
The monthly mean ET is the mean values of 13 months for the study period of 2003-2015, from the error propagation law, the uncertainty of monthly mean ET can be calculated as the uncertainty in monthly ET WB divided by √ 13. The uncertainty of annual ET WB is estimated from the uncertainty of annual P, Q, and TWSC based on error propagation law. Since the annual TWSC is estimated from the difference of the TWSA at the beginning month in one year and the next year, we estimate the uncertainty of annual TWSC equal to monthly TWSC. Figure 3 shows the monthly mean of ET WB following Equation 1, ET PQ , P, Q, and TWSC for 2003-2015 in the nine catchments. The negative ET WB values in January and February in the SRB and December in the Haihe River Basin (HRB) may result from the uncertainties of in situ precipitation and TWSC [13,40]. The TWSC has a significant impact on ET estimates in most catchments. The deviation of the monthly mean of ET WB and ET PQ reaches 34.2 mm/month in June (accounting for 52.9% of ET WB ) in the Upper Yangtze River Basin (UYRB). The deviations between ET WB and ET PQ range from 6.7 to 37.2 mm/month for twelve months in the Middle Yangtze River Basin (MYRB), and its RMSE accounts for 37.1% of variations of ET WB . The RMSEs are computed following Equation 2. In the YeRB and SRB, the deviations between ET WB and ET PQ are small, with their RMSEs between ET WB and ET PQ reaching only 6.4 and 8.7 mm/month, respectively. In the MRB, the RMSE between ET WB and ET PQ shows the maximum value, i.e., 27.2 mm/month, accounts for 33.5% of variations of monthly mean ET WB . The annual ET WB and ET PQ estimates are shown in Figure 4; the mean annual P CMDC , ET WB , and ET PQ results are shown in Table 3. In the UYRB, the largest deviation of annual ET between ET WB and ET PQ is only 27.5 mm/yr in 2014, and the RMSE between ET WB and ET PQ only makes up 2.6% of the mean annual ET. In the SRB, the TWSC has a large impact on annual ET, large deviations between ET WB and ET PQ occur almost all the years, and the proportion of the RMSE accounting for the mean annual ET WB reaches 11.5%. In the Minjiang River Basin (MRB), the RMSE represents 12.7% of the mean annual ET, with the largest deviation (291.8 mm/yr, 39.1% of total ET WB in this year) occurring in 2003.  The monthly mean TWSC from different GRACE solutions is shown in Figure A1, where their mean TWSC is the arithmetical mean from all TWSC estimates for the corresponding calendar month. Note that the TWSC derived from CSR-M compare favorably with the mean TWSC ( Figure A1), thus the CSR-M TWSC is used for the estimate of ET WB in this study. The TWSC from CSRT-GSH.sf show significant differences among the four TWSC results, as they exaggerate the monthly mean TWSC in the UYRB, MYRB, YeRB, SRB, and PRB, and the differences may result from the scaling factor derived from CLM4.5 [43]. The spatial distribution of scaling factors is checked in our study (not shown), and the spatial variability of scaling factors varies greatly in the basins, indicating exaggerated TWSC. The maximum deviations are calculated between each two monthly mean TWSC estimates, which range from 10.7 to 35.6 mm/month, the deviations occur in the YeRB (10.7 mm/month) and MRB (35.6 mm/month) ( Figure A1c,i), respectively. As the area of MRB is the smallest, and with the most abundant precipitation, it is understandable that the MRB shows the largest deviation of TWSC. The large deviation of TWSC between JPL-M.dsf and other GRACE solutions in the HRB and MRB ( Figure A1g,i) may result from the processing strategy and coarse resolution in the spatial of JPL-M since the areas of the two basins are small [54].

ET Estimated by Ignoring TWSC
The annual ET estimates based on different GRACE solutions are shown in Figure A2. The RMSEs among ET CSR-M (=ET WB ), ET JPL-M.dsf , ET CSRT-GSH.sf , and ET CSR-DDK4 are understandably less than those RMSEs between ET WB and ET PQ , and their interannual fluctuations are more consistent than that of ET PQ . We compute the standard deviations (STDs) between the four ET estimates from different GRACE solutions for every single year, and the results show that the max STD is only 51.2 mm/yr, occurring in the MRB. The mean STD for the years from 2003-2015 in the corresponding catchment is also computed, ranging from 9.7 to 27.1 mm/yr (accounting for 1.8-3.9% of annual ET WB ), with the least occurring in the YeRB and the largest occurring in the MRB. In three catchments, the max STDs occur in 2003 in the PRB, the Huaihe River Basin (HuRB), and the MRB, which are located in Southeast China. In the other three catchments, the max STDs appear in 2011, which are the YeRB, SRB, and Liaohe River Basin (LRB), in North China. Figure 5 shows the monthly mean of ET estimates from different ET products. Their mean annual cycles are similar among all the catchments. In the humid catchments: in the UYRB, the other three ET products overestimate the ET compared with ET WB for all months except December, the maximum deviation exists in July, which has the most precipitation (Figure 5a). In the MYRB, other ET estimates are bigger than ET WB estimates for all months except November when the ET WB increases to respond to increased precipitation. In the PRB, the mean of ET WB in July is less than that in June and August, and the mean of ET WB in October is also less than September and November (Figure 5e). In the HuRB, ET WB shows a rapid increase response for sharply increased precipitation in July (Figure 5h), while the three other ET results do not catch it. The ET WB also can capture the irregular monthly mean precipitation changes from June to December in the MRB (Figure 5i). In the semihumid and semiarid catchments: the two versions of GLDAS both show the maximum deviation in September with ET WB in the YeRB. Most months of ET GLDAS-2.1 are more than other ET estimates in the LRB. During the intense irrigation period of April and May, the ET WB is significantly greater than other ET estimates in the HRB. From the above, these ET results all show similar annual cycles, while ET WB can capture some irregular variations in monthly precipitation.  (Table A1). In the YeRB, the RMSEs are the least, which are 7.2, 7.5, and 13.0 mm/month. We also compute the proportion of the RMSEs accounting for an average of the monthly mean of ET WB . The proportions in the UYRB show the maximum values, which are 50.5% (vs ET GLDAS-1 ), 43.3% (vs ET GLDAS-2.1 ), and 43.1% (vs ET GLEAM ). The HuRB experienced the minimum proportions, which are 20.4%, 22.3%, and 25.9%, respectively.

Comparison of Different ET Products
The annual ET from different sources is illustrated in Figure 6, which shows huge gaps among different ET estimates. In terms of the humid catchments: In the UYRB and MYRB, it is obvious that other annual ET estimates are all larger than ET WB . Their mean deviations between ET WB and ET GLEAM reaching 144.7 mm/yr (31.3% in mean annual ET WB ) and 88.0 mm/yr (12.8% in mean annual ET WB ) in the two catchments (Figure 6a,b). In the PRB, ET GLDAS-1 and ET GLEAM both overestimate the annual ET, and ET GLDAS-2.1 shows different interannual variations with respect to ET WB (Figure 6e). In the HuRB, all the ET estimates capture the drop of ET in 2011 due to reduced precipitation ( Figure A3h

Comparison of Different Precipitation and Runoff Inputs for ET Estimation
In all the catchments, the interannual fluctuations of precipitation from different sources show similar patterns ( Figure A3), while the mean annual precipitation shows some differences (Figure 7). In the MYRB and YeRB, the P CMDC is higher than P GLDAS-1 , which is similar to the comparison from Lv et al. [55] in corresponding regions. In the SRB and LRB (Northeast China), P GLDAS-2.1 is prominently larger than the other three precipitation sources ( Figure A3d,f, and Figure 7b). Caution should be taken when using the P GLDAS-2.1 in the two catchments. It should be noted that the annual P MSWEP is the least for all the catchments ( Figure A3), and mean P GLDAS-1 are all less than those of P CMDC (Figure 7a).  Figure A4. Since the runoff is modeled results, it faces more uncertainties than precipitation. As the similar results showed in YRB (UYRB and MYRB) and YeRB in Lv et al. [55], the in situ runoff was significantly larger than that from Q GLDAS-1 . The interannual variations of runoff from different sources show similar patterns in most catchments except in the YeRB and HRB, which experienced small amounts of runoff. In all the catchments, Q GLDAS-2.1 is larger than that from Q GLDAS-1 , and they are closer to Q RSBC in most catchments, presumably due to some modification for GLDAS-2.1 [56].
To explore the impact of precipitation and modeled runoff (GLDAS Noah LSM outputs) on ET estimates, we analyze the difference in both sides of the water balance equation. We first compute the deviation between mean annual ET, precipitation, runoff, and precipitation minus runoff for 2003-2015 ( Figure 7). In the UYRB, the mean annual deviation between ET WB and ET GLDAS-1 reaches 200.9 mm/yr, the mean annual deviation between P CMDC minus Q RSBC (expressed as P-Q) and P GLDAS-1 minus Q GLDAS-1 are close to the deviation of ET (Figure 7a). This deviation is mostly contributed by the deviation of runoff (−241.6 mm/yr). As Figure 7a shows, the deviations of P-Q are close to the deviations of ET in all the catchments. For the water balance (WB) with GLDAS-2.1 (Figure 7b), the deviations of P-Q are close to the deviations of ET in all the catchments except PRB. In the PRB, the deviation of mean annual ET is only 5.3 mm/yr, while the deviation of mean annual precipitation reaches −27.9 mm/yr, and the deviation of mean annual runoff is −4.34 mm/yr (Figure 7b). Based on the water balance method, we only assess the precipitation forcing data variable for GLEAM ET. As Figure 7c shows, in the YeRB, LRB, and HRB, the deviation of precipitation may explain the most differences between annual ET WB and ET GLEAM . In other catchments, it is somewhat opposite between the deviation of annual precipitation and ET.

Uncertainty Estimation Results
Uncertainties of TWSC, P CMDC , Q RSBC , and ET WB are shown in Table 4. Large uncertainties of TWSC appear in the PRB, HuRB, and MRB, which are more than 30 mm/month. The large uncertainties of TWSC may result from the small study area (HuRB and MRB) and large variations of TWSC caused by abundant precipitation (PRB and MRB). In three catchments (MYRB, PRB, and MRB), their annual precipitation is more than 1300 mm/yr, and their uncertainties of precipitation are also more than 11 mm/month. The uncertainties of runoff are similar to those of precipitation except the YeRB, LRB, and HRB, where the water use is intense and the runoff is little. From Table 4, we can conclude that the uncertainties of monthly ET are mainly from TWSC, which is similar to the conclusions in Long et al. [1] and Pan et al. [13]. Almost in all the catchments, the uncertainties of TWSC are two times or even three times larger than the uncertainties of P CMDC , and they are also much larger than the uncertainties of Q RSBC . The uncertainties of annual ET are all larger than 45 mm/yr, while the uncertainties are mainly from the uncertainties of annual precipitation.

Impact of TWSC on ET Estimates in Local Catchments
As we can see, in the UYRB, MYRB, PRB, LRB, and HuRB, the ET WB is typically smaller than ET PQ in the wet season from May to July (Figure 3). Meanwhile, from September to December, the ET WB is larger than ET PQ . From the water balance equation, it is because, during the wet season, TWSA usually increases, and TWSC (ds/dt) is greater than 0 ( Figure 3). In contrast, in the dry season with less precipitation, TWSA generally decreases, and TWSC (ds/dt) is smaller than 0, then ET WB is larger than ET PQ (Figure 3).
It should be noted that the impacts of TWSC on ET estimates are region-specific. On the monthly scale, ET WB is obviously larger than ET PQ from March to May in the HRB (Figure 3g), which is caused by the spring irrigation of wheat [13]. The ET WB is larger than ET PQ from March to May, which is 14.4, 22.1, and 21.5 (sum: 57.9) mm/month, respectively. This result is similar to the human-induced ET (60.0 ± 24.2 mm) estimated by Pan et al. [13] for the same months. While for the SRB and LRB, ET WB is obviously larger than ET PQ from August to October. Since the main crop is corn in this region, the water consumption of the growth period is ongoing in the corresponding period. Meanwhile, there is a significant reduction in P CMDC (−40.3 mm/month) in September relative to August (Figure 3f), which is different from the HRB. As the different water consumption in agriculture, the monthly TWSC are region-specific, then the deviations between ET WB and ET PQ are region-specific.
On the other hand, in the MYRB, PRB, and MRB, the RMSEs between the monthly mean of ET WB and ET PQ is significantly larger than those for other catchments. As Figure A2 shows, the amplitudes of monthly mean TWSC are stronger than other catchments. These catchments are all located in South China, with abundant precipitation [57]. During the rainy season, as the water is stored and TWSC is positive, the monthly ET WB is smaller than ET PQ in all the catchments. As the corresponding months of the rainy season are different with respect to different catchments, these catchments also show regional heterogeneities. However, the region-specific impacts of TWSC on ET deserves more research.
On the annual scale, the sizeable variations between annual ET WB and ET PQ can mostly be explained by large precipitation anomalies ( Figure A3) It is interesting that all the STDs of mean annual ET WB are less than those for ET PQ (see Table 3), indicating smaller interannual ET WB fluctuations. In the year with more precipitation, such as the years 2008 and 2014 in the UYRB, 2012 and 2013 in the SRB, based on the water balance equation, as the TWSA increases, TWSC is greater than 0, then ET WB should be less than ET PQ in the given year. On the contrary, in the years with precipitation deficit, as the TWSA usually decreases, the ET WB would be higher than ET PQ . We would deem that the TWSA plays a role as a reservoir in the terrestrial water cycle, impounding water and reducing the amount of water that returns to the atmosphere through evapotranspiration or other forms in the wet years, but discharging water in the dry years. We can conclude that estimating annual ET simply by subtracting runoff from precipitation would overestimate the interannual fluctuations of ET.
The difference between mean annual ET WB and ET PQ reflects the long-term rate of TWSA in a catchment. The ET WB is significantly higher than ET PQ in the HRB, where the water depletion (mainly from groundwater) is fast [58]. In the LRB and YeRB, the mean annual ET WB is also larger than ET PQ , which also indicates the water depletion there [58]. Conversely, TWSA increases in the UYRB, MYRB, SRB, and PRB, and therefore the mean annual ET WB is typically less than that for ET PQ .

The Differences between ET WB and other ET Estimates
On the monthly scale, the RMSEs between ET WB and ET from different GRACE solutions are smaller, and the ET CSR-DDK4 is closest to ET WB among the three GRACE solutions (Table A1). In the YeRB, SRB, and LRB, the maximum RMSEs are between ET WB and ET GLEAM , and all of these catchments are located in North China and are semiarid catchments. In the UYRB and MRB, the RMSEs between ET WB and ET GLDAS-1 show the maximum values. In the MYRB, PRB, HRB, and HuRB, the RMSEs between ET WB and ET PQ show the maximum values, which indicates that impacts of ignoring TWSC on the ET estimate is the most, and it should be noted that all of these catchments are humid catchments except HRB with intense water consumption.
On the annual scale, the RMSEs between ET WB and ET from the three GRACE solutions show small values, while ET CSRT-GSH.sf is closest to ET WB among the three solutions (Table A2). It indicates some differences in the TWSC estimate on the monthly and annual scales. The RMSEs between ET WB and ET from other products markedly exceed those between ET WB and ET from other GRACE solutions. In the UYRB, MYRB, PRB, and MRB, the RMSEs between ET WB and ET GLDAS-1 show the maximum values, which are all in humid regions. It should be noted that in the UYRB, the RMSE between ET WB and ET PQ is even less than the RMSEs between ET WB and ET from other GRACE solutions, which indicates that the interannual variations of TWSC are very small in this catchment. ET estimates from different GRACE solutions generally show relatively small deviations in all the catchments, and ET estimates from different products are generally relatively large deviations in the humid catchments.

Impact of Precipitation and Modeled Runoff from a Water Balance Perspective
The RMSEs between annual ET WB and other ET results are further analyzed. Their results are shown in Table A3 (WB -GLDAS-1), Table A4 (WB -GLDAS-2.1), and Table A5 (WB -GLEAM). For  Table A3, in the UYRB, MYRB, SRB, PRB, and MRB, the RMSEs between ET WB and ET GLDAS-1 can be markedly reduced if the deviation of P GLDAS-1 and Q GLDAS-1 can both be taken into consideration. Generally speaking, though GLDAS ET outputs are not computed based on the water balance method [9,59]. If the accuracy of P GLDAS-1 and Q GLDAS-1 can be improved in China, e.g., modeled Q GLDAS-1 verified by in situ runoff. Then the ET estimate would also benefit from improved runoff outputs based on the water balance equation during the simulation process. In the YeRB, LRB, HRB, and HuRB, the RMSEs are also reduced, with smaller proportions reduced than above catchments. In the YeRB and LRB, if we only consider the difference of runoff, the RMSEs would even increase, and in the HRB, the RMSE is also slightly reduced. Since the outflows are much smaller in these three catchments than other catchments, the deviation of runoff is small itself (Figure 7a). Unlike the other humid catchments, in the three semiarid catchments (YeRB, LRB, and HRB), the proportions of the RMSEs of ET-P as opposed to ET are reduced, which indicates that deviations of precipitation forcing data indeed contribute to deviations of ET.
In Table A4, the RMSEs between ET WB and ET GLDAS-2.1 reduced in all the catchments except HRB when the deviations of precipitation and runoff can be considered. In the YRB (UYRB and MYRB), the RMSEs between Q RSBC and Q GLDAS-2.1 account for most of the deviations. In the HRB and MRB, the deviations between ET WB and ET GLDAS-2.1 do not result from the precipitation difference. It should be noted that in the LRB, if the precipitation inconsistency is considered (Figure A3f), the RMSE between ET WB and ET GLDAS-2.1 is dramatically reduced, which can explain the cause of overestimation of the annual ET for ET GLDAS-2.1 . In the HRB, with the deviation of precipitation and modeled runoff considered, the proportion of the RMSE increased (Table A4). Since the HRB is heavily influenced by human activities [13,31], the RMSE between mean annual ET WB and GLDAS ET outputs is mainly contributed by anthropogenic activities [13].
As for the RMSEs between ET WB and ET GLEAM , we only compute their precipitation difference (Table A5). In the YeRB, LRB, HRB, and HuRB, if the precipitation difference can be taken into consideration, the RMSEs between ET WB and ET GLEAM would be reduced, the YeRB, LRB, and HRB are semiarid catchments. Figure A3 also shows a large deviation between P CMDC and P MSWEP . The proportion of the RMSE reduced in the YeRB reaches 72.8%. In the UYRB, MYRB, SRB, PRB, and MRB, the RMSEs would even increase, which indicates that the deviations of precipitation do not contribute or contribute little to the deviation between ET GLEAM and ET WB . The RMSE between ET and ET-P rapidly increases from 86.9 to 228.7 mm/yr in the MRB, there is a small difference between their annual precipitation actually ( Figure A3i).
Here we try to explore the deviation between ET WB and GLDAS or GLEAM ET based on the water balance equation. The RMSE (ET) would decrease if the deviation of P GLDAS and modeled Q GLDAS in the GLDAS LSM can be taken into consideration. In four catchments (YeRB, LRB, HRB, and HuRB), precipitation differences contribute to the deviation between ET WB and ET GLEAM . However, the increased RMSE (ET-P), RMSE (ET-Q) and RMSE (ET-(P-Q)) relative to RMSE (ET) should be further explored. We do not investigate other forcing variables except precipitation to derive ET, e.g., radiation, air temperature, and snow water equivalent [6,9,60]. Therefore, a future intercomparison can be performed to identify the impact of these variables on ET estimates.

Impact of Groundwater Baseflow and Water Diversion on ET Estimates
Based on the water balance equation, the groundwater inflow and outflow across the basin boundary would also affect the estimate of ET. As an example, in the LRB, according to the estimate of groundwater outflow from Zhang and Li [61], the outflow is 0.61 × 10 8 m 3 /yr, and its impact on the annual ET is only~0.3 mm/yr. Therefore, it can be negligible relative to the annual ET (417.7 ± 46.5 mm/yr).
Water diversion in the basin inside and outside is also a part of basin water balance. In China, there is South-to-North water diversion, which includes the east route, the middle route, and the west route projects (http://nsbd.mwr.gov.cn/). The west route project has not been built yet. The starting point of the east route is in the mainstream of Lower Yangtze River, transporting water to Shandong Province, which is not in our study area. The middle route transports water from the MYRB to the HRB, is going through the HuRB and the YeRB. It transported water to the North in October 2014 for the first time, with a water volume of 21.67 × 10 8 m 3 in the first year. The impact on the ET estimate is 3.1 mm/yr for the MYRB, which is relatively small compared to annual ET (689.7 ± 50.3 mm/yr). If the water is totally supplied to the HRB, the impact on the ET estimate will reach 15.18 mm/yr, exerting a certain influence on the ET estimate in the HRB (494.2 ± 37.2 mm/yr). If we estimate the ET after 2015 in this region, it is necessary to account for the water diversion.

Impact of Spatial Scale on ET Estimate
The area of the MRB is only 5.45 × 10 4 km 2 , which is less than the typical GRACE footprint (20 × 10 4 km 2 ). However, some studies have demonstrated that GRACE is capable of detecting TWSA in local regions with an area smaller than GRACE resolution if the signal amplitude is large enough [44,62,63]. As the MYRB receives the most abundant precipitation among these catchments (Table 3), TWSA should have higher SNR (Signal to Noise Ratio), and TWSC tends to have higher reliability. On the other hand, the maximum uncertainty of monthly ET estimate is indeed in the MRB, where the uncertainties of monthly TWSC, precipitation, and runoff are also large (Table 3). Thus, we recommend that caution should be exercised when using TWSA estimates in regions with a small area.

Conclusions
In this study, the ET was calculated based on the water balance equation in nine exorheic catchments of China. The impacts of ignoring terrestrial water storage changes and different terrestrial water storage changes from GRACE solutions on ET estimates were analyzed. The intercomparison between ET WB and ET estimates from GLEAM, and GLDAS land surface models was also conducted. The comparison was carried out on the monthly and annual scales.
We found that the impact of ignoring terrestrial water storage changes on the estimate of ET is noteworthy. The RMSEs of between monthly mean ET WB and ET PQ range from 6.4-27.2 mm/yr (17.5-45.2% in corresponding mean monthly ET). The annual RMSEs between ET WB and ET PQ in the estimate of ET range from 12.0-105.8 mm/yr (2.6-12.7% in corresponding annual ET) among these catchments. The STDs of annual ET WB for study periods are all less than those from ET PQ , which simply estimate the annual ET by subtracting runoff from precipitation would overestimate the interannual variations of ET. Thus, TWSC should not be ignored in the estimate of ET.
The ET estimates from different GRACE solutions show relatively small deviations. The RMSEs among different GRACE solutions in most catchments are less than 10 mm/month on the monthly scale and 30 mm/yr on the annual scale. In all the catchments except the HRB and MRB, CSR-GSH.sf solutions exaggerate the monthly mean TWSC, and caution should be taken when applying this solution to derive TWSC.
Different precipitation products are assessed to explain the inconsistency between different ET products and regional ET from a water balance perspective. The difference between ET WB and ET from GLDAS land surface model results can be partly explained from deviation from precipitation forcing data in several catchments, especially in the LRB. Furthermore, the ET estimates would also benefit from improved runoff outputs during the simulation process. In the three semiarid catchments and the HuRB, the RMSEs between ET WB and ET GLEAM can be reduced, provided that the difference of precipitation can be taken into consideration. However, the increased RMSEs with deviations of precipitation forcing data and modeled runoff considering in the estimate of ET deserves further exploration.
The ET estimates show some arresting interannual fluctuations, which warrants further study. In the SRB and MRB, there may exist some positive trends, which are likely resulting from increased precipitation or other effects. The trends are also worthy of further research. In summary, our study emphasizes the capability of GRACE in estimating the ET on the basin scale. The ET estimate based on water balance can be a benchmark to other ET products, which would benefit the GLDAS LSMs and remote sensing ET estimates. Acknowledgments: The authors thank Fei Li, Wei Feng, and Haoming Yan for their insightful suggestion and discussions and thank Fan Xie for her help. We thank four anonymous reviewers for their comments, which help to improve this paper.      Table A3. The RMSEs between mean annual ET WB and ET GLDAS-1 (expressed as RMSE(ET)). The RMSEs between annual ET WB minus P CMDC and annual ET GLDAS-1 minus P GLDAS-1 (expressed as RMSE(ET-P)). The RMSEs between annual ET WB minus Q RSBC and annual ET GLDAS-1 minus Q GLDAS-1 (expressed as RMSE(ET-Q)). The RMSEs between annual ET WB subtracting the result of P CMDC minus Q RSBC and annual ET GLDAS-1 subtracting the result of P GLDAS-1 minus Q GLDAS-1 from 2003-2015 (expressed as RMSE(ET-(P-Q))). The proportion of RMSE changed of RMSE(ET-P) as opposed to RMSE (ET), RMSE(ET-Q) as opposed to RMSE (ET) and RMSE(ET-(P-Q)) as opposed to RMSE (ET).