Methods for Earth-Observing Satellite Surface Reflectance Validation

In this study an initial validation of the Landsat 8 (L8) Operational Land Imager (OLI) Surface Reflectance (SR) product was performed. The OLI SR product is derived from the L8 Top-of-Atmosphere product via the Landsat Surface Reflectance Code (LaSRC) software and generated by the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center. The goal of this study is to develop and evaluate proper validation methodology for the OLI L2 SR product. Validation was performed using near-simultaneous ground truth SR measurements during Landsat 8 overpasses at 13 sites located in the U.S., Brazil, Chile and France. The ground truth measurements consisted of field spectrometer measurements, automated hyperspectral ground measurements operated by the Radiometric Calibration Network (RadCalNet) and derived SR measurements from Airborne Observation Platforms (AOP) operated by the National Ecological Observatory Network (NEON). The 13 sites cover a broad range of 0–0.5 surface reflectance units across the reflective solar spectrum. Results show that the mean reflectance difference between OLI L2 SR products and ground truth measurements for the 13 validation sites and all bands was under 2.5%. The largest uncertainties of 11% and 8% were found in the CA and Blue bands, respectively; whereas, the longer wavelength bands were within 4% or less. Results consistently indicated similarity between the OLI L2 SR product and ground truth data, especially in longer wavelengths over dark and bright targets, while less reliable performance was observed in shorter wavelengths and sparsely vegetated targets.


Introduction
Since 1972, users have relied upon Landsat satellite data for historical study of land surface change. However, post-production processing must be performed to analyze land surface change and create application ready data-sets. To alleviate this burden, a "higher-level" Landsat Surface Reflectance (SR) product was developed by the U.S.
Geological Survey (USGS) as an initial effort to support land surface change studies.
Landsat SR is an essential product desired by users to monitor the land surface reliably and is input for developing higher level surface geophysical parameters to detect overall land cover changes [1].
This study focuses on the validation of the Landsat 8 Operational Land Imager (OLI) Level 2 (L2) SR product as generated by USGS and Earth Resources Observation and Science (EROS) Center. SR is derived from satellite Level 1 (L1) top of atmosphere (TOA) reflectance corrected for the temporally, spatially and spectrally varying scattering and absorbing effects of atmospheric gases/aerosols. Generally, the calibration of earth-observing satellites involves the use of the TOA L1 product to monitor the satellite's radiometric response over time. Whereas, validation provides an accuracy assessment of the SR product (L2) which is derived from the TOA (L1) product. Figure 1 illustrates the difference between the L1 and L2 products. The L8 OLI L2 SR product spatial, spectral and radiometric characteristics provides the remote sensing community with a vital source of environmental change.
For instance, vegetation biophysical characteristics such as leaf area index values, canopy cover and biomass have been extensively retrieved from the use of SR data.
Thematic forest classifications can be obtained from SR data to quantify forest productivity and cover density. Additionally, through evaluating land cover of any type, the SR data can be used to detect seasonal dynamics by using temporal Landsat data [3].
Due to the high demand of SR product applications, it is necessary to validate the OLI L2 surface reflectance products to ensure accuracy and precision of the sensor measurements. However, due to the lack of surface measurements at the required spatial and spectral resolution for a given site, the direct validation of OLI L2 products becomes problematic. Thus, developing methods to validate the atmospherically corrected OLI L2 SR product and provide a preliminary validation of the product is the primary focus of this work.  Figure 2 below.

Landsat Surface Reflectance Code (LaSRC) Description
The Landsat Surface Reflectance Code (LaSRC) algorithm was developed to derive the OLI L2 SR products through atmospheric correction of the OLI L1 products [6]. LaSRC code is publicly available [7] and has been operationally used by the USGS and NASA to generate Landsat analysis ready data and SR products [8]. LaSRC is based on the 6S radiative transfer code [9], it performs aerosol inversion with an improved determination based on the red to blue band reflectance ratio. and South Dakota (SD). A broad geographic distribution provides two main opportunities for validation: first, the OLI L2 SR product is analyzed and compared over different land cover types, and, second, the OLI L2 SR product validation is considered under different atmospheric conditions. More details on ground truth measurement sites and data processing will follow in Section 2.
OLI Validation via ground truth measurements consisted of two sets of acquired SR measurements: (i) field SR measurements made by a field spectrometer carried over the sites in SD USA, Brazil and Chile [11]; (ii) an automated hyperspectral The field SR measurements were measured via a handheld field hyperspectral spectrometer device, designed to cover the solar spectral range. The spectrometer was carried on the field in a predetermined pattern and was equipped with a fiber optic probe to retrieve the spectral signatures of the ground sites use in this study. Shown in Figure   3 is a surface reflectance collection at the South Dakota State University (SDSU) site [13]. The foreoptic is mounted to a boom arm that keeps it away from the user, thereby ensuring that the surface being measured is free of shadows. All SR measurements are made at predetermined points throughout the site collection [14]. The field spectrometer gives a 1.4-nm spectral resolution from 350 to 1000 nm and 10-nm spectral resolution for the 1000-2500-nm spectral range. The spectrometer output is interpolated within the data collection software and the results are sampled at a 1-nm spacing across the entire spectral range. [15][16][17][18][19]. Figure 4 shows the hyperspectral surface reflectance curves of the sites analyzed in this study. The SR measurements are determined by spatially averaging over the test area and band averaging for each of the bands of OLI [13], more details on the sites and data processing will be presented in Section 2. The second set of ground truth measurements used in this study was based on SR measurements provided by the Radiometric Calibration Network (RadCalNet) [13].
The RadCalNet operates automated ground instrumentation that provides continuous measurement of atmosphere and in-situ SR on cloud-free days. Surface measurements are provided at nadir view every 30 min from 9:00 to 15:00 local time, over a spectral range of 400 nm up to 2500 nm at a 10 nm spectral resolution [20].
RadCalNet operated instrumentation consist of (i) four ground-viewing radiometers (GVRs) installed to make the in-situ hyperspectral SR measurements [21] ( Figure 5a); (ii) Cimel sun photometer (part of the Aerosol Robotic Network) used to make atmospheric measurements ( Figure 5b) such as the aerosol optical depth, the Angstrom exponent and water vapor [22,23].
The RadCalNet GVR is a multispectral eight-channel radiometer that covers the visible to the SWIR bands [24]. Data obtained from the GVR must be converted to hyperspectral and spatially averaged SR results similar to those obtained via a field spectrometer device. Therefore, the SR of each GVR multispectral channel is calculated using Equation (1): where ρ is the multispectral SR for a given band (W m−2 sr−1 μm−1)·V−1, C is the GVR calibration coefficient ,V is the GVR output voltage (V), d is the earth-sundistance normalized to the average, is the solar zenith angle, τ α is the solar beam atmospheric transmission (unitless), E 0 is the spectral solar exoatmospheric irradiance (W m−2·μm−1), E sky is the diffuse spectral sky irradiance (W m−2·μm−1). The variables E 0 and E sky are determined using MODTRAN, whereas, the calibration coefficient C is determined using the solar-radiation-based calibration technique (SRBC) [25]. ground-viewing radiometer, 1.5 m above the surface [13]. A solar panel (top) is used to charge the battery that powers the system.; (b) Cimel sun photometer [22].
The calculated multispectral SR GVR data (from Equation (1)) is then converted to hyperspectral SR by rescaling a hyperspectral "reference" spectra at the time of the Landsat-8 overpass. This reference hyperspectral spectra was selected from a reference library created from 12 years of in situ measurements that corresponds to the RadCalNet Railroad Valley site. An illustration of how multispectral GVR SR data are used to scale reference hyperspectral SR data is shown in Figure 6 [26-28]. Figure 6. Ground-viewing radiometer (GVR) multispectral SR data used to scale original hyperspectral SR reference data to create hyperspectral SR data for RadCalNet [13]. The reference data (blue) are scaled up or down based on the GVR data (black dots).

OLI L2 Validation Approach Using Airborne Observation Platforms (AOPs)
While ground truth measurements remain the most direct validation approach, the associated footprint is usually very limited. The goal of this section is to provide an alternative method to validate the performance of the OLI L2 SR product by looking at derived SR from Airborne Observation Platforms (AOP) operated by the National Ecological Observatory Network (NEON). NEON SR products are, in turn, considered as independent truth for the purposes of this study. Scale effects between the Landsat and NEON AOP will be discussed in more detail in Section 2.
On-board the NEON AOPs is an imaging spectrometer, which provides potential for high precision observations [29]. The AOP has an approximate altitude of 1000 m, therefore, the radiance received by the NEON Imaging Spectrometer (NIS) is subject to atmospheric effects such as scattering and absorption caused by gases and aerosols.
Hence, it is necessary to convert the measured NIS spectral radiance to surface reflectance and remove atmospheric effects. The NEON derived SR product is a calibrated and atmospherically corrected product distributed as scaled reflectance.
Therefore, the NIS derived SR reflectance product can be used as ground truth to validate the OLI L2 SR products [30].
NEON flights are conducted annually over strategically located sites across the U.S. within 20 eco-climatic domains [31]. Allowing OLI L2 SR product validation over numerous domains that represent regions of distinct landforms, vegetation, climate and ecosystem dynamics [32]. Figure 7 shows a map of the NEON domains that were analyzed in this study (highlighted in red), the sites will be explained in more detail in Section 2. (highlighted in red) used in this study across the U.S generated using MATLAB [31].
The pushbroom NIS measures upwelling radiance over 426 spectral bands in the solar region between 380 and 2500 nm with a spectral sampling of 0.5 nm. The NIS cross track swath is ~928 m, with an instantaneous field of view (IFOV) of 1.0 mrad, at an altitude above ground level of ~1000 m and covers an area of 5-20 km × 0.600 km × number of flightlines. The ground sampling distance (GSD) is 1 m [33]. The NIS concept is illustrated in Figure 8, and the main differences between the NIS and OLI sensors are listed in Table 1 [34,35].  This study validated the atmospherically corrected OLI L2 SR product, the L2 product scenes, of the sites in Table 2, were obtained from the USGS Earth Explorer on-demand processing system web portal using the 2013 version of the LaSRC code [9]. The number of OLI scenes were based on the L8 coincident overpasses that were listed in Table 3 above. The USGS system generates the full suite of LaSRC based parameters, including TOA reflectance, AOT, SR, and pixel-level quality flags [40]. [41] Hyperspectral SR data for the SDSU, Bahia, and Atacama sites were extracted from ASCII text files retrieved from the SDSU Image Processing Laboratory archive.
Hyperspectral SR data for the RVUS, LCFR, and GBNA sites were obtained from the RadCalNet portal [12].

Data Processing
Data procedure for both time and location was performed to match-up the ground truth data with the OLI data in order to validate the OLI SR product. The ground measurements were taken during OLI overpasses, therefore, uncertainties caused by changing atmospheric conditions were minimized. The following procedure was used to process the OLI L2 product data: Identify the set of OLI overpass dates when ground measurements were simultaneously acquired.
Perform cloud/shadow/artifact screening in each OLI image using the associated Quality Assurance (QA) band information. This information consists of integer values where each bit represents a quality condition. Visual inspection of the images was also performed to verify the QA band assessment; ROI pixels visually showing clouds/shadows were excluded from further analysis.
Convert the artifact-free ROI DNs of the OLI images into units of reflectance using scaling factors given in the associated OLI product metadata: where; ρ OLI,i corresponds to OLI L2 SR product corresponding to band i; DN i is the digital number (pixel value) of OLI L2 product corresponding to band i; SF is the multiplicative scale-factor used to convert DN to SR at band i.
Ground truth measurements are hyperspectral in nature (as alluded to earlier in Section 1.2), therefore, ground truth reflectance must be converted to match OLI reflective multispectral bands as indicated by the OLI prelaunch Relative Spectral Response (RSR) curves for the CA-SWIR2 bands, as shown in Figure 9 [42]. The multispectral reflectance of the ground truth measurements can be calculated by convolving the continuous ground truth reflectance with the OLI RSR function of the corresponding OLI bands: where, ρ GT,i is the multispectral ground truth SR corresponding to band i; ( ) is the OLI spectral response function of the corresponding bands; ρ GT (λ) is the hyperspectral ground truth reflectance; and λ 1 , λ 2 are the lower and upper wavelength of the spectral range in band i. RCN data is generated at a 30-min temporal interval. Therefore, temporal linear interpolation to the RadCalNet data was applied to estimate the measurement(s) during the OLI overpass time.

Analysis
After processing the ground truth and OLI L2 SR data as described in Section 2.1.2 on all six ground truth sites and overpasses from Table 3, statistical analyses were performed to assess the accuracy of the OLI L2 SR (ρ i,λ OLI ) using ground truth SR measurements (ρ i,λ GT ). The analysis procedure is described as follows: First, the accuracy of the L2 product for each band was estimated as follows: Generate scatterplots of the OLI L2 SR (vertical axis) vs. the corresponding ground truth SR measurements (horizontal axis) for all of the overpass dates at a given site. Second, the mean reflectance difference between the OLI L2 SR and ground truth measurements was calculated for the ROI at each of the sites listed in Table 3, on all overpass dates. The mean reflectance difference was used to describe the bias of OLI L2 product (negative, if OLI SR is underestimated and positive, if OLI SR is overestimated).
where, ∆ ̅ ρ λ is the mean reflectance difference of OLI L2 SR and ground truth measurements; n λ is the number of ROI pixels for each band λ. ∆ρ i,λ is the reflectance difference between the ground truth measurement and OLI L2 reflectance in band λ and pixel i: where, ρ i,λ OLI and ρ i,λ GT are the estimated ground truth and OLI surface reflectance, respectively. Absolute measurements were taken to identify possible bias between the ground truth and OLI measurements.
Similarly, the standard deviation (σ MRD ) of the mean reflectance difference (∆ ̅ ρ λ ) was defined as: Finally, the root-mean-square error (RMSE), represents the actual statistical deviation of OLI L2 SR from the truth estimate, including the mean bias, and is computed as: The RMSE value was expressed as a relative percentage of the mean OLI L2 SR to characterize the uncertainty of the OLI L2 SR product. Such analysis was applied to estimate the expected uncertainty of the OLI L2 SR product [40]. Similar approaches were undertaken in the past to characterize Landsat products [42], VIIRS [44] and MODIS SR product [45].

ROI and Site Selection
To validate the OLI SR product with the NIS, six sites across the US were selected since the OLI and NIS were imaged simultaneously (to within ± 10 min). Nearsimultaneous scene pairs were selected in order to minimize potential uncertainties due to atmospheric and solar geometry effects between the OLI and NIS overpass times.
Overall, seven scenes were processedand were mainly composed of mixed vegetation. Table 3 shows the metadata for the scenes used. and homogeneous, based on available product quality information and visual inspection. The "pixel_qa" band of OLI L2 product was used to identify cloudy regions.
The NIS SR data for all the reported sites in Table 3 were obtained from the NEON data portal [46]. The derived NEON SR is a UTM projection hyperspectral raster product. It is distributed in an open HDF5 format including all 426 bands obtained from the on-board NIS. The HDF5 file includes the derived SR reflectance data, QA and ancillary rasters used as inputs for atmospheric correction, all metadata and ancillary data for every flight line.

Data Processing
Prior to validating the OLI SR with NIS SR, several necessary corrections to the NIS SR data were made to account for scale differences. Figure 10 outlines the processing flow used to validate OLI SR products with respect to NEON. Figure 10. Flowchart of NIS SR correction process.
As described in Section 2.1.2, spectral conversion of the NIS data to match the OLI spectral response was performed prior to the analysis.

Geometrically Align the OLI and NIS Images
Remote sensing image data potentially contains some degree of geometric distortion due to changes in sensor orientation, effects due to the earth's rotation about its axis, and terrain effects due to mountains/hills (geometry effects) The OLI is reported to have a 12 m geometric uncertainty [47], while NIS is reported to have a 0.2 m geometric uncertainty [33]. To ensure accurate validation between the OLI and NIS, image registration was applied to the NIS scenes, using the OLI scenes as a reference.
For the purposes of this work, an intensity-based registration approach was performed as there was no set of quality ground control points available. Intensity-based image registration is an iterative process based on a scalar metric representing the degree of similarity between the "moving" NIS image to be registered and the reference "fixed" OLI image [48,49]. Figure 11 shows a basic flowchart of the steps implementing the algorithm. Figure 11. Intensity-based image registration algorithm flowchart [48].
The process starts with a user selected image transformation and an internally determined transformation matrix based on a 'similarity' model. For the purposes of this work, a subset of the affine transformation called the similarity transformation was used [50]. The translation, rotation, and scaling of the affine transformation belongs to the similarity transformation. However, a similarity transformation changes all distances within an image with the same ratio, i.e., a similarity transformation preserves shape [51]. The selected similarity transformation and the internally determined matrix determines the specific image transformation that is applied to the NIS image via bilinear interpolation. The image transformation type is a 2-D transformation that aligns the NIS image with the reference OLI image [44]. Next, the 'Metric' and 'Optimizer' blocks analyze the new NIS transformed image and adjusts the initial transformation matrix to begin the next iteration. The similarity of OLI and NEON images are indicated using the mean square error [52]. The algorithm reiterates itself until it finds a matrix transformation that yields the best possible NIS registration results. In this case, the transformation process stops when the mean square error reaches a point of diminishing returns, at a value of 0.0022 or less [53,54] . An overlaid multispectral registered NIS image with OLI is shown in Figure 12, for clarity purposes a band composite RGB display was used. In Figure 12, the registration results between OLI and NIS RGB composed images are both very good and are difficult to tell apart visually. The process was repeated for each NEON site in this study.

Resample NIS Product to Correct Spatial Resolution Differences
After performing the geometric registration on the multispectral and geometrically corrected NIS scenes, the image data from both sensors were georeferenced to the WGS 84 coordinate system. However, a residual spatial resolution mismatch remains. The OLI SR product has a spatial resolution of 30 × 30 m, whereas the NIS product has a 1 × 1 m spatial resolution. The NIS SR images were resampled to match the OLI SR image spatial resolution. The resampling process was performed using the map coordinates of every scene to locate the OLI pixels. A binary mask was then created for every OLI pixel and applied to the corresponding NIS pixels. The mean of the NIS pixels within the binary mask was taken to represent one OLI pixel at 30 × 30 m resolution, thus leaving us with a NIS SR that is multispectral, geometrically and spatially corrected.

Correct BRDF Effects in Resampled NIS Product with Respect to OLI
The bidirectional reflectance distribution function (BRDF) describes how SR varies with geometries such as view zenith angle (VZA), solar zenith angle (SZA) and azimuth angles [55]. The main discrepancy in the observing geometries between OLI and NIS sensors was in the view angles ranges between both sensors. Figure 13  From Figure 13, OLI VZA exhibited variation of only 3.05° to 3.4°, whereas, the NIS VZA ranged from -25° to 25°, over the same area. Governed by the surface and the spectral band, differences in viewing geometry will likely result in the introduction of a BRDF effect [56]. Since scattering and directional reflectance effects varies with wavelength and cover type, individual BRDF models were created for all NEON sites and each NIS band. The NIS BRDF models did not account for solar geometry due to the lack of changes in the solar zenith angles between the sensor overpasses at the selected NEON sites. For the ONAQ site, the OLI and NIS solar zenith angles were 25° and 26° respectively. Figure 14 shows the resulting model for the SWIR2 band applicable to the ONAQ (UT) NEON site.
where, ρ model is the model predicted reflectance difference for a given pixel at the corresponding NIS VZA. The BRDF correction was accomplished by using the difference model in Equation (8) to normalize the OLI and NIS SR differences measurements to one common OLI VZA "reference" geometry: where, ρ obs is the observed reflectance difference for a given pixel at the corresponding VZA and ρ Ref is the "reference" predicted reflectance difference as a function of the reference VZA (mean OLI VZA within the ROI). The BRDF correction (ρ BRDF Diff ) from Equation (9) was applied to each NIS ROI (from Table 3). The BRDF corrected NIS SR for a given pixel was obtained as follows: The Validation of the OLI L2 product using the BRDF-corrected NIS SR (ρ NIS ) data followed the same analysis procedure described in Section 2.1.3 for the ground truth measurements.

Results of L8 OLI Validation Using Ground Truth measurements
The first method considered to validate the OLI L2 SR product was using direct ground truth measurements. Figure 15a-g shows the scatter plots of the ground truth reflectance vs. OLI L2 surface reflectance, over the six sites reported in Table 2 Figure 16 shows the mean reflectance difference (expressed in percentage) and the 1σ standard deviation of the ratio between the site-specific measured ground truth reflectance and observed OLI L2 SR obtained during the corresponding six field campaign measurements. The mean reflectance difference was used to describe the bias of OLI L2 product: negative if OLI SR is underestimated, and positive if OLI SR is overestimated. A smaller mean reflectance difference magnitude represents a better agreement between the OLI L2 SR product and ground truth measurements.  Figure 16 demonstrates that the validation attempt using ground truth reflectance measurements shows good consistency with the OLI derived L2 SR, with a bias less than ±2% across all bands and sites. The Green, Red and NIR bands tend to have a mean reflectance differences averaging approximately ±1% or less for all bright and vegetation targets at the ground truth sites. In the shorter wavelengths, the mean reflectance difference across all sites is highest at the RVUS site, with a reflectance difference of approximately -2% and -1.5% in the CA and Blue bands respectively.
Whereas in the longer wavelengths, OLI had a mean reflectance difference of approximately 2% in the SWIR1 band over GBNA, and -2% mean reflectance difference over LCFR in the SWIR 2 band.
The OLI L2 SR product validation was within the stated uncertainty region of 10% in the Blue-SWIR2 bands, which indicates a consistent result with truth estimates at the ground level. Retrieval at the CA band was the most troublesome as indicated by the RMSE of 13.6%. The larger differences observed in the CA, and to a less extent the Blue band, are primarily due to difficulties in aerosol estimation and lower signal levels received by the OLI [57,58]. The validation of the OLI in the Green and Red bands exhibited less deviation and falls within the L2 uncertainty region, with an RMSE percentage of approximately 6.2% and 4.9% respectively. Whereas, at the longer wavelengths, the atmospheric transmittance is typically higher, resulting in a higher measured reflectance level, and atmospheric effects tend to be minimal. The agreement in these bands is consistent across different cover types (soil and vegetation) and atmospheric conditions. Some of the LCFR and RVUS data in the SWIR2 band falls just outside of the estimated uncertainty, and this primarily is due to the effects of water vapor absorption which dominates the SWIR spectral region [59].
The validation results of the OLI L2 SR product varied spectrally across vegetation and bright sites. Therefore, further analysis was performed to validate the product. Table 4 shows the RMSE and the 1σ standard deviation (expressed in percent) of the ratio between the observed OLI L2 SR and the ground truth site-specific measurements for RVUS, GBNA, SDSU and LCFR. The RMSE and standard deviations of Chile and Brazil sites were not included in Table 4 because there was only one OLI overpass per site. The largest differences were observed in the SWIR2 band at the RVUS and LCFR sites with RMSEs of 2.27% and 2.81% and a standard deviations of 1.71% and 2.03% respectively. In the CA and Blue bands from Figure 15a, b, the OLI L2 SR product showed good agreement with ground truth measurements of 0.1 SR or less. Reflectance at that level is associated with the vegetation dominant sites: SDSU, LCRF and Brazil. The RMSE (from Table 4) for the SDSU and LCFR sites are less than 0.79%, with a standard deviation of ~0.5%. SRs larger than 0.1 are associated with the bright sites: RVUS, GBNA, and Chile. The GBNA bright site showed more linear agreement between the OLI L2 SR product and ground truth measurements, with an RMSE of 1% and a standard deviation of 0.12. Whereas, the RVUS bright site showed more scatter and variation between the OLI L2 SR and ground truth measurements, with an RMSE of 2.3% and a standard deviation of 1.2%.
In the Green and Red bands from Figure 15c,d, the SR between the OLI SR and ground truth measurements over the SDSU site showed similar agreement, with an RMSE on the order of 0.41% and variation of 0.38% or less. However, the LCRF site showed more variation and scatter than SDSU, as shown by the larger standard deviation of 1.05%. The RVUS and GBNA sites indicated a better agreement between OLI and ground truth data points, in the Green and Red bands than they did in CA and Blue bands, with an RMSE of 1.42% and 0.51% respectively. Nevertheless, the OLI L2 SR product over all three bright sites in the CA -Green bands, are generally lower than the ground truth measurements. This result suggests that there is a potential aerosol overestimation by LaSRC in deriving the OLI L2 product, mainly due to the difficulties of atmospheric characterization in the CA and Blue bands.
In the NIR band shown in Figure 15e, the OLI L2 SRs within 0.2-0.3 at the LCFR site are generally lower than ground truth measurements. The SDSU, LCFR and RVUS sites exhibited similar scatter, with standard deviations of 1.14%, 0.84% and 1.39% respectively. On the other hand, the performance of the OLI L2 product at the GBNA bright sites is more precise as shown by the variation of only 0.56%. The measurements at the GBNA site indicates a higher SR estimate observed by the OLI L2 SR product than the ground truth measurements with an RMSE of 1.28%.
The SWIR1 band in Figure 15f showed that the SR retrievals over vegetation sites were lower than the ground truth measurements. This was indicated with RMSEs of 1.4% and 1.8% for the SDSU and LCFR sites respectively. Shown in Figure 15g, the OLI SR had the largest difference from ground truth measurements of 2.81% RMSE at the LCFR site in the SWIR2 band with a variation of 2.03%. Whereas, OLI demonstrated a consistent agreement with ground truth measurements with an RMSE of 0.7% and a variation of 0.34% at the SDSU site. The OLI L2 SR product over the bright sites in the SWIR bands were higher than the ground truth measurements, showing a clear, high bias, driven largely by the GBNA site, with an RMSE of 2.01% in the SWIR1 band.
Generally, the RVUS and LCFR sites showed the most spread and variation between OLI and ground truth measurements across all bands, with a standard deviation varying from 1.1% to 1.84% for the RVUS site and 0.39%-2.03% for the LCFR site.
For vegetated cases, retrievals at the SDSU site provided a closer match to the ground truth measurements, and also recorded a wide range of SRs from 0 to 0.4. Consistency between the OLI L2 SR product and ground truth measurements at the SDSU site can be seen in the VIS and SWIR bands with standard deviations less than 0.59%, whereas, a larger spread and variation of 1.14% from ground truth measurements was observed in the NIR band. Finally, the OLI L2 SR product values retrieved from the GBNA bright site seems to be in agreement with the ground truth measurements across all bands. The variations between the OLI L2 and ground truth measurements at the GBNA site were less than 0.71%. Thus, the GBNA site exhibits the most accurate ground truth SR retrieval for the validation of the OLI L2 SR product.
The OLI L2 SR product bias was measured for vegetation and bright covers. Table   5 shows the mean reflectance differences and standard deviations, expressed as percentages, for the combined vegetation sites (SDSU, Brazil and LCFR), combined bright sites (RVUS, Chile, and GBNA). In general, the estimated mean reflectance difference between the OLI L2 SR product and ground truth measurements indicated a bias of less than 1.24% or less for all sites across all bands. bands. There appears to be greater variability between the OLI L2 product and ground truth measurements as the wavelength increases, ranging from a minimum standard deviation of approximately 0.43% in the CA band to a maximum of approximately 1.3% in the SWIR bands. For the bright sites, the mean reflectance differences tended to be somewhat larger and the overall variability somewhat smaller than the vegetation sites, as indicated by the lower mean reflectance differences and larger standard deviations. The mean reflectance differences across all bands ranged between approximately 0.33% and 1.2% in magnitude, while the standard deviations were approximately 0.98% or less.

Discussion of L8 OLI Validation Using Ground Truth Measurements
The best path for surface reflectance validation is by using ground truth measurements, due to its direct traceability and high accuracy of 2% for ground campaign field measurements [16] and 3%~4% for RadCalNet measurements [22]. The overall results of the ground truth validation of the L8 OLI L2 SR product was considered on aggregate, by combing the mean reflectance differences of all sites across all bands. Table 6 shows the mean reflectance differences and standard deviations, expressed as percentages, for the six ground truth sites combined. When considering the combined ground truth data from Table 6, the overall validation estimate shows that the OLI L2 SR product is off, at the most, by a mean reflectance difference level of 0.81% ± 0.97% which encompasses the ground truth uncertainty values. Thus, this result indicates that the OLI L2 SR product is in consistent agreement with truth measurements across all the bands and sites. The largest negative bias of 0.51% ± 0.93% was observed in the CA band, whereas, the largest positive bias of 0.81% ± 0.97% was in the SWIR2 band. This result is expected, due to the larger effects of aerosols and water vapor effects in the CA and SWIR2 bands respectively. However, the OLI L2 product reflectances are consistently less than the ground truth measurements at shorter wavelengths and consistently greater at longer wavelengths. Differences in the shorter wavelength is possibly due to the excessive aerosol being estimated and inputted into the LaSRC algorithm, primarily developed to transform TOA reflectance to SR [9,40], thus the retrieved SR is underestimated.
Whereas, at longer wavelengths, the SR is being overestimated, predominantly due to the spatial and temporal variability of water vapor, which makes it more difficult to quantify. Furthermore, water vapor can only be directly estimated from satellite data if there is a designated water vapor absorption channel [60]. Thus, LaSRC relies on auxiliary data to perform the necessary corrections [61,62].

Results of L8 Validation Using NEON Imaging Spectrometer (NIS) SR
The second method of the OLI L2 SR validation was based on the multispectral, geometrically and spatially corrected NIS SR, as described in Section 2.2. Comparison of OLI SR to NIS SR was based on homogeneous ROIs selected as shown in Table 3.  Figure 18 shows the mean reflectance difference estimates of the OLI L2 SR using NIS derived SR to estimate the bias. The error bars represent the 1σ standard deviation of the ratio between the site-specific measured NEON reflectance and observed OLI L2 SR reflectances obtained during the corresponding airborne measurements. In general, validation of the OLI L2 products against the NIS SR products exceeded expectations, given the reported 10% uncertainty for the OLI products and 5%-10% uncertainty for the NIS products. Figure 18. Mean reflectance difference between OLI and NIS SR measurements.
Overall, the initial validation attempts of the OLI L2 SR product using the NIS  Table 7 shows the OLI L2 SR combined mean reflectance differences estimates, with all NEON sites combined. Thus, averaging the mean reflectance differences between NEON and OLI L2 SR across all the sites in Table 3 produces a significantly better validation assessment to the OLI L2 SR product, than validating the product with individual NEON sites.
The results of this evaluation indicate that SR derived from the NEON on-board NIS sensor can be used with confidence to provide an alternative method to accurately validate SR measurements, especially in the VNIR bands. In the SWIR bands, the validation attempt using NIS SR estimated lower mean reflectance differences than those estimated by the ground truth validation method. Since ground truth is a more direct and accurate validation method than NIS, it is not advised to use the NIS SR to validate in the infrared bands.

Summary and Conclusion
The validation of the Landsat 8 OLI L2 SR product was performed using two methods: (i) OLI L2 product validation using direct surface reflectance measurements; (ii) OLI L2 product validation using SR derived from AOP operated by NEON, corrected to match OLI geometry, spatial and spectral resolutions. Ground truth data has the capability of producing direct surface reflectance at the 2% level for ground campaigns and 3%~4% for RadCalNet [22]. This provides the most accurate method to validate surface reflectance products. Due to the limited availability of truth measurements, NIS validation was considered, which is capable of producing SR at the 5%-10% level [30].
When considering the ground truth data, the results indicate that the OLI SR product is validated at the mean reflectance difference levels of ±0.81% (0.0081 units of reflectance), across all bands and sites. The largest differences between the OLI and ground truth measurements were mainly observed at the shorter wavelengths, specifically the CA and Blue bands, with mean reflectance differences of ±1.2% and ±0.8%, respectively. Percent difference across the CA band was outside of the expected uncertainty on the order of 13.6% RMSE. The Blue bands percent difference was just within the expected uncertainty on the order of 9% RMSE. Therefore, initial efforts to validate the OLI L2 SR product suggest there are more difficulties in SR retrieval at the CA and Blue bands, especially over dense dark vegetation targets. This is attributed to a potential LaSRC aerosols overcorrection at the CA and Blue bands, SRs at six out of seven sites seem to be smaller in OLI than ground truth measurements.
The OLI validation using NIS SR indicated that the SR derived from the NEON on-board NIS sensor can be used with confidence to provide an alternative method to accurately validate SR measurements, especially in the VNIR. The OLI validation using NIS derived SR was consistent with validation using ground truth measurements in the VIS bands, with NIS reporting larger RMSE percentages on the order of 31.7%, 17.3%, 9.2% and 7.7%. However, NIS reported lower RMSE in the SWIR bands (4.2% and 5.1% respectively). Since ground truth validation is more accurate and provides a more direct validation of SR than airborne measurements, it is not recommended to validate the SWIR band of OLI L2 SR product using derived NIS SR products. The mean reflectance difference estimates between the OLI L2 SR product and NIS SR showed variation across all seven NEON sites and bands, with a bias larger than 2% in magnitude. However, on average, the seven NEON sites resulted in a bias of less than ~1% between OLI L2 SR and NIS SR, which is consistent with the direct surface reflectance validation bias estimates.
Considering both validation methods to validate the OLI L2 SR product, the reflectance dynamic range detected by OLI varied from 0 to 0.5 across all bands. Over that range, the results indicate that the OLI SR is validated at the mean reflectance difference levels of ±0.91% (0.0091 units of reflectance) across all bands and all 13 sites. The OLI L2 SR product validation was more accurate in the longer wavelengths.
As a measure of overall difference from the "truth" measurements, the RMSE across the Green-SWIR2 bands between the OLI and ground truth/NIS validation techniques was less than 9%. Bias in the OLI L2 SR product retrievals from ground truth estimates was ±0.63% (0.0063 units of reflectance) or less for the Green-SWIR2 bands. These results are well within the associated uncertainties of the L2 validation methods used (~10% for ground truth and 15% for NIS). The possible overcorrection in the Landsat surface reflectance due to the variant nature of water vapor at the SWIR bands might contribute to the positive bias value. On the other hand, more difficulties were noted in the OLI L2 SR retrieval in the CA band as indicated by the ground truth RMSE of 13.6% and NIS RMSE of 31.7% which was not within the expected uncertainty of the OLI L2 SR product.