Quantifying Western U.S. Rangelands as Fractional Components with Multi-Resolution Remote Sensing and In Situ Data

: Quantifying western U.S. rangelands as a series of fractional components with remote sensing provides a new way to understand these changing ecosystems. Nine rangeland ecosystem components, including percent shrub, sagebrush ( Artemisia ), big sagebrush, herbaceous, annual herbaceous, litter, and bare ground cover, along with sagebrush and shrub heights, were quantiﬁed at 30 m resolution. Extensive ground measurements, two scales of remote sensing data from commercial high-resolution satellites and Landsat 8, and regression tree models were used to create component predictions. In the mapped area (2,993,655 km 2 ), bare ground averaged 45.5%, shrub 15.2%, sagebrush 4.3%, big sagebrush 2.9%, herbaceous 23.0%, annual herbaceous 4.2%, and litter 15.8%. Component accuracies using independent validation across all components averaged R 2 values of 0.46 and an root mean squared error (RMSE) of 10.37, and cross-validation averaged R 2 values of 0.72 and an RMSE of 5.09. Component composition strongly varies by Environmental Protection Agency (EPA) level III ecoregions ( n = 32): 17 are bare ground dominant, 11 herbaceous dominant, and four shrub dominant. Sagebrush physically covers 90,950 km 2 , or 4.3%, of our study area, but is present in 883,449 km 2 , or 41.5%, of the mapped portion of our study area.


Introduction
Arid and semiarid rangeland ecosystems comprise one-third of the Earth's terrestrial environment and provide critical ecosystem services to human populations around the world [1,2]. Increasing direct and diffuse anthropogenic usage and threats on these ecosystems create an urgent need to better understand their biodiversity, function, mechanisms of change, biogeochemical cycles, and current U.S. This paper has three specific objectives. First, we report on total quantities and distributions of mapped components with associated validation accuracies. Second, we report U.S. Environmental Protection Agency (EPA) level III ecoregion [41] average component cover and climate associations. Third, we evaluate key fractional component proportions to better quantify and visualize component spatial distributions.

Study Area Description
This paper focuses on a large area of the western U.S. over a region covering parts of 16 states. We mapped the region as a series of 27 smaller mapping units over a 5-year span to accommodate field data collection logistics ( Figure 1). Elevation varies across the region from a low of −86 m to a high of 4421 m. Vegetation varies widely and spans all altitudinal zones including alpine, subalpine, montane, foothills, and plains vegetation communities. In the alpine zone, vegetation communities are often above the tree line and consist of native shrub and perennial forb and grassland communities. Forest and woodland tree communities dominate the subalpine and montane zones, but substantial portions are occupied by herbaceous and shrubland communities, which often occur as an early successional response to disturbance before being ultimately replaced by trees. Shrub and herbaceous communities of the plains and foothills zones of the Great Plains and North American deserts EPA Level I ecoregions [41] constitute the greatest portion of our study area. Shrub vegetation is dominated by native shrub species, with herbaceous vegetation containing both native and exotic perennial and annual forb and grassland areas.

Methods
We quantified nine rangeland ecosystem components, including percent shrub, sagebrush, big sagebrush, herbaceous, annual herbaceous, litter, and bare ground cover, along with sagebrush and shrub heights, at 30 m resolution. This process was completed by independent mapping regions

Methods
We quantified nine rangeland ecosystem components, including percent shrub, sagebrush, big sagebrush, herbaceous, annual herbaceous, litter, and bare ground cover, along with sagebrush and shrub heights, at 30 m resolution. This process was completed by independent mapping regions ( Figure 1), which were subsequently mosaicked into a cohesive regional product. Mapping regions required extensive ground measurements for model training and validation, and two scales of remote sensing data. High-resolution satellite imagery (HRS) commercially available from WorldView-2, WorldView-3, QuickBird, or Pleaides resampled to 2 m nominal resolution provided the first scale for ground level interaction and measurement. Landsat 8 imagery acquired between 2013 and 2017 provided the second scale for landscape modeling. Mapping regions represented relatively similar ecological conditions and were identified by grouping one or more EPA level III ecoregions [41]. Mapping based on regions allowed for the collection of field and HRS data from the correct season of a single year. Also, since regression tree models tend to flatten the dynamic range of each component, dividing mapping into regions tends to limit this effect while exposing the regional regression tree model to more locally pertinent data, which also tends to improve model accuracy and relevancy. The main processing steps are described in Figure 2. Methods covering field measurement, image processing, component modeling, product masking, product validation and product analysis are described in detail below.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 26 ( Figure 1), which were subsequently mosaicked into a cohesive regional product. Mapping regions required extensive ground measurements for model training and validation, and two scales of remote sensing data. High-resolution satellite imagery (HRS) commercially available from WorldView-2, WorldView-3, QuickBird, or Pleaides resampled to 2 m nominal resolution provided the first scale for ground level interaction and measurement. Landsat 8 imagery acquired between 2013 and 2017 provided the second scale for landscape modeling. Mapping regions represented relatively similar ecological conditions and were identified by grouping one or more EPA level III ecoregions [41]. Mapping based on regions allowed for the collection of field and HRS data from the correct season of a single year. Also, since regression tree models tend to flatten the dynamic range of each component, dividing mapping into regions tends to limit this effect while exposing the regional regression tree model to more locally pertinent data, which also tends to improve model accuracy and relevancy. The main processing steps are described in Figure 2. Methods covering field measurement, image processing, component modeling, product masking, product validation and product analysis are described in detail below. Overview of major processing steps and data inputs for characterization of rangeland fractional component cover.

Field Sampling
Our field measurements created the foundation for all component modeling and validation. We targeted field measurements to coincide with HRS observation whenever possible to ensure correspondence in vegetation phenology. We used two field measurement protocols: a validation plot protocol using a randomized design for model assessment (described in the component validation section) and a more dynamic training plot protocol described here. Achieving acceptable model accuracy at the mapping region scale requires adequate distribution and quantity of training plots. To ensure training data collection was logistically feasible, field measurement was done on pre-selected HRS sites, which were located to collectively contain the full range of biophysical, ecological, and climatic conditions across each mapping region. HRS sites were strategically located in each mapping region through analysis of Landsat imagery, elevation, ecoregion boundaries, National Land Cover Database (NLCD) 2011 land cover, road accessibility, and public land access [39]. HRS images from the WorldView-2, WorldView-3, Pleiades-1, or QuickBird sensors were tasked for collection over each sample area at the approximate time the field crew was expected to be doing ground measurement. Hence, field crews typically had the satellite images to assist with analysis and placement of ground sampling. Further, any cloud or shadow cover present in the HRS image could be avoided in the field. If satellite images were not available, we used 1 m National Aerial Image Program (NAIP) imagery only to aid in field sampling. In these cases, HRS imagery was still used for component modeling. HRS sites were typically a single acquired image, approximately 15 by 15 km in size. All HRS imagery was ortho-rectified and resampled to 2 m nominal resolution prior to field data collection.
Initially, in the 2013 mapping region, we sampled predetermined locations on HRS deemed to be ecologically and spectrally critical to mapping with the goal of covering the range of cover values present for each component. This approach was cumbersome and quite slow. So, in later years (2014-2017), we evolved to an approach in which individual plots were dynamically selected for sampling, while still maintaining the goal of covering the range of cover values for each component. Specifically, on each HRS site, we simultaneously examined imagery displayed on a tablet (equipped with Environmental Systems Research Institute ArcMap and sub-meter accuracy global positioning system (GPS) units (Geneq inc. SX Blue II, Anjou, QB, Canada) and ground conditions. Criteria for plot selection included collecting adequate samples to represent the 1) cover histogram of each component, 2) topographic conditions; various elevation, aspect, and slopes, 3) range of management practices (i.e., grazed versus ungrazed, intensity of grazing, etc.), vegetation condition, or disturbances, and 5) the gross range of color and brightness of bare ground (referencing soil color charts when necessary) within each HRS site. Field plots needed to be homogenous over a large enough patch size to find the corresponding pixels on the HRS imagery, and so the vegetation patches measured were at minimum 2 m by 2 m in size. Overall, the sampling approach was dynamic based on the complexity of the sample area, available access, timing of the satellite collection and field sampling logistics. Although this approach required in-the-field judgment on where and how many samples should be collected, it dramatically reduced field sampling time and improved predictions over other more inflexible pre-determined methods used in 2013 by allowing ad hoc selection of plots by field personnel. To maintain consistency and rigor in the sampling procedures, the same field personnel collected data during the entire course of the research. Field personnel were extensively and regularly trained together to improve measurement consistency among personnel, HRS sites, and mapping regions and to calibrate field plot measurements to target components.
HRS plots varied in size and shape depending on the area of the patch being sampled, ranging from one 2 m pixel (4 m 2 ) to 100 2 m pixels (400 m 2 ). The average plot size on HRS imagery was approximately eight 2 m pixels (32 m 2 ), which typically reflected the size of vegetation patch needed to gather reliable measurements. The goal was to make plots as large and homogeneous as possible, with a maximum size of 400 m 2 . Small, single pixel, plots were sometimes necessary to capture more extreme component values. Plots were edited/digitized in the field using ArcMap with component cover or height attributed to each plot on site. At each plot, we recorded an ocular estimate of cover for each component [36], with the total cover of all primary components (shrub, herbaceous, litter, and bare ground) and tree cover (not mapped) in a plot summing to 100%. We found that the potential reduction of accuracy and consistency in ocular estimation of component cover relative to pin drop methods, Daubenmire frames, etc., was more than compensated for in the vastly increased number of data points collected.
The shrub component encompassed numerous species and was discriminated by the presence of woody stems on a plant approximately <6 m tall (Table S1). The sagebrush component encompassed almost all species of sagebrush (Artemisia spp.) including big sagebrush (Artemisia tridentata), low sagebrush (Artemisia arbuscula), black sagebrush (Artemisia nova), three-tip sagebrush (Artemisia triparta), and silver sagebrush (Artemisia cana). However, we strived to exclude prairie sage (Artemisia frigida) and white sagebrush (Artemisia ludoviciana) from the sagebrush component as these suffrutescent shrubs were more spectrally and ecologically similar to herbaceous vegetation and subsequently grouped with the herbaceous component. Distinguishing the big sagebrush subspecies for sagebrush is important as big sagebrush is a critical indicator of sage grouse habit and landscape status/function. The big sagebrush component was dominated by big sagebrush (Artemisia tridentata spp.) but, because of spectral and ecological similarities, may also contain areas of three-tip sagebrush and silver sagebrush. The herbaceous component consisted of all grasses (live and residual standing), forbs, and cacti. The annual herbaceous component included only annual grasses and forbs (based on [42]), which in many portions of the study area were dominated by invasive grass species such as cheatgrass (Bromus tectorum), medusahead (Taeniatherum caput-medusae), red brome (Bromus rubens), or annual mustards such as tumble mustard (Sisymbrium altissimum), and tansy mustard (Descurainia pinnata). Annual herbaceous native species also are present in this component, but their cover is typically insignificant. The litter component included the combined cover of dead standing woody vegetation, detached plant and animal organic matter, and biological soil crusts. The bare ground component included any exposed soil or rocks. The average height of all shrubs and sagebrush in the plot was measured (in cm) using meter sticks on a representative shrub. Though we did not produce a fractional tree canopy cover map, a fractional component map developed by the U.S. Forest Service [43] was considered throughout production. See Table S1 for a complete description of each component.
We also collected additional training data outside of HRS sample areas to supplement the overall training pool. These Landsat-scale training plots (n = 5382) were designed to focus on large landscape features that may have been underrepresented in HRS sample areas, but still needed representation in the training pool ( Figure 1). Using the same plot measurement technique employed with the high-resolution data collection, these plots were located directly on Landsat imagery. Additionally, Assessment, Inventory and Monitoring (AIM) data collected by the Bureau of Land Management (BLM) plots were added in mapping regions where available [44] (Figure 1). AIM data are standardized field observations of rangeland component cover collected to assess the condition and trend of natural resources. Further processing steps were required to ensure AIM data were directly relevant to our other training data, including aggregating the AIM classes into six components (bare ground, annual herbaceous, herbaceous, sagebrush, litter, and shrub). Next, we ensured AIM plots occurred on enough homogenous Landsat pixels to justify inclusion in our training data. Plots located on spectrally heterogeneous areas were deemed too diverse for training samples and were excluded. Following these procedures approximately half (n = 3226) of the available AIM data were included in model development.

Satellite High-Resolution Image Processing and Modeling
Field data samples from within HRS sites were used as training data to predict fractional cover of nine different components on HRS images using a Cubist regression tree algorithm [45] following [38,39] at 2 m resolution. Our default condition choices were 10 committee members, 500 maximum rules and 10% extrapolation. Based on testing numerous settings, these parameters represented the best options to maximize the robustness of the models. All HRS bands were used in predictions applied across the entire HRS footprint for each of the nine components. Once predictions were completed, masking to exclude non-rangeland areas such as urban areas, agriculture, water bodies, and forests was performed. This exclusion process was typically done in three steps. First, an unsupervised classification on the HRS was completed to identify target clusters that represented forested areas for masking exclusion. Next, non-rangeland areas such as urban, water, and agricultural were identified using ancillary data and operator interpretation for exclusion. Finally, cloud or cloud shadow areas and snow were identified manually and excluded. The result was nine separate component predictions, with non-relevant areas excluded, ready for aggregation to 30 m scale model training data.

Landsat-Scale Image Processing and Modeling
For each mapping region, we aggregated the 2 m HRS predictions by averaging within 30 m pixels to serve as training data for regional predictions. Once aggregated to 30 m training pixels, data were filtered through summation of the four primary components (percent bare ground, shrub, herbaceous, and litter) for each pixel. Summation of the primary components should equal 100%, as they did so in all field plots. Deviation in summation from 100% suggests non-rangeland spectral contaminants in the pixel, or one or more inaccurate component predictions, which render the pixel less desirable for training. Hence, only pixels that had summation values that ranged from 90% to 110% were retained for potential training, with over 90% of the training data pool in each region meeting this criterion. Errors that exist in the HRS predictions can be propagated into the regional predictions but are critical to increasing the sample size and to scale plot observations up to Landsat scale. Additionally, HRS predictions form most of our training data pool ( Figure 1). From the potential training pool for each region, 120,000 points were randomly selected as training for each component. Several, non-rangeland dominant ecoregions including the Arizona/New Mexico Mountains, Southern California, San Juan and Sangre de Cristo Mountains, Yellowstone, and Idaho regions were trained using a combination of Landsat-scale observations within the region and HRS predictions, Landsat-scale observations, and AIM data from a~100 km buffer into the surrounding regions.
Training data for each region were stratified into three roughly equal bins of low, medium, and high values defined by the average and standard deviation (SD) of the pool. Values less than the average minus 1 SD were grouped into a low bin, values greater than the average plus 1 SD were grouped into a high bin, and the remaining values were considered in the middle bin. The final component training pool represented an approximately equal ratio of training data from each bin. Binning values helps regression tree models better represent high and low component cover values, spreading the prediction histogram [36,46]. Supplemental Landsat-scale training data collected outside HRS footprints were then added to the final pool of training for each component.
Landsat imagery was the key independent input for our fractional component models. We used U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center Moderate Resolution Imaging Spectroradiometer (eMODIS) weekly Normalized Difference Vegetation Index (NDVI) data [47] to select the ideal dates of Landsat imagery for each region based on periods of best phenological separation among our primary components. For each region, we selected three image dates corresponding to green-up (spring), peak greenness (summer), and leaf off/post-peak (fall). While there is spatial variation in phenology within regions, we 1) strove to reduce this variation by delineating mapping regions based on common ecology, and 2) chose seasonal dates that represent the average phenology of regions. Though 8 day repeat Landsat datasets are available, we chose to maintain our methods as consistently as practical to those developed in 2013 prior to widespread adoption of automated image compositing methods such as Google Earth Engine (GEE).  [50]. Nominal year 2017 LEDAPS imagery was used for the Central California, Northeast California, Grand Canyon, Middle Rockies, and Southwest Tablelands regions. In the Wasatch Mountains, Arizona/New Mexico Mountains, Southern California, San Juan and Sangre de Cristo Mountains, Yellowstone, and Idaho regions, we used GEE to produce composite seasonal images from 2016-2018. All four protocols standardize Landsat imagery to at-satellite reflectance with Albers Equal Area projection. We created seasonal seamless imagery mosaics across Landsat path/rows in each region [39]. All seven Landsat 8 spectral bands (1-7) from each of three seasonal images were applied for the 2013-2015 mapping regions, with six Landsat 8 spectral bands (2-7) applied for the 2016-2017 regions. Band 1 was excluded in the 2016-2017 regions as we found that it provided little additional information relative to our mapping, and occasionally introduced noise. We also generated three spectral indices from each image: Normalized Difference Water Index (NDWI), Normalized Built-up Index (NBDI), and Soil Adjusted Vegetation Index (SAVI).
In addition to Landsat 8 imagery, other ancillary and image data were also key to the regression tree modeling. Ancillary data included slope and aspect products derived from a digital elevation model (DEM) to help stratify the model to appropriate topographical features. We also used the Landsat 8 thermal band (band 10, from the summer date), which functioned in our model as a DEM surrogate sensitive to summer topographic differences, which helped better stratify component elevation responses. Using a DEM product directly in our models produced model artifacts in preliminary predictions, but using the thermal band avoided these artifacts. We also used 30 m rescaled eMODIS data [47] when necessary to complement Landsat data seasonality, especially in regions with more dynamic phenology. Specifically, if regional analysis deemed the Landsat seasonal mosaics insufficiently captured important phenological stages, we created a 30 m Landsat 8 surrogate for the necessary seasonal date using downscaled eMODIS data. For this process, two methods were used based on regional applicability, either a regression tree [45] process [51] or the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) algorithm [52]. Both created "Landsat-like" images in temporally missing portions of the seasonal Landsat record to better discriminate components. Analysis showed that inclusion of these images analogous to Landsat improved regional predictions, especially for herbaceous and annual herbaceous components, as evidenced by more robust regional independent and cross-validation statistics.
In the eastern and northern portions of the Montana region, where abundant herbaceous cover was commonly interspersed with low to moderate shrub cover ( Figure 3, [53]), Landsat spectral data often produced insufficient component accuracies. To improve component discrimination, we developed a shrub/herbaceous index to improve regression tree modeling of the shrub and herbaceous components. This index was calculated as: (shrub − herbaceous cover) (shrub + herbaceous cover) .
The index was applied to our 30 m scale training data, then modeled across the Montana region using Cubist regression trees for subsequent use as an independent variable in shrub and herbaceous component modeling. We found that inclusion of the shrub/herbaceous index in the independent variable stack resulted in a~5% improvement in the accuracy assessment of the shrub and herbaceous cover components in the Montana region.
We mapped components individually with regression models produced by Cubist version 2.08 [45] for each of the nine components using all imagery, spectral indices, and ancillary data layers. Shrub and herbaceous secondary components (sagebrush, big sagebrush, and annual herbaceous components) were restricted to occur only in pixels with predicted values >0% of their respective primary component (i.e., parent component; Table S1). Shrub and sagebrush height were also restricted to occur in pixels with predicted values >0% in their respective parent component. To mitigate relatively rare cloudy areas in the Landsat 8 mosaics, a supplementary model that identified and ignored cloudy areas was completed. This model predicted components from the remaining cloud-free seasons to make component cover estimates in cloudy areas or mosaic artifacts in the original prediction.  To ensure recently burned areas were properly modeled, burned areas were identified using Geospatial Multi-Agency Coordination [54] fire perimeters. Fire events in the mapping year, and previous 4 years, were filled using a prediction generated with the season of imagery corresponding to post-burn conditions. In the Montana and Black Hills regions, data from the Monitoring Trends in Burn Severity fire perimeters after 2009 were also used to recode shrub, sagebrush, and big sagebrush to 0% within burn areas exhibiting high or moderate fire severity. We assumed after a severe fire that remaining live shrubs in the sagebrush ecosystem are rare [55,56], and since using the remote sensing prediction alone to identify remaining live shrubs created unacceptable commission error, setting the prediction to 0 in the year of the fire created less overall error.
Regional fractional cover maps were mosaicked to form a study area-wide component product. Each mapping region shared an overlap boundary with the adjacent region, and a cutline was used within a masked area or along a terrain feature to stitch regional predictions together.

Component Masking
We used two types of component masking to ensure we mapped only on relevant areas; non-rangeland and extent. Non-rangeland component masking removed non-rangeland portions of our study area, such as agricultural, urban, water bodies, and forest. Extent masking was designed to restrict component distribution to areas where the component occurs, which is only necessary for annual herbaceous, sagebrush, and big sagebrush. Extent masking was typically based on thresholds defined by a combination of elevation, aspect, and latitude. Specifically, these maps restricted the distribution of annual herbaceous at high elevation >~2300 m and sagebrush and big sagebrush at high elevation >~2700 m. In the Montana, Black Hills, and Wyoming regions, decision tree classifications were used to separate the mapping regions into occurrence and non-occurrence areas for sagebrush, big sagebrush, and annual herbaceous. Decision tree models were trained on occurrence from field plots and included seasonal Landsat 8 composites and ancillary data (slope and aspect). Final masks derived from the decision tree output were hand edited to ensure occurrence and non-occurrence areas were appropriate. Regions outside the occurrence area for a given component were re-coded to 0%.
For the non-rangeland mask, we identified forest areas using a combination of the NLCD 2016 fractional tree canopy cover product [43] with a threshold greater than 40% and NDVI or Modified Soil-Adjusted Vegetation Index (MSAVI) greenness thresholds. The MSAVI threshold remained relatively constant by mapping region season of imagery. In order to be masked as forest, all three seasons of Landsat image needed to be greater than the MSAVI threshold. Next, urban areas, major roads, snow, and ice were masked according to NLCD 2011 land cover classes [32]. Third, cultivated crop and pasture/hay fields were masked using a combination of the 2013 Cropland Data Layer for the U.S. from the U.S. Department of Agriculture, National Agricultural Statistics Service [57] and the NLCD 2011 agricultural classes, supplemented in certain regions with a combination of NDVI and MSAVI greenness thresholds. Fourth, open water areas in the Landsat mosaics were masked using the Normalized Difference Water Index (NDWI), defined as: where SWIR is the shortwave infrared. A NDWI threshold (held constant among mapping region and season) was used to isolate water pixels, which was combined with the water class from NLCD 2011. Available HRS imagery was also used when available to help calibrate these masking models [39]. The final non-rangeland mask was validated by comparison to Landsat imagery and random samples from high-resolution images on Google Earth. Supplemental hand edits were then applied where issues needed correction.

Component Validation
We validated component predictions using three approaches: independent and cross-validation across the entire study area, and error maps within each mapping area. Independent validation was our primary validation approach, consisting of field measurements of component cover at stratified-random locations. Independent validation data were collected by the same field crews that collected field training samples to minimize human sample bias between training and validation measurements. Cross-validation data were obtained randomly from portions of HRS 30 m training sites withheld from model predictions.
Independent validation point placement used a stratified random design, with two levels of stratified restrictions to simplify logistics of field sampling. The first level of stratification randomly selected 15, 8 km in diameter, sites across each mapping region. First level sites excluded areas less than 30 km away from HRS training sites and other validation sites. The second level stratification randomly placed 6-10 points within each 8 km validation site (total n = 1860 points at n = 227 sites). Only sites on public land, between 100 and 1000 m from the nearest road, and in rangeland vegetation cover within each site were considered. The random points within a site were evenly allocated to three NDVI thresholds from a leaf-on Landsat image (low, medium, and high). Sites with relatively high spatial variance within a 90 m by 90 m patch (3 × 3 Landsat pixels) were excluded to minimize plot-pixel locational error. Using NDVI as a stratum ensured plot locations were distributed across the range of validation site productivity. At each validation point, we measured component cover in a 1 m 2 quadrat every 5 m along a 30 m transect, averaging the seven observations for single plot values.
The secondary validation, cross-validation compared predicted component values to the values of 30 m training data at HRS sites withheld from model development. From each region, we selected 40,000 random pixels for evaluation. Results were proportionally weighted by the mapped area of each region to produce spatially balanced statistics for the entire study area. Validation parameters in both the independent and cross-validation included the coefficient of determination (R 2 ), slope, root mean squared error (RMSE), and normalized RMSE (nRMSE). For the independent validation, we also calculated the 95% confidence limits of linear least-square regression models of field-measured versus predicted component cover, and the distance (in cover percent) between the upper and lower confidence limits.
A third validation approach used regression tree error maps generated in model development to characterize each component's spatial error. This product was designed to assist users in understanding how model uncertainty is distributed spatially across the landscape and provide additional information to help determine the appropriate scale and confidence at which to apply the predictions. Component error maps from the Cubist regression modeling represented the mean absolute error between prediction and field-based estimates, based on rules, and were in the same unit as the prediction (either component percent error or centimeter height error). Error maps were not analyzed in the current study because we used them chiefly as internal reference, but they can be provided with each corresponding component prediction on a region-by-region basis.

Component Analysis
We analyzed the spatial distribution of components using multiple metrics. First, we evaluated the histograms of components across the study area to derive the area (km 2 ) covered by each component, calculating by summing the fractional cover across pixels. We calculated the portion of the mapped area with predictions greater than zero cover/height and the average value by component. Next, we calculated the quartile statistics (1 st quartile, mediation, 3 rd quartile, maximum) for each component to generate box plots, with whiskers set to minimum and maximum values. Second, we determined EPA level III ecoregion average fractional component cover. For each level III ecoregion in the study area (n = 32), we calculated the 1981-2015 average water year (October-September) precipitation, maximum temperature, and minimum temperature using Daymet climate data [58]. Third, key component proportions were derived, including (1) shrub cover proportion of total rangeland vegetation (shrub plus herbaceous) cover, (2) annual herbaceous proportion of herbaceous cover, (3) big sagebrush proportion of sagebrush cover, and (4) other sagebrush (sagebrush minus big sagebrush) proportion of sagebrush cover.

Component Predictions
A total geography of 2,993,655 km 2 was assessed-of which, 863,836 km 2 was in an exclusion mask, leaving 2,129,819 km 2 (71% of total study area) with the presence of at least one of the seven canopy components (i.e., mapped). When fractional predictions are converted to total area covered, bare ground occupies 968,792 km 2 , herbaceous 490,517 km 2 , litter 335,688 km 2 , shrub 324,247 km 2 , sagebrush 90,949 km 2 , annual herbaceous 89,165 km 2 , and big sagebrush 63,838 km 2 of the study area ( Table 1). The balance of the mapped area, 10,575 km 2 , consists of tree canopy cover in a rangeland environment. Of the four primary components, shrub is present on 95.1% of the pixels, herbaceous on 99.5%, litter on 99.7% and bare ground on 99.9% of pixels. Sagebrush occurs on 41.5% of study area pixels. Annual herbaceous is present on the lowest proportion of pixels (30.1%) ( Table 1).

Spatial Patterns
Training from 331 HRS sites was integrated into mapping the entire study area (Figure 1). Four commercial satellite sensors were tasked to collect imagery over HRS sites including 5 images from QuickBird, 96 from WorldView-2, 191 from WorldView-3, and 39 from Pleaides. For modeling components at the HRS level, 31,321 field training plots were collected across all HRS sites, averaging 95 plots per site. For the Landsat-scale models, an additional 3226 plots from BLM AIM data and 5382 Landsat field plots complemented HRS predictions (Figure 1). Primary ( Figure 3) and secondary ( Figure 4) component maps follow expected ecological and biophysical patterns (e.g., shrub cover declines from west to east [19], sagebrush extent matches previous maps [59]). Shrub cover for example, is positively correlated with average water year precipitation (R 2 = 0.14, p < 0.05), while bare ground is negatively correlated (R 2 = 0.26, p < 0.05). Patterns related to disturbance history, land use, and topography are also evident. Fractional cover relationships among pixels are as expected. The fractional cover of herbaceous, for example, is strongly negatively related to bare ground (R 2 = 0.57, p < 0.05) and shrub height is positively associated with shrub cover (R 2 = 0.76, p < 0.05). All components, except bare ground and litter, have a distribution that is skewed to the right ( Figure 5). Bare ground has a near normal distribution of fractional cover values, and litter cover is skewed slightly to the left. The median value of sagebrush, big sagebrush and annual herbaceous is 0% (Figures 4 and 5).

Ecoregion Averages
Fractional component composition varies widely when summarized by EPA level III ecoregion ( Figure 6, Figure S1 Conversely, shrub cover averages ~10% in eastern grassland ecoregions. Sagebrush cover patterns are more complex than shrub overall, and weakly related to precipitation. Instead sagebrush cover patterns are primarily driven by temperature, with a strong, negative, relationship with yearly averaged minimum temperature (R 2 = 0.40, p < 0.05). Bare ground has the greatest fractional cover among primary components in 17, herbaceous in 11, and shrub in four of the 32 EPA level III ecoregions ( Figure 6).  Conversely, shrub cover averages~10% in eastern grassland ecoregions. Sagebrush cover patterns are more complex than shrub overall, and weakly related to precipitation. Instead sagebrush cover patterns are primarily driven by temperature, with a strong, negative, relationship with yearly averaged minimum temperature (R 2 = 0.40, p < 0.05). Bare ground has the greatest fractional cover among primary components in 17, herbaceous in 11, and shrub in four of the 32 EPA level III ecoregions ( Figure 6).  Figure S1 for locations of level III ecoregions.

Component Proportions
Spatial patterns of the secondary components annual herbaceous and big sagebrush are more readily interpreted when viewed as a proportion of their primary component (Figure 7). Shrub cover comprises an average of 40% of total rangeland vegetation cover (shrub plus herbaceous cover) across the study area ( Figure 7A). Shrub cover proportion tends to be greatest in the more arid parts of the study area, especially in the Mojave and Sonoran Deserts and Central Basin and Range, and in the California chaparral. Annual herbaceous vegetation is present in 30.1% of the study area, with an average cover of 4.2%, comprising 18.1% of total herbaceous cover ( Figure 7B). Annual herbaceous proportion of total herbaceous cover is greatest west of the Sierras in California. Portions of the Mojave and Sonoran Deserts, Snake River Plain, Wasatch Mountains, and Central Basin and Range also have high annual herbaceous proportions.
Sagebrush cover over the study area averages 4.3%, big sagebrush 2.9%, and other sagebrush 1.4% In terms of spatial extent, sagebrush, big sagebrush, and other sagebrush occurs on 41.5%, 35.7%, and 28.8% of the study area, respectively. Within their respective ranges, average sagebrush, big sagebrush, and other sagebrush cover is 10.3%, 8.4%, and 4.4%, respectively. Of pixels with sagebrush present, big sagebrush and other sagebrush co-exist on 55.9% of pixels. Big sagebrush is the predominate sagebrush type in most of the study area ( Figure 7C). Other sagebrush is most abundant in the margins of the sagebrush range, especially in the Northwestern Great Plains and Southwest Tablelands ecoregions ( Figure 7D). Sagebrush and annual herbaceous cover are nested (i.e., secondary) to shrub cover and herbaceous cover, respectively, shown as boxes within their primary component. Big sagebrush is nested within sagebrush and is shown as the hatched box within the secondary sagebrush box. Primary components plus tree cover are designed to sum to 100% Ecoregions are listed in order of decreasing mapped area from left to right. Secondary y-axis shows average water year precipitation (WYPRCP) within the mapped portion of the ecoregion, indicated by black points. See Figure S1 for locations of level III ecoregions.

Component Proportions
Spatial patterns of the secondary components annual herbaceous and big sagebrush are more readily interpreted when viewed as a proportion of their primary component (Figure 7). Shrub cover comprises an average of 40% of total rangeland vegetation cover (shrub plus herbaceous cover) across the study area ( Figure 7A). Shrub cover proportion tends to be greatest in the more arid parts of the study area, especially in the Mojave and Sonoran Deserts and Central Basin and Range, and in the California chaparral. Annual herbaceous vegetation is present in 30.1% of the study area, with an average cover of 4.2%, comprising 18.1% of total herbaceous cover ( Figure 7B). Annual herbaceous proportion of total herbaceous cover is greatest west of the Sierras in California. Portions of the Mojave and Sonoran Deserts, Snake River Plain, Wasatch Mountains, and Central Basin and Range also have high annual herbaceous proportions.  Sagebrush cover over the study area averages 4.3%, big sagebrush 2.9%, and other sagebrush 1.4% In terms of spatial extent, sagebrush, big sagebrush, and other sagebrush occurs on 41.5%, 35.7%, and 28.8% of the study area, respectively. Within their respective ranges, average sagebrush, big sagebrush, and other sagebrush cover is 10.3%, 8.4%, and 4.4%, respectively. Of pixels with sagebrush present, big sagebrush and other sagebrush co-exist on 55.9% of pixels. Big sagebrush is the predominate sagebrush type in most of the study area ( Figure 7C). Other sagebrush is most abundant in the margins of the sagebrush range, especially in the Northwestern Great Plains and Southwest Tablelands ecoregions ( Figure 7D).

Component Accuracy
We conducted accuracy assessments of each component prediction using both independent and cross-validation approaches. Table 2a and Figure 8 demonstrate the independent validation results, pooling across regions. Across the cover components, R 2 values averaged 0.46, while the average RMSE was 10.37 and the nRMSE was 0.12. Since the range and skewness vary among components ( Figure 5), R 2 and RMSE need to be considered together in interpreting model performance. Slope is another critical metric; in our case a slope near 1 is preferred as it demonstrates a more unbiased model. Considering all metrics, herbaceous cover and bare ground tended to be the best performers (R 2 = 0.67 and 0.70; slope = 0.61 and 0.73, respectively) while shrub height and sagebrush height results had lower accuracies (R 2 = 0.19 and 0.24; slope = 0.29 and 0.31, respectively). The width of 95% regression confidence intervals is somewhat related to component histograms, as inferred by box plots (Figure 5), where proximity to the mean component value is related to tighter confidence limits ( Figure 8f). Conversely, less common values tend to have broader confidence limits. From a broad scale, however, regression confidence limits do not meaningfully vary across the range of component values. Table 2. Validation results of component predictions compared to (a) independent field-measured observations (n = 1860) and (b) cross-validation 30 m training data (n = 840,000) used in model development. Cross-validation statistics were area-weighted by mapping region. Units for average, max and range are in percent, except for shrub and sagebrush height (ht), which are in centimeters.

Component Accuracy
Our mapping approach focused on combining extensive field data (Figure 1), HRS imagery, robust modeling, Landsat imagery from critical phenological periods, and relevant ancillary data into a fractional product designed to retain as much local relevancy as possible, while still being regionally consistent. Validation accuracy results were generally equal to or greater than our previous mapping For the cross-validation assessment (Table 2b), we area-weighted 840,000 data points across all 21 regions according to the proportional contribution of each region to the total mapped area. Patterns were like those of independent validation in terms of relative performance of components and regions, though the accuracies were substantially greater than the independent validation. Like the independent validation, herbaceous cover and bare ground tended to be the best performers in cross-validation (R 2 = 0.79 and 0.85; RMSE = 6.3 and 8.0, respectively) while shrub height and sagebrush height results had lower accuracies (R 2 = 0.62 and 0.59; RMSE = 17.8 and 7.8 respectively). Across cover components, the average R 2 value was 0.69, and the RMSE was 6.8.

Component Accuracy
Our mapping approach focused on combining extensive field data (Figure 1), HRS imagery, robust modeling, Landsat imagery from critical phenological periods, and relevant ancillary data into a fractional product designed to retain as much local relevancy as possible, while still being regionally consistent. Validation accuracy results were generally equal to or greater than our previous mapping work [36,39], despite covering much larger geographies. For example, similar fractional component mapping in Wyoming [36] across the five main components (shrub, sagebrush, herbaceous, bare ground, and litter) averaged an R 2 of 0.29. Our independent validation results for the same components have an average R 2 of 0.54 for Wyoming and R 2 of 0.50 across the entire study area (Table 2a). Similarly, Xian et al. [39] demonstrated an average independent validation accuracy of R 2 of 0.38 across the same five components, again less robust than found in the present study. However, Xian et al., [39] reported lower RMSE results with an average of 7.49 versus the 10.6 reported here (Table 2a), likely attributed to a larger dynamic range of values in our study area. Relative to independent validation results of fractional component maps developed across the western U.S. by Jones et al. [40] our results show higher accuracy. Of the four components (shrub, bare ground, perennial herbaceous, and annual herbaceous) with independent accuracy reported by Jones et al. [40] the mean R 2 was 0.22, verus 0.58 for our equivalent products. Predicted values near the mean of each component tend to be more accurate, but variation in accuracy is minimal across the range of each component (Figure 8). Cross-validation accuracies of our components were substantially better than independent validation results, averaging an R 2 of 0.69 and an RMSE of 6.8 across the seven cover components (Table 2b). Our cross-validation results also represent a substantial improvement over the same five components reported in Xian et al. [39].
In judging the product, users should consider both independent and cross-validation accuracy together, since the combination of both provides a more realistic understanding of the performance the user can expect. Cross-validation statistics approximately represent the "best-case" component accuracy, while independent statistics reflect more typical accuracy. One must consider the spatial rate of decay in mapping accuracy between training sites and independent validation sites in interpreting overall accuracy. In other words, consider that the spectral and ecological similarity of a pixel decreases, on average, with distance from training pixels, which /'is related to decreasing environmental similarity with distance [60] in factors such as climate, land use, geology, and species assemblies. We strived to limit spatial decay in accuracy by broadly distributing training data in ecologically and spectrally diverse locations, and by placing Landsat-scale training points in critical locations (Figure 1). The spatial rate of decay tends to be proportionally greatest on shrub, big sagebrush, shrub height, and sagebrush height, which are the most difficult targets to map. Errors do exist in our fractional components as rangelands pose numerous difficulties to mapping, resulting from their high degree of spatial and temporal variation, high amount of bare soil and senescent vegetation [34], and variation in the physical and chemical properties of soils. Errors related to modeling are also present, chief among these is the tendency of component predictions to have a flattened histogram relative to training data.
Height components tend to be the most prone to error ( Table 2, [39]), and so users should proceed with caution. Because multispectral data often contain insufficient information to predict canopy height, many researchers have augmented spectral data with light detection and ranging (lidar) data to increase accuracy (e.g., [61]). Lidar data are not yet available at the scale of our study area: even so, we retain our shrub and sagebrush height predictions as they are critical in many rangeland applications such as fire fuel modeling and determining and modeling sage grouse habitat [28,62].
Our mapping region approach was important to localize the prediction and allow customization of each regional model to local conditions. Edge matching of adjacent mapping region was typically quite seamless, suggesting robust modeling that was independently replicating the same component cover in adjacent regions with different models. Methodological improvements did occur as regions were completed. We completed western regions first and then made additional methodological improvements as new issues were encountered in eastern regions. Our advancements relative to Xian et al. [39] included (1) the use of improved handheld tablets in the field to display high-resolution imagery for better plot location and interpretation, (2) sub-meter accuracy GPS units to increase field locational positioning, (3) collecting more plot data on HRS imagery, (4) increasing ground plot supplementation of Landsat-scale training data, and (5) increased experience and consistency of field collection and modeling staff. Regional independent and cross-validation statistics revealed that these improvements yielded higher accuracy numbers even though these regions were generally a more difficult mapping environment primarily because of the higher herbaceous cover which often confounded the spectral signature of shrubs.

Component Distributions
Component prediction abundance in general follows latitude, longitude, precipitation, and elevation trends (Figures 3 and 4). Northern, eastern and high elevation ecoregions of our study area generally represent the more mesic portions of the landscape ( Figure 6). As moisture availability increases in these mesic areas, higher proportions of shrub, herbaceous and litter and lower proportions of bare ground result (Figure 3, Figure 6). Herbaceous vegetation comprises 60% of total rangeland vegetation cover in the study area, with 18.1% of that being annual herbaceous vegetation. The highest proportions of annual herbaceous occur in the western portion of the region, especially in California where invasive annual grasses have supplanted native perennial bunch grasses [9,63]. Significant concentrations of annual herbaceous cover also occur in the Central and Northern Basin and Range, Blue Mountains and Columbia Plateau ecoregions (Figures 4 and 7), where a positive feedback between cheatgrass and fire has altered sagebrush steppe [24,25].
Shrub vegetation comprises 40% of total vegetation cover in the study area, averaging 15.2% cover (Figures 3 and 5). Shrub cover is greatest in the mountain ecoregions, where it is mainly composed of non-sagebrush shrubs such as manzanita (Arctostaphylos spp,), mountain mahogany (Cercocarpus spp.), shrub maples (e.g., Acer grandidentatum), and shrub oaks (e.g., Quercus gambelii). Sagebrush occurs on 41.5% of pixels, mostly across the north central two-thirds of the study area ( Figure 4). Big sagebrush habitats, critical to many species including sage grouse [26], though reduced from its historical range [15,29,59] is still widespread ( Figure 7C). Other sagebrush cover, particularly silver sagebrush, is also regionally important sage grouse habitat [64]. The concentration of other sagebrush in the Northwestern Great Plains ecoregion is composed mainly of silver sage, while the concentration in the Southwest Tablelands ecoregion is predominately sand sagebrush (Artemisia filifolia) ( Figure 7D). Our near range-wide distribution map of sagebrush is similar to that of previously published maps (e.g., [27,65,66]). The additional information presented on our fractional sagebrush cover map versus presence/absence distribution is invaluable to resource managers [29,30].
Litter covers 15.8% of the study area and is most abundant in more mesic areas as a result if th occurrence of more dead plant material in general (Figure 3). More explicitly, litter cover is strongly related to herbaceous cover (R 2 = 0.24, p< 0.05). Bare ground occupies the greatest amount of the study area, with an average of 45.5% cover ( Figure 5) with the highest proportions in the southern xeric parts of our study area, especially in the Mojave and Sonoran Basin and Range (Figures 3 and 6). Bare ground is the component with the highest mapping accuracy (Table 2a,b); this feature combined with its usefulness as an important indicator of vegetative abundance and rangeland health make it especially useful [67,68]. Relative to an analysis completed by Karl et al. [69] on rangelands managed by the BLM, our study indicated higher amounts of bare ground (Figures 3 and 5). This difference may be attributed to a variety of factors such as different methods of field data collection, different ground targets included as bare ground, and relatively lower sample size in Karl et al. [69]. Additionally, our models tend to over-predict bare ground at the low end of its distribution.

Next-Generation Work
Accurate characterization of the rangeland landscape is essential to successful understanding of how to manage, monitor and study rangeland processes. Our fractional components provide flexible and accurate remote-sensing-derived products that offer flexible application utility and further enable rangeland monitoring [36]. With this characterization baseline now established, next-generation work is quantifying component change back in time to 1984 using the Landsat archive and analyzing change trends across time [70,71]. Further work is developing site potential and accompanying departure scores to understand where current component predictions have deviated from the expected component amounts based on site condition [72]. Finally, future annual monitoring is underway to quantify how these components are changing. Understanding changing trend rates on the landscape and associated change drivers of those trends will ultimately allow prediction of future change events showing where the landscape is most vulnerable. We also strive to improve fractional component mapping accuracy of all components, particularly shrub and sagebrush height and big sagebrush cover. We are currently testing the usage of 20 m Sentinel imagery with several red edge bands to improve mapping accuracy. Integrating training data from lidar sources could improve the accuracy of height components [61].

Conclusions
Our approach for inventorying and quantifying western U.S. rangeland ecosystems as fractional components with remote sensing provides a new way to characterize and monitor these changing regions. Nine rangeland ecosystem components, including percent shrub, sagebrush, big sagebrush, herbaceous, annual herbaceous, litter, and bare ground cover, along with sagebrush and shrub heights in centimeters, were quantified at 30 m resolution across most of the western U.S. using independent mapping regions. Component prediction abundance in general followed latitude, longitude and elevation trends, with the more mesic northern, eastern and high elevation regions of our study area having significantly more shrub, herbaceous and litter amounts, and much less bare ground. Validation accuracy results were generally equal to and in some cases greater than our previous mapping work despite covering much larger geographies. Component composition strongly varies by EPA level III ecoregions, where most ecoregions are bare ground dominant, a minority are herbaceous dominant, and only one is shrub dominant. Sagebrush physically covers 90,950 km 2 , or 4.3%, of our study area, but is present in 883,449 km 2 , or 41.5%, of the non-masked area of our study area, underscoring its widespread distribution. Our near range-wide distribution map of sagebrush is like that of previously published maps. Component products detailed in this paper can be downloaded from www.mrlc.gov.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-4292/12/3/412/s1, Figure S1. Level III ecoregions boundaries across the study area and Table S1. Description of the fractional cover and height components used to characterize western U.S. rangelands.