Estimating Wheat Grain Yield Using Sentinel-2 Imagery and Exploring Topographic Features and Rainfall Effects on Wheat Performance in Navarre, Spain

Reliable methods for estimating wheat grain yield before harvest could help improve farm management and, if applied on a regional level, also help identify spatial factors that influence yield. Regional grain yield can be estimated using conventional methods, but the typical process is complex and labor-intensive. Here we describe the development of a streamlined approach using publicly accessible agricultural data, field-level yield, and remote sensing data from Sentinel-2 satellite to estimate regional wheat grain yield. We validated our method on wheat croplands in Navarre in northern Spain, which features heterogeneous topography and rainfall. First, this study developed stepwise multilinear equations to estimate grain yield based on various vegetation indices, which were measured at various phenological stages in order to determine the optimal timings. Second, the most suitable model was used to estimate grain yield in wheat parcels mapped from Sentinel-2 satellite images. We used a supervised pixel-based random forest classification and the estimates were compared to government-published post-harvest yield statistics. When tested, the model achieved an R2 of 0.83 in predicting grain yield at field level. The wheat parcels were mapped with an accuracy close to 86% for both overall accuracy and compared to official statistics. Third, the validated model was used to explore potential relationships of the mapped per-parcel grain yield estimation with topographic features and rainfall by using geographically weighted regressions. Topographic features and rainfall together accounted for an average for 11 to 20% of the observed spatial variation in grain yield in Navarre. These results highlight the ability of our method for estimating wheat grain yield before harvest and determining spatial factors that influence yield at the regional scale.


Introduction
Bread wheat (Triticum aestivum L.) is an important grain in Spain, representing 3% of the total of European production [1]. The majority of wheat production is located in central and southern Spain, which corresponds to the Castille and Andalusia regions. Navarre's wheat croplands in northern Spain, where this paper focuses, also have significant production. Furthermore, Navarre hosts several Spanish agroclimates and a wide range of water regimes in a relatively compact spatial area due to its topographical heterogeneity, which can serve as a good proxy for helping to understand wheat performance across all Spanish agrosystems. Estimating wheat grain yield, mapping wheat crop lands, and studying spatial interactions that affect grain yields can be of great help for farmers and agricultural institutions. This is especially relevant in the European Union, as the Common Agricultural Policy (CAP) subsidies requirements are linked to cropland uses and member states have to verify the declared field area, the sowed crop type and harvest [2].
Regarding wheat grain yield, there are two main ways to estimate it that take advantage of remote sensing data; on the one hand empirical models based on actual field data and spectral-based vegetation indices, which have been successfully used with Sentinel-2 data and wheat grain yield estimation [3,4], and on the other hand growth models [5] involving physiological parameter estimations based on, for instance, radiative transfer models. The availability of data [6] and computer processing requirements needed for crop growth modelling make empirical models more versatile to use in practice for limited datasets, which is the case of this study. Furthermore, in order to scale field-level grain yield predictions regionally, crop type mapping is essential as it can target the fields on which to apply the grain yield prediction models. Both field-level classifications and total wheat cropland area estimations are of great interest for improving agricultural management.
Wheat grain yield estimation at the field level before harvest plays an important role in easing farmers' management challenges and livelihoods. Studying field grain yield at the regional scale also allows for analysis of the most suitable sites for wheat cropland growth by exploring which spatial factors affect crop performance. Moreover, it can potentially be useful to estimate local and regional taxes and subsidies on agricultural production. More often than not, estimating grain yield at field level has been laborious and complex, also regarding the available remote sensing technologies in temporal frequency and spatial and spectral resolution. One challenging factor is the availability of the information regarding field-level agricultural yields, which may be considered sensitive information and hard to obtain due to market interests and other socioeconomic factors, as well as due to the limited ground measures capacity of researchers. This can jeopardize the training and validation dataset availability for crop yield models at a field level in actual agricultural contexts.
Moreover, besides crop growing conditions, such models depend on the local geomorphological contexts (i.e., topographic features) [7]. Defourny et al. [8] also stated that a significant agro-climatic gradient spanning over a large territory gradually shifts the cropping calendar of most crops, as well as the crop type distribution, and accordingly affects crop performances. Thus, regional specific approaches need to be considered when analyzing field-level performances, such as in the study here presented, where phenology is matched by combining different satellite dates for each separate ecozone in a geographically diverse region that may otherwise be captured by a single satellite scene. Several global and country/region-level initiatives assessing crop yields have been developed [9][10][11][12][13][14][15]. However, there is a reduced number of studies working with accurate field-level crop yield estimates [16] due to the difficult accessibility of actual crop yield data and its uncommon combination with field-level crop type classifications, both of which are very dependent on ground validation measurements.
Regarding the available remote sensing technologies for estimating grain yield, the launches of Sentinel-2 a and b satellites brought significant improvements towards functional wheat grain yield estimation, thanks to its improved temporal, spectral, and spatial resolution and open accessibility. Several studies showed Sentinel-2 data capacity for building crop yield models in wheat [4,[17][18][19], as well as crop yields prediction in other crops such as corn [20,21] or sorghum [16,22]. Nonetheless, despite the promising results, such models have been trained and validated based on one single or, at most, a dozen fields in one or two-season periods [4,18,20,21], with clearly constrained data availability and limited regional representativeness or adaptation to geographic variability. It is also common that governments only make crop yields at the regional or district level publicly available [17,19], resulting in limitations for training field-level grain yield estimation models. Various models have been trained and validated with partial in-field grain yield measurements (i.e., partial field crop cuts) [16,22], which are limited regarding pixel errors (reduced harvested area vs. satellite The region has an advanced agricultural intensification with high agronomical standards in terms of cultivation (e.g., certified seeds and widespread access to fertilizers). The regional threeseason crop rotation guidelines for bread wheat are fallow, followed by fava beans, oat or sunflower, and bread wheat again in the following third season. Rotation practices are widely encouraged by agrarian institutions. Wheat is sown between the middle to the end of October depending on the agrarian region within Navarre, harvesting happens between the end of June and the middle of July. Grains are mainly commercialized to markets through various agrarian cooperatives distributed throughout the region.  The region has an advanced agricultural intensification with high agronomical standards in terms of cultivation (e.g., certified seeds and widespread access to fertilizers). The regional three-season crop rotation guidelines for bread wheat are fallow, followed by fava beans, oat or sunflower, and bread wheat again in the following third season. Rotation practices are widely encouraged by agrarian institutions. Wheat is sown between the middle to the end of October depending on the agrarian region within Navarre, harvesting happens between the end of June and the middle of July. Grains are mainly commercialized to markets through various agrarian cooperatives distributed throughout the region. follow the phenological stage of the crop and report any specific growth-related issues. As reported by the farmers the analyzed fields received a basal fertilization with either superphosphate 45% or pig slurry, top-dressing fertilization was also applied with granulated urea at the majority of fields. In the Montaña region wheat was sown around the 20th October, in the Media region around the 28th October, and in the Ribera Alta around the 30th October. The harvest was collected around the middle of July in Montaña and during the first week of July in Media and Ribera Alta. The grain yield (Kg·ha −1 ) per field was reported to the researchers by the participating farmers and standardized in terms of humidity.

Meteorological Data
Temperature data, in order to estimate the crop phenological stages through the calculation of growing degree days (GDD) in zones II, V, and VI, was obtained from 30 openly accessible regional government meteorological stations (www.meteo.navarra.es/estaciones), shown in Figure 2. Rainfall data was also obtained from the same 30 meteorological stations distributed throughout the three agricultural zones under study. The GDDs for each of the two varieties for rain-fed management in Navarre is shown in Table 1 [35].
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 26 Figure 2. Meteorological station locations from which temperature and rainfall data was obtained for the three zones (II, V and VI). UTM coordinates are expressed in meters.

Sentinel-2 Imagery and Phenology
The specific dates of Sentinel-2 images for 2018 were selected as those closest to the estimated phenological date (regarding GDD calculations) per zone as detailed in Table 2 and with the minimal cloud cover thresholds; the specific Sentinel-2 images used in this study are shown in Table 3. The Sentinel-2 a + b satellite constellation's combined global coverage every 5 days enables this level of phenology matching for the first time ever since its second satellite was launched in 2017.    Following average minimum and maximum mean temperatures of the 30 stations at the three zones, the accumulated GDD was calculated (Equation (1)) following calculations by Arnold [37] to estimate zonal phenological dates for 2018 and 2019 seasons. The reported dates that adjust to the phenological stage regarding GDD (Table 1) are reported in Table 2; as Camargo and Marcopolo had slightly different GDD for each stage, the average of both was considered to estimate the zonal phenological date.
where GDD is the growing degree days, nov jun indicates the sum throughout the season, November to June, of daily maximum and minimum temperatures (Tmax and Tmin) divided by 2, minus the base temperature, which in this case was considered to be 0 • C.  The specific dates of Sentinel-2 images for 2018 were selected as those closest to the estimated phenological date (regarding GDD calculations) per zone as detailed in Table 2 and with the minimal cloud cover thresholds; the specific Sentinel-2 images used in this study are shown in Table 3. The Sentinel-2 a + b satellite constellation's combined global coverage every 5 days enables this level of phenology matching for the first time ever since its second satellite was launched in 2017. Sentinel-2 multispectral bands (Table 4) were downloaded from Copernicus Open Access Hub (https://scihub.copernicus.eu/). Despite the high 5-day temporal resolution of Sentinel-2 a + b constellation, some images were discarded due to the image-wide cloud cover areas exceeding 40%. Four Sentinel-2 tiles cover the region under study. The images were downloaded for the 2018 (15-03, 30-03, 14-04, 24-04, 09-05, 19-05, 24-05, 03-06, 18-06 and 23-06) and 2019 (15-03, 20-03, 25-03, 30-03, 09-04, 29-04, 09-05, 14-05, 08-06, 18-06 and 28-06) seasons. Images were downloaded as an L1C product and were corrected to level 2A using the Sen2Cor tool on SNAP (Sentinel Application Platform), obtaining Bottom-Of-Atmosphere (BOA) and cirrus corrected reflectance images. Moreover, a cloud and cloud shadow mask were also applied with this tool.
From the Sentinel-2 multispectral data bands (Table 4), the following spectral reflectance vegetation indices (VIs), which are simple mathematic combinations of spectral bands, were calculated: the chlorophyll index green (CI green) and the chlorophyll index red edge (CI red edge), related to chlorophyll content [38]; the normalized difference water index (NDWI), related to water content [39]; the modified soil-adjusted vegetation index (MSAVI) [40] and the optimized soil adjusted vegetation index (OSAVI) [41], associated with vegetation cover; and the renormalized difference vegetation index Remote Sens. 2020, 12, 2278 7 of 24 (RDVI) [42], the ratio vegetation index (RVI) [43], the modified simple ration (MSR) [44], the normalized difference vegetation index (NDVI) [45], and the green normalized difference vegetation index (GNDVI) [46] sensitive to biomass (Table 5). Table 4. Spectral bands and spatial resolutions of the Sentinel-2 Multispectral Instrument (MSI). Novelty in spectral coverage includes three red-edge spectral bands and the vegetation red-edge (RE) at 20 m as well as improved SWIR (Short-Wave InfraRed) coverage at 20 and 60 m spatial resolutions. Broadband spectral coverage of the visible and near infrared are provided at 10 m spatial resolution.

MSI
The agricultural parcels delineations were obtained from Sistema de Información Geográfica de Parcelas Agrícolas (SIGPAC), a Spanish governmental application (http://sigpac.mapa.gob.es/) that allows identifying specific declared agricultural fields subjected to Common Agricultural Policy (CAP) subsidies. GPS coordinates were recorded to delineate the 39 fields used for training and validating the grain yield model. Crop type points were coordinated in fields throughout the three zones ( Figure 3) by obtaining GPS coordinates from each of the classified types (wheat, bare soil, and other crops). The whole regional agricultural parcels polygons were downloaded for the attributes herbaceous crops and rain-fed for the three agrarian areas under study within Navarre. Furthermore, fields were 10-m-buffered with the ArcGis Pro 2.3.0 buffer tool in order to reduce edge effects. Due to their better spatial resolution (10 m) and contrasting capacities in comparison with its other bands, Sentinel-2 B3 (green), B4 (red), and B8 (near infrared) bands were used for the crop type classification. Crop classifications [26,51] and field boundary delimitation [52] were successfully mapped using B3-B4-B8 images in previous studies with Sentinel-2 data. The trained datasets were divided as follows, 2/3 for training the model and 1/3 for validation. The number of trees was 500 [53]. RF classification and object filters were computed with ArcGis Pro 2.3.0. We applied majority voting of the classified pixels within the agricultural parcels (SIGPAC), as field crops are exclusively grown in monocultures in the region of Navarre, which we confirmed during farmer visits. The pixel's majority from the zonal statistics tool was then used to object filter the classification. With the validation data and the output of classification, a confusion matrix was computed at parcel level in order to obtain classification accuracies. At parcel level, the Fscore of each class was calculated with the Equation (2), where PA refers to producer's accuracy while UA refers to user's accuracy.
The most suitable multilinear stepwise regression model was extended into all the classified parcels of bread wheat in order to estimate grain yield per field for the three agricultural sub-regions. The resulting wheat cropland and grain yield average per zone were compared to the Navarre regional government's official statistics (www.navarra.es).

Grain Yield Estimation
At each of the three studied phenological stages (tillering, heading, and ripening), the previously described vegetation indices were calculated per field on ArcGis Pro 2.3.0 using the downloaded Sentinel-2 imagery. Four different statistical summary extractions of each vegetation index (maximum, minimum, mean, and median) were obtained for each field at each different phenological stage, considering the regional differences outlined in Tables 2 and 3. Mean and median were calculated with the closest available images to the phenological stage using zonal statistics, while minimum and maximum were respectively the averages of the lowest and highest pixel values between the two closest dates to the phenological stage. These calculations were performed on ArcGis Pro 2.3.0 and Microsoft Excel. First, the Raster to Point and Spatial Join tools on ArcGis were used in order to list, per field, all the pixel values; second, with the corresponding attribute tables the maximum and minimum vegetation indices pixel values between the dates were calculated with Microsoft Excel. Finally, the maximum and minimum were calculated on the points' attribute table were converted into two rasters in ArcGis in order to perform zonal statistics and obtain maximum and minimum averages per field over the specific periods. Pearson regressions against grain yield were calculated on R studio ("cor.test") for each phenological stage and summary (min, max, mean, and median). Following this, in order to find the combination of the best vegetation indices and statistical summary to estimate grain yield, multilinear stepwise regressions were studied at the most suitable phenological stage in R Studio (library "MASS"). The selected best model was chosen on the basis of Akaike Information Criterion, computed with the stepAIC function from the MASS library. The normal distribution of residuals and collinearity of the independent variables were studied with the library "olsrr", the variance inflator factor (VIF) was measured as it is an efficient method to asses collinearity [47][48][49], which strongly limits stepwise selection method. Furthermore, the residuals normality Shapiro-Wilk test was calculated.
Remote Sens. 2020, 12, 2278 9 of 24 The 39 monitored fields were separated into two groups, one for training and the other one for validation of the multilinear stepwise equations. The 20 fields from 2018 season were used for modeling and the 19 fields from 2019 were used for validation. For the 2019 season, only one phenological stage, heading, was eventually analyzed, as it appeared to be the most suitable for grain yield estimation (Table 7), with the following images: Montaña, 09-05 and 14-05; Media, 14-05 and 08-06; and Ribera Alta, 29-04 and 09-05.

Crop Type Classification and Regional Per-Parcel Grain Yield Estimation
We performed a pixel-based supervised random forest (RF) classification to differentiate wheat fields from fields of other crops. The classified area was stratified in three zones: II, V, and VI, corresponding to three of Navarre's agricultural zones shown in Figure 1. Each one located in Montaña, Media, and Ribera Alta areas, respectively. Following Colditz [50] and Noi and Kappas [27], which state that for RF classifications the training sample should correspond to around 0.25% of the total area, 513 points of parcels of the ground dataset in Zone II (33,345 pixels), 927 in Zone V (58,401 pixels), and 1050 in Zone VI (60900 pixels) formed the training dataset. Figure 3 indicates the points of the parcels from which the pixels from the various analyzed classes were obtained. The points collected were classified into three classes: wheat, bare soil, and other crops. As mentioned before, the target classification areas were delimitated with the available rain-fed and herbaceous crops polygons mask from SIGPAC.
Considering that the majority of cropland under study (83%) is either bread wheat or barley, we took advantage of the later ripening of bread wheat in order to differentiate it from other crops. After considering both preliminary results and phenological characteristics, differences between wheat and barley were clearer late in the season. This is a consequence of the shorter crop duration of barley, which makes it reach maturity and change color before wheat. Hence, late season Sentinel-2 images (24-05-2018, 03-06-2018 and 18-06-2018) were used for the classification as the moment of greatest contrast.
Due to their better spatial resolution (10 m) and contrasting capacities in comparison with its other bands, Sentinel-2 B3 (green), B4 (red), and B8 (near infrared) bands were used for the crop type classification. Crop classifications [26,51] and field boundary delimitation [52] were successfully mapped using B3-B4-B8 images in previous studies with Sentinel-2 data. The trained datasets were divided as follows, 2/3 for training the model and 1/3 for validation. The number of trees was 500 [53]. RF classification and object filters were computed with ArcGis Pro 2.3.0. We applied majority voting of the classified pixels within the agricultural parcels (SIGPAC), as field crops are exclusively grown in monocultures in the region of Navarre, which we confirmed during farmer visits. The pixel's majority from the zonal statistics tool was then used to object filter the classification. With the validation data and the output of classification, a confusion matrix was computed at parcel level in order to obtain classification accuracies. At parcel level, the Fscore of each class was calculated with the Equation (2), where PA refers to producer's accuracy while UA refers to user's accuracy.
The most suitable multilinear stepwise regression model was extended into all the classified parcels of bread wheat in order to estimate grain yield per field for the three agricultural sub-regions. The resulting wheat cropland and grain yield average per zone were compared to the Navarre regional government's official statistics (www.navarra.es).

Rainfall Interpolation and Topographic Models
A 2 m resolution digital elevation model (DEM) was obtained from Navarre's regional Government at ftp://ftp.cartografia.navarra.es/. The downloaded tiles were mosaicked and slope and aspect were calculated from the DEM using the slope and aspect tools.
The sum of the accumulated rainfall (mm) during the wheat growing season, from November to May, at every station was gathered in a georeferenced-points dataset. This dataset was used to interpolate rainfall using two strategies: inverse distance weighting (IDW) and kriging. All DEM, topographic feature and subsequent GIS processing were completed on ArcGis Pro 2.3.0.

Ordinary Least Square and Geographically Weighted Regression
The ordinary least square (OLS) was used in the first term to explore the relationship between grain yield and the explanatory variables. In OLS the existence of local variation is not taken into account in the regression, hence the regression coefficient remains constant for each variable (Equation (3)).
where y i is the dependent variable (grain yield), β 0 is the intercept; β k is the coefficient of the independent variable (x ik ) and the random error is ε i . Alternatively, GWR 4.0 software, developed by Nakaya et al. [54], was used. A geographically weighted regression (GWR) model was also calculated in order to explore the relationship between grain yield and the explanatory variables (Equation (4)) taking into account the spatial factor, which is an essential feature when dealing with spatial heterogeneity, something that is especially central to the objectives of this study. GWR uses the least square method given the location as a weighting factor. The optimal bandwidth, namely the optimum number of neighbors, was determined by the Akaike Information Criterion (AIC). A multiple comparison of AIC to find the best bandwidth (the one with the lowest AIC) was computed. For OLS and GWR comparison, an ANOVA was performed to compare the accuracy of both levels. Furthermore, for both OLS and GWR, the spatial autocorrelation (Moran's I) of residuals were tested.
where y i is the dependent variable (grain yield), (µ i ,v i ) are the coordinates (x,y) at the location i, β 0 is the intercept, β k is the coefficient of the independent variable (x ik ) at the specific weighted location, and ε i is the random error. A geographic variability test was performed in order to determine if the explanatory variables were spatially heterogeneous. In order to do so, two models were developed, one that considers all of the variables (slope, altitude, height, and rainfall) to be spatially heterogeneous, and on the other hand one that considers all of the variables to be spatially homogenous. If the variables were indeed spatially heterogeneous determinants of grain yield, i.e., if the coefficients varied significantly in space, then the AIC size of the second model should be larger and result in a negative diff-criterion value for this test (GWR4 manual) [55].

Grain Yield Estimation
In Table 6 we show the Pearson's correlation between the calculated vegetation indices and the actual grain yield at the three studied phenological stages (tillering, heading, and ripening) for the various extracted zonal statistics (min, max, mean, and median). The correlations were substantially higher at heading in comparison with tillering and ripening. The resulting correlations against grain yield were similar among the chlorophyll content, water content, vegetation cover, and biomass-sensitive vegetation indices, with no obvious results to determine better correlating indices with the Pearson's correlation. While at heading the correlations averaged between 0.79 to 0.84 for the mean, 0.76 to 0.84 for the median, 0.72 to 0.81 for the minimum, and 0.82 to 0.84 for the maximum, at tillering the results were between 0.41 and 0.74 for the mean, 0.40 and 0.71 for the median, 0.38 and 0.63 for the minimum, and 0.43 and 0.65 for the maximum. Regarding ripening, the correlations yielded between 0.32 and 0.49 for the mean, 0.31 and 0.49 for the median, 0.38 and 0.45 for the minimum, and 0.34 and 0.45 for the maximum. Furthermore, the statistical significance of the correlations at heading was greater than that at both tillering and ripening, which demonstrated lesser (<0.05) or no significance differences in the majority of the cases. At heading, the results from the multilinear stepwise regression (Table 7) show the most suitable combination of vegetation indices in order to estimate grain yield. In this sense, the best-suited equation uses minimum averages of MSAVI pixels and maximum averages of RDVI pixels. VIF was 2.16 for both variables. The histogram of the residuals is shown in Figure A1 in the Appendix A together with the Shapiro-Wilk normality test results, the null hypotheses of which assumes that the data is normal while the alternative assumes it is not. The p value of the test was 0.297 and the statistic 0.982.
The resulting multilinear stepwise equation (Table 7) was tested at heading for the studied fields for the following year 2019 using the corresponding calculated vegetation indices and summary statistics. The equations worked accurately as shown in Figure 4, with an R 2 of 0.83 (RSE = 733.1 kg·ha −1 ). with the Shapiro-Wilk normality test results, the null hypotheses of which assumes that the data is normal while the alternative assumes it is not. The p value of the test was 0.297 and the statistic 0.982. The resulting multilinear stepwise equation (Table 7) was tested at heading for the studied fields for the following year 2019 using the corresponding calculated vegetation indices and summary statistics. The equations worked accurately as shown in Figure 4, with an R 2 of 0.83 (RSE=733.1 kg·ha −1 ).

Crop type classification
The overall classification accuracy reached 86.55%, 89.32%, and 83.14% for the II, V, and VI zones, respectively (Table 8). Wheat was well-mapped at the three classified sites with an Fscore of 88% at the agrarian sub-regions II, 91.90% at zone V, and 84.05% at zone VI. On the other hand, bare soil was reliably classified with an Fscore of 91.43% at zone II, 93.26% at zone V, and 84.06% at zone VI. The class "other crops" was the least well classified, the corresponding Fscores at the three areas where 78.15%, 78.26%, and 81.82% for the zones II, V and VI, respectively. Figure 5 shows the crop type classified map.

Crop Type Classification
The overall classification accuracy reached 86.55%, 89.32%, and 83.14% for the II, V, and VI zones, respectively (Table 8). Wheat was well-mapped at the three classified sites with an Fscore of 88% at the agrarian sub-regions II, 91.90% at zone V, and 84.05% at zone VI. On the other hand, bare soil was reliably classified with an Fscore of 91.43% at zone II, 93.26% at zone V, and 84.06% at zone VI. The class "other crops" was the least well classified, the corresponding Fscores at the three areas where 78.15%, 78.26%, and 81.82% for the zones II, V and VI, respectively. Figure 5 shows the crop type classified map. Table 8. Confusion matrixes for the pixel-based crop type classification at the three stratified agrarian regions under study. Zone II in the northern Montaña region, zone V in the Media region, and zone VI in the Ribera Alta Southern region. The confusion matrix was calculated after the object filtering of fields. Thus, the number indicates the classification accuracy of fields and not pixels. UA indicates users' accuracy and PA producers' accuracy. The overall accuracy at the bottom right of each zone is highlighted in bold. The estimated grain yield average for zone II corresponded to 103% of the official data, while zone V and VI correspond to 95% and 94%, respectively (Table 9). Regarding wheat crop land, the classified wheat field area can be an indicator of the crop type area. In this sense, the estimated wheat crop land area for the whole region corresponded to 85.77% of the estimate in the official statistics. Regarding each zone, the surface corresponded to 81.41%, 81.15%, and 123.16% with the zones I, V, and VI respectively. Regarding zone VI, the estimated wheat cropland was overestimated if compared with official statistics, while the opposite happened at sites I and V. We trust the area estimation indicator as the polygons used for the object-filtering are official trusted vectors, as well as the mask used for the classification itself. All Spanish cereal producers are required to declare their croplands, and thus the polygons used in this study should be fairly accurate. The estimated grain yield average for zone II corresponded to 103% of the official data, while zone V and VI correspond to 95% and 94%, respectively (Table 9). Regarding wheat crop land, the classified wheat field area can be an indicator of the crop type area. In this sense, the estimated wheat crop land area for the whole region corresponded to 85.77% of the estimate in the official statistics. Regarding each zone, the surface corresponded to 81.41%, 81.15%, and 123.16% with the zones I, V, and VI respectively. Regarding zone VI, the estimated wheat cropland was overestimated if compared with official statistics, while the opposite happened at sites I and V. We trust the area estimation indicator as the polygons used for the object-filtering are official trusted vectors, as well as the mask used for the classification itself. All Spanish cereal producers are required to declare their croplands, and thus the polygons used in this study should be fairly accurate.  Table 9. Statistics for grain yield (GY), total wheat crop land surface studied, and farmer field average surface from both the official regional Government of Navarre (www.navarra.es) and the calculated results for the season 2017-2018 at the regions Montaña (zone II), Media (zone V), and Ribera Alta (zone VI).

Topographic Features and Rainfall Effects on Performance
OLS and GWR are both models that seek to explain the relationship between a dependent variable, in this case the calculated grain yield at all the classified wheat parcels, and explanatory independent variables, topographic features (aspect, slope, and altitude), and rainfall for this study. Table 10 presents GWR as the most suitable model for explaining these effects. In this case AIC, sigma and both R 2 and adjusted R 2 improved with GWR in comparison with OLS for the three zones (II, V, and VI) ( Table 11). For instance, OLS yielded an adjusted R 2 of 0.05, 0.06, and 0.07 for zones II, V, and VI respectively. Meanwhile, for the statistically significant (p < 0.05) independent variables (rainfall and altitude in zone II; rainfall, slope, and altitude in zone V; and rainfall, slope, and altitude in zone VI) OLS can explain a 5%, a 6%, and a 7% of grain yield variability at each zone (II, V, and VI). On the other hand, GWR presents an adjusted R 2 of 0.20, 0.11, and 0.20 for zones II, V, and VI respectively; therefore 20%, 11%, and 20% of GY variability at each zone can be explained with GWR. The local R 2 of each wheat field is shown in Figure 6. Furthermore, in Table 10, the ANOVA results support significant improvement (p < 0.05) when using GWR in all the three studied zones compared to the global model (OLS). The GWR model explains the effects of topographic factors (aspect, altitude, and slope) and rainfall on grain yield significantly better than the OLS model for all three zones by adding information on the spatial variation of grain yield with these local phenomena.
The autocorrelation test of both GWR and OLS for the three zones (Table 12) presents Moran's I values closer to the expected for GWR in comparison with OLS. The complete spatial randomness of GWR residuals was accepted as the p-value is not statistically significant, making the null hypothesis, that there is not significant spatial autocorrelation, correct. In contrast with this, p-value is statistically significant for OLS and therefore shows spatial autocorrelation of OLS residues.
Regarding the rainfall interpolation, kriging interpolation outperformed IDW (RMSE = 91.90 mm and 105.08 mm respectively), hence the previous was used. Grain yield correlated positively with rainfall, namely when rainfall increased, so did grain yield. Although in GWR models the regression coefficients vary locally, the fact that in zone II the mean coefficient is 13.87 (Table 11) suggests the strong effect of the rainfall gradient distribution on wheat grain yield in this northern area. Regarding rainfall at the southern region (Zone VI), the mean coefficient is also positive and relatively high, 12.32 (Table 11). Meanwhile, the mean regression coefficient in zone V was lower at 1.73. Concerning the topographic variables, they were spatially heterogeneous except for aspect in the middle zone.    Figure 6. GWR local R 2 distribution showed on the four analyzed parameters (altitude, slope, aspect, and rainfall maps, in order from top to bottom) zoomed in to zone II (north). GWR: geographically weighted regression. UTM coordinates are expressed in meters.

Discussion
Testing the applicability of empirical vegetation index models in different years of data allows for evaluating the reliability and reproducibility of that model through time. In this case the stepwise multilinear model split between the training dataset, season 2018, and validation dataset, season 2019, revealed the efficiency of this approach for at least a two-year period in Navarre. This is uncommon, as generally such empirical models are developed and validated with same-year data [16,21], something that likely happens due to the difficulty to obtain data from various growing seasons, and thus gives robustness to the techniques and analyses that we have presented here.
The full operationality of Sentinel-2 constellation reached in 2018 allowed focusing on phenological stages thanks to its improved frequency, as the results presented here show. Until this recent date, phenology-based grain yield prediction was unlikely to be assessed with alike satellites (i.e. Landsat 8). Our results suggest that heading is the most suitable phenological stage in order to study empirical relations of vegetation indices with grain yield for rain-fed wheat in Navarre. This is consistent with other ground spectral measurements for estimating wheat grain yield under rain-fed conditions as reported by Fernández-Gallego et al. [56]. Moreover, this stage was also found optimal for grain yield prediction with proximal remote sensing (i.e., UAV) and equivalent growing conditions [57]. We argue that the results here obtained showed that focusing on phenology for grain yield estimation is now possible using Sentinel-2 imagery and not only applicable with ground and proximal sensing. Furthermore, the combination of climatic (i.e. temperature, growing degree days) and satellite data has resulted in an adequate approach for estimating phenology, as previous studies have suggested [58,59], and wheat grain yield.
The minimum average of pixel values at heading stage of MSAVI and the maximum of RDVI provided the most optimal combination of vegetation indices in the stepwise multilinear regression produced in this study. The improved features of Sentinel-2 regarding spectral and spatial resolution Figure 6. GWR local R 2 distribution showed on the four analyzed parameters (altitude, slope, aspect, and rainfall maps, in order from top to bottom) zoomed in to zone II (north). GWR: geographically weighted regression. UTM coordinates are expressed in meters.

Discussion
Testing the applicability of empirical vegetation index models in different years of data allows for evaluating the reliability and reproducibility of that model through time. In this case the stepwise multilinear model split between the training dataset, season 2018, and validation dataset, season 2019, revealed the efficiency of this approach for at least a two-year period in Navarre. This is uncommon, as generally such empirical models are developed and validated with same-year data [16,21], something that likely happens due to the difficulty to obtain data from various growing seasons, and thus gives robustness to the techniques and analyses that we have presented here.
The full operationality of Sentinel-2 constellation reached in 2018 allowed focusing on phenological stages thanks to its improved frequency, as the results presented here show. Until this recent date, phenology-based grain yield prediction was unlikely to be assessed with alike satellites (i.e., Landsat 8). Our results suggest that heading is the most suitable phenological stage in order to study empirical relations of vegetation indices with grain yield for rain-fed wheat in Navarre. This is consistent with other ground spectral measurements for estimating wheat grain yield under rain-fed conditions as reported by Fernández-Gallego et al. [56]. Moreover, this stage was also found optimal for grain yield prediction with proximal remote sensing (i.e., UAV) and equivalent growing conditions [57]. We argue that the results here obtained showed that focusing on phenology for grain yield estimation is now possible using Sentinel-2 imagery and not only applicable with ground and proximal sensing. Furthermore, the combination of climatic (i.e., temperature, growing degree days) and satellite data has resulted in an adequate approach for estimating phenology, as previous studies have suggested [58,59], and wheat grain yield.
The minimum average of pixel values at heading stage of MSAVI and the maximum of RDVI provided the most optimal combination of vegetation indices in the stepwise multilinear regression produced in this study. The improved features of Sentinel-2 regarding spectral and spatial resolution have allowed increasing the VIs able to be calculated and the most optimal statistic summary, which with coarser spectral and spatial resolutions would be very limited. The improved capacities of these indices could be explained due to the reduced soil background influences, regarding MSAVI, and, as detailed by Rougean and Breon [42], the capacity to minimize background reflectance effects of RDVI. In this case, MSAVI together with RDVI show the ability to maintain sensitivity to total vegetation biomass for fully developed canopies corresponding to heading, while other vegetation indices might be saturated at this phenological stage. These vegetation indices have already been successfully correlated with wheat performance using remote sensing data. In the scientific literature, we find examples of RDVI, which is used for green biomass estimation as it is sensitive to vegetation coverage [60] and has also been successfully used in empirical models estimating wheat yields [61]. Around the phenological stage of heading, Bao et al. [62] reported a correlation of 0.79 (R 2 ) between RDVI and wheat yield, while Zhao et al. [63] found a correlation of 0.76 (R 2 ), which show the relative high correlation ability of RDVI with wheat yield. Examples are also available for MSAVI, which has shown relative high correlation with wheat yields [64,65]. Regarding the goodness of the model, the selected vegetation indices showed absence of collinearity, as a VIF of 2.16, obtained for both cases, was lower than the collinearity threshold value of 10 [66]. Moreover, the normality of the residuals was assumed, as the Shapiro-Wilk test p value was over 0.05 (0.297) and the null hypotheses, which assumes that the data is normal, was accepted. The plot of residuals also suggested it as shown in Figure A1 of Appendix A.
The fact that maximum and minimum pixel value averages were the most suitable statistical summary combination shows that, for studying grain yield, exploring crop canopy evolution through time may provide an improved output in comparison with single date image data. However, it shows that focusing on the phenological period of heading can provide relatively higher correlations with grain yield, explaining 83% of variability in this case. Namely, segregating phenological periods is a a useful approach in order to calculate empirical grain yield estimations in contrast with following whole season pixel values without explicitly considering phenological periods itself. Estimating the phenological stage with the GDD and focusing on the specific period, heading in this case, eases processing demands and yields accurate results, while clearly diminishing costs. Sentinel-2 spatial resolution allows for the fine evaluation of the statistical summaries and consequently more precise estimations.
Using late-season imagery in order to take advantage of barley's earlier senescence proved to be efficient for wheat crop land classification, again, taking maximal advantage of the S2 return interval temporal frequency to identify the optimal date for the separation of like cereal crops. Our results are congruent with the Navarre regional government's official statistics, as Table 9 demonstrates. The average grain yield obtained after applying the grain yield stepwise equation from Table 7, for all of the classified wheat fields, showed similar results with official (i.e., after harvest) grain yield averages. Sentinel-2 spatial resolution allowed accurate mapping and estimation of grain yield, while other satellites would not have yielded such precise results regarding field delimitations, for instance. The grain yield estimation model was applied to the wheat classified parcels without considering genotypic differences within the region (both concerning classification and grain yield modelling issues) due to the fact that for the 2017-2018 season 80% of the fields across the whole of Navarre used either Camargo or Marcopolo [67]. Therefore, a general wheat genotypic homogeneity was supposed in the whole region.
In order to be able to use the yield estimation based on spectral data from Sentinel-2 to estimate the yield across the entire study area, it is necessary to also develop a pixel or better field-level image classification. The slightly lower Fscore of wheat classification accuracy in the Ribera Alta southern sub-region might be due to the limited surface area devoted to this crop type in comparison with the other two sites-a higher difficulty in accurately classifying wheat fields was observed with 13 misclassifications with a proportionally smaller number of ground data points for wheat at this site (Table 8). Moreover, it could be a consequence of the smaller size of the fields in this southern zone. Bare soil did present substantial misclassification with "other crops" in this southern zone; this likely occurred due to the major presence of barley in zone VI and the earlier senescence it features, especially considering that this zone is substantially drier. Relevant misclassifications happened with wheat and bare soil in all of the three zones. Unfortunately, ground reference points in which the cultivar was specified were only obtained for wheat, the other available ground points could not be specified among the other cultivated crops, namely barley and other minor crops such as oats were not well differentiated. Despite targeting wheat crop lands, specifically classifying each of the various crop types in the region could have improved the classification results overall. Nonetheless, the lack of this exact information did not jeopardize the classification of only the wheat crop lands too much, which showed relatively high Fscores for all of the zones and was the central aim of this study.
Topographic features and rainfall together accounted for an average of 11% to 20% of the observed spatial variation in grain yield in Navarre; this percentage range is consistent with other studies dealing with these factors. Regarding topography effects on wheat at the field level in central Europe, slope and altitude have been estimated to account for an average of 5% to 42.5% variation in grain yield [68], meanwhile in North America those same topographic attributes were estimated to account for an average of 39% to 65% of the variation in grain yield [69]. Moreover, Basso et al. showed that in the Mediterranean context aspect, altitude, and slope are directly linked with wheat grain yield [70] and that rainfall distribution is a major limiting factor for wheat yields [71]. It has been observed that topographic features affect grain yield more dramatically in dry years or in areas with naturally irregular rainfalls (i.e., Mediterranean rain-fed wheat croplands) [72], which explains the relatively wide range of variability of the rainfall and topographic effects on grain yield in this study as well as in the scientific literature. Thus, an increased effect of topographic features might be expected in Spanish wheat agrosystems in a climate change scenario, and these features consequently deserve attention. In this sense, the results of this study demonstrate novelty as the relations between topography and grain yield found in the scientific literature have been, on the one hand, studied in few or single fields [73], presumably due to the difficulty in obtaining precise grain yield predictions for multiple fields from both traditional methods (i.e., time consuming field-work) and from remote sensing techniques (i.e., access restrictions or coarser resolutions to correlate with grain yields and delineate fields). On the other hand, when focusing on regional-scale topographic and climatic data, previous studies have defined suitability zones solely within the agroecological landscape [74][75][76]; none have completed an estimation of the effects of those factors directly on grain yield. We here took advantage of both topographic and climatic data of Navarre, Spain to explore the potential effects on grain yield in various regions, while estimating precisely the grain yield per field at a regional level using to the best of our advantage all of the Sentinel-2 improved characteristics. We believe that studying climatic and topographic impacts on grain yield at a larger scale can provide new insights that in turn will be of interest for managing croplands systemically while advancing towards improved agroecological landscape assessments.
Regarding the previously-mentioned GWR results, approximately 80%, 89%, and 80% of grain yield variability are yet to be explained in each region respectively, as the tested variables cannot explain the whole of the variability. Factors such as soil type, evapotranspiration, radiation, and fertilization or farming practices, among others, could not be evaluated with the remote sensing techniques employed in this study, and might help to explain the remaining variability. Be that as it may, the mean coefficient of regression is positive for rainfall at all three zones. This suggests a relatively strong effect of the rainfall gradient distribution on wheat grain yield in the northern zone. This might be explained with the varying precipitation patterns of the area due to the rugged topography and the numerous valleys that the area includes and that may consequently affect wheat grain yield. Regarding zone VI, the result links the drier features of the southern climate and its positive strong dependence on rainfall.
In zone II, the slope showed a negative coefficient or inverse relationship (the steeper the terrain the lower the GY), while altitude and aspect were positive, hence suggesting that southern-oriented fields at higher altitudes experienced improved performance for that season. Regarding the middle zone, the regression coefficient for the slope was also negative, while the altitude coefficient was positive. In zone VI, the slope and altitude regression coefficients were negative, while aspect was positive. This suggest that at this site that grain yield decreases the higher and steeper the parcel is; while orientation, towards the South in this case, affects grain yield positively. The lower adjusted average of R 2 results in zone V suggest only a moderate influence of the tested factors in comparison with the northern zone II and southern zone VI, agricultural areas. This might be due to the milder weather with sufficient rainfall, in comparison with the southern zone, and the fairly flatter ground in comparison with the rugged northern zone.

Conclusions
We believe that Sentinel-2 s improved temporal frequency benefitted the grain yield prediction models by adapting to phenology across ecozones in the geographically diverse region of Navarre and improved the spectral separation of bread wheat from other crop types by using precisely timed images at a moment of phenological differences. The improved Sentinel-2 spatial detail was also optimally harnessed by capturing field level variability with the use of zonal statistics as model input parameters, and contributed to improving total estimates related to both yield and crop types. Furthermore, the use of field-level grain yield data for both training and validation of the wheat grain yield estimation model proved to be an efficient approach. The matching of crop type classification and field-level grain yield estimation models are scarce in the scientific literature, as often studies either focus on classification [23][24][25][26] or grain yield estimation [17][18][19][20][21][22] alone. Notwithstanding, the combination of both as presented in this study provided improved data for a more robust model (i.e., comparison to official cropland area and regional yield statistics) and for testing its applicability in actual agricultural contexts. Moreover, another novel aspect that this research addresses is the study of large-scale field-level topographic effects on grain yield, which has not been studied extensively, as single fields or reduced-scale approaches are used in the majority of studies on this topic [73].
In addition, we have applied here various techniques to take maximal advantage of several different types of openly accessible data sources for the Navarre region in northern Spain, where no such study had been previously reported. It is especially relevant, as in 2020 the CAP in the European Union will be reformulated aiming to optimize resources and advance towards an integrated regional management. The CAP accounts for around 30% of the total European budget [77], and member states are responsible of supervising the declared croplands and harvests. The results obtained and the methodology here developed can be easily used and reproduced for both Navarre's regional government and other European regions with equivalent data availability and frameworks, easing their expensive field works. We here used polygon databases from agricultural fields declared in the CAP subvention system, public topographic and environmental data, as well as European Union and European Space Agency-launched Sentinel-2 openly accessible imagery. Furthermore, we believe that in order to obtain field-level crop yields in actual agricultural contexts, coordination with local farmers is central to this endeavor.
In summary, this research successfully showed the ability of our relatively straightforward and potentially operational method for estimating wheat grain yield at field-level before harvest and determining the spatial factors that influence grain yield. Concerning grain yield estimation, two observations were deemed most pertinent: on the one hand this approach allows grain yield at approximately two months before harvesting (at the phenological stage of heading) to be estimated correctly, and on the other hand, it is applicable across at least a two-season period. The combination of our grain yield estimation model and crop type mapping also achieved accurate results with regards to official statistics. The techniques and models presented can be said to have successfully mapped wheat croplands and calculated regional grain yields. Furthermore, the observed spatial variation caused by topographic features (slope, aspect, and altitude) and rainfall could be attributed to explain on average 11% to 20% of spatial grain yield variation. Additional work is warranted in relation to the other genotype/varietal, environmental, or management (GxExM) factors that may account for regional grain yield variability across local to regional scales, such as those presented here for Navarre, Spain.
Author Contributions: J.S. processed the Sentinel-2 images, mapped the crop types, developed the performance estimation models and analyzed the topographic and rainfall effects on it under the supervision of J.L.A. and S.C.K., J.G.-T. and Í.A. managed the field work in Navarre and organized the datasets. J.S. and S.C.K. wrote the paper. All authors have read and agreed to the published version of the manuscript.