Potential of Different Optical and SAR Data in Forest and Land Cover Classiﬁcation to Support REDD+ MRV

: The applicability of optical and synthetic aperture radar (SAR) data for land cover classiﬁcation to support REDD+ (Reducing Emissions from Deforestation and Forest Degradation) MRV (measuring, reporting and veriﬁcation) services was tested on a tropical to sub-tropical test site. The 100 km by 100 km test site was situated in the State of Chiapas in Mexico. Land cover classiﬁcations were computed using RapidEye and Landsat TM optical satellite images and ALOS PALSAR L-band and Envisat ASAR C-band images. Identical sample plot data from Kompsat-2 imagery of one-metre spatial resolution were used for the accuracy assessment. The overall accuracy for forest and non-forest classiﬁcation varied between 95% for the RapidEye classiﬁcation and 74% for the Envisat ASAR classiﬁcation. For more detailed land cover classiﬁcation, the accuracies varied between 89% and 70%, respectively. A combination of Landsat TM and ALOS PALSAR data sets provided only 1% improvement in the overall accuracy. The biases were small in most classiﬁcations, varying from practically zero for the Landsat TM based classiﬁcation to a 7% overestimation of forest area in the Envisat ASAR classiﬁcation. Considering the pros and cons of the data types, we recommend optical data of 10 m spatial resolution as the primary data source for REDD MRV purposes. The results with L-band SAR data were nearly as accurate as the optical data but considering the present maturity of the imaging systems and image analysis methods, the L-band SAR is recommended as a secondary data source. The C-band SAR clearly has poorer potential than the L-band but it is applicable in stratiﬁcation for a statistical sampling when other image types are unavailable.


Introduction
The global yearly deforestation rate during 2000-2010 was 13 million hectares [1]. Although the rate has recently been decreasing in several countries such as Brazil, it is still high in the tropical zone [2]. The carbon densities of tropical and boreal forests are similar but the differences in the speed of nutrient recycling result in a situation where over 50% of tropical forest carbon is stored in the living biomass, whereas the corresponding figure is 20% in boreal forest [3,4]. In the tropical forest, a large amount of carbon is lost when trees are removed. Hence, prevention of tropical deforestation has a key role to play in the fight against climate change.
The United Nations Framework Convention on Climate Change has established the REDD+ (Reducing Emissions from Deforestation and Forest Degradation) program to provide incentives to developing countries to prevent the loss of their forests [5]. The plus sign refers to the extended role of REDD to support conservation, sustainable management of forests and enhancement of carbon stocks. The measurement, reporting and verification (MRV) activity of the REDD+ process requires application of reliable and robust methods for the assessment of forest resources and their changes. Information is required on the extent of land cover and land use categories at the beginning of the reporting period and on the changes between categories during the period [6].
Remote sensing-based services have the potential to fulfil several of the requirements of the monitoring services as defined by the Intergovernmental Panel on Climate Change (IPCC) [7] and the Conference of the Parties [5]: adequate, robust, consistent, accurate, complete and transparent. Earth observation has been stated to be the only feasible means to get information from changes in the forest for large areas [8].
Medium resolution (10-60 m spatial resolution) satellite images are recommended as the main source for collecting activity data for MRV on forest area and changes [9]. Such data at 30 m spatial resolution have been available from the Landsat satellite program since 1982 [10]. Landsat images are available free of charge for anybody, but availability of applicable data varies in different regions mainly because of cloud cover and acquisition capability [11,12]. Since 2015, Sentinel-2 satellites have provided free imagery with 10 m, 20 m and 60 m spatial resolution depending on the band [13]. Several commercial satellite missions, such as SPOT and RapidEye offer optical images with similar resolutions to MultiSpectral Instrument (MSI) of Sentinel-2.
Radar data can provide a solution to the problems caused by clouds, since the L-band in particular, but also the C-band are much less affected by weather conditions than the optical data. SAR (synthetic aperture radar) data may be the only available information source of forest resources at a desired time point. Therefore, it is important to investigate its potential for forest monitoring in comparison to optical data. Currently operational Sentinel-1 and Radarsat-2 missions provide SAR data in the C-band (centre frequency 5.405 GHz) and the ALOS-2 mission in the L-band (centre frequency 1.27 GHz). SAR instruments onboard ERS-1, ERS-2 and Envisat acquired C-band data from 1991 to 2012, which provides a potential data source for historical monitoring of land cover. Historical L-band satellite imagery has been provided by JERS-1 (Japanese Earth Resource Satellite), operated from 1992 to 1998, and by ALOS-1 PALSAR, which operated between 2006 and 2011. The first freely available and systematically acquired SAR data are from Sentinel-1 [14], which has been available since 2014. Challenges in data availability and processing as well as information content that depends on meteorological conditions have restricted wider operational application of SAR imagery in forest cover monitoring. The dependence on meteorological conditions particularly affects shorter wavelength SAR systems in the C-and X-bands.
Variable land cover classification accuracies based on different approaches in image interpretation and accuracy assessment have been reported. Overall accuracy in forest and land cover classification using Landsat-type data has often been close to or above 90% [15][16][17][18] and several global forest cover maps have been produced (e.g., [19,20]). For detailed class division on national and global levels, also lower overall accuracies have been obtained. For the operational land cover system for Mexico [21] the highest overall accuracies for ten classes were 76%. In the global land cover map produced by [22] the highest overall classification accuracy for eight land cover classes was 71.5%. Application of time series of images and combination of land cover classification with change detection can improve land cover classification results for a selected date in a time series and provide accuracies close to or even above 90%, even for detailed class division [23][24][25].
Results with L-band SAR data have been close to the results of optical data particularly in forest and non-forest classification [17,[26][27][28] and L-band data have also been shown to be applicable for producing forest cover maps at global scales [29]. However, the results in land cover classification using SAR data are not consistent [17,18,30,31]. In [30] the overall accuracy in forest and non-forest classification was 92.1% for ALOS PALSAR L-band data and 81.2% for Radarsat-2 C-band data. In another study [31], the corresponding figures were 72.2% and 54.7% in the classification of six land cover classes. In forest and non-forest mapping in French Guiana [32], an accuracy of 90% was achieved with multitemporal ERS-1 and ERS-2 C-band data and an accuracy of 94% with Envisat ASAR data. The studies of synergistic use of optical and SAR data sets indicate that adding SAR features to optical data can improve the land cover classification results [18,25,26,33]. Also, L-band SAR has been more effective as additional data than C-band.
The variability in accuracy assessment methods and used reference data for the assessment makes it difficult to evaluate the real robustness and performance of the approaches and different data sources. Accuracy assessment approaches that are based on very high resolution (VHR) optical images with a spatial resolution less than one metre can be implemented at any global location. This kind of approach has been applied e.g., in [34][35][36]. If no VHR imagery exists from the mapping epoch, one applied solution has been to use the best available optical data with lower resolution [15,32].
The objective of this paper is to compare the accuracy of land cover classifications using optical and SAR satellite imagery in a tropical to sub-tropical area. The comparison will be done with common reference data from random sample plots that are selected from VHR satellite imagery. This ensures an objective and realistic evaluation of the potential of medium resolution optical and SAR data and significantly reduces the uncertainty that remains after the survey of the diverse results from the literature.
Image data from RapidEye, Landsat Thematic Mapper (TM), Envisat ASAR and ALOS PALSAR sensors were used. The applied land cover classes have been defined by the IPCC. Thus, this study aims at supporting methodology development to implement the MRV component of REDD+. In addition to the assessment of the potential of different data types, the study demonstrates how a plot sample from VHR imagery can be collected and applied in a manner that is also applicable in a future operational context. As a key result of the study, recommendations on the usage of different satellite data types in future operational land cover mapping are given. Early results of this study were published in a conference paper [37]. The main improvements since the conference paper include re-computation of the Envisat ASAR land cover classification results using a more complete satellite data set, computation of the combined Landsat TM and ALOS PALSAR classification and application of the common data for all the land cover maps in the accuracy assessment.

Study Site and Class Definition
The study was performed on a site of approximately 100 km by 100 km in the eastern part of the State of Chiapas in Southern Mexico (Figure 1), bordering Guatemala in the south and east and Montes Azules Biosphere Reserve in the west. The dominant vegetation is lowland tropical humid to sub-humid forest. The area was colonized during the 1980s and early 90s, due to governmental policies, and currently the population amounts to about 25,000 inhabitants. The main economic activities are related to land use, mainly subsistence farming, commercial agriculture and animal husbandry. Six land use classes that are compatible with the good practice guidance [6] of IPCC: 'forest land', 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land' were used in the accuracy assessment of the classifications. Additionally, a class 'shrubland' was used in the classification but in the accuracy assessment it was included in the 'grassland' class (class definitions from [34], Table  1). For forest definition, the FAO (Food and Agriculture Organization of the United Nations) thresholds, forest cover larger than 10% and a minimum height of 5 m [38], were used. Six land use classes that are compatible with the good practice guidance [6] of IPCC: 'forest land', 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land' were used in the accuracy assessment of the classifications. Additionally, a class 'shrubland' was used in the classification but in the accuracy assessment it was included in the 'grassland' class (class definitions from [34], Table 1). For forest definition, the FAO (Food and Agriculture Organization of the United Nations) thresholds, forest cover larger than 10% and a minimum height of 5 m [38], were used. Areas where an aquatic crop is purposely planted, cultivated and harvested and which is standing in water over extensive periods during its cultivation period. E.g., paddy rice fields.
Class did not exist in the reference data and was not used in the classification High grass Grassland with grasses above 1 m. Class did not exist in the reference data and was not used in the classification

Cleared forest land
Areas that are not (yet) cultivated but in which the forest has been cut or possibly burned. The crown closure is less than 10%. Can be almost or completely tree-less. Deviates from the shrub land because cleared forest land would be capable to grow forest.

Grassland Non-forest
Other natural or semi-natural vegetation Any naturally or semi-naturally vegetated terrestrial area not included in the previous classes.
Class did not exist in the reference data and was not used in the classification Natural and Semi-Natural Aquatic or Regularly Flooded Vegetation Areas that are transitional between terrestrial and aquatic systems and where the water table is usually at or near the surface or the land is covered by shallow water.
Class did not exist in the reference data and was not used in the classification

Data Procurement and Pre-Processing
The study compared the potential of RapidEye, Landsat TM, ALOS PALSAR and Envisat ASAR medium resolution satellite images by applying identical sample plot data for the accuracy assessment. To enable an objective comparison, a systematic grid in WGS84 datum and UTM projection (zone 15 north) was defined over the whole study site. A 50 m by 50 m cell size was defined for the grid units. The pixels of all the images and the plots for reference data were aligned with the grid. The size of a reference data plot in the reference data collection was also 50 m by 50 m corresponding to one grid unit.
The satellite data sets that were used in the study are summarized in Table 2. The processing chains for the medium and very high resolution satellite images are presented in Figure 2. Kompsat-2 images were used for collection of reference data. Land cover maps were computed using RapidEye, Landsat TM, ALOS PALSAR and Envisat ASAR data. Google Earth, Aster DEM (Digital Elevation Model) and MODIS reflectance mosaic were used as ancillary data. Satellite images in the Google Earth service (in the resolution range from 1 to 30 m) were used as an additional information source for land cover as training data. They were also the best available data for the geographic reference. Aster DEM [39] was downloaded from http://gdem.ersdac.jspacesystems.or.jp and used in the pre-processing of the SAR data. MODIS reflectance product was used for the reflectance calibration of the RapidEye tiles.

RapidEye Tiles
Thirteen level 3A RapidEye orthorectified tiles of 25 km by 25 km acquired between May and September 2009 were collected from the study area and processed according to the workflow in Figure 2. Pre-processing of RapidEye data included geometric and radiometric corrections, resampling, cloud masking and mosaic compilation.
Co-alignment of some image tiles required additional adjustment. The correction was done using block adjustment [40] and ground control points measured from the data and Google Earth. The data were received at a pixel size of 5 m, which is slightly finer then the nominal resolution of 6.5 m. They were averaged to 10 m pixel size for classification. Preliminary tests with the data showed that the pixel size of 10 m produced more homogeneous classification results than application of 5 m pixel size. The 10 m resolution also corresponded to the pixel size of three visible and one near-infrared band of the MSI of Sentinel-2.
Atmospheric correction was performed for all the RapidEye tiles using the Simplified Method for Atmospheric Correction (SMAC) [41] to convert the image pixel values to bottom of atmosphere reflectances. An additional offset calibration of the reflectance values was done using MODIS reflectance (MOD09GA) product of 500 m resolution [42]. An unsupervised k-means clustering was performed on the MODIS data and a few clusters with low red reflectance, i.e., representing forested areas, were selected for calibration. An average reflectance within RapidEye tiles from each date was computed for the selected MODIS clusters. The difference between the reflectance of the MODIS-based cluster and RapidEye reflectance was used to define an offset correction to the RapidEye tiles. The resulting mosaic is shown in Figure 3. Clouds and shadows were masked out manually using a GIS (Geographic Information System) software. Despite the offset calibration, the remaining differences between tiles from different dates prevented the classification of the RapidEye data set as one mosaic. The tiles from May and June were processed separately but the nine tiles from July and the two tiles from September were compiled as two mosaics.

Landsat TM Images
Two Landsat 5 TM level 1T images that had been acquired on 9 December 2009 (path: 20, row: 48-49) were downloaded from the USGS archives (http://earthexplorer.usgs.gov/). Based on the available images in the archives the selected images were the images with the smallest cloud cover for the selected path and row during the period 2007-2011. The cloud free area was approximately 51% in Landsat TM images. For the RapidEye tiles the cloud free area was approximately 87%.
For the Landsat TM images, no additional geometric correction was required but the data were resampled from 30 to 25 m pixel size. SMAC atmospheric correction was performed but additional calibration was not necessary since the images were from the same date and originally very similar. A mosaic was compiled from the two images ( Figure 3). Also for the Landsat TM images, the cloud mask was created manually.

ALOS PALSAR Images
ALOS PALSAR images were received through the ESA (European Space Agency) Category-1 project C1P.6213. The image set consisted of six dual-polarized PALSAR images in HH and HV polarizations with a nominal incidence angle of 39 degrees in August and September 2009. PALSAR images were received as single-look complex data (level 1.1) [43]. Pre-processing chain of ALOS PALSAR images consisted of pre-averaging, orthorectification, radiometric correction, resampling and mosaic compilation ( Figure 2).
ALOS PALSAR HH and HV bands were pre-averaged in the power domain over seven pixels along the track to reduce speckle. The resulting pixel size was 22.2 m by 9.4 m. The images were orthorectified and radiometrically corrected individually using Aster DEM [39]. To reduce speckle, the weighted average over four pixels was computed using bilinear interpolation to resample the data according to the defined 50 m by 50 m grid. As a result, the pixel spacing of the orthorectified PALSAR images was 25 m by 25 m and a pixel represented a unit of a 28-look image. Bilinear interpolation that forms an output pixel as a weighted sum of four input pixels, reduces speckle in a SAR image. This speckle reduction was considered to outweigh its possible negative effects, such as smoothing of the image detail.
Amplitude mosaics of HH and HV polarizations were compiled from the images ( Figure 3). The mosaics were seamless between image borders but a shift to the north was observed in the comparison with Google Earth. This was corrected by moving the mosaics 55.6 m southward in the final orthorectification.
In the preliminary trials, mosaics from 2007 and 2008 were also computed and the standard deviation, average amplitude, and temporal variability of the three mosaics were computed from the overlapping pixels [44]. The temporal variability was computed as the standard deviation of the 10-based logarithm of the amplitude of a single pixel over a set of images. The result of the test was that the original amplitude bands separated the land cover classes better. The likely reason was the stability of L-band backscatter over time.

Envisat ASAR Images
Twenty-eight Envisat ASAR image mode precision images (IMP) [45] in IS-2 imaging mode with mid-swath nominal incidence angle 23 degrees were received through the ESA Data Warehouse. The images were detected three-look images, had pixel spacing of 12.5 m by 12.5 m and they were in VV polarization. The IS2 imaging mode was selected because only a few images were available with a shallower view angle, which would have been better for land cover classification [46]. The steps in the pre-processing chain of Envisat ASAR images were pre-averaging, orthorectification, radiometric correction, resampling, feature computation and mosaic compilation (Figure 2). ASAR images from three to six different dates were available depending on the location within the study area. Images that had poor contrast between forest and fields were left out from further processing. Finally, three images were used per orbit: from March 2004, May 2004, and March 2006 in the western orbit and from April 2004, April 2005, and April 2010 in the eastern orbit. RapidEye, Landsat TM, and ALOS PALSAR data had been acquired in 2009. Thus, the Envisat ASAR data were from a very different time period compared to the other data sets, but no other C-band SAR data were available for the study.
The Envisat ASAR images were first pre-averaged using 2 by 2 pixel window. The resulting images were orthorectified [47] and radiometrically corrected using Aster DEM, tie points between scenes, and ground control points between the scenes and Google Earth. The pre-processed ASAR images were resampled using bilinear interpolation from four pixels according to the defined 50 m by 50 m grid to reduce speckle. As a result, the pixel spacing of the orthorectified Envisat ASAR images was 50 m by 50 m and a pixel represented a unit of 48 looks. Average and temporal variability of the backscatter amplitude were computed for each pixel and used as inputs for the classification (Figure 3).

Kompsat-2 Images
Eight Kompsat-2 images from 2008-2011 were received through the ESA Category-1 project C1P.1519 ( Figure 1). The locations of the images were selected randomly from the archived images from 2008-2011. The sample was not strictly random for the whole study site because candidate images were not available at all locations. However, the sample was considered representative to enable comparison of the potential of the medium resolution satellite image types for land cover mapping because the Kompsat-2 images represented the variability of the landscape of the study area. The pixel size of the Kompsat-2 images was 1 m on the panchromatic band and 4 m on the multispectral bands.
The alignment of the Kompsat-2 images and the pre-processed RapidEye tiles was analysed visually, and where necessary, geometric shifts in the easting and northing direction were computed using ground control points that were measured from RapidEye tiles and Kompsat-2 images. For visual interpretation, true (red, green and blue bands) and false colour composites (near infrared, red and green bands) images were produced using pan-sharpening. The pan-chromatic wide-spectrum channel of 500-900 nm as the fourth band was used for sharpening the multi-spectral bands to 1 m pixel size (https://directory.eoportal.org/web/eoportal/satellite-missions/k/kompsat-2).
After the pre-processing step, the satellite data sets were ready for further processing. The data sets for land cover classification were RapidEye and Landsat TM reflectance mosaics and Envisat ASAR and ALOS PALSAR feature mosaics. The eight Kompsat-2 true and false colour image composites were used for the collection of reference data.

Reference Data Collection from VHR Satellite Images
Reference data were collected by visual interpretation of the Kompsat-2 images. The main steps in the reference data collection from VHR images were: • Creation of the grid of plots for each VHR satellite image • Definition of the land cover class proportions for the plots with visual interpretation • Division of the reference data set to training and test sets.
For each VHR image, a systematic grid of 50 m by 50 m plots at 800 m intervals in northing and easting directions was created (Figure 4). A statistical analysis that was made in a previous study was behind the selection of those plot sizes and distances. These results were also considered applicable to this study. The procedure and the results are reported in detail in [34]. The 50 m by 50 m square was small enough to make objective visual interpretation of reference data possible. Furthermore, it was sufficiently large with respect to the poorest spatial resolution of Envisat ASAR. Large enough plot size was also needed to consider the residual geometric differences between image data sets in reference data collection for model training and for accuracy assessment. With 800 m distance, the spatial correlation between the plots was assumed to be low with a sufficient margin [34]. In comparison, the most common plot distance in the Finnish national forest inventory is 300 m [48]. The total number of plots in eight images was 1680. The proportion of the land cover classes (Table 1) in each plot was defined visually with the help of GIS software. It was obvious that application of systematic sampling produced a large proportion of heterogeneous plots with respect to their land cover classes in the fragmented landscape of state Chiapas. However, our approach to image interpretation, in which also the categorical land cover variables are first predicted as continuous variables or pixel class probabilities, also made it possible to use heterogeneous data as training data.
The plots were divided into two sets of equal size (840 plots), one for training and another for accuracy assessment. Each VHR image was first divided in two alternative ways: from north to south and from east to west. At every image, the similarity of land cover distribution of the east and west halves and north and south halves was tested. The division that showed a more similar land cover class distribution in the halves was applied. Within a VHR image, one half was selected randomly to be included in the training set while the other half was included in the test set.
The plot dataset was forest-dominated. The proportion of forest was 73% when 'cleared forest land' class was excluded. The proportion of the class 'cleared forest land' was 7% and the proportion of the class 'cultivated and managed terrestrial areas' was 16% of the total area of the plots. The proportions of classes 'shrub' and 'natural water bodies' were both about 1% of the total area. The proportion of the classes 'artificial surfaces and associated areas', 'bare areas' and 'artificial water bodies' was in total less than 1% of the area of the plots. The IPCC compatible classes 'settlements' from 'artificial surfaces and associated areas' and 'other land' from 'bare areas' were, however, included in the classifications and confusion matrices to make the test concept universally applicable.

Classification of the Medium Resolution Satellite Data
Land cover classifications with seven classes ('forest land', 'shrubland', 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land') were computed for the medium resolution data sets. For RapidEye and Landsat TM mosaics the bottom of atmosphere reflectance values were used as input features of the classification. For RapidEye mosaics blue, green, red and near infrared bands and for Landsat TM mosaic the two shortwave infrared bands were also used. For ALOS PALSAR mosaic the input features were HH-and HV-polarized amplitudes and for Envisat ASAR, the average backscatter of VV-polarized amplitude and its temporal variability were used. For the training plots, the proportions of the classes that had been used in the visual interpretation were summed up to represent proportions of 'forest land', 'shrubland', 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land' classes according to the Table 1.
Classifications were computed separately for each medium resolution data source in order to compare the potential of each satellite data set. Additionally, we computed a classification where the bands of Landsat TM and ALOS PALSAR mosaics were combined to examine the potential improvement in classification accuracy compared to using Landsat TM or ALOS PALSAR data alone.
The classifications were produced in two phases. In the first phase, the probability of each land cover class was predicted with the probability estimation method [34,49] for each pixel using its feature vector. The applied parametric approach assumes normal distribution for the features. One The proportion of the land cover classes (Table 1) in each plot was defined visually with the help of GIS software. It was obvious that application of systematic sampling produced a large proportion of heterogeneous plots with respect to their land cover classes in the fragmented landscape of state Chiapas. However, our approach to image interpretation, in which also the categorical land cover variables are first predicted as continuous variables or pixel class probabilities, also made it possible to use heterogeneous data as training data.
The plots were divided into two sets of equal size (840 plots), one for training and another for accuracy assessment. Each VHR image was first divided in two alternative ways: from north to south and from east to west. At every image, the similarity of land cover distribution of the east and west halves and north and south halves was tested. The division that showed a more similar land cover class distribution in the halves was applied. Within a VHR image, one half was selected randomly to be included in the training set while the other half was included in the test set.
The plot dataset was forest-dominated. The proportion of forest was 73% when 'cleared forest land' class was excluded. The proportion of the class 'cleared forest land' was 7% and the proportion of the class 'cultivated and managed terrestrial areas' was 16% of the total area of the plots. The proportions of classes 'shrub' and 'natural water bodies' were both about 1% of the total area. The proportion of the classes 'artificial surfaces and associated areas', 'bare areas' and 'artificial water bodies' was in total less than 1% of the area of the plots. The IPCC compatible classes 'settlements' from 'artificial surfaces and associated areas' and 'other land' from 'bare areas' were, however, included in the classifications and confusion matrices to make the test concept universally applicable.

Classification of the Medium Resolution Satellite Data
Land cover classifications with seven classes ('forest land', 'shrubland', 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land') were computed for the medium resolution data sets. For RapidEye and Landsat TM mosaics the bottom of atmosphere reflectance values were used as input features of the classification. For RapidEye mosaics blue, green, red and near infrared bands and for Landsat TM mosaic the two shortwave infrared bands were also used. For ALOS PALSAR mosaic the input features were HH-and HV-polarized amplitudes and for Envisat ASAR, the average backscatter of VV-polarized amplitude and its temporal variability were used. For the training plots, the proportions of the classes that had been used in the visual interpretation were summed up to represent proportions of 'forest land', 'shrubland', 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land' classes according to the Table 1. Classifications were computed separately for each medium resolution data source in order to compare the potential of each satellite data set. Additionally, we computed a classification where the bands of Landsat TM and ALOS PALSAR mosaics were combined to examine the potential improvement in classification accuracy compared to using Landsat TM or ALOS PALSAR data alone.
The classifications were produced in two phases. In the first phase, the probability of each land cover class was predicted with the probability estimation method [34,49] for each pixel using its feature vector. The applied parametric approach assumes normal distribution for the features. One consequence of the normal distribution assumption is a tendency to predict towards the average of the features. The approach is somewhat robust against non-normally distributed features. The parametric model is sufficiently transparent in the sense that it is possible to follow whether the connections between the input features and land cover class predictions are compatible with physical understanding. The non-parametric machine learning methods create models that may be hard to understand in a physical sense. Decreases in robustness may also occur with complex models, for example, artificial neural networks through overfitting [50]. In the second phase, the land cover classifications were compiled from the continuous predictions of the probabilities of the land cover classes using a hierarchical rule based category classification ( Figure 5).
In the first phase, feature vectors of each medium resolution satellite data set were classified in feature classes using unsupervised k-means clustering after normalizing them to the same mean and standard deviation. The class means and covariance matrices from the clustering results were computed from the non-normalized data.
The thematic contents of the feature classes were defined using the plot data reserved for training. Each training plot was classified to one feature class. Maximum likelihood classification was applied to the average of the feature vectors inside the plot. For each feature class, the training plots that were classified to that feature class were used for computing the average of proportion of each land cover class for that feature class. If no reference data were available for a feature class, land cover class proportions were defined with the help of other available information, e.g., visual interpretation of satellite images of Google Earth, the input images, and the location of the feature class in the feature space. The thematic content of a feature class was a vector that showed the proportion of each land cover class for the feature class.
Prediction of the probability of a land cover class for a pixel was computed as a weighted sum of the feature classes' ground data values: where f ( x ) is the probability of a land cover class for feature vector x, P(c|x) the probability for the feature vector x to belong to the feature class c, f c the proportion of a land cover class in feature class c and N the number of feature classes.
The output from the first phase was a multiband raster image, where each band represented one land cover class and the pixel values represented the probabilities of the pixels to belong in that class.
Water masks were created for the optical image mosaics. For the Landsat TM mosaic the pixels where the reflectance on near-infrared band was less than 10%, were classified as water. The threshold was selected using the reflectance value histogram of the near-infrared band and verified visually with the image mosaic. The water mask from Landsat TM was also used in the combined Landsat TM and ALOS PALSAR classification. For RapidEye, the same 10% threshold for the near-infrared band was not sufficient. The sediment content in the rivers appeared to be high and it was necessary to combine thresholds on visible and near-infrared bands to extract areas where near-infrared reflectance was low and reflectance values of visible bands were high. The water areas were included in the 'wetlands' class.
In the second phase, category classifications were computed from the continuous predictions. A pixel was classified as forest if the predicted forest probability was higher than the sum of the predicted non-forest class probabilities of that pixel. Otherwise, the pixel was classified as non-forest. After that, the non-forest pixels were classified to the non-forest class that had the highest predicted probability.
For ALOS PALSAR and Envisat ASAR mosaics alternative processing with 3 by 3 pixels majority filtering was performed in the category classification. Majority filtering was applied to the forest and non-forest classification and the resulting non-forest pixels were classified to the dominating non-forest class using the continuous predictions. The aim of the 3 by 3 majority filtering was to remove noise. The filter size was chosen as a trade-off between noise-artefact removal and overly generalized land cover maps. This class-selective use of a post-processing filter had the benefit that it did not smooth out small details in smaller non-forest classes. Stronger averaging or filtering before classification could have smoothed out these smaller classes.
The thematic information about the land cover was included iteratively in the classification process. The aim of the iterative approach was to make it possible to follow how the overall accuracy and the errors of commission and omission developed when more data were introduced in the classification. The training plots of each VHR image were divided into two subsets, A and B (Figure 5), which both had 420 plots. First, set A was used in the classification and a land cover map was compiled. In addition, other data could be included to complete the classification, but the quality of the classification was assessed using the training plot data only. The best possible classification, as measured with the classification accuracy and the difference between the omission and the commission errors with set A, was selected and set B was introduced. If accuracy assessment with set B indicated bias in the model, the classification was fine-tuned using set B. Because of the clouds and cloud shadows in the Landsat TM images and RapidEye tiles, not all training plots could be used for optical data sets. smooth out small details in smaller non-forest classes. Stronger averaging or filtering before classification could have smoothed out these smaller classes.
The thematic information about the land cover was included iteratively in the classification process. The aim of the iterative approach was to make it possible to follow how the overall accuracy and the errors of commission and omission developed when more data were introduced in the classification. The training plots of each VHR image were divided into two subsets, A and B ( Figure  5), which both had 420 plots. First, set A was used in the classification and a land cover map was compiled. In addition, other data could be included to complete the classification, but the quality of the classification was assessed using the training plot data only. The best possible classification, as measured with the classification accuracy and the difference between the omission and the commission errors with set A, was selected and set B was introduced. If accuracy assessment with set B indicated bias in the model, the classification was fine-tuned using set B. Because of the clouds and cloud shadows in the Landsat TM images and RapidEye tiles, not all training plots could be used for optical data sets.
The total number of training plots that were used in the classification was 331 for Landsat TM and 713 for the RapidEye. A slice of missing data in the southern border of the study area ( Figure 3) restricted the training set to 795 plots for the ALOS PALSAR mosaic. Clouds and missing SAR data allowed use of 286 training plots for the combined Landsat TM and ALOS PALSAR mosaics. For the Envisat ASAR mosaic, 840 training plots were used in total.

Accuracy Assessment
The main steps in the accuracy assessment were: • Definition of the IPCC compatible target class for each test plot using the land cover class proportions in the reference data • Definition of the IPCC compatible target class for each test plot using the land cover class proportions extracted from the satellite image classifications The total number of training plots that were used in the classification was 331 for Landsat TM and 713 for the RapidEye. A slice of missing data in the southern border of the study area (Figure 3) restricted the training set to 795 plots for the ALOS PALSAR mosaic. Clouds and missing SAR data allowed use of 286 training plots for the combined Landsat TM and ALOS PALSAR mosaics. For the Envisat ASAR mosaic, 840 training plots were used in total.

Accuracy Assessment
The main steps in the accuracy assessment were: • Definition of the IPCC compatible target class for each test plot using the land cover class proportions in the reference data • Definition of the IPCC compatible target class for each test plot using the land cover class proportions extracted from the satellite image classifications • Comparison of the reference class and the classification result at each plot using the IPCC compatible target classes.
The classification scheme in the visual interpretation of the 840 test plots represented more detailed classes than the IPCC compatible classes as shown in Table 1. The IPCC classes were generated for the plots by combining the lower level classes. The proportions of the classes that had been used in the visual interpretation were summed up to represent proportions of the IPCC compatible classes according to the Table 1. The proportions of 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land' classes were then summed up to indicate the total proportion of non-forest class. If the proportion of forest class was higher than the total proportion of the non-forest classes, the plot was classified as forest. If the proportion of forest class was lower than the proportion of non-forest class, the non-forest class that had the largest individual proportion was set as the land cover class for the plot.
For each plot, the proportions of land cover classes inside the plot were extracted from all satellite image classifications using GIS software and the geographic location of the plot. The class of the plot was defined similarly as for the data from the visual interpretation. Again, the proportions of 'cropland', 'grassland', 'wetlands', 'settlements' and 'other land' classes were combined as the total proportion of non-forest class. Based on the forest and non-forest class proportions, either forest class or the dominating non-forest class was assigned as the final land cover class of the plot.
To use only plots that were free from artefacts, plots where more than 10% of the plot area was covered by unclassified pixels in any of the satellite image classifications (clouds, no data) were excluded from the accuracy assessment. Only plots with valid data in every classification were taken into account. The data set for accuracy assessment contained 320 plots of the original 840 plots. For each plot, the data set included eight variables: the IPCC compatible class from visual interpretation and the IPCC compatible class for each of the seven land cover classifications computed using the satellite data.
A confusion matrix was created for each classification. The overall accuracy for the six IPCC compatible classes, the overall accuracy for the forest and non-forest classification, including confidence intervals and user's and producer's accuracies and errors of commission and omission were computed. The accuracy statistics were computed according to [51].
The accuracy assessment was also experimentally done using plots that were available for each satellite image type, which meant that the satellite image types had different number of test plots. Otherwise the process was the same as described above.

Results and Discussion
The resulting land cover maps for different data sources are shown in Figure 6 and details of the maps and the source data in Figures 7-9. Tables 3 and 4 summarize the results of the accuracy assessment. The confusion matrices are shown in Appendix A.     Figure 6 with a black rectangle.  Figure 6 with a black rectangle.   Figure 6 with a black rectangle.

Forest and Non-Forest Classification Accuracy
The overall accuracy for forest and non-forest classification varied between 74% for the unfiltered Envisat ASAR classification and 95% for the RapidEye classification. The error of omission for the forest class was 5-6% for all other classifications with single data sets except for the unfiltered Envisat ASAR, for which it was 19%. For the combined Landsat TM and ALOS PALSAR classification, the omission error was 3%. The commission error varied from 2% for RapidEye classification to 15% for the unfiltered Envisat ASAR classification. RapidEye, Landsat TM, majority-filtered ALOS PALSAR and combined Landsat TM and ALOS PALSAR classification reached similar accuracies, when the 95% confidence intervals of the classification accuracies are taken into consideration. The confidence interval for unfiltered ALOS PALSAR overlapped with Landsat TM classification but not with the RapidEye classification. Both Envisat ASAR classifications were less accurate. The filtering was particularly effective for the Envisat ASAR classification since their 95% confidence intervals for unfiltered and filtered classifications did not overlap.
At the sampling unit size of 50 m by 50 m, improvement in the resolution from the 25 m pixel size of Landsat TM to the 10 m pixel size used for RapidEye did not provide any major benefit to the accuracy. The better spectral resolution of the Landsat TM sensor as compared to RapidEye also may have partially compensated for its poorer spatial resolution. A combination of Landsat TM and ALOS PALSAR features improved the forest and non-forest separation as compared to using only Landsat TM data but the improvement was only one percentage point.
The maps revealed fragmented variability between forest and non-forest, particularly in the Envisat ASAR classification but also in the ALOS PALSAR result. This variability could have been caused by the variable incidence angle over the image swath or by the meteorological conditions during the acquisition of the different images. Residual noise could also be observed in the unfiltered maps despite spatial averaging of the pixels in image pre-processing. The noise was highest with Envisat ASAR although the applied pixel size, 50 m by 50 m, was larger than for all other data sources. The variability was concentrated on regions with minor topographic variation. As this variability was not present in the available coarse elevation model, its effects on image radiometry could not be adequately corrected in the radiometric correction. SAR imagery with higher incidence angle [52] and use of images from ascending and descending orbits could probably alleviate the effects of this minor topographic variation but no such imagery was available from Envisat ASAR or ALOS PALSAR. Furthermore, earlier studies have shown that forest vs. non-forest mapping in tropical areas with C-band radar data requires images acquired during dry conditions [53], which may not have been true in our study.
The approach to accuracy assessment in which the majority class forest or non-forest within the test plot was defined first, favoured the ALOS PALSAR and Envisat ASAR classifications that also had residual non-forest patches inside uniform forest areas. The approach in the accuracy assessment corresponded to an additional majority filtering to the SAR classification results. On the other hand, the temporal difference between the acquisition date of the medium resolution satellite images and the Kompsat-2 images was larger for Envisat ASAR than for the other data sources, which may have affected the results. Because of the trend of deforestation in the study area [54] the temporal difference between Envisat ASAR data and the reference data could have contributed to the overestimation of forest area in the Envisat ASAR classification. Two of 12 Envisat ASAR images were from 2010 and the other images were from 2004, 2005 and 2006. All images were used to compute the temporal features for the Envisat ASAR mosaic. The other medium resolution satellite data sets were from 2009 whereas Kompsat-2 images from 2008, 2010 and 2011.

Bias in Forest and Non-Forest Classification
The biases were small in most classifications varying from practically zero for Landsat TM -based classification to a 7% overestimation of forest area in the filtered Envisat ASAR classification. With RapidEye the forest area was underestimated by 2.5% (numbers rounded to integers in Table 3).
The differences in the omission and commission errors show that with Envisat ASAR, the filtering increased the forest area by 11 percentage points and turned the classification from underestimation to overestimation. With ALOS PALSAR, the filtering somewhat reduced the bias with a slight overestimation of forest area remaining. When Landsat TM and ALOS PALSAR were used together, the bias increased 3 percentage points leading to overestimation of forest area compared to using Landsat TM only. The bias estimation only concerns the common area that was covered by the Kompsat-2 images because their location was not completely random.

Accuracy with Six IPCC Compatible Classes
The optical images outperformed the radar data in more detailed land cover classification. The overall accuracy of classifications with single satellite data source varied between 70% for unfiltered Envisat ASAR classification and 89% for the RapidEye classification (Tables A1-A5 in Appendix A). The result from the combined Landsat TM and ALOS PALSAR data was similar to the forest and non-forest classification, improving the overall accuracy by one percentage point.
'Grassland' class was mixed with 'forest land' and 'cropland' in all classifications. The border between 'forest land' and 'grassland' classes was defined by canopy height and crown cover percentage, which are continuous variables (Table 1). Wooded vegetation that did not fulfil the definition of forest was counted as 'grassland' in the reference data. Thus, mixing of 'grassland' and 'forest land' classes could be expected. Similarly, the border between grassland and cropland is not explicit. In the Envisat ASAR classification, the amount of 'grassland' was severely underestimated leading to a 100% omission error.
Classification errors for the 'wetlands' class were low in the optical classifications. For RapidEye, only the commission error and for Landsat, the omission error deviated from zero. One 'cropland' plot and one 'bare soil' plot was classified to the 'wetlands' class in the RapidEye classification and one 'wetlands' plot to the 'forest land class' in the Landsat classification. For SAR classifications, the proportion of the 'wetlands' class was underestimated. There were no commission errors but the omission errors were 30-40% and 80% for ALOS PALSAR and for Envisat ASAR, respectively. The combination of Landsat TM and ALOS PALSAR did not improve separation of the 'wetlands' class from other land cover classes.
The most serious error from the viewpoint of REDD MRV would be systematic classification of other land cover classes to 'forest land' class or vice versa since it could cause a large over or underestimation of carbon emissions [6]. This was not observed because the biases in forest area estimation were small particularly with optical data. The number of observations in the individual non-forest classes was low, which does not justify firm conclusions at class level. However, the number of non-forest observations in these classes in total was large enough to assess the performance of forest and non-forest classification.
The cloudiness in the optical data reduced the size of the common data set where there was adequate data in all classifications, from the assessed 840 plots to 320. The accuracy assessments using all the available plots for each classification gave similar accuracies to the test with the common plot set. The ranking of the performance of the image types also remained the same.

Suitability of the Approach for MRV
The small biases that were achieved in this study indicated that the applied estimation procedure is applicable for MRV type inventories. The bias should be minimized in the operational MRV by applying a statistical sampling framework throughout the whole process. Provision of accurate maps of the non-forest classes may be difficult in conditions similar to the study area but combining the classification with statistically representative reference data can produce reliable information on the areas of the classes [55].
The image analysis methods for the optical data are relatively mature and established and give similar results despite the applied algorithm. No complex image pre-processing is required, which means a robust and transparent estimation procedure of land cover classes. SAR analysis is still more experimental as compared to analysis of optical satellite data and it requires special expertise which decreases the transparency and robustness. The robustness is further decreased by the effect of terrain topography, surface roughness and moisture. Better elevation models will be available (https://www.intelligence-airbusds.com/worlddem/) to correct radiometry of SAR images and image pre-processing and analysis methods become simpler for regular users. This improves the potential of SAR-based land cover classification.
In our study area, the completeness of classification greatly suffered from cloud cover reducing the size of the reference data and area for which classification could be done using optical data. This would favour using L-band SAR as the preferred source. However, the availability of cloud-free medium resolution optical data has dramatically improved after the launch of the Sentinel-2 satellites that currently provide global revisits every five days (https://sentinels.copernicus.eu/documents/ 247904/685154/Sentinel_High_Level_Operations_Plan). The increased frequency of data acquisition improves the possibility of utilizing temporal information in land cover classification.

Conclusions
Our results match the findings of other studies that use optical data [15][16][17][18][19]. In the case of SAR data, the situation is less clear. In this study, classification with ALOS PALSAR gave nearly as high accuracy as the classifications with optical data whereas the ASAR based result was poor. In our own previous studies, good results with C-band ERS SAR were achieved on a flat terrain in forest and non-forest classification of tropical forest [32] and poorer accuracies than in this study were found with ALOS PALSAR in another tropical region [34]. Thus, our results reflect the observations of other SAR based studies in which variable accuracies have been observed [17,18,30,31]. A particular tendency with SAR, also observed in our study, is systematic underestimation of non-forest classes [26]. Synergistic use of optical and SAR data sources has been reported to increase the accuracy of the classification by one to two percentage points [18,25]. In our study, both overall accuracy of forest and non-forest classification and the overall accuracy of the more detailed classification increased by one percentage point, when Landsat TM and ALOS PALSAR data were combined. The benefit of such low or negligible marginal utility from inclusion of data from two different sensor types may be questionable since the classification models become more difficult to understand, image analysis is more complex and results are potentially less predictable.
Considering the pros and cons of the data types, we recommend optical data of 10 m spatial resolution as the primary data source for REDD MRV purposes. A poorer resolution, similar to Landsat TM, is approximately as powerful when the minimum mapping unit and population unit of the sampling is 50 m by 50 m, as in this study, or larger. The potential of using the optical data is supported by excellent availability of free and open data, well-known image processing and analysis methods, and relatively good insensitivity to different terrain conditions. Extensive use of L-band SAR for forest cover monitoring for operational purposes would require guaranteed long-term availability of the data, preferably under the same data policy as with the Sentinels. In addition, improved digital elevation models and image processing methods that require less expert knowledge should be available. Considering the present maturity of the imaging systems and image analysis methods, the L-band SAR is recommended as a secondary data source. The C-band SAR, although it has poorer potential than the L-band, is applicable in stratification for a statistical sampling when other image types are unavailable. The low-frequency P-band SAR that will be onboard the ESA Biomass mission is not expected to be a major operational tool in the near future for REDD MRV. The mission with a scheduled launch in 2022 will be experimental and the spatial resolution of 50 m to 100 m may not be adequate. Furthermore, several technical issues restrict data availability and usefulness [56]. In all cases, the data source used should be known for every pixel of the classification outputs for MRV. Our research will continue by combining Copernicus Sentinel and L-band SAR data and a VHR data sample in a statistical framework to support REDD MRV and forest cover monitoring. High level of method automation, while sustaining reliability is on the research agenda.  Table A3. Confusion matrix for the ALOS PALSAR classification with filtering for the six IPCC compatible classes (UA = user's accuracy, PA = producer's accuracy, OA = overall accuracy).  Table A4. Confusion matrix for the Envisat ASAR classification with filtering for the six IPCC compatible classes (UA = user's accuracy, PA = producer's accuracy, OA = overall accuracy).