Comparison of Support Vector Machine and Random Forest Algorithms for Invasive and Expansive Species Classification using Airborne Hyperspectral Data

Invasive and expansive plant species are considered a threat to natural biodiversity because of their high adaptability and low habitat requirements. Species investigated in this research, including Solidago spp., Calamagrostis epigejos, and Rubus spp., are successfully displacing native vegetation and claiming new areas, which in turn severely decreases natural ecosystem richness, as they rapidly encroach on protected areas (e.g., Natura 2000 habitats). Because of the damage caused, the European Union (EU) has committed all its member countries to monitor biodiversity. In this paper we compared two machine learning algorithms, Support Vector Machine (SVM) and Random Forest (RF), to identify Solidago spp., Calamagrostis epigejos, and Rubus spp. on HySpex hyperspectral aerial images. SVM and RF are reliable and well-known classifiers that achieve satisfactory results in the literature. Data sets containing 30, 50, 100, 200, and 300 pixels per class in the training data set were used to train SVM and RF classifiers. The classifications were performed on 430-spectral bands and on the most informative 30 bands extracted using the Minimum Noise Fraction (MNF) transformation. As a result, maps of the spatial distribution of analyzed species were achieved; high accuracies were observed for all data sets and classifiers (an average F1 score above 0.78). The highest accuracies were obtained using 30 MNF bands and 300 sample pixels per class in the training data set (average F1 score > 0.9). Lower training data set sample sizes resulted in decreased average F1 scores, up to 13 percentage points in the case of 30-pixel samples per class.


Introduction
The spread of invasive and expansive species is one of the main threats to biodiversity and functioning of ecosystems [1]. This results in transformation of natural habitats, displacement of native species, and degrading environmental conditions (e.g., number of existing micro-and macrophytes). It also generates economic losses by degrading the quality of soil and destroying road and railway infrastructure [2]. In the European Union (EU), it is estimated that the cost of controlling and combating invasive species amounts to approximately 12 billion EUR per year [3]. Implementation of appropriate remedial strategies and effective limitation of the invasion's effects require constant monitoring, which is emphasized in the EU Regulation No. 1143/2014.
The species that pose a threat to natural habitats protected under the Natura 2000 program in Poland include, for example, native expansive plants such as blackberry shrubs (Rubus spp. L.), perennial wood small-reed (Calamagrostis epigejos (L.) Roth), and foreign invasive goldenrod species (light detection and ranging) products from the Riegl LMS-Q680i scanner were used in the study, obtaining the highest median Kappa of 0.85 (F1 = 0.89, which is a mathematical product of the user (UA) and producer accuracies (PA)) for M. caerulea identification and 0.65 (F1 = 0.73) for C. epigejos [26].
The use of SVM and RF methods yielded good results during the classification of 20 types of grassy vegetation in the Hortobágy National Park in eastern Hungary on the basis of AISA Eagle II data [27]. The highest accuracy of classification was obtained on the first nine Minimum Noise Fraction (MNF) transformation bands of the hyperspectral image and by using 30 random training pixels (OA SVM = 82.06%, OA RF = 79.14%, OA ML = 80.78%). However, when the training set was reduced to 10 pixels, SVM and RF methods still maintained high levels of accuracy (79.57% and 76.55%, respectively), while the ML accuracy dropped significantly to 52.56%. The low level of sensitivity to the training sample size is a big advantage of these algorithms, especially SVM. On the other hand, the RF algorithm had a short image classification time (3 minutes) compared to the other methods used on the same data set (SVM = 16 min, ML = 8 min). Studies of Mediterranean vegetation (mainly shrubs varying in height from about 0.5 m to almost 5 m) that were carried out in Languedoc in southern France demonstrated that RF and SVM methods obtained better information from hyperspectral data than any traditional classifiers (e.g., classification tree (CT), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k-nearest neighbor (k-nn)), especially when the spectral differences between classes were small [21]. When distinguishing 15 species of plants, the overall accuracies of the classification for modern methods, i.e., SVM and RF (OA SVM = 39.2-47.9%, OA RF = 39.3-49.5%), were higher than those recorded for traditional methods (OA CT = 28.6-44.4%, OA LDA = 37-45.1%, OA QDA = 37.5-39.3%, OA k-nn = 18-28.8%), depending on the set of input data. The artificial neural network (ANN) method was also used to identify plant species; however, this experiment did not lead to satisfactory results.
The aim of the current analysis was to verify whether the expansive/invasive Rubus spp., Calamagrostis epigejos, and Solidago spp. were characterized by a specific set of spectral characteristics that allowed them to be distinguished from the surrounding species, which altogether create a mix of fuzzy, covered patterns. Moreover, an analysis of the impact of the number of pixels in training data set on the classification accuracy was performed. Well-known reference classification algorithms were applied, SVM and RF methods, which are commonly used because of their effectiveness.
The proposed method could be applied in extensively used agricultural areas (considering traditionally used farming methods), and not limited to only selected test areas.

Study Area and Objects of the Study
The research area was located in southern Poland near the town of Malinowice (Silesian Province) and covered an area of approximately 10.6 km 2 of the Natura 2000 habitat (Figure 1). This is an upland area covering the Tarnogóra Hummock and the Katowice Upland and is in a transitional temperate climate. This area is dominated by grasslands, meadows, and forests. Blackberry (mainly Rubus caesius L., European dewberry), various species of goldenrod (Solidago spp.), and wood small-reed grass (Calamagrostis epigejos) occur very frequently in this area.
Rubus spp. L., a genus of plant in the Rosaceae family commonly called bramble, is one of the most important expansive species [28]. Blackberries are native to Asia, Europe, and North and South America [29], and they often pose a threat to young forest crops and habitats protected under the Natura 2000 program. They are typically shrubs (can be up to 3 meters high) with perennial roots, biennial prickly stems, and edible fruits which are aggregates of drupelets [29]. Blackberries can be found in all kinds of environments, including forests, shrubs, meadows, wastelands, and roadsides. Vegetative reproduction and production of a large number of seeds that are spread by birds and other animals allows them to quickly colonize new areas [30]. They bloom from May to September. According to the latest data, there are 105 Rubus species in Poland alone [31]. Rubus spp. L. is linked to Remote Sens. 2020, 12, 516 4 of 21 negative economic and environmental consequences (e.g., changes in the dominant type of vegetation, soil depletion, or increased susceptibility to fires) [32]. The spectral characteristics of Rubus spp. are very similar, which is why they were identified collectively in the paper without division into individual species. Rubus spp. L., a genus of plant in the Rosaceae family commonly called bramble, is one of the most important expansive species [28]. Blackberries are native to Asia, Europe, and North and South America [29], and they often pose a threat to young forest crops and habitats protected under the Natura 2000 program. They are typically shrubs (can be up to 3 meters high) with perennial roots, biennial prickly stems, and edible fruits which are aggregates of drupelets [29]. Blackberries can be found in all kinds of environments, including forests, shrubs, meadows, wastelands, and roadsides. Vegetative reproduction and production of a large number of seeds that are spread by birds and other animals allows them to quickly colonize new areas [30]. They bloom from May to September. According to the latest data, there are 105 Rubus species in Poland alone [31]. Rubus spp. L. is linked to negative economic and environmental consequences (e.g., changes in the dominant type of vegetation, soil depletion, or increased susceptibility to fires) [32]. The spectral characteristics of Rubus spp. are very similar, which is why they were identified collectively in the paper without division into individual species.
Another widespread, expansive species that degrades grassland and meadow communities is Calamagrostis epigejos (L.) Roth, commonly referred to as wood small-reed [33]. It is a perennial grass in the Poaceae family, which is native to the Eurasian area [5], and has spread to North America [34]. The plant has thick and rigid blades that can be up to 2 meters high and has complex inflorescences in the form of a panicle. Wood small-reed propagates vegetatively, through numerous stolons, as Another widespread, expansive species that degrades grassland and meadow communities is Calamagrostis epigejos (L.) Roth, commonly referred to as wood small-reed [33]. It is a perennial grass in the Poaceae family, which is native to the Eurasian area [5], and has spread to North America [34]. The plant has thick and rigid blades that can be up to 2 meters high and has complex inflorescences in the form of a panicle. Wood small-reed propagates vegetatively, through numerous stolons, as well as generatively, through seeds (i.e., kernels) [35]. It blooms from July to September, often forming extensive single-species fields whose colors vary from green to brown to purple. Wood small-reed grows in meadows, forests, urban areas, along railways, and on the roadsides. A large amount of reed biomass is deposited in non-hay areas, and its lengthy decomposition time causes acidification of the substrate and hinders development of other plants [36].
Remote Sens. 2020, 12, 516 5 of 21 Some of the most invasive plants that pose a huge threat to native species and biodiversity of entire ecosystems are representatives of the goldenrod genus (Solidago spp. L.). They are perennials from the Asteraceae family, imported from North America to Europe as decorative plants [37]. Goldenrod occurs in the form of three invasive species: Solidago canadensis (Canadian goldenrod), Solidago gigantea (tall goldenrod), and Solidago graminifolia (grass-leaved goldenrod) [38,39]. These plants have stiff sprouts that can be up to 2 meters tall, ending in pyramidal panicle clusters, which are formed by flowers clustered in heads [40]. They propagate vegetatively, thanks to underground rhizomes, and generatively with the help of light seeds (achenes with pappus) that can be spread over considerable distances [41]. They quickly begin to dominate and often form dense single-species patches. They bloom from July to October, forming characteristic yellow inflorescences. Goldenrods have a high tolerance for various soil types, but they require exposure to full sun [42]. They grow in open habitats such as meadows, wastelands, anthropogenic areas, and along roads and river banks [2].

Field Measurements
The field studies were conducted in the summer of 2017. Compact polygons (in the shape of circles with a radius of 3 meters) of Rubus spp., Solidago spp., Calamagrostis epigejos, and other background plants were located within the research area with the help of the Leica CS20 GNSS device ( Figure 2). The number of polygons was proportional to the prevalence of species in the research area and amounted to 50 polygons for blackberry and wood small-reed, 60 polygons for goldenrod, and 100 polygons for background plants.
Remote Sens. 2018, 10, x FOR PEER REVIEW 5 of 23 forming extensive single-species fields whose colors vary from green to brown to purple. Wood small-reed grows in meadows, forests, urban areas, along railways, and on the roadsides. A large amount of reed biomass is deposited in non-hay areas, and its lengthy decomposition time causes acidification of the substrate and hinders development of other plants [36].
Some of the most invasive plants that pose a huge threat to native species and biodiversity of entire ecosystems are representatives of the goldenrod genus (Solidago spp. L.). They are perennials from the Asteraceae family, imported from North America to Europe as decorative plants [37]. Goldenrod occurs in the form of three invasive species: Solidago canadensis (Canadian goldenrod), Solidago gigantea (tall goldenrod), and Solidago graminifolia (grass-leaved goldenrod) [38,39]. These plants have stiff sprouts that can be up to 2 meters tall, ending in pyramidal panicle clusters, which are formed by flowers clustered in heads [40]. They propagate vegetatively, thanks to underground rhizomes, and generatively with the help of light seeds (achenes with pappus) that can be spread over considerable distances [41]. They quickly begin to dominate and often form dense single-species patches. They bloom from July to October, forming characteristic yellow inflorescences. Goldenrods have a high tolerance for various soil types, but they require exposure to full sun [42]. They grow in open habitats such as meadows, wastelands, anthropogenic areas, and along roads and river banks [2].

Field measurements
The field studies were conducted in the summer of 2017. Compact polygons (in the shape of circles with a radius of 3 meters) of Rubus spp., Solidago spp., Calamagrostis epigejos, and other background plants were located within the research area with the help of the Leica CS20 GNSS device ( Figure 2). The number of polygons was proportional to the prevalence of species in the research area and amounted to 50 polygons for blackberry and wood small-reed, 60 polygons for goldenrod, and 100 polygons for background plants. Then, the polygons were transferred to ArcMap 10.3, where photo interpretation techniques were used to create an additional 30 reference polygons for other types of land cover (i.e., trees, buildings, bare soil, and shaded areas). These additional classes were meant to indicate for the algorithm the spectral properties of objects that occurred in the research area and constituted nonforest vegetation. Finally, reference polygons were created for eight classes: Calamagrostis epigejos, Solidago spp., Rubus spp., other plants, trees, bare soils, buildings, and shadows ( Figure 1).

Airborne hyperspectral HySpex data
Aerial hyperspectral data were obtained by MGGP Aero Sp. z o.o. on 29 August 2017 using sensors located on a Cessna 402B aircraft. A hyperspectral image with a 16-bit radiometric resolution was recorded with two HySpex Visible and Near Infrared (VNIR-1800) scanners and a Shortwave Infrared (SWIR-384) scanner. The specification of both sensors is provided in the table below (Table  1). Then, the polygons were transferred to ArcMap 10.3, where photo interpretation techniques were used to create an additional 30 reference polygons for other types of land cover (i.e., trees, buildings, bare soil, and shaded areas). These additional classes were meant to indicate for the algorithm the spectral properties of objects that occurred in the research area and constituted non-forest vegetation. Finally, reference polygons were created for eight classes: Calamagrostis epigejos, Solidago spp., Rubus spp., other plants, trees, bare soils, buildings, and shadows ( Figure 1).

Airborne Hyperspectral HySpex Data
Aerial hyperspectral data were obtained by MGGP Aero Sp. z o.o. on 29 August 2017 using sensors located on a Cessna 402B aircraft. A hyperspectral image with a 16-bit radiometric resolution was recorded with two HySpex Visible and Near Infrared (VNIR-1800) scanners and a Shortwave Infrared (SWIR-384) scanner. The specification of both sensors is provided in the table below (Table 1). The aerial hyperspectral images were prepared for further processing in accordance with the diagram presented in the schema ( Figure 3). The data obtained by hyperspectral sensors were converted to radiance units with HySpex RAD software. Then, the hyperspectral image was subjected to geometric correction, which employed the digital surface model in PARGE (PARametric GEocoding) software (ReSe Applications LLC, Wil, Switzerland), and atmospheric correction was performed with the MODTRAN5 model in ATCOR4 (ATmospheric CORrection) software (ReSe Applications LLC, Wil, Switzerland). Nine flight lines were mosaicked and re-sampled to achieve a uniform spatial resolution of 1 m. Next, the last 21 bands in the SWIR range were removed because of the high level of noise caused by the sensor's lower SNR (Signal to Noise Ratio) at the extreme ranges of the imaged spectrum, which ultimately resulted in a 430-band image in the spectral range of 416.18-2396.44 nm.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 23 The aerial hyperspectral images were prepared for further processing in accordance with the diagram presented in the schema ( Figure 3). The data obtained by hyperspectral sensors were converted to radiance units with HySpex RAD software. Then, the hyperspectral image was subjected to geometric correction, which employed the digital surface model in PARGE (PARametric GEocoding) software (ReSe Applications LLC, Wil, Switzerland), and atmospheric correction was performed with the MODTRAN5 model in ATCOR4 (ATmospheric CORrection) software (ReSe Applications LLC, Wil, Switzerland). Nine flight lines were mosaicked and re-sampled to achieve a uniform spatial resolution of 1 m. Next, the last 21 bands in the SWIR range were removed because of the high level of noise caused by the sensor's lower SNR (Signal to Noise Ratio) at the extreme ranges of the imaged spectrum, which ultimately resulted in a 430-band image in the spectral range of 416.18-2396.44 nm.  In order to reduce HySpex data dimensionality, Minimum Noise Fraction (MNF) transformation was applied. Based on MNF bands, eigenvalues, and visual assessment of transformed bands, the first 30 MNF bands were selected for further processing. Finally, two data sets were prepared for species classification: the first one contained 430 HySpex bands while the second one contained 30 MNF bands.

Classification Process and Accuracy Assessment
One of our goals was to analyze the impact of pixel number in the training data sets on classification accuracy; hence, we created training data sets with a set number of pixels per class. Using stratified random sampling, 50% of all reference polygons were selected to create a training test data set, while remaining polygons were used to create a validation data set.
The training test data set was used to create subsets (training data sets with a set number of pixels per class), and all remaining samples ended up in the test data set that was used for preliminary accuracy assessment. The validation data set was created to eliminate spatial autocorrelation with the training test data set (randomly selected pixels were used in the training and validation sets). This allowed to create a spatially independent and stable validation data set, which was used to assess final results.
To investigate the influence of training data set size on achieved classification results, the training test data set was sub-sampled to create training data sets that contained exactly 30, 50, 100, 200, and 300 pixels per class. These sub-sampled data sets will be used for classifier training. If a given class had fewer total available samples than required, random sampling with replacement was used, otherwise random sampling without replacement was employed. If all available pixels for given class were selected for training purposes, a copy of training data for this class was used instead.
An iterative accuracy assessment was used in order to objectively compare achieved results. This was a procedure consisting of the following steps repeated 100 times: 1.
Sub-sample the training test data set in order to create a training data set with a set number of samples per class; 2.
Assess accuracy using test and validation data sets; and 4.
Save trained classifier models and accuracy measures for further analysis.
Pixel classification was carried out on the basis of the Support Vector Machine and Random Forest algorithms in R software. The first stage of the training process was to optimize the learning parameters of these algorithms in order to obtain the best possible settings. This task was completed on the training and test sets before the division. A radial basis function was chosen for the SVM algorithm because of its proven efficiency [43] and smaller number of computational difficulties [44]. The learning parameters of the compared classification algorithms were subjected to a tuning process. A gamma value of 0.1 and cost of 1000 was obtained for the SVM algorithm. In the case of the Random Forest algorithm, on the basis of the out-of-bag (OOB) error analysis, the mtry parameter (the number of features randomly sampled at each split) was set at 140 for classification on 430 hyperspectral bands and at 13 for classification on the set of the first 30 MNF transformation bands. In both cases, the number of random trees (ntree) amounted to 500.
In this work, we compared two classification algorithms (SVM and RF), two different data sets (430 original hyperspectral bands, 430 HS, and 30 Minimum Noise Fraction bands, 30 MNF), and five different sample sizes per class in the training data set (30,50,100,200, and 300 pixels). Due to the unavailability of the larger continuous areas of invasive plants on our study area, we have limited the analysis to 300 pixels. All combinations of the above parameters were tested, resulting in 20 different classification scenarios.
Accuracy of the performed classifier training was assessed with the set of test data and the data spatially separated from the training and test set (i.e., on pixels of the validation data set), which was constant for all scenarios. The algorithms were compared, and the best combination of image data set and classifier was determined based on validation performance. The following accuracy parameters were calculated on the basis of the error matrix: Remote Sens. 2020, 12, 516 8 of 21

•
Overall accuracy-the ratio of the total number of correctly classified pixels to the total number of reference pixels [45]; • Cohen's Kappa-the similarity of the analyzed classification compared to the random classification (a Kappa value of 0 means full similarity while 1 means no similarity) [46]; • Producer's Accuracy (PA)-the ratio of correctly classified pixels of a given class to all pixels in the validation data set for this class [45]; • User's Accuracy (UA)-the ratio of pixels correctly classified in a given class to all pixels classified in this category [45]; and • F1 score sensitivity, measured using harmonic mean of precision (P), positive predictive value, and recall (R) as in Equation (1) [47,48]: Afterwards, the best models for each classifier and data set were selected on the basis of the mean F1 scores for all classes (based on the validation data), and the images were classified. The significance of statistical differences between the accuracy of the models was checked using the Mann-Whitney-Wilcoxon test [49] (significance level = 0.05). The Mann-Whitney-Wilcoxon test is well suited for testing differences between non-normally distributed populations [26,50]. Distributions of achieved accuracy measures for all classification scenarios were visualized using box plots. A detailed explanation of boxes used in box plots is shown in Figure 4. and classifier was determined based on validation performance. The following accuracy parameters were calculated on the basis of the error matrix:  Overall accuracy-the ratio of the total number of correctly classified pixels to the total number of reference pixels [45];  Cohen's Kappa-the similarity of the analyzed classification compared to the random classification (a Kappa value of 0 means full similarity while 1 means no similarity) [46];  Producer's Accuracy (PA)-the ratio of correctly classified pixels of a given class to all pixels in the validation data set for this class [45];  User's Accuracy (UA)-the ratio of pixels correctly classified in a given class to all pixels classified in this category [45]; and  F1 score sensitivity, measured using harmonic mean of precision (P), positive predictive value, and recall (R) as in Equation (1) [47,48]: Afterwards, the best models for each classifier and data set were selected on the basis of the mean F1 scores for all classes (based on the validation data), and the images were classified. The significance of statistical differences between the accuracy of the models was checked using the Mann-Whitney-Wilcoxon test [49] (significance level = 0.05). The Mann-Whitney-Wilcoxon test is well suited for testing differences between non-normally distributed populations [26,50]. Distributions of achieved accuracy measures for all classification scenarios were visualized using box plots. A detailed explanation of boxes used in box plots is shown in Figure 4. Moreover, classifier training was performed on nine classes, each with an identical number of training samples to reduce any effect of unbalanced training data. After classifier training, background classes were considered as one class with relation to plant classes. Such steps allowed us to properly assess classification quality (which classes are confused with which) and helped us achieve the most accurate results. In our work we assumed that confusion between background classes was acceptable, while confusion between plant species and background classes or other plant species would be a concern that would need to be addressed and reported. When classifying plant species, it is important to deliver a suitable and representative sample of pixels that characterize objects other than object of the study. Such classes can be oftentimes referred to as background classes. Since our study aimed to investigate the influence of training data set size, it would be insufficient to perform classification of four classes, that is, three plant species and one class with background objects. This mainly is due to difficulties in randomly sampling background classes in such a way that, for example, 30 pixels will represent them all. In fact, such an approach would almost guarantee that pixels for background classes covering a relatively small area would not be included Moreover, classifier training was performed on nine classes, each with an identical number of training samples to reduce any effect of unbalanced training data. After classifier training, background classes were considered as one class with relation to plant classes. Such steps allowed us to properly assess classification quality (which classes are confused with which) and helped us achieve the most accurate results. In our work we assumed that confusion between background classes was acceptable, while confusion between plant species and background classes or other plant species would be a concern that would need to be addressed and reported. When classifying plant species, it is important to deliver a suitable and representative sample of pixels that characterize objects other than object of the study. Such classes can be oftentimes referred to as background classes. Since our study aimed to investigate the influence of training data set size, it would be insufficient to perform classification of four classes, that is, three plant species and one class with background objects. This mainly is due to difficulties in randomly sampling background classes in such a way that, for example, 30 pixels will represent them all. In fact, such an approach would almost guarantee that pixels for background classes covering a relatively small area would not be included in the training data set with a sufficient number of samples, which in turn would destroy any credibility of such work. To address this issue when creating the training data set, each background class (shadows, trees, other plants, soils, and buildings) had the same number of training samples, equal to the number of samples used for each plant species class. This is to ensure our background classes had similar representation to the plant species classes during classifier training.

Statistical Analysis of Investigated Classification Scenarios
The mean F1 score was calculated for all classes on two sets: the test set and the validation set. The test set was dependent on the training set-the pixels in these sets were drawn from the same polygons, so the number of pixels in the test set decreased with an increasing number of pixels in the training set ( Table 2). The high accuracy level obtained for this set is, therefore, not surprising, nor can it be used to compare the classifiers.
In contrast, the validation set had a fixed number of observations (4835 pixels) and was spatially independent of the other data sets. Regardless of the classifier used, higher mean F1 scores for all classes based on the validation set were obtained for classifications performed on 30 MNF transformation bands (0.854-0.918) compared to that of the 430 hyperspectral data bands (0.760-0.853). The accuracy level for both classifiers increased with the number of training pixels used for classification ( Figure 5). The distributions of the mean F1 score for all classes revealed that when the number of training pixels increased, the interquartile range of the obtained accuracies decreased, so the results obtained in 100 iterations were more stable. What is more, the use of a smaller number of training pixels caused a greater decrease in the accuracy of classifications performed on the original hyperspectral bands than in the case of classifications performed on the MNF transformation bands. The most stable distributions and the highest F1 scores for all classes were obtained by the classifications performed on a set of 30 MNF transformation bands and 300 training pixels (the median F1 for RF was about 0.92, while the median F1 for SVM was about 0.88). number of training pixels increased, the interquartile range of the obtained accuracies decreased, so the results obtained in 100 iterations were more stable. What is more, the use of a smaller number of training pixels caused a greater decrease in the accuracy of classifications performed on the original hyperspectral bands than in the case of classifications performed on the MNF transformation bands. The most stable distributions and the highest F1 scores for all classes were obtained by the classifications performed on a set of 30 MNF transformation bands and 300 training pixels (the median F1 for RF was about 0.92, while the median F1 for SVM was about 0.88). In order to check if there are statistically significant differences in the F1 scores of all the tested scenarios, the Mann-Whitney-Wilcoxon test was carried out at the significance level of 0.05 ( Figure  6). There were statistically significant differences between most of the considered scenarios. The SVM classifications on MNF bands using 200 and 300 pixels for classifier training were the only exception. There were no statistically significant differences found for the RF classification performed on 430 hyperspectral bands using 300 training pixels and the SVM classification on a very limited data set consisting of 30 MNF bands and 30 training pixels. In order to check if there are statistically significant differences in the F1 scores of all the tested scenarios, the Mann-Whitney-Wilcoxon test was carried out at the significance level of 0.05 ( Figure 6). There were statistically significant differences between most of the considered scenarios. The SVM classifications on MNF bands using 200 and 300 pixels for classifier training were the only exception. There were no statistically significant differences found for the RF classification performed on 430 hyperspectral bands using 300 training pixels and the SVM classification on a very limited data set consisting of 30 MNF bands and 30 training pixels.  The Solidago spp. class identified well with all classifiers and raster data sets (the F1 score was above 0.95). The accuracy levels increased with an increasing number of training pixels, whereas the differences in accuracy levels resulting from the change in the size of the training sets were small. However, slightly higher mean F1 scores were recorded for the Random Forest classifier. Solidago are The Solidago spp. class identified well with all classifiers and raster data sets (the F1 score was above 0.95). The accuracy levels increased with an increasing number of training pixels, whereas the differences in accuracy levels resulting from the change in the size of the training sets were small. However, slightly higher mean F1 scores were recorded for the Random Forest classifier. Solidago are marked by their very characteristic yellow color and spectral properties, which distinguished them from other classes in the imaging, and additionally tend to form large, uniform fields, so the almost perfect identification of this species was not surprising.
In the case of the Rubus spp. class, the best identification results were obtained for the SVM classification on 30 MNF bands using 300 training pixels (F1 = 0.97), but application of the same classifier with the number of training pixels reduced to 100 resulted in a similar accuracy level. Good results were also obtained for the RF classification on the same raster data set and 300 training pixels (F1 = 0.95). The F1 scores obtained on 430 hyperspectral data bands were lower (F1 RF from 0.7 to 0.76, and F1 SVM from 0.71 to 0.84).
Calamagrostis epigejos was a more difficult plant species to identify. However, high F1 scores of around 0.91 were obtained using the SVM algorithm, 30 MNF transformation bands, and sets of 200 and 300 training pixels. A similar accuracy level was also obtained for the SVM classification and 300 training pixels on 430 hyperspectral bands (F1 = 0.9). The Random Forest classification resulted in lower accuracy levels for this species, with F1 scores between 0.7 and 0.82 on the hyperspectral data set, and between 0.76 and 0.83 on the MNF transformation bands. The accuracy increased with the growth of the number of training pixels.
Considering the mean accuracy level for three species identified in the research area, it can be concluded that the best spatial distribution was obtained using the SVM algorithm and 200 or 300 training pixels (F1 = 0.95). For the other classes distinguished in the image (i.e., plant background, forests, buildings, bare soil, and shadows), the best F1 scores (from 0.93 to 0.96) were obtained with the RF algorithm. However, in terms of accuracy for all the classes together, the best accuracy (Kappa = 0.92, F1 for all classes = 0.92) was obtained for the RF classifier, 30 MNF bands, and sets of 200 and 300 training pixels.
To sum up, the SVM algorithm and the data set consisting of 30 MNF bands and 300 training pixels proved to be the best for identifying the Calamagrostis and Rubus classes. In the case of Solidago and background classes, better results were obtained with the Random Forest classifier. However, goldenrod classified well (mean F1> 0.95) on both sets of raster data and with a different numbers of training pixels. On the other hand, in the case of background classes, the best results were obtained for 30 MNF bands and 200 training pixels. This may indicate that the Random Forest method works better for the classification of spectrally uniform, large forms of land use, which differ significantly from their surroundings, while the SVM method is better for identifying plant species that are more spectrally different and similar to the background classes.

Best Model Plant Species Identification Accuracy
A set of data consisting of 30 MNF bands and 300 training pixels was selected on the basis of the analysis of statistical accuracy to develop images showing spatial distributions of the analyzed species in the research area. Figure 8 presents distributions of the producer and user accuracies for 100 iterations of classifications performed on a selected set of data using both classifiers.
For the Rubus spp. class, the RF classifier yielded a lower median user's accuracy than that of SVM by three percentage points, while the differences in the producer's accuracy levels between the classifiers were small. Both the producer's and user's accuracies for Solidago spp. were very high (close to 100%), a slight underestimation was detected only in the case of the SVM classification (producer's accuracy about 93%). In contrast, the Calamagrostis epigejos class achieved the lowest median producer and user accuracies of all classes. The SVM classifier achieved higher producer and user accuracy levels for C. epigejos (PA = 96%, UA = 87%) than the RF classifier (PA = 88%, UA = 78%).
The resulting images for both classification methods prepared for the best mean F1 scores for all iteration classes are presented and compared below (Figure 9). The correctness of species identification was also assessed on the basis of the confusion matrix (Tables 3 and 4). For the Rubus spp. class, the RF classifier yielded a lower median user's accuracy than that of SVM by three percentage points, while the differences in the producer's accuracy levels between the classifiers were small. Both the producer's and user's accuracies for Solidago spp. were very high (close to 100%), a slight underestimation was detected only in the case of the SVM classification (producer's accuracy about 93%). In contrast, the Calamagrostis epigejos class achieved the lowest median producer and user accuracies of all classes. The SVM classifier achieved higher producer and user accuracy levels for C. epigejos (PA=96%, UA=87%) than the RF classifier (PA=88%, UA=78%).
The resulting images for both classification methods prepared for the best mean F1 scores for all iteration classes are presented and compared below (Figure 9). The correctness of species identification was also assessed on the basis of the confusion matrix (Table 3 and Table 4).     Rubus spp. was identified near forest borders and buildings, and its spatial distribution for the SVM method reflected reality more accurately than the result of using the RF method ( Figure 9). There was a slight overestimation of this species in the case of the RF method, especially in places with trees and bushes near buildings ( Table 4). The Calamagrostis epigejos and Solidago spp. classes can be found in the open spaces of non-agricultural meadows. The spatial distribution of Solidago in the image resulting from the use of the RF method reflected reality almost perfectly, and in the case of the SVM method, the underestimation of this species applied mainly to uncut meadows in the south of the area. On the other hand, the Calamagrostis epigejos class was slightly overestimated in the results of both classifications, especially in places with dry or mowed meadows. The SVM classification image presents the spatial distribution of this species in the research area with greater precision (Table 3), and its estimations were more accurate, especially in places with bare soils, which have a similar spectral response.

Discussion
The effects of the raster data set and the number of training pixels on the classification accuracy of three invasive or expansive plant species were tested in this paper using the Random Forest and Support Vector Machine methods. The method we used to divide the patterns into three sets-the training set, the test set, and the spatially independent validation set-allows for reliable assessment of the classification accuracy. Balanced training sets of 30, 50, 100, 200, and 300 pixels per class were tested in this paper. The test set was strongly spatially correlated with the training set, which led to inflated accuracy results; therefore, it was used only for the initial accuracy assessment. However, surprisingly accurate measures (PA, UA, F1) calculated on the test data set increased, despite the decrease in the number of patterns in the test set. This highlights the importance of using spatially separate data set for proper accuracy assessments. A constant set of validation pixels that remained both unchanged between iterations and was spatially separate from training data allowed us to reliably assess the accuracy of classification. Spatial separation of the data sets used to assess classification results and train classifiers allowed us to avoid artificial inflation due to spatial correlation between pixels belonging to the same reference polygon. Such a method allows for more objective comparisons of classification algorithms and data sets, while delivering more trustworthy accuracy metrics. The very act of creating training and test or validation data sets introduces human or random bias into any comparison. In order to decrease such bias of our method, training and validation data sets were created multiple times. Such approaches were already used multiple times in the past [24,51,52] and are proven to be more reliable when it comes to classifier comparison. The accuracy of any machine learning procedure is directly related to the quality of samples used for training and validation of a given classifier. In order to decrease the impact of human or random bias in creating the data sets, training and validation data sets were created multiple times. Repeated sampling of pixels for the reference sets and assessing classification accuracy minimized the impact of pixel selection for training on the classification accuracy and allowed an objective assessment of the impact of the tested data sets on the effectiveness of species identification [26,52,53].
The analyses showed that, regardless of the selected classifier, a higher F1 score for all classes was obtained for classifications performed on 30 MNF transformation bands (0.854-0.918) than those on 430 hyperspectral data bands (0.760-0.853). The reduction in the number of input layers to several dozen of the most informative bands is recommended for the Random Forest and Support Vector Machine algorithms, as it allows one to obtain higher accuracy levels and significantly shortens the classification time [51,[54][55][56]. During the classification of herbaceous vegetation in the Hortobágy National Park (Eastern Hungary), a higher overall accuracy level was obtained for nine MNF transformation bands (SVM = 82.06%, RF = 79.14%) than for 128 original bands of AISA Eagle (SVM = 72.85%, RF = 72.89%) [27]. Similarly, when identifying tree species based on AISA Eagle data using the SVM algorithm, classification of the MNF-transformed data resulted in an increase of about 30% in the classification agreement compared to the classification performed on the original bands [57]. The first 30 MNF transformation bands were used, for example, to identify four invasive or expansive species in central Poland, obtaining high F1 scores of identification: about 0.80 for Filipendula ulmaria and Molinia caerulea, about 0.79 for Phragmites australis, and about 0.73 for Solidago gigantean [58].
The increase in the number of pixels used to train the F1 score classification for the three species analyzed in this article resulted in an increase of these values, but also a simultaneous decrease in their distribution width, which indicates stabilization of the results. Our observations indicate that the preferred number of training patterns is at least 300 pixels per class, regardless of the classifier used. In the case of 30 MNF and the SVM algorithm, 300 was the optimal value because there were no statistically significant differences between training data sets containing 200 and 300 pixels per class ( Figure 6). Due to the unavailability of the larger continuous areas of invasive plants on our research area, we have limited the analysis to 300 pixels, and therefore we were unable to assess impact of larger number of pixels per class in training data set on achieved classification results. A similar trend was noticed by testing different sets of training pixels (from 10 to 30 pixels) and raster data for the classification of 20 herbaceous species in Eastern Hungary by means of the SVM and RF algorithms [27]. Moreover, the highest overall accuracy (SVM: 82.06%; RF: 79.14%) was obtained using the largest of the tested sets of patterns (30 training pixels). The overall classification accuracy decreased with a decreasing number of training pixels (lower by about 2 percentage points for the set of 10 training pixels).
After a detailed analysis, it can be concluded that the Support Vector Machine algorithm was more resistant to smaller numbers of training patterns and allowed to obtain a higher mean F1 score for three plant species (F1 SVM = 0.95) compared to the Random Forest algorithm (F1 RF = 0.92) on the best data set (30 MNF, 300 training pixels). Lower mean F1 scores for background classes (F1 SVM = 0.82, F1 RF = 0.91) were noted in the SVM result image, but classification errors occurred mainly between different background classes and not between the background and plant species.
Visual interpretation of the result images and statistical accuracy analyses indicated that both classifiers detected the plant species of this study in the research area with a very high level of accuracy. Correct identification of species was also confirmed by additional field verifications carried out after the analyses. High classification accuracy levels obtained for the analyzed scenarios may also be due to the optimal time in which the imaging was obtained [26,59]. The analyzed species are in their flowering and fruiting phases at the turn of August and September, which makes them more distinctive thanks to their characteristic colors of inflorescences, fruits, and leaves ( Table 5).
The classification accuracy of the Solidago spp. species was very high (F1> 0.95) for both classifiers and the raster data. This is not surprising because this plant's yellow inflorescences form homogeneous fields, which are easy to distinguish from other objects in images, and it would probably be even possible to use photointerpretation for this task. The Solidago gigantea species was identified in central Poland using 30 MNF transformation bands (a mosaic of hyperspectral data from the same HySpex sensors) and the Random Forest method; a lower F1 score for the species, about 0.73, and a slightly higher F1 score for the background, about 0.94, were obtained [58]. Solidago spp. has also been classified with high accuracy (F1 about 0.83, UA = 0.71, PA = 1.0) on the Hungarian-Slovak cross-border site using 15 MNF bands (a mosaic of hyperspectral data from AISA Eagle II) and the maximum likelihood method [61]. High identification accuracy of one of the goldenrod species, Solidago altissima (F1 score of about 0.86, UA = 0.94, PA = 0.80), was also obtained during the research conducted in Watarase wetlands in Japan with the help of only 3 MNF transformation bands (a mosaic of hyperspectral data from AISA Eagle) and generalized linear models [19]. Rubus spp. was classified in the research area with F1 scores ranging from 0.70 to 0.97, with the highest accuracy obtained for the Support Vector Machine method and 30 MNF transformation bands. High accuracy (OA = 87.8% and Kappa = 0.75) was also obtained during the detection of Rubus armeniacus in open areas in Surrey, BC, Canada, by means of a combination of CASI hyperspectral imagery with LiDAR data and the Random Forest algorithm [62]. Similarly, when identifying Rubus fruticosus sp. agg. in the Kosciuszko National Park in Australia, a F1 score of about 0.83 was obtained for blackberry using 23 bands of a mosaic of hyperspectral data from HyMap after MNF transformation and the Mixture-Tuned Matched Filter (MTMF) algorithm [32]. On the other hand, research on the identification of Rubus cuneifolius species in the eastern parts of South Africa using the SVM algorithm and multispectral data led to results that were much lower in accuracy: the F1 scores for the Landsat data varied from 0.33 to 0.48, while the scores for the Sentinel-2 data were between 0.34 and 0.58, which confirms that hyperspectral data allow for much more accurate detection of blackberries [63].
Identification of Calamagrostis epigejos resulted in F1 scores between 0.70 and 0.91, depending on the algorithm and data set used. As before, the best data set for wood small-reed classification turned out to be the SVM algorithm and MNF-transformed bands (F1 scores from 0.86 to 0.91), while the RF method resulted in F1 scores between 0.76 and 0.83, depending on the number of pixels used for training. By carrying out C. epigejos classifications at various growth stages, it was confirmed that flowering time (around September) facilitated correct identification of wood small-reed [26]. In addition, the use of the Random Forest method and MNF transformation bands on the HySpex hyperspectral data led to an F1 score of 0.72, which is an accuracy level close to the one obtained for wood small-reed in our research. Lower accuracy was obtained (producer accuracy 68%, and user accuracy 51%) in the classification of plant communities representing the Calamagrostis villosa species when the APEX data and the SVM method were used [60]. However, an average PA of about 82% and UA of about 75% were obtained for wood small-reed grasses during the classification of high-mountain vegetation communities using 40 MNF transformation bands of the DAIS 7915 data and neural networks [64]. This was similar to the results obtained in our work on 30 MNF bands with the RF algorithm (PA of about 88%, UA of about 78%) and was lower than the results for SVM (PA of about 96%, UA of about 87%).

Conclusions
The above-presented research concerning identification of three species of invasive or expansive plants using the Random Forest and Support Vector Machine classification methods, as well as various sets of input data, has led to the following conclusions:

•
The accuracy assessment method presented in the paper allows us to confirm that all analyzed species can be identified in heterogenous habitats (achieved classification results F1 oscillated around 0.90). The species created a unique set of spectral properties, which are recognizable by the SVM and RF classifiers, and separating training and validation sets at the level of the reference polygons, and not at the level of individual pixels, is justified. This allows one to avoid overestimating the accuracy of the results due to spatial correlation of pixels from the same reference polygon. We have shown a clear need to divide classes into training and validation at the polygon level in order to minimize spatial correlation between samples and in order to achieve unbiased and accurate classification metrics. A spatially separate and unchanging validation set can be used to improve the quality of the obtained accuracy scores and compare the results more objectively. Unfortunately, this type of approach makes it more difficult to use iterative methods of assessing accuracy or significantly reduces the number of observations in a data set that can be used for classifier training. What is more, the principles of a constant and unchanging validation set are not optimal, which may negatively affect the quality of the resulting post-classification images. A set of 30 MNF bands allows for more accurate identification of the analyzed invasive and expansive plant species than that of the 430 original spectral bands of the HySpex image. • Increase of number of pixels per class in training data set has a greater effect on achieved accuracy measures in the case of 430 spectral bands data set (difference in medians between 30 and 300 pixel data sets around 8 percentage points (p.p.) in the case of RF and 9 p.p. in the case of SVM algorithm) then in the case of 30 MNF bands data set (median difference between 30 and 300 pixel data set: 5 p. p. for RF and 3 p.p for SVM, Figure 7).
• Three hundred pixels per class is the preferred number of samples in the training set for classification of the analyzed plant species with the help of the SVM and RF methods. Fewer pixels result in a significant decrease in classification accuracy and less stable results. In our case, we managed to find the optimal number of pixels in training data sets per class only in the case of the SVM classifier applied to MNF data. Figure 6 shows that there was no significant statistical difference between tests performed on MNF bands with 200 and 300 pixel samples per class. Hence, 200 pixels per class in the training data set for 30 MNF bands and the SVM classifier is optimal. • Both the Support Vector Machine method and the Random Forest method allowed us to obtain very accurate images of the distribution of analyzed species in the research area. However, the SVM classifier worked better for the classification of blackberry and wood small-reed (i.e., for classes that are not uniform and do not differ spectrally from their surroundings). On the other hand, the Random Forest algorithm allows one to obtain a higher accuracy for homogeneous classes that stand out spectrally (i.e., goldenrod and background classes). Still, the SVM image was found to be more reliable, despite its relatively lower accuracy for the background classes. Most classification errors occurred between background classes rather than individual species.