Regional Mapping and Spatial Distribution Analysis of Canopy Palms in an Amazon Forest Using Deep Learning and VHR Images

Mapping plant species at the regional scale to provide information for ecologists and forest managers is a challenge for the remote sensing community. Here, we use a deep learning algorithm called U-net and very high-resolution multispectral images (0.5 m) from GeoEye satellite to identify, segment and map canopy palms over ∼3000 km2 of Amazonian forest. The map was used to analyse the spatial distribution of canopy palm trees and its relation to human disturbance and edaphic conditions. The overall accuracy of the map was 95.5% and the F1-score was 0.7. Canopy palm trees covered 6.4% of the forest canopy and were distributed in more than two million patches that can represent one or more individuals. The density of canopy palms is affected by human disturbance. The post-disturbance density in secondary forests seems to be related to the type of disturbance, being higher in abandoned pasture areas and lower in forests that have been cut once and abandoned. Additionally, analysis of palm trees’ distribution shows that their abundance is controlled naturally by local soil water content, avoiding both flooded and waterlogged areas near rivers and dry areas on the top of the hills. They show two preferential habitats, in the low elevation above the large rivers, and in the slope directly below the hill tops. Overall, their distribution over the region indicates a relatively pristine landscape, albeit within a forest that is critically endangered because of its location between two deforestation fronts and because of illegal cutting. New tree species distribution data, such as the map of all adult canopy palms produced in this work, are urgently needed to support Amazon species inventory and to understand their distribution and diversity.


Introduction
Identifying ecological mechanisms that govern natural plant distribution and range is a fundamental question of ecology [1]. The answers to this long-standing challenge are particularly urgent as most of the remaining wilderness areas are shrinking and are been degraded at unprecedented rates in the last decades [2,3]. Tropical ecosystems in this context are of primary importance as they host a large part of the biodiversity, for example, sixteen of the 25 global biodiversity hotspots are located in the tropics [4,5]. In particular, tropical forests are the most critical ecosystems to study tree distribution and diversity as, they host an overwhelming proportion of global diversity, with as many as 53,000 tree species or more, in contrast to only 124 across temperate Europe [6]. Among tropical forests, Amazonia host the largest wilderness areas [2,3] and the most diverse forests [7], as well as being subject to intense deforestation and degradation [8].
However, obtaining data in the Amazon forests from the ground is extremely difficult. Conventional field inventorying is a challenging task in tropical environments in terms of time, effort and cost. Consequently, tree ecological field surveys in tropical forests typically cover areas of ∼1 ha with also a few 50 ha plots [9,10]. As a consequence, the total area of the field plots surveyed since the 1950s in the Amazon forests likely represents less than 0.0001% of its total area [11]. While forest plots if sufficiently well-distributed are appropriate to evaluate the relationship of species distribution with climate, they are less well adapted to study the relation of species distribution with landscape-scale edaphic conditions and gradients.
One solution to gain spatially-resolved data on species distribution and abundance, is to measure species occurrence by satellite. This was proposed amongst the 10 biodiversity metrics to monitor progress towards the Aichi Biodiversity Targets [12] and was included as a recommended action to reach the millennium goal 7: "Ensure environmental sustainability" [13]. Mapping species distribution at the landscape/regional scale remains a major challenge that is most likely to be achieved, at least for some species and genera, by combining remote sensing information with state-of-the-art deep learning techniques for object detection or image segmentation. On one hand, satellite sensors can now acquire images with sub-metric spatial resolution enabling the detection of plant and tree individuals, such as via WorldView and GeoEye satellites. On other hand, deep learning algorithms have demonstrated the ability to identify features in large amounts of data, including very high-resolution imagery, achieving accuracy similar to human-level classification accuracy, but in a consistent and fast way, enabling rapid application over very large areas and/or through time [14]. For example, individual tree species mapping in tropical forests, that was in the past only possible on small scale with traditional machine learning methods [15,16] is now feasible on a large scale (>5000 km 2 ) with a deep learning method called U-net [17][18][19][20].
To be considered as a good candidate for species mapping inside tropical forests using deep learning, the object (here the tree species or genus) needs to have three main characteristics. First, it needs to have a unique morphologic and/or spectral signature of the leaves or crowns that render it easily identifiable and that cannot be confused with other species in the very high resolution images. Second, it needs to be abundant, thus enabling the collection of sufficient manual samples in the image to train the algorithm and also to be able to detect spatial patterns in the distribution. And third, it needs to be an indicator species of some wider ecosystem property. While this latter characteristic is not mandatory, if the species has distinctive ecological traits or behavior it is more suited to help us understand forest characteristics based on its absence, presence, or abundance. These are the main characteristics of the neotropical tree species mapped so far with U-net, Tibouchina pulchra and Cecropia hololeuca. Both are very common in the Brazilian Atlantic forest, they have, respectively, synchronously pink flowers bloom in February and unmistakable large bright gray leaves, and they provide excellent indicators of past anthropogenic disturbance. For all three reasons these species are especially useful to map the disturbance history of this unique and species-rich ecosystem [18].
Arborescent palms (in the family Arecaceae) present strong features for automatic mapping in very high resolution images, given their distribution across all tropical and sub-tropical regions [21,22] and their characteristic star-shaped crowns when seen from above that are distinctive as a group but also vary greatly amongst genera. As among the few monocots which reach the canopy in most tropical forests, palm leaves may also contrast particularly strongly with the leaves of the rest of tree flora.
Due to their abundance, they are already used as a model system for studying tropical biodiversity and its geographic variation [23]. In addition to the characteristic crown shape, they also show distinctive ecological behaviour, such as Attalea speciosa, which has a prominent star-shaped crown and is common and distributed in all the Amazon forest. It is a pioneer species known for the invasion of pastures and can even be the cause of abandonment of the land [24,25]. Some palm species are also indicators of past occupation in the Amazon forest, as it has been demonstrated for the anthropogenic soils called 'terra preta' in the Amazon [26][27][28]. Finally, almost all tree palms provide culturally and economically important resources for communities in Amazonia because they provide non-timber forest products including fruits, fabrics, fuel, and construction materials [29,30]. Because of these characteristics, the possibility to automatically map populations of palms has gained attention recently. For example, one of the earliest works using deep learning for vegetation applications provided a method based on LeNet convolutional network (CNN) to detect and count individual palms in plantations [31]. At the same time, it has been demonstrated than ResNet-based CNN classifier provided better results than GoogLeNet CNN and than the state-of-the-art Object-Based Image Analysis-based methods at detecting and counting shrubs [32]. Since then CNNs have been intensively used in detection and counting of palms and other plants in plantations [33][34][35][36][37], and, recently, this task has also been made with the U-net model [20]. Furthermore, regarding the palm species automatic detection, it has been shown that canopy palm species can be identified with high overall accuracy (>85%) using traditional machine learning methods and very high resolution images in managed forests of northeastern Peru [29].
By mapping the canopy palm distribution in a forest, we may expect to derive two main information. First, large-scale detection and mapping of secondary forests based on the palm density. In secondary forests, changes in canopy palm density can be expected as some palm species, such as Attalea speciosa are known to compete strongly [24]; and some are indicators of past occupation in the Amazon forest [26][27][28]. For this task, the type of succession of secondary forests in the Amazon will be determined since the year 1988 from the PRODES data [8]. Second, the determination of how forest edaphic conditions affect canopy palm distribution. Forest characteristics such as forest diversity and biomass have been previously shown to be strongly related to landscape-scale patterns in edaphic gradients such as soil parent material, landform and soil types, as verified in tropical forests of Costa Rica, Amazonian countries and Borneo [38]. More specifically for palms, the assembly of palm communities was shown to be driven primarily by species sorting according to hydrology and soil as observed in more than 70 forest plots across the Amazon forest [39,40]. Consequently, the variation of canopy palm distribution at the landscape scale is expected to follow and help characterize patterns of edaphic conditions. Based on canopy palm distribution, we will try to understand how edaphic gradients shape species distribution at the landscape scale.
In this study, we aim to (i) produce a map of all canopy palm individuals in a region of ∼3000 km 2 in the Brazilian Amazon forest; (ii) analyse the association between the spatial distribution of the canopy palms and degradation history obtained from PRODES data; and, (iii) determine which environmental edaphic conditions shape palm species distribution at the landscape scale in the intact fores, based on the analysis of natural canopy palm density and spatial patterns.

Study Region
This study was undertaken in a region of the Amazon forest located between the Brazilian states of Rondônia and Mato Grosso and centered at 10 • 13 41.70 S and 61 • 30 9.43 W, Figure 1. This region was chosen to study the natural distribution of canopy palms because it contains a large patch of forest between two deforestation fronts in a remote area that was relatively unimpacted in 2012, the time the images were taken. On the western part of the study area, more than 100 km 2 has been already converted to cattle ranches since 2012. The study area is considered to be within the Southeast Amazon forest region based on forest structure and dynamics [41]. The closest biological station with field inventory of species is the 'Reserva Biológica do Jaru', which is a part of the same forest patch but not included in the GeoEye-1 images. The list of species encountered in the reserve can be found the following link https://www.icmbio.gov.br/portal/unidadesdeconservacao/biomas-brasileiros/ amazonia/unidades-de-conservacao-amazonia/1999-rebio-do-jaru.  Figure 1. (a) Remaining Brazilian Amazon forest cover in green [8] and geographical locations of the study area in red; (b) region of interest with 2017 land use/cover classes from the MapBiomas project [42] and extents of the GeoEye-1 images used in this study.

GeoEye-1 Image
Two GeoEye-1 images (DigitalGlobe, Inc., Westminster, CO, USA) covering the region were acquired on 15 August 2012, at an average off nadir view angle of 18.7 • and 11.0 • , respectively. The images overlap by ∼50%. DigitalGlobe catalog IDs of the images were A010010401B4E700 and A010010401E28600. These two images were distributed in tiles of 16,384 × 16,384 pixels which represents 28 tiles for each image covering a region of ∼2814 km 2 ( Figure 1). The spatial resolution was 0.5 m for the panchromatic band (450-800 nm) and 2 m for the selected multispectral bands: Red (655-690 nm), Green (510-580 nm) and Blue (450-510 nm). All bands were scaled from raw image digital numbers (11 bits) to 0-254 (8 bits) using gdal_translate [43]. This transformation was made mainly for practical reasons. First, because the images were lighter to store and to process, and second, because the deep learning algorithm required 8 bits images. The Red-Green-Blue (RGB) bands were pan-sharpened with the panchromatic band using the method Simple RCS of the Orfeo ToolBox add-on otbcli_BundleToPerfectSensor [44] to create a single high-resolution RGB image at 0.5 m spatial resolution. No atmospheric correction was performed.

Forest Cover Mask and Clear-Cut Deforestation History from PRODES
In order to remove non-forested areas from the analysis, the deforested areas were manually delineated in the GeoEye images using QGIS [45]. To describe the species distribution only on the land, the rivers visible in the image were also masked. Additionally, to detect secondary forests in the sample, the deforestation mask from the high-spatial-resolution PRODES map of annual deforestation since 1988 was used to identify forest present in the 2012 image that has been deforested previously [8]. The PRODES project aims at monitoring clear-cut deforestation in the Brazilian Legal Amazon based on satellite images of ∼30 m of spatial resolution (Landsat-8, CBERS-4 and similar imagery) and delivers official annual rates of forest loss for Brazil. The publicly available annual maps are all corrected/edited manually by experts and contain polygons with the following labels: forest, non-forest, deforestation of the year, previous deforestation, clouds, and water.

Wetland Mask
To analyse the distribution of canopy palms in the wetland, we use the LBA-ECO LC-07 dataset of wetland extent for Lowland Amazon Basin at 3 arc-seconds of spatial resolution (∼90 m) [46]. This dataset was derived from mosaics of Japanese Earth Resources Satellite (JERS-1) Synthetic Aperture Radar (SAR) imagery for the period October-November 1995 and May-July 1996. The dataset was reclassified to binary values with 0 for non-wetland and 1 for wetland (full classes and values description are available here: https://daac.ornl.gov/LBA/guides/LC07_Amazon_Wetlands.html).

Elevation Data
To test if canopy palm spatial distribution was related to elevation and to landscape-scale topography, high resolution elevation data from the ALOS-PALSAR at 12.5 m spatial resolution were used [47]. Note that ALOS-PALSAR elevation data are influenced by the signal of the forest cover, so elevation here should be considered more as surface elevation model than a terrain elevation model.

Statistical Analysis
To describe the association of the canopy palms with the edaphic variables elevation and wetland, first, we first reclassified the elevation in 10 to 12 classes according to its quantiles. Then, a bootstrap procedure was applied. For an image where we have a number N of palm tree locations, N random point locations were sampled within the image and the value of the elevation at each point was extracted and stored. This operation was repeated 100 times. It enables to compare the number of palm trees in each quantile class with the mean and 95% confidence interval of number of points obtained by spatial random sampling in each class. Specifically, the null hypothesis of no spatial association between canopy palm distribution and the elevation is rejected at a level of 0.05% if the number of palms in a quantile class of the elevation is outside the (0.025, 0.975) quantiles of the empirical distribution of elevation obtained by random location sampling in the same class. The association between wetland (only two classes) and palm location (observed and random location points) was tested with a traditional Pearson's χ 2 test. All analyses were performed using the R project software [48].

Model Architecture
In this study, to produce the map of all canopy palm individuals, we used a convolutional network for image segmentation called U-net [17][18][19], Figure 2. This network predicts the probability of each pixel to belong to a particular class (per-pixel classification) and is currently a standard in image dense labeling [49]. The original U-net architecture [19] was adapted with half the number of filters due to the limited size of our training set and because reducing the number of filters helps in preventing overfitting. Furthermore, the network architecture has been adapted to use a three-band RGB image of size 128 × 128 pixels as the input, see [17]. As the canopy palm trees segmentation consists in a per-pixel binary classification (palm/non-palm), a Sigmoid activation function was used.  Figure 2. U-net architecture used for canopy palm segmentation, adapted from [19]. The number of channels and the row × column size in pixels are indicated for each cuboid.

Canopy Palms Dataset
To train the algorithm to detect and segment canopy palms, all the canopy palms were manually delineated in both images, in a subset area of 2.1 km 2 inside the region of overlap between the two GeoEye-1 images, resulting in 2407 and 2419 polygons, respectively. Each delineated polygon can represent one palm or a cluster of canopy palms. This area was chosen for manual sampling because it presented all types of palm distribution, from sparse to clustered; and the main vegetation types, from dense humid forests to dry rocky outcrops. Using the delineated polygons, a binary raster mask was produced for each subset image with the following values: background [0] and canopy palms [1] ( Figure 3).

Model Training
Clipping the image and the masks in 128 × 128 pixels over the region where palms were manually delineated resulted in a sample of 1024 images and their associated labelled masks to train the model. Among these images, 980 contained canopy palms and background and 44 contained only background. 819 images (80%) were used for the training and 205 (20%) for independent validation. The training dataset contained 1,038,807 canopy palm pixels and 12,379,689 background pixels while the validation dataset contained 265,309 canopy palm pixels and 3,093,411 background pixels. The size of 128 × 128 pixels was selected because studied objects are smaller than 128 pixels in size (128 pixels = 38.4 m) and their distribution is not dependent on a larger context. The images were extracted from uniform grids of 128 × 128 pixels without any overlap between neighboring images. During network training, we used a standard stochastic gradient descent optimization. The loss function was designed as a sum of two terms: binary cross-entropy and Dice coefficient-related loss of the three predicted masks [50][51][52]. We used the optimizer RMSprop (unpublished, adaptive learning rate method proposed by Geoff Hinton here http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) with an initial learning rate of 0.0001. We trained our network for 300 epochs with 24 images per batch.

Data Augmentation
Data augmentation was applied randomly to the input images, including 0-360 • rotations, horizontal and vertical flips, and changes in the brightness, saturation and hue, by modulating the current values between 95 and 110% for brightness, 95-105% for saturation and 99-101% for hue (as changes in the plant hues are not expected).

Segmentation Accuracy Assessment
Two performance metrics were computed. First, the overall accuracy was computed as the percentage of correctly classified pixels. Second, the F1 score was computed for each class i as the harmonic average of the precision and recall (Equation (1)), where precision was the ratio of the number of segments classified correctly as i to the number of all segments (true and false positive) and recall was the ratio of the number of segments classified correctly as i to the total number of segments belonging to class i (true positive and false negative). This score varies between 0 (lowest value) and 1 (best value).

Prediction
For prediction, the GeoEye-1 tiles of 16,384 × 16,384 pixels were clipped with a regular grid with cells of 512 × 512 pixels and 64 neighbouring pixels were added on each side to create an overlap between the patches. If there was a remaining blank portion (for example, due to the tile border), it was filled by the symmetrical image of the non-blank portion. The predictions were made on these images of 640 × 640 pixels, and the resulting images were clipped to 512 × 512 pixels and merged again to reconstitute the original 16,384 × 16,384 pixels GeoEye-1 tile. This overlapping method was used to avoid border artefacts during prediction, a known problem for the U-net algorithm [19]. To belong to the canopy palm class, the pixel prediction value must be greater than or equal to 0.5.

Algorithm
The model was coded in the programming language R [48] with Rstudio interface to Keras [51,52] and Tensorflow backend [53]. The training of the models took ∼2-20 hours using GPU of an Nvidia RTX2080 graphics card with 8 GB of dedicated memory. Prediction using GPU of a single tile of 16,384 × 16,384 pixels (∼67.10 km 2 ) took approximately 6 min.

Results
For the canopy palms segmentation, the overall accuracy was 95.5% and the F1-score was 0.70 (precision = 0.68 and recall = 0.70). An example of canopy palm segmentation with F1-score and manual segmentation is presented in Figure 4. The low F1-score can result from inaccurate manual delineation, small size of the object and palm crowns missed during the production of the training sample. The palm crowns are small in pixels number, as their diameter is between ∼10 and 20 m, and their manually delineated border is an approximation of the convex envelope of the crown and not of each individual palm leave, Figure 4. While there is sometimes a large variation between the borders from the manual and the automatic delineation, they delineate the same objects, Figure 4. This is reflected by the intersection between automatic and manual segments, that is, in the validation dataset, 82% of the manual segment intersects with the automatic segment. There were 1233 objects in the validation dataset and the model segmented 1195 objects.  We found that 6.4% of the forest canopy was covered by canopy palms (∼14,497 ha) and more than 2 million of canopy palm patches were delineated by the segmentation model over the full area covered by the images (Table 1). Our results show that the distribution of canopy palms inside the forest is not uniform and is highly variable at the regional scale ( Figure 5). The median and mean size of palm patches were of 38.0 and 67.5 m 2 , respectively, ranging from 0.25 m 2 (one pixel) to 4977.25 m 2 (∼0.5 ha) ( Table 1). While the images have different extents and overlap only by 50%, the total number of object and mean density were relatively similar between both, with only 10 to 15% of differences (Table 1). Slightly more objects were delineated in the P002 image. At the landscape scale, spatial patterns are observed (c.f. see the regions highlighted with yellow boxes in Figure 5), which are detailed in the following paragraphs.  Different canopy palm densities are observed after human disturbances ( Figure 6). In the region 1, a forest area has been cut before 1988 and maintained as a pasture until abandonment in the year 2000 (Figure 6a,d). As this area has been maintained as a pasture, it is likely to have been burnt several times as this is a common practice in the region to 'clean' the pasture. At this site, while the secondary forest is constituted of trees and palms, the palms are clearly dominating (Figure 6d). The border of the old pasture is easily found by only looking at the palm distribution. Inside the nearby forest, the distribution is variable but less than in the old pasture and surprisingly, palms are almost absent in the wetland (Figure 6d), which was not visible in the RGB image (Figure 6a). Region 2 has been abandoned directly after clear-cut in 2002-2003 (Figure 6b,e) and have likely burnt only once. At this site, the secondary forest is also constituted of trees and palms, but the palms are not dominating (Figure 6e). The nearby old-growth forest showed variability in the palm distribution, with a density equal or higher than in the secondary forest (Figure 6e). Region 3 had no history of human degradation in the PRODES data [8], which means no deforestation between 1988 and 2012 when the image was acquired. However, some human presence is clear. There is a house visible in the GeoEye-1 image near the river (Figure 6c), and the distribution of the palms shows a strong canopy palm dominance (Figure 6f). The distribution of palms shows geometric features that seem to indicate previous human disturbance. For example, a linear pattern appears in the palm distribution, that is, a line of ∼500 m of length and of ∼15 m of width without any palms (Figure 6f).

Natural Distribution of Canopy Palms near the River and Wetlands, Regions 4, 5, 6 and 7
Near the river, water levels seem to be associated with the natural canopy palms distribution, Figure 7. In region 4, on the margin of the river Ji-Paraná (or popular name river Machado), large areas where canopy palms are completely absent are observed. These areas are, by their shapes, old (paleoriver) or secondary bed river channels. These areas were not mapped as wetlands in the dataset of wetlands, so it is unclear if they become inundated or not. The absence of palms in this area could be related to soil water content or soil chemical characteristics. When looking at a wetland area extracted from the wetland dataset, region 5, Figure 7c,d, at the confluence of the rivers 'Igarapé Castanhal' and 'Igarapé Curral da Vara'; it appears that these canopy palms avoid wetlands. Even though the wetlands cannot be seen directly in the RGB image, the distribution of canopy palms produced by the U-net, Figure 7d, allows their visualization. Here again, it seems that the proximity to water levels does not favour establishment and growth of these palm species, while other vegetation types are adapted to these conditions. Other area in the region 5 shows very low density of palms and could be related to an old or secondary river channel (Figure 7d), but that are not classified as inundated in the LBA-ECO LC-07 wetland dataset. The particular canopy palm distribution near wetland and large rivers is also visible in the elevation data for the two regions presented previously (region 4 and 5 in Figure 8a,c) and for two other regions, also on the side of the river Ji-Paraná (region 6 in Figure 8b) and in wetlands near the river 'Igarapé Curral da Vara' (region 7 in Figure 8d). Note that ALOS elevation data are influenced by the signal of the forest cover, so elevation here should be considered more as surface elevation model than a terrain elevation model. Near the Ji-Paraná river (Figure 8b), there are significantly less palms at lower elevation in comparison to a random distribution (the interval of confidence does not include the value 50%), thus more palms were found than expected by chance in the elevation between 20 and 40-45 m, and less palms than expected by chance at higher elevation. The same general pattern is observed in the wetland around the river 'Igarapé Curral da Vara'. First, the lower density when compared to a random distribution on elevation below 15-20 m and then higher density than expected by chance between 15-20 and 30-40 m and the higher elevation less or non-significantly different from a random distribution. The general pattern is consistent, and has been tested with the exact locations of a total of 69,381 canopy palms patches.

Elevation class (m)
Percentage (%) Figure 8. Comparison of the portion of observed canopy palms (%) in relation of the proportion obtained by random sampling over the same area, for the quantile classes of elevation. Elevation is given as relative to the river level (river level = 0 m). The analysis is given for four regions nearby river and wetlands (region 4, 5, 6 and 7, see Figure 5 for localization). N obs = 18,728 for region 4 (a), N obs = 14,832 for region 5 (b), N obs = 19,934 for region 6 (c) and N obs = 15,887 for region 7 (d)) and the same number of random locations than N are sampled in each region. The random locations are bootstrapped 100 times to create the 95% confidence interval. For a given elevation class, if the confidence interval intersects with 50%, there is no significant difference between the distribution of palm and a random distribution for this class at 5% significance level.

Natural Distribution of Canopy Palms at Higher Elevation Far from the Water Table, Regions 8 and 9
The canopy palm distributions also showed some remarkable spatial patterns at higher elevation, near the top of the hills, Figure 9. At the highest elevation, some trees are deciduous and a lower stature vegetation and rocky outcrops are observed, Figure 9a,c. There are some exceptions, but in general, the canopy palms seem to avoid the highest elevations and the vicinity of rocky outcrops, Figure 9b,d. Canopy palms are found in great number and dense patches in the slope directly below the hill tops, Figure 9b,d, while further down, canopy palm patches are observed in lower density. This distribution with elevation is also observed quantitatively when compared to a random distribution of points for the same regions (8 and 9), Figure 10. The number of canopy palm patches was significantly lower than expected by chance at higher elevation and in the valley, while higher on the slopes, Figure 10a,b.

Canopy Palm Mapping
In this study, we produced the first regional scale distribution map of canopy palms in a tropical rainforest using a very high resolution multispectral remote sensing image and the U-net convolutional network. To our knowledge, this is only the second large scale automatic mapping of any plant taxon in the Amazon forest, the first being the large-scale mapping (155,000 km 2 ) of bamboo forests in the Brazilian State of Acre [54]. Here, we found that the two million automatically delineated canopy palm patches covered 6.4% of the forest canopy, which equates to ∼145 km 2 of canopy (Table 1), highlighting the importance of canopy palms in this region. In Amazonian forests, the number of palm species is estimated to be at least 160-180 species [22,55], comprising no less than six of the top ten most abundant tree species of the Amazon basin [7,56]. However extensive areas of the Amazon forest have not been explored so far and maps such as the one developed here could help to support field collection. Our maps presented a high overall accuracy of 95.5%, however, showed a relatively low F1-score for object segmentation (0.70). Similarly as observed for the tree species Cecropia hololeuca in the Atlantic forest [17], the F1-score is not higher (i) because of the difficulty of accurately delineating the border of the palms; and, (ii) due to the small size of the crown/patch, each pixel contribution is important in relation to the crown/patch size, that is, missing only a few pixels strongly affects the value of the F1-score. The map of canopy palms cover ∼0.24% of the Amazon forest area, which is estimated to be 5.5 million km 2 . The prediction running time is of a couple of hours (4-5 h) on a single notebook, which seems to indicate that the mapping of all the canopy palms in the Amazon forest should be feasible in the very near future.

Canopy Palm Distribution in Secondary Forests
Human disturbances can be reflected in the canopy palm tree distribution as shown by the changes of canopy palm density in perturbed areas ( Figure 6). However, in the case of secondary forest regrowth after clear-cut with known history (region 1 and 2), we found that the post-disturbance densities could be either higher or lower in comparison to the natural density in neighbor areas. This could be linked to the type of forest degradation. For example, while both regions 1 and 2 have undergone a clear-cut, region 1 has a long history of being maintained as a pasture before been abandoned, while region 2 was directly abandoned after being cut. This could explain why in region 1 there is a great dominance of palms as some palm species are known to be aggressive competitors in pasture, such as Attalea speciosa [57]. This species, which is common and widely distributed across the Amazon forest, is known for invasion of pastures that can cause abandonment of the land and also because cleaning pasture with fire has a positive effect on this species, favouring its dominance [24]. For region 2, our assumption is that the soil characteristics (seed bank, nutrients and porosity) might be not so affected by human disturbance as the site have likely undergone only one fire event at the time of the clear-cut, then several pioneer species can enter in competition with the palms. It is known that previous human degradation, such as clear-cut only, use of fire and pasture conversion can affect neotropical forest succession and species composition. For example, in the Amazon and the Atlantic forest, it has been shown that Cecropia sp. dominated in the case of clear-cut without subsequent use of fire, while Vismia sp. (in the Amazon) or Tibouchina sp. (in the Atlantic forest) can dominate after fire and pasture conversion [18,58,59]. To further investigate this, it could be interesting to gather more images with secondary forest with different succession stages to have a more balanced sample and a higher statistical power to conclude with confidence on the effect of different human disturbances on the densities of canopy palm trees. In region 3, a very high density of canopy palm was observed but in a forest without known history of degradation. However, the recent human disturbance was clear in this area because of the presence of a house nearby, visible in the very high resolution images. Furthermore, the spatial distribution of canopy palms shown a linear pattern that was very unlikely to be natural. Since at this site there was no history of clear-cut in the PRODES Data [8], the disturbance should have occurred before 1988 and haven't been documented. Nevertheless, in the absence of human presence signal (here the house), it might still be very difficult to conclude on the human cause of the disturbance only based on the distribution of palms.

Natural Spatial Distribution
While the Amazon forest generally shows an impressive homogeneity and no visible spatial patterns in the forest canopy when looking at RGB satellite images; here, we show that when looking at a particular taxon, a strong spatial pattern of plant distribution appears ( Figure 5). These large-scale spatial patterns, in the case of canopy palms, appear to be primarily related to edaphic conditions.
One of the main findings is that these particular canopy palms tend to avoid wetlands and old river channels. This is consistent with the field botanist studies which have previously shown that the diversity of palm species and genera in Amazonian forests reaches very high levels in unflooded (terra firme) forests, while they seem to be less diversified in flooded and waterlogged forests [55]. Along the river Ji-Paraná, 'Igarapé Castanhal' and 'Igarapé Curral da Vara' it was possible to map the wetland and old river channels by the complete absence of these canopy palms ( Figure 7). However, this is visible only near the large rivers, likely because only there large areas of flooded and waterlogged forests are found that enable wetlands to be visible looking at the palm distribution.
A second important finding near rivers, is that these canopy palm trees, while avoiding flooded or waterlogged soils, have one preferential habitat in the landscape, located within an elevation range of 15 to 40 m above the water level of the large rivers. This effect was observed within a sample of almost 70,000 canopy palm patches in four sub-regions near rivers and wetlands ( Figure 8). This pattern could be associated to elevation in relation to the water table level, where there is no water excess and optimal conditions for the roots system of the canopy palms, which are usually shorter than in trees, and naturally restricted to superficial soil layers [60]. Another far more speculative hypothesis, is that it could correspond to the place in the landscape where humans were more likely to make settlements (see Figure 2 in [61]) and could represent a past disturbance. Some palm species have already been observed to be past occupation indicators in the Amazon forest [26][27][28]. Among them, there are common arborescent palm trees such as Murumuru (Astrocaryum murumuru), Urucuri (Attalea cf. phalerata), Caiaué (Elaeis oleifera), and Jarina (Phytelephas macrocarpa), that are 'terra preta' indicators [26,27] and could be seen in the forest canopy because of their sizes. Furthermore, the border of the river Ji-Paraná river are known for the high density occurrence of anthropogenic soils (see Figure 2 in [62]). While it could have several other causes such as nutrients availability, fertility, or other unmeasured edaphic variables, the simplest explanation here is that the preferential habitat for the palm mapped is primarily explained by water availability.
This relation with water is consistent with the results observed on the other side of the water availability gradient, on the tops and slopes of the hills (Figure 9). On the hill tops, we found that canopy palms avoid the vicinity of drier areas, that can be visually detected by leafless trees or rocky outcrops (Figure 9), and even if there are some leafy trees, canopy palms are not commonly found there. Surprisingly, it seems that the large slopes directly under the hill top are also a preferential habitat in the landscape for the canopy palms (Figure 9c,f). In these slopes, the palms are more frequently found than expected by chance and this was observed with a sample of more than 31,629 canopy palm patches. This could be explained by a perched water table, that could create similar soil conditions that on the lower elevation close to the river.
There is no clear consensus in the literature regarding palm distribution and abundance response to topography in neotropical forests [22,63]. For example, in the forests close to La Selva biological station-Costa Rica, steep sites had twice as many large palms (>10 m tall) per hectare than those on gentler sloping topography or at lower slope positions [64]. In these forests, the authors related the high density of large palms to the higher occurrence of gaps in steep slopes. In a central Amazonian forest, the abundance of palms was also found to be related to tree-fall occurrences [65]. While openness can play a role in our studied forest, if it was only openness, we should see more palms nearby the top of the hill where forests are more open and where sometimes even soil is apparent. At Reserva Ducke, near Manaus (Brazil), slopes had no significant effect on palm stand composition [40]. When looking at more than one site in NE Peru, significant but opposing topographic preferences have been found at different sites for the same species, rendering interpretation more difficult [66]. Looking at the palm distribution on the hills (Figure 9b,e), it is easily understandable why field plots are not designed to study the effect of topography. While in the mean, we found statistically more canopy palms on the slope below the hills, it is not mandatory, that is not all the slopes show this pattern. Our results show that abundance of canopy palms varies along an environmental gradient, that is, canopy palm abundance appears related to topography, likely because topography influence water availability through drainage and drought susceptibility. Recently, it was shown, in west Amazonian forests, that patterns in palm species composition were best explained by soil extractable exchangeable bases (Ca, K, Mg) and phosphorus (P) concentration, with the different palm species clearly separating along the soil cation concentration gradient [67]. For our studied region unfortunately, there are no field measurements and future field studies will be made to determine which edaphic conditions or soil types the canopy palms are indicators. Maps such as the one produced here could help to setup field studies to better understand natural canopy palm diversity and abundance along environmental gradients.

Which Palm Species Are Observed?
Even though palm trees have a very distinctive crown shape, working with a spatial resolution of 0.5 m makes the canopy palm identification difficult at a species level (especially without ground reference data) since some leaf patterns cannot be detected. However, considering that only large palm tree crowns can be distinguished in very high resolution images and that the mapped canopy palm trees were intolerant to inundated conditions, some taxa could be the potential observed species and others could be excluded. Furthermore and fortunately, from the nearby national forest reserve, the 'Reserva Biológica do Jaru', which is part of the same forest patch, some palm species are known and listed (https://www.icmbio.gov.br/portal/unidadesdeconservacao/biomas-brasileiros/ amazonia/unidades-de-conservacao-amazonia/1999-rebio-do-jaru). Because of their tolerance or adaptation to inundated conditions, Mauritia flexuosa and Astrocaryum jauari are likely not covered by our map [55]. Other canopy palms present in the Jaru Reserve are Socratea species, like Socratea exorrhiza. They can reach heights higher than 15 m, however, their crown diameter is ≤5 m, being too small to be mapped by this method. Additionally, Euterpe species are not expected to be detected in the 0.5 m resolution images since their crown diameter ≤7 m. Astrocaryum aculeatum could be one of the potential species of our map as it can measures ∼15 m height, however, it is known to usually have a solitary behaviour. One of the best potential candidates are species from the genus Oenocarpus and Attalea as they can reach 25-30 m height and produce very large leaves, with crown diameter around 15 m [65]. The ecological behaviour of these palms is also compatible with the distribution found in secondary forest, for example, the invasion of pasture has been documented for Attalea sp. Thus, species from the Attalea genus are also known to occurs in clumps, as seen in our maps, exhibiting high occurrence rates close to parent trees due to limited fruit dispersal [68]. Furthermore, in the nearby Jaru forest reserve, Attalea speciosa is known to occur in very high density in relatively open forest, making this species one of the best candidates as our potential mapped species. The palm species mapped here will now be determined in the field and future studies will be made to detect the palm species directly from remote sensing, which should be possible with LiDAR data or with higher spatial resolution in multispectral images, such as been shown in managed forest of the of Northeastern Peru [29].

Conclusions
In this study, we produce a deep learning-based regional map of all canopy palm individuals in a Brazilian Amazon forest. The accuracy of the segmentation model was good (F1-score = 0.70) and enabled to map more than 2 million of palms patches that cover a large proportion (6.4%) of the forest cover, and shown the strong potential of deep learning methods to support palm species mapping. Using the canopy palm map and the degradation history obtained from PRODES data, we found that canopy palm distribution density was associated with human recent disturbance. However, we found contrasted results, such as higher or lower density in the secondary forests in comparison to the natural density in neighbor areas. This might be related to the type of previous human degradation, such as clear-cut only, use of fire and pasture conversion. More studies are needed to understand the effect of disturbance on canopy palm densities and to detect past disturbance only based on palm distribution. The natural canopy palm natural distribution was found to be mainly driven by water availability. The canopy palms avoid areas such as old (paleoriver) or secondary bed river channels and wetland, likely due to water excess. On the other side of the water gradient, canopy palms avoid the vicinity of drier areas that are located on the hill tops. Furthermore, we found two habitats where the palms are more frequently found than expected by chance, at elevation range of 15 to 40 m above the water level of the large rivers and in the large slopes directly under the hill tops. While the species of the mapped canopy palms are unknown, based on their size, leaves shapes and ecological behaviour, species from the genus Oenocarpus and Attalea are amongst the best potential candidates. Overall, the canopy palm distribution over the region seems to indicate a relatively pristine landscape, unfortunately, these forests are critically endangered and more than 100 km 2 have been cut in 2018. New natural species distribution data, such as the map all canopy palm trees produced in this work could help the documentation of species distribution and to understand species spatial distribution and diversity.
The dataset of the canopy palms produced in this study is available at https://doi.org/10.5281/ zenodo.3822705.
The model used in this study is available at https://doi.org/10.5281/zenodo.3926822.