Determination of Phycocyanin from Space—A Bibliometric Analysis

: Over the past few decades, there has been an increase in the number of studies about the estimation of phycocyanin derived from remote sensing techniques. Since phycocyanin is a unique pigment of inland water cyanobacteria, the quantiﬁcation of its concentration from earth observation data is important for water quality monitoring - once some species can produce toxins. Because of the growth of this ﬁeld in the past decade, several reviews and studies comparing algorithms have been published. Thus, instead of focusing on algorithms comparison or description, the goal of the present study is to systematically analyze and visualize the evolution of publications. Using the Web of Science database this study analyzed the existing publications on remote sensing of phycocyanin decade-by-decade for the period 1991–2020. The bibliometric analysis showed how research topics evolved from measuring pigments to the quantiﬁcation of optical properties and from laboratory experiments to measuring entire temperate and tropical aquatic systems. This study provides the status quo and development trend of the ﬁeld and points out what could be the direction for future research.


Introduction
Cyanobacteria harmful algal blooms (CHABs), in freshwater systems, have been a major concern for environmental and public health authorities worldwide. In general, algal blooms come with the loss of aesthetic conditions, the increase of taste and odor of water supply sources, the growth of a thick scum on the surface of lakes and reservoirs as well as the lack of water clarity [1]. In addition, other environmental effects can be associated with CHABs such as the decomposition of dying blooms which may lead to oxygen depletion (hypoxia and anoxia) for aquatic life [2]. These are common environmental health concerns of algal bloom occurrence in inland waters; however, the main concern is related to the production of toxic secondary metabolites by some species of cyanobacteria also known as blue-green (BG) algae. These metabolites can cause serious health issues in mammals (including humans) and wildlife (bird and fishes) affecting several systems such as the hepatopancreatic, digestive, endocrine, dermal, and nervous systems [3,4].
Cyanobacteria have been observed in lentic freshwater systems worldwide [2,[5][6][7][8]. The presence of cyanobacteria has been associated with favorable ecophysiological conditions -especially nutrient over-enrichment and hydrologic alterations to ecosystems [9][10][11]. Management of cyanobacteria traditionally focuses on in situ water sampling to monitor cell counts and chlorophyll-a (chl-a) concentration as biomass indicators. However, these methods are expensive, time-consuming [12], and usually limited in spatial and temporal extents [13].
Remote sensing has been used to monitor CHABs worldwide [5][6][7][8]. Advantages of using remote sensing are: (1) the synoptic view of remote sensing imagery allows the acquisition of data for the entire aquatic system, (2) the capability of acquiring information from remote and sometimes inaccessible regions, and (3) the availability of historical data which allows the extraction of information from archived remote sensing imagery [14]. Accordingly, Gons [13] observed that the use of remote sensing techniques was time-saving, cost-effective, and a scientifically rewarding alternative.
Initially, cyanobacterial biomass was remotely estimated using chl-a concentration as a proxy, since it is the primary and dominant photosynthetic pigment in BG algae [15]. However, chl-a is not an accurate estimator of cyanobacterial biomass because it is a common pigment to almost all phytoplankton groups [16]. Therefore, remote sensing studies have evaluated the use of phycocyanin (PC) to remotely estimate its concentration and consequently cyanobacteria [5][6][7][8][17][18][19]. Figure 1 shows the number of published studies in two different databases: Web of Science (webofknowledge.com, Figure 1A) and Science Direct (www.sciencedirect.com, Figure 1B). The reference search was based on the following terms: "remote"+"sensing"+"phycocyanin". Publications were divided into a total number of publications (considering book chapters, research articles, and protocols) and only research articles. Although Science Direct presented a higher number of research articles, some of the publications were not exactly on the topic of remote sensing of PC. Instead they covered other topics, such as continuous electrophoresis of amino-acids, cyanobacterial genome, cell counting, and other articles related to PC as a photosynthetic pigment and not with remote sensing. Nevertheless, a trend was observed in both databases: Most of the studies using remote sensing techniques to monitor PC were produced in the last three decades. Since the use of optical remote sensing for the monitoring of PC concentration (and consequently cyanobacteria) has evolved in the last years, this technique has been implemented for water quality monitoring purposes. The "Experimental Lake Erie Harmful Algal Bloom Bulletin" developed by the National Oceanic and Atmospheric Administration (NOAA) is one example of the use of remote sensing imagery to assess HABs. This program has the goal to monitor algal blooms and to forecast the spreading of HABs in the West part of Lake Erie, USA [20]. Another project that is currently under implementation in the US is the "CYAN Project", led by United States National Aeronautics and Space Administration (NASA), NOAA, Environmental Protection Agency (EPA) and United States Geological Survey [21]. This project has as its main goal the development of an early warning indicator system using historical and current satellite data of United States freshwater systems. In a global scale, the United Nations Educational, Scientific and Cultural Organization (UNESCO), through its International Initiative on Water Quality (IIWQ) launched the first comprehensive worldwide water quality online portal for freshwater systems, lakes, and rivers, retrieved from earth observation data. In this portal, the UNESCO/IIWQ is enabling the public to access earth observation's derived water quality data [22].
Because of the growing relevance of the monitoring of cyanobacteria using remote sensing techniques, several articles compared bio-optical algorithms for the estimation of PC concentration [19,[23][24][25][26][27]. However, most of the previous reviews used limited numbers of bio-optical algorithms, expert-based judgments for algorithm comparison and suggested directions for future research. Therefore, the goal of this study is to conduct a bibliometric analysis and to highlight the gap and the evolution of this research field based on published work. To do that, this review is based on a quantitative approach to analyze research articles collected from the Web of Science (WoS) Core Collection database in the past 30 years (from 1991 to 2020). Compared with existing reviews on remote sensing estimation of PC, the present study will not focus on the description and comparison of remote sensing algorithms. Instead, the quantitative approach provides a comprehensive portrayal of the knowledge structure of this research field in the last three decades.

Previous Reviews on Remote Sensing of Phycocyanin
Several articles have evaluated bio-optical algorithms for the estimation of phycocyanin [19,[23][24][25][26][27]. The first two review articles were published in 2008 and 2013 by Ruiz-Verdu et al. [23] and Ogashawara et al. [19]. Ruiz-Verdu et al. [23] compared the performance of empirical, semi-empirical, and semi-analytical algorithms in Spanish and Dutch lakes and concluded that the semi-analytical algorithm proposed by Simis et al. [5] and the algorithm proposed by Schalles and Yacobi [28] showed the best performances. Ogashawara et al. [19] assessed semi-empirical algorithms for Funil hydroelectric reservoir (Brazil) and catfish ponds in Mississippi (USA) and concluded that the algorithms proposed by Simis et al. [5] and Mishra et al. [18] had the best performances for the estimation of PC, and there were less sensitive to the interference of chl-a. Both studies showed that future research should focus on the removal of the effect of chl-a on PC estimation.
Later, in 2017, Beck et al. [24] using simulated satellite images from an airborne hyperspectral image compared the application of different PC bio-optical algorithms (see Table 1). This study showed that, for the ocean and land colour instrument (OLCI), Simis et al. [5] and Mishra et al. [18] algorithms got the best performances. These findings agree with the ones presented by Ogashawara et al. [19]. Recently, Riddick et al. [26] published a comprehensive comparison of PC bio-optical algorithms applied to MEdium Resolution Imaging Spectrometer (MERIS) imagery from Lake Balaton, Hungary (see algorithms in Table 1). This comparison showed that lower error metrics were observed in the application of algorithms from Simis et al. [5], Mishra et al. [29], and Li et al. [30]. Interestingly, these results suggest that semi-analytical and quasi-analytical algorithms for PC retrieval are more appropriate, especially if applied for lakes with high concentrations of suspended inorganic particles. for the monitoring of PC concentrations. Shi et al. [27] also provided a comprehensive descriptive review of bio-optical algorithms for chl-a and PC concentration estimation (see Table 1). As well as Yan et al. [25], this study also showed that OLCI is the main sensor for the monitoring of PC due its spectral band centered at 620 nm. The main challenges for this research field are the development of a universal algorithm for PC retrieval and the estimation of PC in oligotrophic waters [27].

Data Acquisition
In this research, publications from 1991 to 2020 were analyzed from the Web of Science database. Data acquisition was based on the search for the terms "remote" + "sensing" + "phycocyanin" in the titles, keywords, and abstracts. The type of research was defined to be articles and reviews which resulted in a dataset contained 115 publications published between January 1991 and January 2020 (see Supplementary Table S1 for the full list of publications). As shown in Figure 1

Methodology
For the bibliometric analysis of the dataset, the present study uses CiteSpace (v.5.6 R2) (http: //cluster.cis.drexel.edu/~{}\protect\T1\textbraceleft\protect\T1\textbracerightcchen/citespace/, Accessed 08 January 2020) which is a Java-based application for the visualization and analysis of trends and patterns in the scientific literature [47,48]. This software is one of the most used tools for bibliometric analysis due to its capabilities of bibliographic coupling, cluster maps, and dual-map overlays (Pan et al. 2018).
This study takes the advantages of the capabilities of CiteSpace (v.5.6 R2) to perform (1) the co-citation analysis and (2) the bibliometric mapping. The dataset presented in Section 2.2 was analyzed using document co-citation analysis. This is based on a network of co-cited references that reveals underlying intellectual structures based on how two research articles may be linked. The connection between two publications is based on the number of times that they have been cited together. If the frequency of co-citation between the two publications is high, the idea presented in both works is strongly linked.
In this study, the bibliometric analysis was defined for the entire period of the publications (from 1991 to 2020) and for each decade. Each dataset was analyzed to evaluate the evolution of the relationships between key ideas in a field for each period [49]. Citations clusters were computed for all periods by clustering algorithms implemented in CiteSpace [48]. The labeling of each cluster was extracted from article titles, using both the latent semantic indexing (LSI) and Log-Likelihood Ratio (LLR). Figure 2 shows the co-citation network structure over the 30-year period created with CiteSpace (v.5.6 R2). The software generated a co-citation network of 301 nodes and 1062 links. In this network, each node represents a cited publication within the core dataset and the size of the node represents the frequency of citations. The link between two nodes represents a co-citation relationship and colors represent the year of the publication. Overlaying the co-citation network structure are the top 10 most cited publications -the font size of each reference indicates the frequency of citation (see Appendix A for the list of the publications). The position of each reference is related to the position of their respective node. Therefore, it was observed that most of the most cited publications are between 2005 to 2013. From the co-citation network structure, it was also possible to identify clusters in order to build a knowledge domain. Figure 3 shows the network with ten cluster labels extracted based on the titles of the publications using the LLR method, which is the default algorithm of CiteSpace, to extract the cluster labels. LLR was chosen especially because of this algorithm allows the computation of a p-value for each cluster. Table 2 presents the ten clusters in this co-citation network structure with their labels extracted by the LSI algorithm as well as the labels and p-values from the LLR algorithm. It was observed that earlier studies were clustered together on research domains related to the monitoring of cyanobacteria or on their pigments. In the second decade, studies were focused on the algorithm development for atmospheric correction and for cyanobacteria pigment. In the third decade, studies were focused on the development of quasi-analytical approaches for the estimation of PC concentration and in situ measurements of remote sensing reflectance (R rs ).

30 Years of Remote Sensing of Phycocianin
Similarly, Table 2 shows the main labels extracted by the LSI and LLR methods for each cluster (organized by mean cited year). It was observed that three of the clusters' labels were the same for both methods, the differences occurred in clusters numbers 0, 1, 4, 5, 7, 9, and 11 (see Table 2). Table 2 also lists the mean year of the publications within each cluster, which allowed the observation of the same trend of evolution of the research topics. Publications started covering more the biological and biochemical at the beginning of the field and topics related to remote sensing and algorithm development in the second decade. The full list of labels is presented in Appendix B.

Decade-by-Decade Analysis
To compare the evolution of the remote sensing of PC research field, a decade-by-decade analysis was used.    Table 3. Cluster labels of remotely sensed phycocyanin research over 10 years (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000).   Figure 4 shows the co-citation network of remote sensing of PC publications from 1991 to 2000. It was observed that in this period there were only a few publications that could be distributed in only two co-citation clusters. These two clusters contain 41 nodes and 126 links and the main labels for them were: (1) dense outdoor algal cultures; and (2) high resolution airborne remote sensing (see Table 3). These two clusters show the two research lines in this field: (1) the image acquisition, and (2) processing and the biochemical characterization. The first analyzed decade has a small network due to the lack of publications in this period. However, it was observed that studies were based on the acquisition of radiance data and the determination of phytoplankton pigments. Figure 5 shows the co-citation network of remote sensing of PC publications from 2001 to 2010. Different from the previous decade where only two clusters were identified, there are ten different co-citation clusters in this period. Not only the number of co-citation clusters was higher in this period, but the number of nodes and links also increased to 166 and 612 respectively. The main label for each cluster is presented overlaying the network in Figure 5 and in Table 4. Surprisingly, some labels were related to location, such as "Lake Erie", "great lakes" and "eutrophic shallow lake". This indicates that the location was an important factor driving the research during this period. The need for larger lakes could be related to the spatial resolution of MERIS and moderate-resolution imaging spectroradiometer (MODIS), which were commonly used sensors in publications in this period. Table 4 presents the labels for each cluster arranged by the mean year of the citations for remote sensing of PC publications from 2001 to 2010. Similarly, to the results in Section 3.1 for the entire period of analysis, in this decade the labels were extracted from the RFIDF and LLR methods. For this period, the two methods only extracted one common label, the other nine labels were different. The interesting observation is that for the LSI method, a location was also extracted as the main label for a co-citation cluster and differently from the LLR method, the location was China. This indicates the growth of Chinese studies in this field. Another interesting label from the LSI method is the "Landsat TM data", which is not the main sensor for the retrieval of PC, but some studies used empirical relationships to derive its concentration in small to medium aquatic systems [17,50,51]. Figure 6 shows the co-citation network of remote sensing of PC publications from 2011 to 2020. In this last decade, it was found the largest co-citation network with twelve clusters, 338 nodes, and 1462 links. The main label for each cluster is presented overlaying this co-citation network in Figure 6 and in Table 5. As observed in the previous period, locations showed to be important labels for some co-citation clusters. However, instead of large aquatic lakes, in this period there is a variety in the locations such as "tropical eutrophic reservoir", "deep reservoir", "near coastal transition waters" and "eastern Iberian Peninsula". This indicates a geographic expansion of the research in the field of remote sensing of PC which is now being conducted in different locations and latitudes, from large temperate lakes in Period II to tropical and coastal water in Period III. Table 5 presents the main label for each cluster arranged by the mean year of the citations for remote sensing of PC publications from 2011 to 2020. In contrast to Period II, in this period, labels extracted via RFIDF and LLR methods were completely different with no common label for one cluster. While some labels are still common in comparison to Period I and II, it was observed of new labels such as "new scheme", "theoretical basis" or "modern robust approach". This indicates the development of different approaches than the ones developed in period II. Overall, this period could be characterized by the addition of new study sites-which is important for assessing global coverage of PC-and the development of new remote sensing algorithms.

Discussion
The increasing number of publications in the last decade can be related to the increasing number of HABs occurrences worldwide. Although HABs have been known to affect animal health since the end of 19th century [52], several studies indicate that they are currently increasing globally. A recent study in the United States showed that the number of days with HAB will increase from about seven days per year per waterbody to 18-39 days in 2090 [53]. This increase in the number of HAB events and the growing interest of policymakers for developing monitoring tools could be related to the increase in the number of publications observed in the last decade ( Figure 1). Additionally, the development of orbital sensors with water quality monitoring capabilities also played an important role in the increase of the number of publications in this field [54].
Based on the results from the previous section (Section 3) and on the previous algorithms reviews [19,[23][24][25][26][27] there are some perspectives for the next decade of publication on the estimation of PC concentrations from remotely sensed data. The result from the co-citation cluster labels showed an evolution on the study site locations which started locally with airborne imaging of specific lakes and laboratory experiments (Period I, Figure 4). By the use of earth observation satellites, it was possible to shift the location of the studies from local and lab cultures to larger lakes such as the Great Lakes in the United States and Lake Taihu in China (Period II, Figure 5). The use of larger aquatic systems as study sites are probably related to the spatial resolution of orbital sensors used for ocean color remote sensing in this period: MERIS (300 m) and MODIS (1000 m). Additionally, the monitoring of PC based on its absorption feature at 620 nm was only possible using MERIS images (nowadays OLCI images are available). Due to the spatial limitation of the satellite's sensors, the monitoring of PC could only be conducted in larger aquatic systems. Period III showed the expansion to lower latitudes which are areas in which cyanobacteria develop well due to high temperatures [55]. Additionally, higher temperature increases internal phosphorus loading and rates of soil phosphorus regeneration which are used as a nutrient source for cyanobacteria blooms [56]. Thus, study sites in tropical aquatic systems and coastal areas were investigated. It was interesting that during this period, Landsat satellite was used even without the spectral band centered at 620 nm which is the spectral band related to the peak of PC absorption. This could be related to the need to monitor medium to small aquatic systems, which are commonly used for water supply. This was also observed for the monitoring of other water quality parameters with new satellite sensors with a better spatial resolution such as Sentinel 2/Multispectral Instrument (MSI) which has been used for the monitoring of chl-a and water transparency in several small to medium inland aquatic systems. Nevertheless, studies focusing on environments from the Southern Hemisphere are still lacking (especially in equatorial regions from Latin America and Africa). In the equatorial region, the small annual temperature range supports the development of HABs which are mainly controlled by wind and rainfall patterns over the watershed [57]. Therefore, one perspective in this field is to continue with this geographic expansion to have more studies collecting data from equatorial regions. To do that it is important to establish partnerships with institutions within these regions, not only for the data acquisition but also for the development of the research field. The availability of a globally representative dataset will allow future researchers to develop a universal algorithm or an algorithm for a specific optical water type (especially oligotrophic waters).
The label analysis also showed a strong presence of satellite sensors such as Landsat TM, MODIS, and MERIS. However, as pointed by Yan et al. [25] using satellite images for the estimation of PC in small inland waters is still a challenge due to the lack of a sensor with high spatial and spectral resolution (or a multi-spectral sensor containing spectral bands needed for the monitoring of PC). Additionally, considering a cyanobacteria monitoring perspective the revisit time of the sensor is also important, since blooms can form in a few hours. Beck et al. [24] suggested that future satellite sensors for water quality should have a spectral band centered at 620 nm and a spatial resolution between 20 to 90 m. However, since the remote sensing of phycocyanin relies on passive remote sensing methods, cloud cover is a major issue, especially in equatorial regions. Because of that, unmanned aerial vehicles (UAVs) have been tested as a tool to collect high spatial and spectral images over small water bodies [25]. However, their use is still limited by battery consumption which does not allow the imaging of large areas. Nevertheless, UAVs can be used to target specific regions where it is known to be the starting point of a bloom. The monitoring of these small regions could be then used for the prediction of the growth of cyanobacteria. Thus, a future perspective to the field of remote sensing of PC is to have more UAV studies being conducted.
Results also showed an evolution in algorithm development, from empirical and semi-empirical to semi and quasi-analytical algorithms. Figure 3 and Table 2 showed that recent studies have been focusing on quasi-analytical algorithms. Recently, Riddick et al. [26] showed that semi-analytical and quasi-analytical models have the best performance for the estimation of PC, especially because these two types of bio-optical algorithms account for variations in phytoplankton inherent optical properties on the water-leaving signal. The same was observed by Beck et al. [24] who reported that semi-analytical algorithms had an advantage in the estimation of blue-green algae. Thus, because of the development of new equipment to compute optical properties, it is expected that semi and quasi-analytical algorithms will be improved and validated in the near future.
Finally, for a good performance of bio-optical algorithms (especially semi and quasi-analytical) on satellite images, atmospheric correction is needed. In the 30 years of evaluation, atmospheric correction showed to be an important task in Period II, however, it is still one of the main challenges over inland water. Shi et al. [27] highlighted that it is still a challenge to perform atmospheric corrections in turbid inland waters, especially when algal blooms are occurring because they directly affect the remote sensing reflectance in the near-infrared. Additionally, atmospheric corrections over these aquatic systems are influenced by several factors, such as: proximity to terrestrial sources of atmospheric pollution, adjacency effects from neighboring land pixels, undulating topography around the water body, non-negligible reflectance of water in the near-infrared region due to high sediment concentrations in inland waters, and variations in the altitude of inland water surface from the mean sea level [58]. Since this manuscript reviewed PC publications, atmospheric correction studies or comparisons are not included here.

Conclusions
This study represents a new type of review in which algorithms and methods were not explored in depth. Here, a bibliometric analysis of scientific production was presented based on the Web of Science database for the terms "remote+ sensing+phycocyanin". This approach does not describe or compare different algorithms or methods. Since these topics were extensively covered by previous reviews and comparisons [19,[23][24][25][26][27], another review of algorithms was not needed. Because of that, this review was based on co-citation analysis and mapping to evaluate the evolution of the field in the last 30 years.
The results presented in this review are unique, since, to my knowledge, this is the first attempt for performing a bibliometric analysis in this field. As observed on the number of papers published on this topic (see Figure 1) the last two decades were important for the consolidation of the topic within the remote sensing community. Decade-by-decade results showed an interesting evolution of the geographic coverage of remote sensing of PC from local studies using airborne and lab cultures in Period I (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000), to large temperate lakes monitored via satellites in Period II (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) and the expansion to tropical and coastal environments in Period III (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020). In terms of techniques, it was observed that another evolution took place from semi-empirical algorithms in the first years of the field to semi-analytical to quasi-analytical algorithms in the last years. In the 30-year analysis ( Figure 3 and Table 2) it was observed that the development of physics-based algorithms (semi-analytical and quasi-analytical) is more common in the last decade. This occurs especially because of the development of new equipment for measuring optical properties which allow the acquisition of data to validate the bio-optical modeling.
Future research seems to be related to the topics highlighted in Period III, which are related to (1) data acquisition from different locations to validate global studies, especially in oligotrophic to mesotrophic environments, (2) the development of a universal approach for the retrieval of PC concentration from Earth observation, and (3) the development of orbital sensors, which allow the assessment of PC from space. Additionally, the development of atmospheric correction algorithms, especially over inland waters, is needed for the good performance of PC algorithms. Therefore, although not strongly connected to this field, atmospheric corrections are essential.  cyanobacteria; predicting phycocyanin concentrations; novel algorithm; proximal hyperspectral remote sensing approach; mesotrophic reservoir; phycocyanin; turbid productive water; chlorophyll; chlorophyll-a | phycocyanin; turbid productive water; chlorophyll; proximal hyperspectral remote sensing approach; predicting phycocyanin concentrations; chlorophyll-a; mesotrophic reservoir; cyanobacteria; novel algorithm