On the Prediction of Upwelling Events at the Colombian Caribbean Coasts from Modis-SST Imagery

The upwelling cores on the Caribbean Colombian coasts are mainly located at the Peninsula de la Guajira and Cabo de la Aguja. We used monthly averaged Moderate Resolution Imaging Spectroradiometer (MODIS) sea surface temperature as the only information to build up a prediction model for the upwelling events. This comprised two steps: (i) the reduction of the complexity by means of the Karhunen–Loève transform and (ii) a prediction model of time series. Two prediction models were considered: (a) a parametric autoregressive-moving average (ARMA) time series from the Box–Jenkins methodology and (b) a harmonic synthesis model. The harmonic synthesis also comprised of two steps: the maximum entropy spectral analysis and a least-squares harmonic analysis on the set of frequencies. The parametric ARMA time series model failed at the time of prediction with a very narrow range, and it was quite difficult to apply. The harmonic synthesis allowed prediction with a horizon of six months with a correlation of about 0.80. The results can be summarized using the time series of the weights of the different oscillation modes, their spatial structures with the nodal lines, and a high confidence model with a horizon of prediction of about four months.


Introduction
One of the nine upwelling areas in the Southern Caribbean [1] is located in Punta Gallinas and Cabo de la Aguja, on the Peninsula de la Guajira (Figure 1). Deep currents travel from the northward branch of the Brazilian current, reaching the Caribbean Sea as the Caribbean current after crossing the Lesser Antilles [2,3]. A description of the upwelling foci and an exhaustive study on the oceanography, nutrient and pigment distribution, and a discussion of the role of the upwelling filaments in the enrichment of open-sea areas in the Caribbean can be found in [4]. Remote sensing studies in the area started with the use of coastal zone color scanner (CZCS) imagery, which showed a low phytoplankton concentration in the open ocean and high concentrations at the coastal areas where upwelling occurs [5]. A study of the upwelling foci along the Southern Caribbean coast using advanced very high resolution radiometer (AVHRR) imagery for the year 1996 can be found in [6]. Upwelling events from AVHRR imagery were also studied in [7], where remote sensing information was combined with meteorological data to link the wind forcing with the sea surface temperature (SST) field. Upwelling events occur twice per year. The most important event occurs when the intertropical convergence zone (ITCZ) is at its southernmost position (December until April-May) with the prevailing trade winds acting in the Southern Caribbean. The second and weaker event occurs in about June-August when the ITCZ is at its northernmost position. This event is caused by intensification of the Caribbean low-level jet (CLLJ) [8][9][10][11].
Sensors 2019, 19, x FOR PEER REVIEW 2 of 13 convergence zone (ITCZ) is at its southernmost position (December until April-May) with the prevailing trade winds acting in the Southern Caribbean. The second and weaker event occurs in about June-August when the ITCZ is at its northernmost position. This event is caused by intensification of the Caribbean low-level jet (CLLJ) [8][9][10][11]. On the other hand, we studied the upwelling at the Peninsula de la Guajira from several points of view. The application of mathematical morphology [12,13] revealed many internal features of the SST signature. Computation of the fractal dimension of the skeletons of the SST signatures and joint analysis with climate indexes provided insights on the influence of meteorological forcing on the upwelling system. The influence of the upwelling at the Peninsula de la Guajira on some coastal water bodies was studied from a numerical point of view [14].
In this study, we consider the prediction of the upwelling events at the Peninsula de la Guajira from monthly averaged MODIS-SST imagery. Higher frequency composites show cloud cover, and the main objective of this work is the development of a prediction model. This comprises two steps. The first is the reduction of the complexity by means of the Karhunen-Loève transform (KLT). This allows the decomposition of a time series of digital images onto a double-linked basis set consisting of orthonormal spatial modes and their corresponding temporal modes. The second step is the prediction of the temporal modes. This is carried out by two different methods. The Box-Jenkins methodology for time series models (ARMA) produces low-quality predictions. However, the study of frequency composition to apply harmonic synthesis presents some difficulties because of the short length of the time series. The power spectral density (PSD) is computed by maximum entropy spectral analysis (MESA) to avoid spectral leakage of the Fourier transform [15]. The set of frequencies is used to build up a harmonic regression, leading to a high-confidence prediction model. This manuscript is structured as follows: the descriptions of the available imagery, its preprocessing, a brief introduction of the KLT, some guidelines about reading the information from it, On the other hand, we studied the upwelling at the Peninsula de la Guajira from several points of view. The application of mathematical morphology [12,13] revealed many internal features of the SST signature. Computation of the fractal dimension of the skeletons of the SST signatures and joint analysis with climate indexes provided insights on the influence of meteorological forcing on the upwelling system. The influence of the upwelling at the Peninsula de la Guajira on some coastal water bodies was studied from a numerical point of view [14].
In this study, we consider the prediction of the upwelling events at the Peninsula de la Guajira from monthly averaged MODIS-SST imagery. Higher frequency composites show cloud cover, and the main objective of this work is the development of a prediction model. This comprises two steps. The first is the reduction of the complexity by means of the Karhunen-Loève transform (KLT). This allows the decomposition of a time series of digital images onto a double-linked basis set consisting of orthonormal spatial modes and their corresponding temporal modes. The second step is the prediction of the temporal modes. This is carried out by two different methods. The Box-Jenkins methodology for time series models (ARMA) produces low-quality predictions. However, the study of frequency composition to apply harmonic synthesis presents some difficulties because of the short length of the time series. The power spectral density (PSD) is computed by maximum entropy spectral analysis (MESA) to avoid spectral leakage of the Fourier transform [15]. The set of frequencies is used to build up a harmonic regression, leading to a high-confidence prediction model. This manuscript is structured as follows: the descriptions of the available imagery, its pre-processing, a brief introduction of the KLT, some guidelines about reading the information from it, and a description of the time-series prediction model are presented in Section 2. The results and the discussion are presented in Section 3. Finally, the conclusions are drawn in Section 4.

Satellite SST Imagery
The available satellite imagery consisted of a set of 118 monthly composite MODIS-SST images with a 1 km 2 spatial resolution (December 2002 to September 2012). They were downloaded from [16], but they were originally generated by the Institute of Marine Remote Sensing at the College of Marine Science at University of South Florida. The digital images were cut to the study area (NW corner at [16 • N, 77 • W], SE corner at [10.5 • N, 71.5 • W]) to focus the analysis on the Peninsula de la Guajira. The digital images were transformed to an 8-bit grey scale for computation and were translated to degrees Celsius at the time of presenting the results.
The upwelling system at the Caribbean Colombian coasts is linked to the dynamics of the Atlantic ITCZ. This is defined as a narrow zonal strip where atmospheric convergence of both hemispheres occurs. The generation of the westward trade winds conforms to the Caribbean low-level jet (CLLJ) [8][9][10][11]. The latitude of the ITCZ spans from 4 • N to 13 • N. From December to April, the ITCZ is in its southernmost position at around 4 • N, causing intensification of the trade winds and moving them over the southern Caribbean Sea, generating the main upwelling episode. A full discussion can be found in [3,4,[7][8][9][10][11]17].
An example of the upwelling event from November 2007 to June 2008 is shown in Figure 2. discussion are presented in Section 3. Finally, the conclusions are drawn in Section 4.

Satellite SST Imagery
The available satellite imagery consisted of a set of 118 monthly composite MODIS-SST images with a 1 km 2 spatial resolution (December 2002 to September 2012). They were downloaded from [16], but they were originally generated by the Institute of Marine Remote Sensing at the College of Marine Science at University of South Florida. The digital images were cut to the study area (NW corner at [16°N, 77°W], SE corner at [10.5°N, 71.5°W]) to focus the analysis on the Peninsula de la Guajira. The digital images were transformed to an 8-bit grey scale for computation and were translated to degrees Celsius at the time of presenting the results.
The upwelling system at the Caribbean Colombian coasts is linked to the dynamics of the Atlantic ITCZ. This is defined as a narrow zonal strip where atmospheric convergence of both hemispheres occurs. The generation of the westward trade winds conforms to the Caribbean lowlevel jet (CLLJ) [8][9][10][11]. The latitude of the ITCZ spans from 4°N to 13°N. From December to April, the ITCZ is in its southernmost position at around 4°N, causing intensification of the trade winds and moving them over the southern Caribbean Sea, generating the main upwelling episode. A full discussion can be found in [3,4,[7][8][9][10][11]17]. An

The Karhunen-Loève Transform
The KLT was independently proposed by [18,19]. The method is widely known under a variety of names, but the main basis is the so-called Hotelling transform [20], also known as principal component analysis [21,22]. The KLT is a commonly used tool to project a time series of a two-dimensional scalar field onto a double empirical-linked basis conformed by self-oscillation spatial modes and their corresponding temporal weights. Decreasing the dimensions of the problem was one of its first applications [23], and it is commonly used for the denoising of time series of two-dimensional fields or for facial and pattern recognition [24][25][26].
In the frame of the KLT, each scene of a time series of equally sampled digital images is named snapshot, Φ n (x, t), where x represents the position vector and t stands for time. Their ensemble, , is named super-snapshot, and it must be decomposed of using the KLT. Let a super-snapshot of n be centered on two-dimensional digital images Φ(x, t) − Φ(x). The mean field is Φ(x). Consider the decomposition of the n-th snapshot, Φ n (x, t), in terms of the sum of an orthonormal basis of m spatial modes ,m and a factor to measure the importance (a singular value) of each i-th mode, M i , in such a way that The product of the i-th temporal and spatial terms is the i-th mode. However, it is convenient to compute them separately. The usual dual estimates are where <> stands for the projection operator in the Hilbert space [27]. This can be also written as an eigenproblem: where the i-th eigenvalue Λ i is related to the i-th singular value M i by Λ i = M 2 i and the symmetric, positively defined covariance matrix C tt = Φ(x, t)Φ(x, t ) . The latter is computed as the autocorrelation function of the super-snapshot and stored in matrix form.
The easiest approach to carry out the eigen-decomposition of Equation (3) is to consider the expression of one snapshot in terms of Equation (1): where a i (t) and ψ i (x) are the i-th temporal and spatial modes, respectively, and µ i is the associated singular value that gives the importance of the i-th mode. The above self-covariance matrix is then written by means of singular value decomposition: taking advantage of the ortho-normality of each two bases. The identity described by Fukunaga [22] for the temporal covariance matrix is easily reached: where λ k = µ 2 k is the k-th eigen-value. Finally, the eigen-functions are computed by writing the self-covariance matrix by using a combination of the temporal correlation between times t and t' and the spatial modes weighted by their variances: The following remarks could be useful for clarifying the above exposition and for computational details. As usual, in the methodology of the principal component analysis, the variables must be centered. The mean field must be computed pixel-by-pixel to prevent bias error through the averaging of undesired areas. So, the first result from the KLT is the mean field, Φ(x), the second result is the set of spatial modes, and the third is the set of temporal modes. In addition, the spectrum of the eigenproblem sheds light on how many modes must be considered.
The spatial modes give the spatial structure of the oscillation and the intensity (amplitude) of the spatial variation. These modes have time-invariant lines (zero amplitude) or nodal lines, and the field pivots around them. The temporal modes give the corresponding time evolution. They allow the correlation to any exogenous variable to identify external influences if required. The importance of each mode allows us to simplify the problem and filter out the important amount of noise that is always present in any geophysical signal.
Another question is how many eigenmodes must be computed. When a set of N images is submitted to a KLT, N modes are computed, as in the principal component analysis. However, this may be not the best option, because the first eigenvalue is sensitive to round-off errors, and its correction is very difficult. It is convenient to think that the covariance matrix holds the signal and noise subspaces and that the eigenmodes to be computed are assigned to the signal subspace. From a practical point of view, the number of eigenmodes can be one-fifth, and perhaps one-quarter, of the number of available images.

Parametric Time-Series Models
After Box-Jenkins, the most celebrated parametric time-series model is the auto-regressive integrated moving averaged (ARIMA) model [28,29]. The integration is introduced to take trends into account, and the AR and MA terms are devoted to the so-called deterministic and noise parts, respectively. The temporal modes from the KLT oscillate around a mean value with no trend, so the integration is not needed, and it leads to the ARMA(p,q) model: where p and q represent the order of the AR and MA parts, respectively; a n represents the reflection coefficients; b n represents the coefficients of the MA part; and θ t is the noise at time t. The coefficients are computed by the Yule-Walker equations, and they must verify the Levinson recursion to ensure that the roots of the polynomials in Equation (8) lie in the unit circle [28,29]. The parametric time-series models are barely constrained by the parsimony principle. This means that the order of the model must be the lowest possible [28,29]. Once the coefficients of Equation (8) are known, the prediction is easily computed.

Harmonic Prediction Model
The harmonic prediction model in based on the frequency composition of the time series. Because the time series of the modes are short, the Fourier transform is not efficient for computing the power spectral density due to the spectral leakage and low resolution [15]. The maximum entropy spectral analysis (MESA) was used instead. This consists of fitting the autoregressive parametric model x(t) = 1 + P n=1 a n ·x t−n .
By taking its z-transform and making a sine-cosine recursion, the result is a Fourier-like decomposition with very sharp peaks and very well located frequencies [15]. Once the spectral information has been computed and the frequency composition is known, the harmonic least-squares fit is which gives the mean value w 0 and the amplitudes of the circular functions [a,b] i . Now, it is possible to compute a prediction at a given time.
The key point is the determination of the order of the AR model. This is not constrained to the parsimony principle, and the order of the AR model can be chosen to be as high as needed from the considerations of Wold's Theorem [28]. The upper limit occurs when spontaneous line splitting is observed [15].

KLT
The first 112 available MODIS-SST images (December 2002 to March 2012) were processed using the KLT (Section 2.2). The cumulative explained variance was computed from the spectrum of the eigenproblem (Figure 3a). The first mode explained up to 93% of the whole variance, and the first five modes explained close to 99%. The temporal weights of the first four are presented in Figure 3b-d. The first and most important mode presents a very regular pattern with a period of about one year (Figure 3b). The second and third modes (Figure 3d) have many short-period components and a smooth evolution. The fourth and higher modes are minor modes. It is desirable for the second to fourth modes to increase their explained variance by about 5% and to take into account short-term oscillations for predictions.
The logarithmic MESA-PSD values of the first four temporal modes are shown in Figure 4a-d, and more relevant periods can be found in Table 1. The first and most important mode (Figure 4a) presents a well-defined temporal pattern (Figure 3b) with a period of one year and six months (Figure 4a). The second mode (Figure 4b) presents the same periods and some others, such as 9, 6.8, 5.2, and 4 months. The third and fourth modes (Figure 4c,4d) have the same periods in their compositions, and a new one of 54 months long is detected. The only clear periods for reading are the annual and semiannual periods; they are used in most of the geophysical signals. The other periods are intentionally left for further works, but some of them were previously noted in [8] and [10].

Mode
Period (Months) 1 11.9, 6.8 2 11.9, 9.0, 6.8, 6.0, 5.2, 4.0 3 54.3, 11.9, 9.0, 6.8, 6.0, 5.2, 4.0 4 54.3, 11.9, 9.0, 6.8, 6.0, 5.2, 4.0 The mean SST field and the first four spatial modes are presented in Figure 5. Because the variables submitted to a KLT must be centered, one of the results of the KLT is the average SST field (Figure 5a). It shows a permanent structure at Punta Gallinas and along the Colombian coast, but not at Cabo de la Aguja. The mean value at the upwelling cores is about 22-23 °C, and when there is no upwelling, it is about 28-29 °C. The first mode (Figure 5b) presents the structure of the developed upwelling. The maximum values follow the coast, and all values have the same sign, except at the mouth of Lake Maracaibo and around Santa Marta. The second mode (Figure 5c) presents two differentiated areas. The first is close to the coast, and the other takes the open sea into account, both

Mode
Period (Months) 1 11.9, 6.8 2 11.9, 9.0, 6.8, 6.0, 5.2, 4.0 3 54.3, 11.9, 9.0, 6.8, 6.0, 5.2, 4.0 4 54.3, 11.9, 9.0, 6.8, 6.0, 5.2, 4.0 The mean SST field and the first four spatial modes are presented in Figure 5. Because the variables submitted to a KLT must be centered, one of the results of the KLT is the average SST field (Figure 5a). It shows a permanent structure at Punta Gallinas and along the Colombian coast, but not at Cabo de la Aguja. The mean value at the upwelling cores is about 22-23 • C, and when there is no upwelling, it is about 28-29 • C. The first mode (Figure 5b) presents the structure of the developed upwelling. The maximum values follow the coast, and all values have the same sign, except at the mouth of Lake Maracaibo and around Santa Marta. The second mode (Figure 5c) presents two differentiated areas. The first is close to the coast, and the other takes the open sea into account, both with different signs in their values. The third and fourth modes (Figure 5d,e) show small scattered structures that can be thought of as noise. Although the fourth mode ( Figure 5d) reveals a nice spatial structure, it is necessary to keep in mind that it has a small contribution to the explained variance (Figure 3a). with different signs in their values. The third and fourth modes (Figure 5d,e) show small scattered structures that can be thought of as noise. Although the fourth mode ( Figure 5d) reveals a nice spatial structure, it is necessary to keep in mind that it has a small contribution to the explained variance (Figure 3a). The zero value lines, the nodal lines, are also represented in the spatial modes ( Figure 5). The time-invariant lines in a given mode are where each spatial mode pivots, and they can be considered as the statistical limits of the upwelling events in a given area. The most important mode, i.e., the first one in Figure 5b, has no nodal lines, and all fields will oscillate up and down with the corresponding temporal mode. The highest amplitudes cover the area affected by the upwelling. The second mode (Figure 5c) has a nodal line dividing the domain into two halves with different signs, which oscillate up and down depending on the temporal weight. The highest amplitudes correspond to the areas with the coldest waters, and the nodal line can be considered the limit of the most important part of the upwelling. The third mode (Figure 5d) presents some noise, and the fourth mode (Figure 5e) has a banded structure with three areas that are well-delimited by two nodal lines.

A Prediction Model for the Upwelling Events
The advantage of the KLT is the drastic reduction in complexity, because the modes are considered separable in time and space (see Section 2.2). So, it is possible to build up a simple prediction model from the temporal modes without getting into the complexities of a numerical model. Although the predicting method is customary, its effectiveness depends on the experience of the researcher in time-series analysis. The set of 119 digital images was divided in two. The first 112 images (December 2002 to March 2012) was the training set, and it was submitted to a KLT decomposition. The temporal modes were predicted by two different methods, and their compositions (Equation (4)) were compared to the last six images (April 2012 to September 2012).
The worst situation in which to test the methodology is when the upwelling is deactivated and the warmer water of the open Caribbean reaches the Colombian coasts. Two methods were used to test this unfavorable situation. The first one consists of prediction using the autoregressive-moving average (ARMA) parametric time-series model. An ARMA(2,1) (see Section 2.3) was estimated for most of the temporal modes after the analysis of the autocorrelation and partial autocorrelation functions [28,29] and six projected time steps. One of the temporal modes was fitted to an AR(2) because the MA part could not be computed. The predicted SST field was computed from Equation (4) considering the prediction of the temporal factor and adding the mean SST field. The second method used for prediction was the harmonic method. The spectral information of the temporal modes was computed using MESA ( Figure 4 and Table 1). Then, the least-squares method in Equation (8) led to the prediction model. Figure 6 presents the MODIS-SST images for April to September 2012 (Figure 6a, from left to right), and the predictions of the same scenes using the ARMA model are presented in Figure 6b. The same parameters with harmonic synthesis are presented in Figure 6c. The first inspection reveals that the parametric time-series ARMA mode failed in the prediction, because the temperature range was quite narrow and far from the observed imagery. The harmonic method performed better, and its solutions were smoother, with results close to the observed data.
The correlations between the original MODIS-SST images and the predictions are presented in Table 2. The land was kept out of the analysis to avoid biased results. The goodness of fit was extremely good for the ARMA models (>0.9) for the months of April and May, and the harmonic prediction the worst (about 0.8). This is because there was no upwelling and no organized structures. However, for the rest of the months, when a weak upwelling episode started, the ARMA model predictions were the worst, falling under 0.75, lower than the harmonic prediction (which was consistently about 0.8). It may be concluded that the harmonic method works better for longer time predicting horizons and when there are organized structures. The correlations between the original MODIS-SST images and the predictions are presented in Table 2. The land was kept out of the analysis to avoid biased results. The goodness of fit was extremely good for the ARMA models (>0.9) for the months of April and May, and the harmonic prediction the worst (about 0.8). This is because there was no upwelling and no organized structures.

Conclusions
The Colombian Caribbean upwelling system is almost permanent due to the combination of the morphology of the coast, the bathymetry, and the persistently blowing westward trade winds related to the dynamics of the ITCZ. The KLT is very useful for simplifying the analysis of a time series of digital images to obtain three types of information: the spectrum of the problem, the temporal modes, and the spatial modes. The spatial modes give information on the intensity of the spatial variations and the very important nodal lines. These show a banded spatial structure with an area near the coast, another at the open sea, and the third as a transitional area in the middle. The visual inspection of the temporal weights revealed periodic patterns in many modes.
Two time-series prediction models were tested. The ARMA model presented a very high correlation (>0.9) when there were no upwelling events; otherwise, it must be not considered. The harmonic synthesis performed much better under upwelling conditions (>0.8). The horizon of prediction depended on how developed the upwelling event was, but the harmonic prediction worked better than the parametric ARMA model. The prediction models could be improved by considering additional variables in the models with the contribution of exogenous variables as climate indexes (see [8,10] for two examples). Funding: This work has been partially supported by RNM160 of the local Andalusian government.