Optimal Kernel ELM and Variational Mode Decomposition for Probabilistic PV Power Prediction

: A probabilistic prediction interval (PI) model based on variational mode decomposition (VMD) and a kernel extreme learning machine using the ﬁreﬂy algorithm (FA-KELM) is presented to tackle the problem of photovoltaic (PV) power for intra-day-ahead prediction. Firstly, considering the non-stationary and nonlinear characteristics of a PV power output sequence, the decomposition of the original PV power output series is carried out using VMD. Secondly, to further improve the prediction accuracy, KELM is established for each decomposed component and the ﬁreﬂy algorithm is introduced to optimize the penalty factor and kernel parameter. Finally, the point predicted value is obtained through the summation of predicted results of each component and then using the nonlinear kernel density estimation to ﬁt it. The cubic spline interpolation algorithm is applied to obtain the shortest conﬁdence interval. Results from practical cases show that this probabilistic prediction interval could achieve higher accuracy as compared with other prediction models.


Introduction
Driven by the global severe condition of fossil fuel depletion and growing environmental pollution, as an environmentally friendly renewable energy, photovoltaic (PV) generation is an exemplar of widely used power generation methods in the renewable energy industry. PV generation is susceptible to surface solar irradiance, and its output is strongly random, which challenges frequency regulation, peak load regulation, and system reserve. With the increase in grid integration capacity, the randomness of PV generation brings more and more risks to power system scheduling and operation. More accurate prediction of PV power can provide a reliable basis for power grid dispatching decisions [1]. It is of significant importance to ensure system security, stability, and optimal operation.
There are four main techniques to predict PV power output, namely physical, artificial intelligence (AI), statistical, and hybrid approaches [2][3][4]. The physical method uses numerical weather prediction (NWP) data and measured data. The statistical method establishes a relationship between historical data and forecasted variables based on data-driven formulations such as regression models [5,6], time series [7], and cluster analysis of clearness index [8,9]. For AI methods, there are artificial Table 1. Literature review on recent works.

Variational Mode Decomposition Principle
Signal decomposition is often used in hybrid prediction methods. The common signal decomposition techniques are wavelet packet decomposition (WPD) [30] and empirical mode decomposition (EMD) [31]. EMD can decompose signals into different frequency characteristics and solve the envelope and instantaneous frequency decomposition based on Hilbert-Huang transform.
Although Hilbert-Huang transform is an effective decomposition method, EMD also has some drawbacks, such as unavailability of a rigorous mathematical model, interpolation option, and is sensitive to sampling and noise. The authors in [31] described that EMD exhibits a lack of "sparseness" characteristics, whereas VMD could probably render a slightly higher degree of sparseness than EMD. To overcome these limitations, in 2014, an alternative multiresolution technique named variational mode decomposition (VMD) was presented [32]. VMD is a model of entirely non-recursive variational, and the modes are simultaneously determined. The VMD model looks for multiple modes and their center frequencies. The band-limited modes recreate the input signal precisely or in a least squares manner. The instantaneous frequency of each analytic signal has practical physical significance in VMD, whereas this is not the case for EMD. In this paper, the VMD is selected as the signal decomposition method. VMD aims to decompose the original series u into a series of band-limited modes u k , where individual mode compacts a center frequency ω k identified in the decomposition. Individual mode u k for the bandwidth can be calculated with the procedures presented in [33].

Kernel Extreme Learning Machine
A single-hidden layer feedforward neural network (SLFN) structure is an extreme learning machine (ELM), whose input weights and hidden layer biases are chosen arbitrarily. Well-known neural network learning methods including the back propagation (BP) algorithm require the user to manually establish a significant number of training parameters, and such procedure gives a prediction output that can be of local optimum. Differently, the ELM requires to establish the number of hidden layer nodes for the model without the task to modify the bias of the hidden layer units and the network's input weights. The Moore-Penrose generalized inverse matrix theory is used to obtain the optimal output weights for ELM. Moreover, to minimize the training error, its output weights can be solved by only one step. It has the characteristic of rapid learning and great generalization capability [34][35][36]. However, the number of its hidden nodes is difficult to be determined and its output is of stochastic volatility. By introducing the kernel function to ELM and comparing with SVM theory, the kernel extreme learning machine (KELM) algorithm was developed. KELM enhances the algorithm's learning accuracy [37]. The detailed proof process can be referred to references [38,39].
For N random sample (x i ,t i ), where t i is the relating target class label of the training sample x i , x i ∈ R n , t i ∈ R n , i = 1, 2, . . . , n. The ELM's output function is [39] is the output, a row vector of the hidden layer concerning the input x, which for the relationship for the samples from the input space to the hidden layer feature space, L is the number of hidden layers, H is the hidden layer output matrix, I is unit sparse, C is penalty coefficient, and T is the output matrix of the SLFN. If the feature mapping h(x) is unknown, then a kernal matrix for ELM needs to be defined. The output of the KELM model can be calculated by Equation (2) below. where Ω ELM = h(x i ) · h x j = k x i , x j is a kernel function. In Equation (2), the hidden layer feature mapping h(x) does not required to be defined and the function does not identify the number of hidden layers L. The kernel k(u, v) that substitutes h(x) and L needs to be defined. The stable kernel function substitutes the ELM's arbitrate mapping and the output weight becomes robust. KELM constitutes an enhanced generalization capability than ELM.
Various kernel functions exist such as polynomial kernel, linear kernel, and radial basis function (RBF) kernel. RBF kernel shows great learning capability in practical challenges and the amount of unknown parameters is less compared to polynomial kernel. Therefore, the RBF kernel function is considered in this work.
The RBF kernel function and output weight of the KELM model is written as where g is the kernel parameter, and β is the output weight between the output and hidden layers. The ELM with the kernel function overcomes the shortcomings of the traditional ELM [38], but its performance is affected by the penalty coefficient C and g. Therefore, the firefly algorithm (FA) is chosen to optimize the parameters.

Optimization of KELM's Parameters through FA
Since the kernel parameter and penalty coefficient of KELM are the two major factors affecting the prediction performance, this paper selects the firefly algorithm which is characterized with fewer parameters, better stability, and convergence to optimize these two parameters for further enhancing the prediction model's accuracy. FA is a swarm optimization technique developed by simulating the flashing characteristics of fireflies in nature [40]. In the algorithm, each firefly represents a solution of the solution space and is randomly distributed in the searching space. Each firefly individual has its own brightness and neighborhood, known as the radius of the decision domain, r i d 0 < r i d < r s , where r s is the visual range of the firefly. To simplify the problem, the following three hypotheses are considered to describe FA: (1) All fireflies are unisex. A firefly can be attracted to other fireflies of either sex; (2) The firefly's brightness is relevant to the analytical expression of the objective function. To solve the maximization problem, brightness is considered to be a proportion to the objective function's value. Alternative forms of brightness could be established in a similar approach to the fitness function in some optimization techniques; (3) Attractiveness is related to the firefly's brightness. The dimmer firefly will approach the brighter firefly for two flashing fireflies. Attractiveness is related to the brightness of the fireflies and will reduce with increasing distance between fireflies. A firefly will move arbitrarily in the space if there are no fireflies brighter than itself.
The details for the FA are given in [41]. The solution to the KELM based on FA optimization is shown in Algorithm 1. Initialize: population of fireflies x i (i = 1, 2, . . . , n), the number of iterations t ← 1 Set light absorption coefficient γ, MaxGeneration, calculation dimension d, the attractiveness β, the randomization parameter α while (t<MaxGeneration ) for i = 1:n all n fireflies for j = 1:n all n fireflies end for j end for i Rank the fireflies and determine the optimal solution end while Output the optimum C, g

Influential Factors of PV Output
PV output forecasting is a complex nonlinear problem. Therefore, environmental factors and weather conditions should be considered. In this paper, different weather conditions including solar radiation, wind speed, and temperature are studied. The full extent of the impact of these weather conditions on PV power output is analyzed [42][43][44].

PV Output for Different Weather Conditions
The PV outputs as shown in Figure 1 under the different weather conditions, such as three main weather conditions, sunny, cloudy, and rainy day in 2015, are selected from the Ashland station in Oregon, USA. We can conclude that, from Figure 1, the PV output is stable: basically, there is normal distribution on sunny days. On cloudy days, the PV output forecasting becomes more challenging with greater randomness and volatility of solar radiation. The PV output is changing all the time on rainy days, and the average output of PV power is very small. This situation will affect the safety and stability of PV power plant operation [45]. Therefore, it is essential to classify a large number of PV output historical data according to their weather conditions to enhance the prediction's accuracy.

The Influences of Different Meteorological Factors on PV Power Output
Different meteorological factors can change the PV power output. If we take all terms into account, the complexity of prediction will be increased. Historical meteorological data encompassing solar radiation, temperature, and wind speed, and PV power outputs for the whole year of 2015 are

The Influences of Different Meteorological Factors on PV Power Output
Different meteorological factors can change the PV power output. If we take all terms into account, the complexity of prediction will be increased. Historical meteorological data encompassing solar radiation, temperature, and wind speed, and PV power outputs for the whole year of 2015 are selected, and the Kendall correlation coefficient method is applied to analyze the factors affecting PV power [46,47]. The result of the Kendall rank correlation coefficients is provided in Table 2. As shown in Table 2, the PV power output has a maximum correlation with solar radiation, which is 0.977. The correlation between PV power output and the temperature is 0.265, which is a weak correlation. Further, the correlation between PV power output and wind speed is 0.094, which is irrelevant. These three factors are positively related to PV power output. As a result, solar radiation and temperature are chosen as the input variables of the FA-KELM model.

Prediction Interval Evaluation Indices
The prediction interval results are composed of the upper and lower boundaries and correspond to certain expected confidence levels, which are different from point forecasts. The common predictive indices used here are given in Appendix A.

Kernel Density Estimation (KDE)
In order to get the confidence interval as high as possible and the average interval bandwidth as small as possible, the kernel density distribution interval is estimated by the predicted error between the point predicted value and actual value of PV power. For a specific predicted error e, the formula of its probability density function can be written as follows [30,48,49]: where k(x) is the kernel function, and Gauss kernel function is used,ê i represents the point predicted error samples for PV power, N is forecast sample size, and h is the PV interval bandwidth. In this paper, the bandwidth is calculated by the following equation [50]: where σ stands for sample standard deviation, and FIQR is the interquartile sample range. If the kernel density estimation is used for each data point, the computation will increase when the sample number is increasing. This work allocates the power prediction errors into equal intervals in hours. Considering that the length of the power section is ∆P and the range of power fluctuation is [P 1 , P h ], the interval can be calculated as Energies 2020, 13, 3592 9 of 21 where i = 1, 2, . . . , l, l is the number of sectors. Moreover, l is calculated as According to the PV power point prediction value, the probability density curve of the power prediction error is computed by a kernel density estimation, then the PV power prediction interval is obtained under a certain confidence level. Suppose that the probability distribution function of a PV power prediction error e is F(ε), where ε is a random variable of prediction error, then the probability prediction range of the actual value of PV power under a given confidence level 1 − α is whereP is the point prediction value, and G(α 1 ) and G(α 2 ) are both the inverse functions of the probability distribution function F(ε). Through the sensitivity analysis, it is found that when α 1 = α/2 and α 2 = 1 − α 1 , the confidence interval is minimum. The practical implementation of the prediction interval under a certain confidence level is summarized as follows: Step 1: Determine the corresponding power interval for a predicted value; Step 2: Find the error probability density curve corresponding to the above interval, and solve the corresponding probability distribution function with the integral; Step 3: Adopt the cubic spline interpolation method to fit the probability distribution curve of the prediction error. Then, solve the α/2 and 1 − α/2 intervals of the prediction error for the PV output; Step 4: Calculate the prediction interval for the power output according to Equation (8).

Probabilistic Prediction Interval Model
A probabilistic prediction interval is formed under a certain confidence level, and the interval predicted value is calculated by determining the lower and upper bounds of the confidence interval, which will provide more comprehensive information for power system decision-makers. The flowchart is shown in Figure 2 below.
The procedure can be explained as follows: Step 1: Preprocess the power data of the PV power plant with normalization.
where x i represents the original input data, x max = max(x), x min = min(x), u ∈ [0, 1] is the normalized data, and i = 1, 2, . . . , m, m is the total input number; Step 2: Use the clustering algorithm to classify the original PV power data into sunny, cloudy, rainy, and other weather conditions; Step 3: Use the VMD algorithm to decompose different weather conditions data, then the subseries is divided into training set and testing set; Step 4: Add the corresponding meteorological data into the data set from Step 3, and the FA-KELM algorithm is adopted to create the forecasting model of intra-day-ahead. Finally, the subseries will be added to obtain the final point prediction value; Step 5: According to the error between the point prediction value and the actual value used to estimate the kernel density distribution, the confidence interval under a certain confidence level will be obtained by the cubic spline interpolation method. Then, the prediction interval model is built; Step 6: Analyze the predicted results according to the prediction interval evaluation indices. The procedure can be explained as follows: Step 1: Preprocess the power data of the PV power plant with normalization.
where represents the original input data, is the normalized data, and = 1, 2, ..., , is the total input number; Step 2: Use the clustering algorithm to classify the original PV power data into sunny, cloudy, rainy, and other weather conditions; Step 3: Use the VMD algorithm to decompose different weather conditions data, then the subseries is divided into training set and testing set; Step 4: Add the corresponding meteorological data into the data set from Step 3, and the FA-KELM algorithm is adopted to create the forecasting model of intra-day-ahead. Finally, the subseries will be added to obtain the final point prediction value; Step 5: According to the error between the point prediction value and the actual value used to estimate the kernel density distribution, the confidence interval under a certain confidence level will be obtained by the cubic spline interpolation method. Then, the prediction interval model is built; Step 6: Analyze the predicted results according to the prediction interval evaluation indices.

Data Collection
The data from the Ashland PV power plant with the capacity of 15 kW per unit in Oregon, USA (latitude: 42.19; longitude: 122.70; altitude: 595m) are selected for modeling and evaluating the

Data Collection
The data from the Ashland PV power plant with the capacity of 15 kW per unit in Oregon, USA (latitude: 42.19; longitude: 122.70; altitude: 595 m) are selected for modeling and evaluating the prediction performance. The research duration time is between 6 a.m. and 6 p.m. from 1 January 2015 to 31 December 2015, in which some "incomparable" data of 11 days are found by comparing and analyzing the original data [51]. Then, these data are removed in the prediction. The database from the website contains data for every 5 min, and we derive the average data value for every 15 min. We collect data for 12 h per day, that is, there are 48 data points per day with a sampling rate of 15 min per sample. The corresponding weather information per day was obtained from the historical weather data through the website given in reference [52]. The data which fluctuate excessively and irregularly are eliminated. The data sets are normalized to the interval of [0, 1]. In the meantime, 75% of the data are treated as the training set, while the remaining 25% is regarded as the testing set. The original sample data of the PV power series in April 2015 are shown in Figure 3 below.

Cluster Analysis
The self-organizing map (SOM) neural network [53,54] consists of an input layer and a competition layer. The main idea is that neurons in the competition layer of the network will compete with each other in order to gain the response opportunity to input variables. Finally, only one neuron will win. Usually, it is mainly used for clustering. minutes. We collect data for 12 hours per day, that is, there are 48 data points per day with a sampling rate of 15 minutes per sample. The corresponding weather information per day was obtained from the historical weather data through the website given in reference [52]. The data which fluctuate excessively and irregularly are eliminated. The data sets are normalized to the interval of [0,1]. In the meantime, 75% of the data are treated as the training set, while the remaining 25% is regarded as the testing set. The original sample data of the PV power series in April 2015 are shown in Figure 3 below.

Cluster Analysis
The self-organizing map (SOM) neural network [53,54] consists of an input layer and a competition layer. The main idea is that neurons in the competition layer of the network will compete with each other in order to gain the response opportunity to input variables. Finally, only one neuron will win. Usually, it is mainly used for clustering.
Considering the randomness and intermittent nature of PV power, it will generate significant errors if we make predictions directly through the original data. Therefore, according to the already known weather conditions, cluster analysis for the original data sample in the year 2015 is conducted through the SOM neural network. Its clustering distribution is shown in Figure 4, and the clustering results are shown in Table 4.  Considering the randomness and intermittent nature of PV power, it will generate significant errors if we make predictions directly through the original data. Therefore, according to the already known weather conditions, cluster analysis for the original data sample in the year 2015 is conducted through the SOM neural network. Its clustering distribution is shown in Figure 4, and the clustering results are shown in Table 3.

Cluster Analysis
The self-organizing map (SOM) neural network [53,54] consists of an input layer and a competition layer. The main idea is that neurons in the competition layer of the network will compete with each other in order to gain the response opportunity to input variables. Finally, only one neuron will win. Usually, it is mainly used for clustering.
Considering the randomness and intermittent nature of PV power, it will generate significant errors if we make predictions directly through the original data. Therefore, according to the already known weather conditions, cluster analysis for the original data sample in the year 2015 is conducted through the SOM neural network. Its clustering distribution is shown in Figure 4, and the clustering results are shown in Table 4.

VMD Decomposition
In this paper, VMD is used to decompose the original PV power data into finite subseries, then build the prediction model for each subseries, which can reduce the non-stationary characteristic of prediction data. VMD transfers signal decomposition to the framework of variational theory, and the optimal solution of the variational model is obtained by iterative calculation to determine the center frequency and bandwidth of each mode. The sum of all modes is the source signal. Empirical mode decomposition (EMD) is another popular decomposition method. It decomposes signals into characteristic modes. Further, it has the advantage of not using any defined functions as a basis, but instead adaptive generation of intrinsic modal functions based on the analyzed signals.
On sunny days, the data collected from the first 125 days out of 160 days are treated as the training sets, and the data from the last 35 days are regarded as testing sets, which are obtained from the clustering algorithm of Section 5.2. The main parameters need to be set for the VMD program including penalty factor α, discriminant precision e, and mode number K. K has an influence on the decomposition process, and it will be under decomposition when K is too small; otherwise, it will be over decomposition. A simple and effective process is adapted to determine the mode number. A sensitivity analysis is conducted on the K value ranging from 3 to 10 with the parameters in the VMD-FA-KELM algorithm presented in Table 4. The analysis shows that while K is greater than 6, the central frequency of the modes will be similar. Therefore, choosing K = 6 will be the optimal choice. VMD decomposes the PV power series with the decomposition results for three weather conditions, i.e., sunny, cloudy, and rainy, presented in Figure 5. IMF1 to IMF6 represent the subseries. It is worth noting that only six IMFs have been plotted as determined by the algorithm. Table 3 depicts the number of days for other weather conditions including cloudy, rainy, and fog.

Simulation Results under Different Confidence Levels
To validate the proposed approach, the simulation results under different confidence levels are compared in this section. Select sample data of 16 sunny days, 12 cloudy days, and 7 rainy days were obtained from the cluster analysis of Section 5.2 to get the corresponding point prediction value. Based on this, the prediction error is obtained, and the prediction interval model based on the kernel density estimation is then produced. The relevant parameter settings of the proposed model are shown in Table 4 below.
Based on the VMD-FA-KELM algorithm and kernel density estimation, the prediction interval results of sunny, cloudy, and rainy weather for each two-day period at 90% and 70% confidence levels, respectively, are shown in Figures 6 and 7, respectively, where the daily time interval is from 6 a.m. to 6 p.m. Further, its corresponding prediction interval indices are given in terms of the prediction interval coverage probability (PICP) and prediction interval normalized averaged width (PINAW).
As shown in Figures 6 and 7, we can see that the actual PV power almost falls within the 90% confidence level, and a small part is outside the 70% confidence level. This shows that the smaller the confidence level is, the narrower the interval width is. In order to satisfy the corresponding confidence level, it is shown that when the confidence level is high, the average width of the interval gets wider. As such, the probability of real PV power falling into the prediction interval is greater. Moreover, when the confidence level is low, the average width of the interval is narrower. Then, the probability of real PV power falling into the prediction interval is smaller. As the confidence level decreases gradually, the corresponding prediction interval normalized averaged width will decrease, and the Energies 2020, 13, 3592 13 of 21 coverage probability will decrease as well. The width of the sunny prediction interval is relatively narrow, which indicates that the sunny data are more stable, and the prediction interval accuracy is higher than cloudy weather and rainy weather. Further, on the same day, the larger the point predicted error is, the greater the range of error fluctuation is. Moreover, at noon, the error is the largest, and the interval is the widest.

Simulation Results under Different Confidence Levels
To validate the proposed approach, the simulation results under different confidence levels are compared in this section. Select sample data of 16 sunny days, 12 cloudy days, and 7 rainy days were obtained from the cluster analysis of Section 5.2 to get the corresponding point prediction value. Based on this, the prediction error is obtained, and the prediction interval model based on the kernel density estimation is then produced. The relevant parameter settings of the proposed model are shown in Table 5 below.
Based on the VMD-FA-KELM algorithm and kernel density estimation, the prediction interval results of sunny, cloudy, and rainy weather for each two-day period at 90% and 70% confidence levels, The comparisons in Table 5 show that whether it is sunny, cloudy, or rainy, the interval width of the 90% confidence level is wider than that under the 70% confidence level, and the corresponding prediction interval coverage probability is also higher than the 70% confidence level by more than 10%. The width of the prediction interval is minimum in sunny weather, and the prediction interval coverage probability is higher than cloudy and rainy days.  As shown in Figures 6 and 7, we can see that the actual PV power almost falls within the 90% confidence level, and a small part is outside the 70% confidence level. This shows that the smaller the confidence level is, the narrower the interval width is. In order to satisfy the corresponding confidence level, it is shown that when the confidence level is high, the average width of the interval gets wider. As such, the probability of real PV power falling into the prediction interval is greater. Moreover, when the confidence level is low, the average width of the interval is narrower. Then, the probability of real PV power falling into the prediction interval is smaller. As the confidence level decreases gradually, the corresponding prediction interval normalized averaged width will decrease, and the coverage probability will decrease as well. The width of the sunny prediction interval is relatively narrow, which indicates that the sunny data are more stable, and the prediction interval accuracy is higher than cloudy weather and rainy weather. Further, on the same day, the larger the point predicted error is, the greater the range of error fluctuation is. Moreover, at noon, the error is the largest, and the interval is the widest.
The comparisons in Table 6 show that whether it is sunny, cloudy, or rainy, the interval width of the 90% confidence level is wider than that under the 70% confidence level, and the corresponding prediction interval coverage probability is also higher than the 70% confidence level by more than 10%. The width of the prediction interval is minimum in sunny weather, and the prediction interval coverage probability is higher than cloudy and rainy days.  As shown in Figures 6 and 7, we can see that the actual PV power almost falls within the 90% confidence level, and a small part is outside the 70% confidence level. This shows that the smaller the confidence level is, the narrower the interval width is. In order to satisfy the corresponding confidence level, it is shown that when the confidence level is high, the average width of the interval gets wider. As such, the probability of real PV power falling into the prediction interval is greater. Moreover, when the confidence level is low, the average width of the interval is narrower. Then, the probability of real PV power falling into the prediction interval is smaller. As the confidence level decreases gradually, the corresponding prediction interval normalized averaged width will decrease, and the coverage probability will decrease as well. The width of the sunny prediction interval is relatively narrow, which indicates that the sunny data are more stable, and the prediction interval accuracy is higher than cloudy weather and rainy weather. Further, on the same day, the larger the point predicted error is, the greater the range of error fluctuation is. Moreover, at noon, the error is the largest, and the interval is the widest.
The comparisons in Table 6 show that whether it is sunny, cloudy, or rainy, the interval width of the 90% confidence level is wider than that under the 70% confidence level, and the corresponding prediction interval coverage probability is also higher than the 70% confidence level by more than 10%. The width of the prediction interval is minimum in sunny weather, and the prediction interval coverage probability is higher than cloudy and rainy days.

Comparison of the Proposed Model with Other Models
The proposed VMD-FA-KELM model is compared with three other models, namely VMD-PSO-KELM, WPD-FA-KELM, and VMD-KELM. In the PSO algorithm, the number of iterations is 200, the group size is 60, and the values of learning parameters c 1 and c 2 are both 2. The several case studies of point prediction on 11 August 2015 sunny day, 10 May cloudy day, and 14 September rainy weather are shown in Figures 8-10, respectively. Figures 8-10 show that all models have high prediction accuracy in sunny weather, and the PV power forecasting output can fit the actual output better. In cloudy and rainy weather, the prediction results fluctuate violently because the PV power output has more randomness and uncertainty.
As shown in Figure 9, the prediction value obtained by the FA-KELM algorithm is closer to the actual value. This means that FA is more optimal for KELM.       10 show that all models have high prediction accuracy in sunny weather, and the PV power forecasting output can fit the actual output better. In cloudy and rainy weather, the prediction results fluctuate violently because the PV power output has more randomness and uncertainty. As shown in Figure 9, the prediction value obtained by the FA-KELM algorithm is closer to the actual value. This means that FA is more optimal for KELM.
In order to further compare the proposed method with other methods, the normalized root mean square error index NRMSE e and normalized mean absolute error index NMAE e are used to measure the point prediction error in Table 7, while the constructed PI at the 90% confidence level is summarized in Table 8 in terms of the reliability index, prediction interval coverage probability (PICP), sharpness index, and prediction interval normalized averaged width (PINAW). In order to further compare the proposed method with other methods, the normalized root mean square error index e NRMSE and normalized mean absolute error index e NMAE are used to measure the point prediction error in Table 6, while the constructed PI at the 90% confidence level is summarized in Table 7 in terms of the reliability index, prediction interval coverage probability (PICP), sharpness index, and prediction interval normalized averaged width (PINAW). From the indices results of Table 7, we can see that the proposed model has the smallest point prediction error, and the values of both e NRMSE and e NMAE are below 10% in all weather conditions. This indicates that the proposed model gives the best performance in all models. From Table 6, the proposed prediction interval model at the 90% confidence level achieves a PICP value of 97.96% and a PINAW value of 9.98% in sunny weather. The PICP of the prediction interval for the proposed method meets the corresponding confidence level, that is, the confidence level is greater than 90%, and the width of the prediction interval is the narrowest, which indicates that the proposed prediction interval model can construct the prediction interval effectively and more practically.
From the indices results of Table 6, we can see that the proposed model has the smallest point prediction error, and the values of both e NRMSE and e NMAE are below 10% in all weather conditions. This indicates that the proposed model gives the best performance of all the models. From Table 5, the proposed prediction interval model at the 90% confidence level achieves a PICP value of 97.96% and a PINAW value of 9.98% in sunny weather. The PICP of the prediction interval for the proposed method meets the corresponding confidence level, that is, the confidence level is higher than 90%, and the width of the prediction interval is the narrowest, which indicates that the proposed prediction interval model can construct the prediction interval effectively and more practically.
Further, from Tables 6 and 7, the forecast errors of sunny days are smaller than those of cloudy days and rainy days. Based on the same VMD decomposition technique, the prediction error of the single KELM model is larger than that from the PSO-KELM model and FA-KELM model. This means that after optimizing the penalty parameter C and kernel parameter g, a better KELM model can be obtained. It effectively avoids the random selection of the two parameters of KELM. The FA-KELM prediction is better than PSO-KELM: this implies that the FA algorithm has a stronger global searching ability and generalization ability than the PSO algorithm. Based on the same FA-KELM algorithm, the prediction error decomposed by VMD is less than WPD. This shows that the VMD algorithm can effectively overcome the disadvantages of the selection of wavelet bases and the number of decomposition layers in WPD decomposition. Compared with the undecomposed KELM prediction method, the prediction effect of the decomposed ones is better, which indicates that the VMD algorithm can effectively reduce the non-stationarity of the sequence. At the same time, compared with the common BP neural network, the prediction error of KELM is smaller. In summary, the proposed FA algorithm produces an overall optimum. The above analysis coincides with results reported from Figures 6-10, which can provide more accurate decision-making and ensure better security of a power supply.

Potential Application to Power Systems
The statistical analysis and comparative study of a large amount of irradiance data under different weather types show that sunny, cloudy, and rainy days are typical weather types. The irradiance variation of these three types is different and has their distinct characteristics, and the probability of these three weather types is higher and covers the weather states corresponding to most of the dates. Without a loss of generality, it aims to explain the framework in a simplified way by reducing the calculation scale of the interval prediction but to demonstrate the benefit gained from this approach. With the advancement of computers, it can be foreseen that the number of clusters could be increased significantly to get a practical solution within the allowable time constraints. It is possible to compare the probability forecast of normal distribution and the proposed probability prediction of kernel density distribution to get a better prediction interval.
As reported in [55], microgrids consist of smart buildings with solar panels. The solar energy can be trade with each other in a peer-to-peer approach. This potential application aims to optimize the energy consumption of PV energy merged with energy storage systems (ESSs), such as electric vehicles for a microgrid community of multiple buildings.
Commonly, a community may be equipped with PV systems, heat pumps, and multiple sensors, etc. Energy production prediction based on machine learning and short-term weather forecasts can help identifying possible management and optimal usage of various systems (e.g., heating and cooling) to enhance the system operation. For example, neighbors in the Brooklyn Microgrid project produce, consume, and buy power in their community with a transactive energy platform based on blockchain [56]. The platform facilitates distributed energy supply systems that is highly based on renewable-based sources such as solar energy generation for a more resilient, low-carbon, and customer-driven economy to deploy smart cities [57]. Energy harvesting from solar energy will be important to have a good prediction of solar irradiance.

Conclusions
This paper proposed a novel hybrid model for the day-ahead or intra-day-ahead PV power output prediction interval considering the principle of VMD and FA-KELM. The proposed approach shows promising results as compared to existing methods without PV power series decomposition. VMD decomposition has been used for the first time to decompose the PV power series of different weather conditions, which overcomes the disadvantages of the selection of wavelet bases and the number of decomposition layers in WPD decomposition. The decomposing technique is useful in identifying the complexity of the IMFs series. In addition, the hybrid use of VMD and FA-KELM has shown to be an effective method to construct the optimal PIs. In addition to this, to the best knowledge of the authors, it is the first time in applying this integrated approach to solar irradiance prediction. Study results show that the presented hybrid method can give excellent quality of PIs, with significance for practical applications in system operation, planning, and risk assessment. Solar irradiance prediction is a non-linear and non-deterministic problem and the mathematical model will not be obtained easily. As such in this instance, the authors focus on the artificial intelligence approach. With many simulations done, the authors have confidence that they are getting a good solution as compared with other mainstream methods. Further work will be carried out in a sensitivity study to search for a near global optimal solution.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The prediction interval coverage probability (PICP) is calculated based on the number of occurrences that the output value falls inside the PI for a given confidence level α, is given by [46] where a i is a Boolean value, t i represents the predicted target value, and U i and L i are upper and lower bounds of prediction interval, respectively. Moreover, N is the number of the prediction sample. Based on satisfying the confidence level α, the larger the PICP value is, the greater the confidence level is, where the number of actual PV power falling into the prediction interval is larger. The larger the PICP value is, the greater the confidence level is. Then, to account for the fact that, The PI normalized averaged width (PINAW) considers the output value will easily fall inside the wider PI as follows [58]: R depicts the range of the predicted targets. The PINAW value is used to measure the ability of predicted results for describing uncertain information. When PICP is constant, the smaller the value of PINAW is, the better the predicted results are. Normalized mean absolute error (NMAE) and normalized root mean square error (NRMSE) are considered to evaluate the deterministic prediction [59]: where P rated is the rated power of the PV unit,Ŷ i is the point predicted value, and Y i is the actual point value at time i.