A Novel Hybrid Interval Prediction Approach Based on Modiﬁed Lower Upper Bound Estimation in Combination with Multi-Objective Salp Swarm Algorithm for Short-Term Load Forecasting

: Effective and reliable load forecasting is an important basis for power system planning and operation decisions. Its forecasting accuracy directly affects the safety and economy of the operation of the power system. However, attaining the desired point forecasting accuracy has been regarded as a challenge because of the intrinsic complexity and instability of the power load. Considering the difﬁculties of accurate point forecasting, interval prediction is able to tolerate increased uncertainty and provide more information for practical operation decisions. In this study, a novel hybrid system for short-term load forecasting (STLF) is proposed by integrating a data preprocessing module, a multi-objective optimization module, and an interval prediction module. In this system, the training process is performed by maximizing the coverage probability and by minimizing the forecasting interval width at the same time. To verify the performance of the proposed hybrid system, half-hourly load data are set as illustrative cases and two experiments are carried out in four states with four quarters in Australia. The simulation results veriﬁed the superiority of the proposed technique and the effects of the submodules were analyzed by comparing the outcomes with those of benchmark models. Furthermore, it is proved that the proposed hybrid system is valuable in improving power grid management.


Introduction
Load forecasting is of upmost significance and affects the construction and operation of power systems. In the preparation of the power system planning stage, if the load forecasting result is lower than the real demand, the installed and distribution capacities of the planned power system will be insufficient. The power generated will not be able to meet electricity demand of the community it serves, and the entire system will not be able to operate in a stable manner. Conversely, if the load forecast is too high, it will result in power generation, transmission, and distribution, at a larger scale, that cannot be fully used in the real power system. The investment efficiency and the efficiency of the resource utilization will be reduced in this situation. Therefore, effective and reliable power load forecasting can promote a balanced development of the power system while improving the utilization of energy. There are various power load forecasting methods and, commonly, load forecasting is classified into short-term, medium-term, and long-term, based on the application field and forecasting empirical mode decomposition (EMD) [45] is extensively used, which is an adaptive method introduced to analyze non-linear and non-stationary signals. In order to alleviate some reconstruction problems, such as "mode mixing" of EMD, some other versions [46][47][48] are proposed. Particularly, the problem of different number of modes for different realizations of signal and noise need to be considered.
Summing up the above, in this study, a hybrid interval prediction system is proposed to solve the STLF problem based on the modified Lower and Upper bound estimate (LUBE) technique, by incorporating the use of a data preprocessing module, an optimization module, and a prediction module. In order to verify the performance of the proposed model, we choose as the experimental case the power loads of four states in Australia. The elicited results are compared with those from basic benchmark models. In summary, the primary contributions of this study are described below: (1) A modified LUBE technique is proposed based on a recurrent neural network, which is able to consider previous information of former observations in STLF. The contest layer of ENN can store the outputs of a former hidden layer, and then connect the input layer in the current period.
Comparison of the traditional interval predictive model with the basic neural network, this mechanism can improve the performance of time series forecasting methods, such as STLF.
(2) A more convincing optimization technique based on multi-objective optimization is proposed for LUBE. In LUBE, besides CP, PIW should also be considered in the construction of the cost function. In this study, the novel multi-objective optimization method MOSSA is employed in the optimization module to balance the conflict between higher CP and lower PIW, and to train the parameters in ENN. With this method, the structure of neural networks can provide a better performance in interval prediction. (3) A novel and efficient data preprocessing method is introduced to extract the valuable information from raw data. In order to improve the signal noise ratio (SNR) of the input data, an efficient method is used to decompose the raw data into several empirical modal functions (IMFs). According to the entropy theory, the IMFs with little valuable information are ignored. The performance of the proposed model trained with processed data improves significantly. (4) The proposed hybrid system for STLF can provide powerful theoretical and practical support for decision making and management in power grids. This hybrid system is simulated and tested depending on the abundant samples involving different regions and different times, which indicate its practicability and applicability in the practical operations of power grids compared to some basic models.
The rest of this study is organized as follows: The relevant methodology, including data preprocessing, Elman neural network, LUBE, and multi-objective algorithms, are introduced in Section 2. Section 3 discusses our proposed model in detail. The specific simulation, comparisons and analyses of the model performances are shown in Section 4. In order to further understand the features of the proposed model, several points are discussed in Section 5. According to the results of our research, conclusions are outlined in Section 6.

Methodology
In this section, the theory of the hybrid interval prediction model is elaborated, and the methodology of the components in hybrid models, including complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), Elman neural networks, LUBE, and MOSSA, are explained in detail.

Data Preprocessing
The EMD technique [45] usually decomposes a signal into several numbers of IMFs. For each IMF, the series have to fulfill two conditions: (i) the number of extrema (maxima and minima) and the number of zero-crossings must be equal or differ at most by one; and (ii) the local mean, defined as the mean of the upper and lower envelopes, must be zero. In order to alleviate mode mixing, the EEMD [46], defines the "true" modes as the average of the corresponding IMFs obtained from an ensemble of the original signal plus different realizations of finite variance white noise. But incompletion of decomposition still exists, and the number of modes will be different due to the noise added. Taking these short comes into account, CEEMDAN is proposed. The details are described as follows: let E k (·) be the operator which produces the kth mode obtained by EMD and w (i) be a realization of white noise with N (0, 1). And then the process of CEEMDAN can be expressed as several stages: 1st step. For every i = 1, . . . , I decompose each x (i) = x + β 0 w (i) by EMD, until the first mode is extracted and compute d 1 by: 2nd step. At the first stage (k = 1) calculate the first residue as r 1 = x − d 1 . 3rd step.Obtain the first mode of r 1 + β 1 E 1 (w i ) , i = 1, . . . I, by EMD and define the second CEEMDAN mode as: E 1 (r 1 + β 1 E 1 (w (i) )) (2) 4th step. For k = 2, . . . K calculate the kth residue: 5th step. Obtain the first mode of r k + β k E k (w (i) ) , i = 1, . . . , I, by EMD until define the (k + 1)th CEEMDAN mode as: E 1 (r k + β k E k (w (i) )) (4) 6th step. Go to 4th step for the next k.
Iterate the steps 4 to 6 until the obtained residue cannot be further decomposed by EMD, either because it satisfies IMF conditions or because it has less than three local extremums. Observe that, by construction of CEEMDAN, the final residue satisfies: with K being the total number of modes. Therefore, the signal of interest x can be expressed as: which ensures the completeness property of the proposed decomposition and thus providing an exact reconstruction of the original data. The final number of modes is determined only by the data and the stopping criterion. The coefficients β k = ε k std(r k ) allow the selection of the SNR at each stage. The CEEMDAN method can add a limited number of self-use white noises at each stage, which can achieve almost zero reconstruction error with fewer average times. Therefore, CEEMDAN can overcome the "mode-mixing" phenomenon existing in EMD, and can also solve the incompleteness of EEMD decomposition and reduce the computational efficiency by reducing the reconstruction error by increasing the number of integrations.

Elman Neural Network (ENN)
As an important branch of deep learning, recurrent neural networks have been widely used in academic and industrial fields. The common neural network mainly consists of three layers: input layer, hidden layer and output layer. For the hidden layer, the input information only comes from the input layer. For a recurrent neural network, the input information of the hidden layer will not only come from the input layer, but also from the hidden layer itself and the output layer.
In various structures of the recurrent neural network, Elman neural network (ENN) [49] is typical structure in which the lags of hidden layer are delivered into the current hidden layer by a new layer called the context layer. This structure takes the former information of the hidden layer into account and commonly has a better performance in the time-series forecasting such as STLF, wind speed forecasting, financial time-series forecasting. The structure is showed in Figure 1.
The context layer can feed back the hidden layer outputs in the previous time steps and neurons contained in each layer are used to transmit information from one layer to another. The dynamics of the change in hidden state neuron activations in the context layer is as follows: where S k (t) and I j (t) denote the output of the context state and input neurons, respectively; V ik and W ij denote their corresponding weights; and g(·) is a sigmoid transfer function. The other related theories such as feed-forward and back propagation are similar with the common back propagation neural network.

Elman Neural Network (ENN)
As an important branch of deep learning, recurrent neural networks have been widely used in academic and industrial fields. The common neural network mainly consists of three layers: input layer, hidden layer and output layer. For the hidden layer, the input information only comes from the input layer. For a recurrent neural network, the input information of the hidden layer will not only come from the input layer, but also from the hidden layer itself and the output layer.
In various structures of the recurrent neural network, Elman neural network (ENN) [49] is typical structure in which the lags of hidden layer are delivered into the current hidden layer by a new layer called the context layer. This structure takes the former information of the hidden layer into account and commonly has a better performance in the time-series forecasting such as STLF, wind speed forecasting, financial time-series forecasting. The structure is showed in Figure 1.
The context layer can feed back the hidden layer outputs in the previous time steps and neurons contained in each layer are used to transmit information from one layer to another. The dynamics of the change in hidden state neuron activations in the context layer is as follows: where ( )

Lower Bound and Upper Bound Estimation (LUBE)
In the literature, the traditional interval prediction commonly attempts to construct the PI based on the point prediction. The upper bound and the lower bound are calculated according to the forecasting value and the confidence level. The accuracy of the point forecasting has played a key role in the accuracy of the PI. In this paper, we introduce a novel method of interval prediction called lower bound and upper bound estimation (LUBE). This method directly outputs the lower bound and the upper bound of PI depending on the multi-output neural network. The structure we employed in this paper is shown in Figure 1.

Lower Bound and Upper Bound Estimation (LUBE)
In the literature, the traditional interval prediction commonly attempts to construct the PI based on the point prediction. The upper bound and the lower bound are calculated according to the forecasting value and the confidence level. The accuracy of the point forecasting has played a key role in the accuracy of the PI. In this paper, we introduce a novel method of interval prediction called lower bound and upper bound estimation (LUBE). This method directly outputs the lower bound and the upper bound of PI depending on the multi-output neural network. The structure we employed in this paper is shown in Figure 1.
The output of the normal LUBE structure [50] just consist of two neurons which denote the upper bound and the lower bound, while the outputs in our structure of LUBE consist of three neurons. The first output corresponds to the upper bound of the PI, the second output denotes the predicted value, and third output approximates the lower bound of the PI. In the literature, the PI construction techniques attempt to estimate the mean and variance of the targets for construction of PIs. In contrast to existing techniques, the proposed method tries to directly approximate upper and lower bounds of PIs based on the set of inputs. Therefore, in the training process, loss function of this LUBE method based on neural network should be set according to the key criterion of PIs (CP and PIW).

Multi-Objective Optimization Algorithm
The multi-objective optimization algorithm has been widely used to solve multi-objective optimization problem. In this paper, a novel multi-objective optimization algorithm named Multi-Objective Salp Swarm Algorithm (MOSSA) is introduced.

Multi-Objective Optimization Problem
In multi-objective optimization, all of the objectives are optimized simultaneously. The main concern is formulated as follows: where o is the number of objectives, m is the number of inequality constraints, p is the number of equality constraints, and lb i is the lower bound of the ith variable, and ub i is the upper bound of the ith variable. With one objective we can confidently estimate that a solution is better than another depending on comparing the single criterion, while in a multi-objective problem, there is more than one criterion to compare solutions. The main theory to compare two solutions considering multiple objectives is called Pareto optimal dominance as explained in [51]. There are two main approaches for solving multi-objective problems: a priori and a posteriori [52]. In the priori method, the multi-objective problem is transformed to a single-objective problem by aggregating the objectives with a set of weights determined by experts. The main defect of this method is that the Pareto optimal set and the front need to be constructed by re-running the algorithm and changing the weights [53]. However, the a posteriori method keeps the multi-objective formulation in the solving process, and the Pareto optimal set can be determined in a single run. Without any weight to be defined by experts, this approach can approximate any type of Pareto optimal front. Because of the advantages of a posteriori optimization over the a priori approach, the focus of our research is aimed at a posteriori multi-objective optimization.

Multi-Objective Salp Swarm Algorithm (MOSSA)
As an a posteriori multi-objective optimization, MOSSA [54] is similar to some swarm multi-objective optimization algorithm such as MOPSO [31], MOACO [33] and MOGO [35]. By simulating the biological behavior of ecological communities, the optimal solution is achieved.
Salps belong to the family of Salpidae and have transparent barrel-shaped body. Their tissues are highly similar to jellyfishes. They also move very similar to jellyfish, in which the water is pumped through body as propulsion to move forward. In deep oceans, salps often form a swarm called a salp chain. The main concern about salps in MOSSA is their swarming behavior.
To mathematically model the salp chains, the population is first divided to two groups: leader and followers. The leader is the salp at the front of the chain, whereas the rest of salps are considered as followers. As the name of these salps implies, the leader guides swarm and the followers follow each other.
Similar to other swarm-based techniques, the position of salps is defined in an n-dimensional search space where n is the number of variables of a given problem. Therefore, the positions of all salps are stored in a two-dimensional matrix called x. It is also assumed that there is a food source called F in the search space as the swarm's target.

Definition 1.
To update the position of the leader the following equation is proposed: where x 1 j shows the position of the first salp (leader) in the jth dimension, F j is the position of the food source in the jth dimension, ub j indicates the upper bound of jth dimension, lb j indicates the lower bound of jth dimension, c 1 , c 2 , and c 3 are random numbers. Equation (12) shows that the leader only updates its position with respect to the food source.

Definition 2.
The coefficient c 1 is the most important parameter in the Salp swarm algorithm (SSA) because it balances exploration and exploitation is defined as follows: where l is the current iteration and L is the maximum number of iterations.
The parameter c 2 and c 3 are random numbers uniformly generated in the interval of [0, 1]. In fact, they dictate if the next position in jth dimension should be towards positive infinity or negative infinity as well as the step size.

Definition 3.
To update the position of the followers, the following equations is utilized depending on Newton's law of motion: where i ≥ 2, x i j shows the position of ith follower salp in jth dimension, t is time, v 0 is the initial speed, Because the time in optimization is iteration, the discrepancy between iterations is equal to 1, and considering v 0 = 0, this equation can be expressed as follows: where i ≥ 2 and x i j(t) show the position of ith follower salp in jth dimension at t-th iteration. According to the mathematical emulation explained above, the swarm behavior of salp chains can be simulated vividly.
When dealing with multi-objective problems, there are two issues that need to be adjusted for SSA. First, MOSSA need to store multiple solutions as the best solutions for a multi-objective problem. Second, in each iteration, SSA updates the food source with the best solution, but in the multi-objective problem, single best solutions does not exist.
In MOSSA, the first issue is settled by equipping the SSA algorithm with a repository of food sources. The repository can store a limited number of non-dominated solutions. In the process of optimization, each salp is compared with all the residents in repository using the Pareto dominance operators. If a salp dominates only one solution in the repository, it will be swapped. If a salp dominates a set of solutions in the repository, they all should be removed from the repository and the salp should be added in the repository. If at least one of the repository residents dominates a salp in the new population, it should be discarded straight away. If a salp is non-dominated in comparison with all repository residents, it has to be added to the archive. If the repository becomes full, we need to remove one of the similar non-dominated solutions in the repository. For the second issue, an appropriate way is to select it from a set of non-dominated solutions with the least crowded neighborhood. This can be done using the same ranking process and roulette wheel selection employed in the repository maintenance operator. The pseudo code of MOSSA is showed in Algorithm 1: In MOSSA, the first issue is settled by equipping the SSA algorithm with a repository of food sources. The repository can store a limited number of non-dominated solutions. In the process of optimization, each salp is compared with all the residents in repository using the Pareto dominance operators. If a salp dominates only one solution in the repository, it will be swapped. If a salp dominates a set of solutions in the repository, they all should be removed from the repository and the salp should be added in the repository. If at least one of the repository residents dominates a salp in the new population, it should be discarded straight away. If a salp is non-dominated in comparison with all repository residents, it has to be added to the archive. If the repository becomes full, we need to remove one of the similar non-dominated solutions in the repository. For the second issue, an appropriate way is to select it from a set of non-dominated solutions with the least crowded neighborhood. This can be done using the same ranking process and roulette wheel selection employed in the repository maintenance operator. The pseudo code of MOSSA is showed in Algorithm 1: 1 Set the hyper-parameter: 2 Max_iter: Maximum of iteration 3 ArchiveMaxSize: Max capacity of archive (repository) 4 Dim: The number of parameters on each salp 5 Ub and lb: The upper bound and the lower bound of salp population 6 Obj_no: The objective number to be estimated 7 Initialize the salp population ( 1, 2,..., ) i x i n = depending on the ub and lb 8 Define the objective function (loss function): @ Ob_func 9 While (end criterion is not met) 10 Calculate the fitness of each search agent (salp) with Ob_func 11 Determine the non-dominated salps 12 Update the repository considering the obtained non-dominated salps 13 If (the repository become full) 14 Call the repository maintenance procedure to remove one repository resident 15 Add the non-dominated salp to the repository 16 End If 17 Choose a source of food from repository: F = SelectFood (repository) 18 Update c1 by 20 Update the position of the leading salp by: Else: 23 Update the position of the leading salp by:

End If 25
End For 26 Amend the salps based on the upper and lower bound of variables 27 End While 28 Return repository

Proposed Interval Prediction Model for Short-Term Load Forecasting (STLF)
In this paper, we proposed a hybrid model for interval prediction based on the data preprocessing, multi-objective optimization algorithm and LUBE to solve the problem of STLF. This hybrid model consist of two stages: data de-noising and model prediction.
In the first stage, the main task is to refine the original data. The raw power load data is affected by many internal and external factors in the collection process. Therefore, a lot of unrelated information is integrated in the data. Several pieces of information will further affect the quality of the power load data, and increase the difficulty of accurate forecasting of the power load. In the neural network model, the performance of the model is directly affected by the quality of the data. As a type of machine learning algorithm, the neural network uses its multilayered structure to learn the relevant interdependencies of the data and determine the structural parameters of the prediction model, so as to achieve fitting and forecasting. However, if the input set of the model contains too much noise and "false information", the model will be seriously affected in the training process, and some problems will emerge, such as the overfitting problem. Therefore, we introduced CEEMDAN to eliminate useless information in the raw data. As mentioned above, CEEMDAN can decompose the data series into several IMFs with different frequencies, as shown in Figure 2. Because the IMFs are extracted with envelope curves depending on the extremum, some of the IMFs have higher frequencies, just as the first few IMFs that are shown in Figure 2. In addition, the other IMFs also have lower frequencies and represent the trend factors, thereby formulating the vital basis for time-series prediction. In the actual operations, we can remove the IMFs with higher frequencies, which effectively represent noise to refine the original data. In order to determine which IMFs ought to be abandoned, we calculated the entropy of each IMF and removed the IMFs with lower entropy. After the denoising process, the refined data are transferred to next stage as the input data for training in the predictive model.
In the second stage, the main interval prediction model was proposed. In our hybrid interval prediction model, the PI is output dependent on LUBE, which is based on the multi-output of the Elman neural network (E-LUBE). In the training process, the input set of E-LUBE is constructed as indicated in Formula (16), while the output set is constructed as indicated in Formula (17), where m and s respectively denote the number of features and the numbers of samples, and α denotes the interval width coefficient. In the case of the STLF problem, m indicates the number of previous time-points that we use to forecast the predictive value.
x m+s+1 According to a trained model, when a new series X i , i = 1, . . . , m, is input into the model, X m+1 with an upper bound X U m+1 and a lower bound X L m+1 will be output. This is the basic mechanism of interval prediction for STLF in this study. However, in traditional multi-output neural networks, the loss function is always the mean-square-error (MSE), which is a key criterion for point forecasting. In this study, we introduced two new criteria (PIW and CP) to construct the loss function, considering the main purpose of our interval prediction. The traditional neural network parameters were determined by using a gradient descent algorithm, but for two of the set criteria, the calculation of the gradient was difficult. Therefore, we employed MOSSA to realize the multi-objective parameter optimization. Furthermore, the optimization problem can be expressed as, where θ is a set of parameters in E-LUBE, including the weight and bias. When the parameters are determined in the training process, the entire model can be applied to the test set to verify the performance of interval prediction.  (18) where θ is a set of parameters in E-LUBE, including the weight and bias.
When the parameters are determined in the training process, the entire model can be applied to the test set to verify the performance of interval prediction.

Simulations and Analyses
In order to validate the performance of the proposed hybrid model in STLF, four electrical load datasets collected from four states in Australia are used in our research. The four states include New South Wales (NSW), Tasmania (TAX), Queensland (QLD) and Victoria (VIC), and the specific location is showed in Figure 3. The experiments in this study consist of two parts: experiment I and experiment II. For experiment I, the load data of four states are modeled with interval width coefficient 0.05 α = , and for the experiment II, the interval width coefficient α is set as 0.025 for further analysis. In order to verify the superiority of the proposed hybrid model, several benchmark models which include basic LUBE (LUBE), LUBE with Elman neural network (E-LUBE), E-LUBE with point optimization (PO-E-LUBE), E-LUBE with interval optimization (IO-E-LUBE), and models integrated with CEEMDAN, are exhibited. For persuasive comparability and fairness, the hyper-parameters in each model are consistent, as shown in Table 1. All experiments have been carried out in MATLAB 2016a on a PC with the configuration of Windows 7 64-bit, Inter Core i5-4590 CPU @ 3.30GHz, 8GB RAM.

Simulations and Analyses
In order to validate the performance of the proposed hybrid model in STLF, four electrical load datasets collected from four states in Australia are used in our research. The four states include New South Wales (NSW), Tasmania (TAX), Queensland (QLD) and Victoria (VIC), and the specific location is showed in Figure 3. The experiments in this study consist of two parts: experiment I and experiment II. For experiment I, the load data of four states are modeled with interval width coefficient α = 0.05, and for the experiment II, the interval width coefficient α is set as 0.025 for further analysis. In order to verify the superiority of the proposed hybrid model, several benchmark models which include basic LUBE (LUBE), LUBE with Elman neural network (E-LUBE), E-LUBE with point optimization (PO-E-LUBE), E-LUBE with interval optimization (IO-E-LUBE), and models integrated with CEEMDAN, are exhibited. For persuasive comparability and fairness, the hyper-parameters in each model are consistent, as shown in Table 1

Data Descriptions
For each state, we considered the data using half an hour interval in four quarters. The data used in this paper can be obtained on the website of Australian energy market operator (http://www. aemo.com.au/). We chose data from the whole of 2017 from 1 January 2017 0:30 am to 31 December 2017 0:00 am to construct dataset. In each state, the total sample number is 17,520. For each quarter, the number of samples were 4320, 4358, 4416, 4416 respectively. In order to control the comparability, we selected 1200 samples to test the trained model, and used the rest in each quarter to train the models. The proportion of train sets versus the test sets was approximately equal to 3:1. The description of the data characteristics are shown in Figure 4. Considering the structure of the neural network in this study, we set six input neurons, 13 hidden neurons, and three output neurons. Specifically, the output set was formulated in accordance with Formula (17).
During data preprocessing, the input data were divided into several IMFs depending on CEEMDAN, as displayed in Figure 2. According to the energy entropy of each IMF shown in Figure 3, we ignored the IMFs which contained high frequencies, and summed the rest of the IMFs to reconstruct the input set, as shown in Figure 1.

Data Descriptions
For each state, we considered the data using half an hour interval in four quarters. The data used in this paper can be obtained on the website of Australian energy market operator (http://www.aemo.com.au/). We chose data from the whole of 2017 from 1 January 2017 0:30 am to 31 December 2017 0:00 am to construct dataset. In each state, the total sample number is 17,520. For each quarter, the number of samples were 4320, 4358, 4416, 4416 respectively. In order to control the comparability, we selected 1200 samples to test the trained model, and used the rest in each quarter to train the models. The proportion of train sets versus the test sets was approximately equal to 3:1. The description of the data characteristics are shown in Figure 4. Considering the structure of the neural network in this study, we set six input neurons, 13 hidden neurons, and three output neurons. Specifically, the output set was formulated in accordance with Formula (17).
During data preprocessing, the input data were divided into several IMFs depending on CEEMDAN, as displayed in Figure 2. According to the energy entropy of each IMF shown in Figure 3, we ignored the IMFs which contained high frequencies, and summed the rest of the IMFs to reconstruct the input set, as shown in Figure 1.

Performance Metrics
In order to comprehensively assess the performance of the models, some metrics were employed. These metrics primarily focused on the coverage of the real value in the prediction interval and the width of the interval.

Coverage Probability
Coverage probability [50] is usually considered as a basic feature of PIs and CP is calculated according to the ratio of the number of target values covered by PIs:

Performance Metrics
In order to comprehensively assess the performance of the models, some metrics were employed. These metrics primarily focused on the coverage of the real value in the prediction interval and the width of the interval.

Coverage Probability
Coverage probability [50] is usually considered as a basic feature of PIs and CP is calculated according to the ratio of the number of target values covered by PIs: where m denotes the number of samples, and θ i is a binary index which measures whether the target value is covered by PIs: where y t i denote the ith target value andL i ,Û i represent the ith lower bound and the upper bound, respectively.
A larger CP means more targets are covered by the constructed PIs and a too small CP indicates the unsatisfied coverage behaviors. To have valid PIs, CP should be larger or at least equal to the nominal confidence level of PIs. Furthermore, in this paper, CP is also an important factor in the process of parameter optimization by the multi-objective optimization algorithm.

Prediction Interval (PI) Normalized Average width and PI Normalized Root-Mean-Square Width
In research studies on interval prediction, more attention is usually paid to CP. However, if the lower and upper bounds of the PIs are expanded from either side, any requirement for a larger CP can be satisfied, even for 100%. However, in some cases, a narrower interval width is necessary for a more precise support for electric power supply. Therefore, the width between the lower and upper bounds should be controlled so that the PIs are more convincing. In this study, the prediction interval width (PIW) is another factor in the process of parameter optimization. With CP and PIW, two objects compose the solution space within which the Pareto solution set is estimated.
In order to eliminate the impact of dimension, some relative indexes should be introduced to improve the comparability of width indicators. Inspired by the mean absolute percentage error (MAPE) in point forecasting, we employed PI normalized average width (PINAW) and PI normalized root-mean-square width (PINRW) [50]: where R equals to the maximum minus minimum of the target values. Normalization by the range R is able to improve comparability of PIs constructed using different methods and for different types of datasets.

Accumulated Width Deviation (AWD)
Accumulated width deviation (AWD) is a criterion that measure the relative deviation degree, and it can be obtained by the cumulative sum of AWD i [55]. The calculation formula of AWD is expressed as Equations (23) and (24), where α denotes the interval width coefficient and I i represents the i-th prediction interval.

Experiment I: Cases with Larger Width Coefficients
In this experiment, we set the interval width coefficient α = 0.05, which is equivalent to setting the output to [0.95 × X, X, 1.05 × X] for a single sample in the training process of the neural network. Based on this structure, the PIs can be output given an input test set. In order to guarantee the diversity of the samples, we studied four different quarterly data for four different states.
The models involved in our research can be divided into three groups for better explanations for the impact of different components. The first group included LUBE and E-LUBE, and the difference between them were the structures of the neural network. The structure of LUBE consisted of three layers which were similar to the traditional BP neural network. Moreover, in the E-LUBE, an extra context layer was added to the structure so that we could validate the impact of the context layer in prediction by comparing the performance of these two models. The second group included the PO-E-LUBE and IO-E-LUBE, and the difference between them included the optimization algorithm in the training process. PO-E-LUBE used the error and variance of point prediction to construct the cost function in MOSSA, whereby the target of minimizing the cost function effectively denotes a requirement for better prediction accuracy. In addition, IO-E-LUBE employed the CP and PIW of the interval prediction to construct the cost function in multi-objective optimization, while the target of minimizing such a cost function denoted the requirements for a better performance in interval coverage, which is more rational for our goal of interval prediction. The comparison between such models can reflect the influence of different cost functions in the parameter optimization process. Furthermore, in the first group, the parameters of the neural network are determined by a conventional gradient descent algorithm, and in the second group, the parameters are determined by a heuristic optimization algorithm. Therefore, the impact of different optimization algorithms can be shown by comparing the models in different groups. In addition, in the third group, the data preprocessing is introduced. Based on the models in the first two groups, CEEMDAN was used to refine the input dataset. The results of the models in this group will display the effect of data preprocessing in the hybrid model.
The simulation results are shown in Tables 2 and 3. Also shown in Figure 5 are the principal indices of interval prediction, namely, CP and PINAW. Based on the conducted comparisons referred to earlier, several conclusions can be inferred: (1) By comparing the models in the first group, we can conclude that the E-LUBE is superior to LUBE in most cases, such as the fourth quarter in NSW and the first quarter in TAX, as shown in Table 2 and Figure 5. The CP of E-LUBE reached 87.17%, while the CP of LUBE was 72.36% for the fourth quarter in NSW. The rate of improvement was more than 15% with the maintenance of PINAW and PINRW. However, in some cases, the improvement is not remarkable, such as the fourth quarter in QLD, as shown in Table 3 and Figure 5. The performances of these two models are almost the same. In general, the performance of E-LUBE is better than LUBE, which means that E-LUBE with an extra context layer can improve the performance. In theory, the context layers are able to provide more information compared to previous outputs of hidden layers. This superiority has been proved in our experiments. However, owing to the instability of the parameters in the neural network, the improvement is not adequately remarkable in a few cases. (2) In terms of the optimization methods, and according to the results shown in Figure 5, and Tables 2  and 3, the CPs of the second group (PO-E-LUBE and IO-E-LUBE) perform better than E-LUBE in most cases. E-LUBE uses the gradient descent algorithm, which is sensitive to the initialization, in order to obtain the parameters in NN. Furthermore, the models in the second group use the heuristic swarm optimization algorithm which can synthesize the initialization results using an adequate population size. Thus, the models in the second groups should have elicited better performances in theory unless the random initializations of E-LUBE are perfect. Moreover, within the second group, IO-E-LUBE has a larger CP value than PO-E-LUBE, with low levels of PINAW and PINRW. It is just the influence of the cost function that makes such a difference. The main object of the interval prediction is a larger CP value along with a narrow width. Therefore, the IO should have an advantage (3) Incorporation of CEEMDAN in the hybrid models is improved the performances significantly because of the denoising preprocessing. In most cases, the CPs are larger than 80% and 90%, which means more than 80% target load values are covered by the predicted intervals. Furthermore, in some cases, the CPs can reach 100%, such as the second and third quarters in NSW, and the second quarter in QLD. Such accuracy can ensure that the power supply meets the demand. Compared with the original LUBE and E-LUBE, the hybrid model we proposed (CEEMDAN-IO-E-LUBE) elicited a significant improvement in the elicited results of interval prediction. (4) With a larger width coefficient, the CPs of our models were almost satisfactory. The smallest CP was more than 70%, and the largest CP was able to reach 100%, which is perfect for interval prediction in STLF. However, the PINAW and PINRW were almost all larger than 10, and even reached the value of 20 in second quarter in QLD. But the proposed model still outperforms other models. (5) Considering the accumulated width deviation (AWD), for a larger width coefficient, the proposed model (CEEMDAN-IO-E-LUBE) has a smaller AWD compared with other benchmark models in most cases. According to the definition of AWD, a smaller AWD means more target values fall into the predicted intervals. For the results in which the target values are over the bounds, the deviations are relatively small. In this experiment, the AWDs of the proposed model are satisfactory in most case. For some cases, the AWDs is even closed to 0, which means almost all target load values fall into the predicted intervals. According to these predicted intervals, load dispatch will be more rational.
Energies 2018, 11, x 16 of 30 makes such a difference. The main object of the interval prediction is a larger CP value along with a narrow width. Therefore, the IO should have an advantage (3) Incorporation of CEEMDAN in the hybrid models is improved the performances significantly because of the denoising preprocessing. In most cases, the CPs are larger than 80% and 90%, which means more than 80% target load values are covered by the predicted intervals. Furthermore, in some cases, the CPs can reach 100%, such as the second and third quarters in NSW, and the second quarter in QLD. Such accuracy can ensure that the power supply meets the demand. Compared with the original LUBE and E-LUBE, the hybrid model we proposed (CEEMDAN-IO-E-LUBE) elicited a significant improvement in the elicited results of interval prediction. (4) With a larger width coefficient, the CPs of our models were almost satisfactory. The smallest CP was more than 70%, and the largest CP was able to reach 100%, which is perfect for interval prediction in STLF. However, the PINAW and PINRW were almost all larger than 10, and even reached the value of 20 in second quarter in QLD. But the proposed model still outperforms other models. (5) Considering the accumulated width deviation (AWD), for a larger width coefficient, the proposed model (CEEMDAN-IO-E-LUBE) has a smaller AWD compared with other benchmark models in most cases. According to the definition of AWD, a smaller AWD means more target values fall into the predicted intervals. For the results in which the target values are over the bounds, the deviations are relatively small. In this experiment, the AWDs of the proposed model are satisfactory in most case. For some cases, the AWDs is even closed to 0, which means almost all target load values fall into the predicted intervals. According to these predicted intervals, load dispatch will be more rational.

Experiment II: Cases with Smaller Width Coefficients
In this experiment, we set the interval width coefficient α = 0.025, which means we set the output to be [0.925 × X, X, 1.025 × X] for a single sample in the training process of the neural network. With a narrow width coefficient, the lower and upper bounds were closer to the target value in the training process, which can provide more valuable information in practice. However, a narrow bound might lead to the increase of CP. Thus, a smaller width coefficient requires the models to have better predictive properties. The results of this simulation are shown in Tables 4 and 5, and in Figure 6. Correspondingly, the following conclusions can be drawn: (1) As Table 4 and Figure 6 show, the distinction of the models is similar to experiment I. The CPs of the original LUBE and E-LUBE are the smallest among the models in our simulation, and our proposed model CEEMDAN-IO-E-LUBE elicits the best performance (2) For some benchmark models in this experiment, with a narrow bound in the training process, the performance was not adequately satisfactory. As the cases of the third quarter in NSW denote and the second quarter in TAX show the CPs of LUBE and E-LUBE are close to 50%, which is not conclusive in practice. However, based on the hybrid mechanism we proposed, the performances were improved significantly. The minimum CP values of CEEMDAN-IO-E-LUBE can reach 70%, and the maximum is close to 100%, such as in the third quarter in QLD. Such results show that the predicted intervals can better cover actual electricity demand data and economize spinning reserve in power grid. (3) With a smaller width coefficient, the CPs decreased while the PINAW and PINRW are reduced.
For the benchmark models, the results mostly display smaller CPs and larger PINAW or PINRW. However, the proposed model is able to demonstrate larger CPs with smaller PINAW and PINRW values, which is equivalent to a good performance in interval prediction. In some cases, the CP values were larger than 95% with PINAW and PINRW values less than 10. In such cases, the CPs are satisfactory and the widths of the PIs are most appropriate. (4) In terms of AWD in this experiment, the proposed model still showed a relatively small AWD compared with other benchmark models, which means the proposed model has a better performance at predicted accuracy. Compared with experiment I, the AWDs in this experiment are bigger. For a smaller width coefficient, the predicted interval will be narrower, which means there will be more target points falling outside the intervals. In some situations, a narrower predicted interval is necessary. The proposed model is able to provide a better performance on the condition of the requirement of a narrower predicted interval of electric load.

Comparisons and Analyses
According to the comparison of the above two experimental results, the width coefficient has a significant influence on performance, as shown in Figure 7. From one perspective, for most models, a coefficient with a larger width may lead to a larger and more satisfactory CP value, but the index about the width of PI may not be desired. From another perspective, for most models, a narrower width coefficient may elicit the desired PINAW and PINRW values, but the CP is not good enough. Considering such a situation, the proposed models alleviate the contradiction. Even though the CP value of the proposed model will decline when the width coefficient decreases, comprehensive performance is satisfactory. In some exceptional cases, owing to the complexity and instability of the datasets, the performance of the proposed models is not adequate, as the description in Figure 3 shows.

Discussion
In this section, we discuss some factors which may have an effect on the performances of the proposed models in order to improve the practicability of our hybrid model. The factors involved mainly include the features of the datasets and the setting of the hyperparameters in the algorithm.

Dataset Features
The feature and quality of the datasets have a significant effect on the performance of the prediction models. In STLF, the data shows periodicity and volatility. The periodicity is attributed to the regularity in the actual use of electricity, and the volatility is attributed to the randomness and occasional use of electricity. Therefore, the linear component and the non-linear components operate simultaneously during the forecasting of the model. Specifically, some outliers may have a negative effect in the process of prediction.
As Figure 4 shows, the dataset features of the different samples are various. According to the boxplot theory, the data points that are larger than Q3 + 1.5IQR or smaller than Q1 − 1.5IQR are regarded as outliers. For the first and fourth quarters in NSW, and the first and fourth quarters in VIC, the distributions of the datasets displayed a number of outliers. Additionally, the results of the models shown in Tables 2-5 demonstrate that the model performance of the sample whose distribution is not desired may be unremarkable. These outliers are important factors that lead to such results, even if the CEEMDAN model has been applied in data preprocessing.
Another set of data features that may cause an unsatisfactory result are the non-linear characteristics of the dataset. It is well known that in traditional research, the prediction of regular and linear time series are easy to reach the desired accuracy. However, unstable and non-linear time series are more difficult to forecast in spite of the applications of novel models, such as the case of machine learning algorithms. A method used to measure the instability of data series is the recurrence plot (RP) [56]. A recurrence plot is an advanced technique of non-linear data analyses. It is the visualization (or a graph) of a square matrix in which the matrix elements correspond to those times at which the state of the dynamical system recurs. Stationary systems will deliver homogeneous recurrence plots, and unstable systems cause changes in the distribution of recurrence points in the plot, which is visible and identifiable by the brightened areas. In this study, we selected VIC as an example to verify the influence of instability. Before drawing the recurrence plot, the time delay and the dimension of the embedded matrix were determined by the C-C method. Depending on the "CRP Toolbox" released by Norbert Marwan [57], the recurrence plot of the four datasets of the different quarters in VIC is shown in Figure 8.

Discussion
In this section, we discuss some factors which may have an effect on the performances of the proposed models in order to improve the practicability of our hybrid model. The factors involved mainly include the features of the datasets and the setting of the hyperparameters in the algorithm.

Dataset Features
The feature and quality of the datasets have a significant effect on the performance of the prediction models. In STLF, the data shows periodicity and volatility. The periodicity is attributed to the regularity in the actual use of electricity, and the volatility is attributed to the randomness and occasional use of electricity. Therefore, the linear component and the non-linear components operate simultaneously during the forecasting of the model. Specifically, some outliers may have a negative effect in the process of prediction.
As Figure 4 shows, the dataset features of the different samples are various. According to the boxplot theory, the data points that are larger than Q3 + 1.5IQR or smaller than Q1 − 1.5IQR are regarded as outliers. For the first and fourth quarters in NSW, and the first and fourth quarters in VIC, the distributions of the datasets displayed a number of outliers. Additionally, the results of the models shown in Tables 2-5 demonstrate that the model performance of the sample whose distribution is not desired may be unremarkable. These outliers are important factors that lead to such results, even if the CEEMDAN model has been applied in data preprocessing.
Another set of data features that may cause an unsatisfactory result are the non-linear characteristics of the dataset. It is well known that in traditional research, the prediction of regular and linear time series are easy to reach the desired accuracy. However, unstable and non-linear time series are more difficult to forecast in spite of the applications of novel models, such as the case of machine learning algorithms. A method used to measure the instability of data series is the recurrence plot (RP) [56]. A recurrence plot is an advanced technique of non-linear data analyses. It is the visualization (or a graph) of a square matrix in which the matrix elements correspond to those times at which the state of the dynamical system recurs. Stationary systems will deliver homogeneous recurrence plots, and unstable systems cause changes in the distribution of recurrence points in the plot, which is visible and identifiable by the brightened areas. In this study, we selected VIC as an example to verify the influence of instability. Before drawing the recurrence plot, the time delay and the dimension of the embedded matrix were determined by the C-C method. Depending on the "CRP Toolbox" released by Norbert Marwan [57], the recurrence plot of the four datasets of the different quarters in VIC is shown in Figure 8.  As the figure shows, the second and third quarters in VIC display relatively homogeneous distributions, while other quarters display isolated brightened areas. According to the theory of the recurrence plot, the instabilities of the former two samples are weaker, and the other two samples reveal stronger instabilities. Furthermore, we can conclude that the performances of the forecasting models shown in Table 5 are remarkable when the dataset is relatively stable, while the unstable dataset results in an unsatisfactory performance, which cannot be avoided.

Sensitivity Analysis
The hybrid model proposed in this study is based on the structure of the neural network shown in Figure 1. In the hybrid model, the hyperparameter is a key factor that influences the model's performance. In most studies on machine learning, the setting of the hyperparameters always depends on trials or empirical knowledge. This is the reason why many experimental results cannot be reproduced and why a considerable amount of time and energy is spent on tuning parameters in industrial applications. At present, there is no absolute method to determine the values of all types of hyperparameters. In this study, we also mainly relied on experiences and trials to set the hyperparameters, as shown in Table 1. Among the hyperparameters, several parameters need to be highlighted.
The first one is the number of salp populations in MOSSA. In the swarm heuristic optimization algorithm, the number of swarms is usually a vital factor that needs to be considered. A larger population might provide a larger probability to reach the best individual, but exceeding the desired population may cause an increase in the complexity of the algorithm, which is related to the number of algorithmic iterations. Considering the number of parameters in our proposed model, the population numbers that ranged from 10 to 100 with a step of 10 were evaluated. As a result, we selected the number 50 as the population number (as shown in Table 6) after comprehensively considering the time complexity and model performance.
The second type of hyperparameters that need to be emphasized are the upper and lower bounds of individual parameters in MOSSA. In our simulation, the datasets were normalized within the range of −1 to 1 in order to avoid the influence of dimension and improve the training speed. Therefore, the absolute value of weights and thresholds of neural networks in the training process will not be too large. As Table 6 shows, we set the initial upper and lower bounds to 2 and −2 according to the experiment trials. Excessive range limits may increase the difficulty of searching for the best parameters with a limited number of iterations. Furthermore, the algorithm that operates based on a small range may not elicit the optimal solution. As the figure shows, the second and third quarters in VIC display relatively homogeneous distributions, while other quarters display isolated brightened areas. According to the theory of the recurrence plot, the instabilities of the former two samples are weaker, and the other two samples reveal stronger instabilities. Furthermore, we can conclude that the performances of the forecasting models shown in Table 5 are remarkable when the dataset is relatively stable, while the unstable dataset results in an unsatisfactory performance, which cannot be avoided.

Sensitivity Analysis
The hybrid model proposed in this study is based on the structure of the neural network shown in Figure 1. In the hybrid model, the hyperparameter is a key factor that influences the model's performance. In most studies on machine learning, the setting of the hyperparameters always depends on trials or empirical knowledge. This is the reason why many experimental results cannot be reproduced and why a considerable amount of time and energy is spent on tuning parameters in industrial applications. At present, there is no absolute method to determine the values of all types of hyperparameters. In this study, we also mainly relied on experiences and trials to set the hyperparameters, as shown in Table 1. Among the hyperparameters, several parameters need to be highlighted.
The first one is the number of salp populations in MOSSA. In the swarm heuristic optimization algorithm, the number of swarms is usually a vital factor that needs to be considered. A larger population might provide a larger probability to reach the best individual, but exceeding the desired population may cause an increase in the complexity of the algorithm, which is related to the number of algorithmic iterations. Considering the number of parameters in our proposed model, the population numbers that ranged from 10 to 100 with a step of 10 were evaluated. As a result, we selected the number 50 as the population number (as shown in Table 6) after comprehensively considering the time complexity and model performance.
The second type of hyperparameters that need to be emphasized are the upper and lower bounds of individual parameters in MOSSA. In our simulation, the datasets were normalized within the range of −1 to 1 in order to avoid the influence of dimension and improve the training speed. Therefore, the absolute value of weights and thresholds of neural networks in the training process will not be too large. As Table 6 shows, we set the initial upper and lower bounds to 2 and −2 according to the experiment trials. Excessive range limits may increase the difficulty of searching for the best parameters with a limited number of iterations. Furthermore, the algorithm that operates based on a small range may not elicit the optimal solution.

Metrics
The Number of Salp Populations in MOSSA

Consistency Analysis
In this section, in order to verify the consistency of our proposed model, new datasets involving latest dates are introduced. In addition, several basic compared models including long short-term memory (LSTM) networks, function fitting neural networks (FITNET), and least squares support vector machine (LSSVM) which have been proved to provide good results for STLF are employed to verify the advantages of the proposed model.
We  Table 7, the proposed model also has a good performance on the new datasets. The CP is almost 90%, which means the predicted interval can cover 90% target load value. The consistency of the proposed can be guaranteed, and the change of the dates of dataset will not risk altering the final conclusion.
Considering different basic models for STLF, we chose three widely used artificial intelligence models (LSTM, FITNET, and LSSVM) as comparators to verify the superiority. As shown in Table 7, the proposed models provide a larger CP and smaller PINAW compared with the other three models. In particular, LSTM reveals desired narrower PINAW and PINRW, but the CPs are not satisfactory. Moreover, the proposed model outperformed than other basic models in AWD. Therefore, the proposed approach have a distinct advantage in the performance of short-term power load interval forecasting. It is able to provide a satisfactory CP and restrict the interval width at the same time, which is the most important aspect of superiority of the proposed model. On the other hand, in order to obtain a better performance and accuracy, the proposed approach is more complex. The algorithm with higher complexity often takes longer in practice. As Table 7 shows, compared with LSSVM and FITNET, the execution times of the proposed model are longer, which is the major disadvantage. However, with the development of hardware, the operational capability of computer can be improved, and the execution time can be reduced. Furthermore, as a kind of artificial intelligence technique, the fine-tuning of hyper-parameters in the proposed model will take time, which is a common situation in academic and industrial fields.

Further Research Prospect
This paper proposes a hybrid interval prediction model to predict the power load intervals. Compared with other basic models, this model has achieved good results in terms of coverage, interval width, and deviation error of the prediction interval. The model can obtain relatively high coverage under the condition of relatively narrow interval width, and the interval obtained can accurately reflect the changes of future short-term power load and provide more accurate and reliable support for power dispatch. On the other hand, for datasets with more complex changes and non-linear features, although the performance of proposed model is improved compared with the traditional models, it is still not ideal in some cases. For the unfavorable results caused by the characteristics of datasets, we may explore the following two aspects in future: (a) Finding and improving prediction methods that can better solve the non-linear characteristics of electrical loads, and improving the performance of predictive models in complex situations; (b) Fully analyzing the relevant characteristics in the power load data, selecting different models for different characteristics, and using ensemble learning to integrate and enhance the prediction results.

Conclusions
STLF is the basic work of power system planning and operation. However, the power load has regularity and certain randomness at the same time, which increases the difficulty of desired and reliable STLF. Moreover, compared with the prediction of specific points, interval prediction may provide more information for decision making in STLF. In this study, based on LUBE, we developed a novel hybrid model including data preprocessing, a multi-objective salp algorithm, and E-LUBE. In theory, such a hybrid model can reduce the influence of noise in a dataset and the parameter optimization process is more effective and efficient in E-LUBE.
In our proposed approach, we used a multi-objective optimization algorithm to search for the parameters of the neural network and reconstructed the cost function with double interval criterions instead of point criterions (such as MSE) in the traditional method. As Tables 2-5 show, by comparing it with traditional methods, the proposed approach provides a higher CP and a lower interval width at the same time, which makes up for the lower CP and higher interval width of traditional methods.
In order to verify the performance of the proposed model and validate the impact of the constituent components in a hybrid model, we collected 16 samples involving four states using four quarters in Australia, and set several model comparisons in experiments Furthermore, according to the comparison and analyses results, the conclusions are summarized as follows: (a) an efficient data preprocessing method was applied herein. Depending on the decomposition and reconstruction, this method can significantly improve the model performance in STLF. (b) Compared to the traditional prediction models based on neural networks, the newly developed E-LUBE method has an advantage in terms of comprehensive performance in interval prediction. It can be validated that the context layer with the information of the former hidden layer can improve model performance. (c) The introduction of the novel multi-objective algorithm MOSSA optimized the process of parameter search. The new cost function was based on a double-objective interval index that outperformed the traditional single-objective point error index (such as MSE) in interval prediction. (d) For STLF based on the E-LUBE mechanism, the width coefficient is an important factor. A larger width coefficient may lead to satisfactory CPs, and a smaller width coefficient may result in a satisfactory interval width. Therefore, in practice, the decision maker needs to adjust the width coefficient for specific demands. For example, we chose the width coefficient with a minimum interval width at the same time that the minimum demand of CP was guaranteed. (e) No matter how complex is the dataset, the proposed model always provides the best performance compared to benchmark models. However, because of the complexity of the data itself, some of the performance is not remarkable. In general, the proposed model provided a desired result in most cases.
Furthermore, in a power grid operator the proposed method has a strong practical application significance. A highly accurate forecasting method is one of the most important approaches used in improving power system management, especially in the power market [58]. In actual operation, for secure power grid dispatching, a control center has to make a prediction for the subsequent load. According to historical data, the dataset for the predictive model involved can be constructed. The results of the predictive model are able to provide the upper bound and lower bound of the load at some point in the future. Depending on the upper bound and lower bound, the control center can adjust the quantity of electricity on each charging line. Therefore, such a hybrid approach which can provide more accurate results can ensure the safe operation of the power grid and improve the economic efficiency of power grid operation.
Author Contributions: J.W. carried on the validation and visualization of experiment results; Y.G. carried on programming and writing of the whole manuscript; X.C. provided the overall guide of conceptualization and methodology.
Funding: This research was funded by National Natural Science Foundation of China (Grant number: 71671029) and Gansu science and technology program "Study on the forecasting methods of very short-term wind speeds" (Grant number: 1506RJZA187).
Acknowledgments: This work was supported by the National Natural Science Foundation of China (No. 71671029) and the Gansu science and technology program "Study on the forecasting methods of very short-term wind speeds" (No. 1506RJZA187).

Conflicts of Interest:
The authors declare no conflicts of interest.