An Ensemble Learner-Based Bagging Model Using Past Output Data for Photovoltaic Forecasting

: As the world is aware, the trend of generating energy sources has been changing from conventional fossil fuels to sustainable energy. In order to reduce greenhouse gas emissions, the ratio of renewable energy sources should be increased, and solar and wind power, typically, are driving this energy change. However, renewable energy sources highly depend on weather conditions and have intermittent generation characteristics, thus embedding uncertainty and variability. As a result, it can cause variability and uncertainty in the power system, and accurate prediction of renewable energy output is essential to address this. To solve this issue, much research has studied prediction models, and machine learning is one of the typical methods. In this paper, we used a bagging model to predict solar energy output. Bagging generally uses a decision tree as a base learner. However, to improve forecasting accuracy, we proposed a bagging model using an ensemble model as a base learner and adding past output data as new features. We set base learners as ensemble models, such as random forest, XGBoost, and LightGBMs. Also, we used past output data as new features. Results showed that the ensemble learner-based bagging model using past data features performed more accurately than the bagging model using a single model learner with default features.


Introduction
The 196 countries that signed the Paris Agreement in 2015 agreed to make efforts to reduce their artificial greenhouse gas emissions to zero in the second half of the 21st century. This agreement highlighted the need to generate energy through renewable resources and was motivated by research on how to manage and integrate variable power generation systems, such as solar and wind power, into the grid [1]. Focusing on solar energy, the proportion of solar energy in the power system has been growing with the large drop in photovoltaic (PV) prices [2]. Photovoltaic power is now one of the fastest growing renewable energy technologies, and is ready to play an important role in the future global electricity generation mix. According to International Energy Agency(IEA)'s Renewable 2018, solar power plants accounted for more than two-thirds of the world's net electricity capacity growth in 2017. The world's total renewable-based power capacity is expected to grow 50 percent between 2019 and 2024, with solar power accounting for 60 percent of the rise [3,4].
The high penetration of PV in the power system provides many economic benefits, but solar energy with the characteristics of variability and intermittency can bring challenge to the safe operation and reliability of the power system. Power system operators must ensure an accurate balance between electricity production and consumption at any time, and effective forecasting techniques have become important to prepare for the grid integration of renewable energy sources with this instability [5]. The solar power forecasting has following advantages: (1) the effective operation of the power grid [6], (2) the optimal management of the energy fluxes occurring into the solar system, (3) estimating the reserves, (4) scheduling the power system, (5) congestion management, (6) the optimal management of the storage with the stochastic production, (7) trading the produced power in the electricity market, and (8) reduction of the costs of solar power generation. Accurate predictions can not only contribute to the reduction of uncertainty in power generation forecasts, but also add to the stable operation of the system. In addition, it allows photovoltaic plant operators to avoid penalties that may arise from differences between forecasted and produced energy, and benefits from cost saving for energy consumers [7][8][9][10].
The forecast of solar power can be performed by several methods and machine learning is the typical method that we focused on in this study. Many algorithms on photovoltaic power forecasting have been proposed and remarkable results have been achieved by experts and scholars. The following is the state-of-the-art of photovoltaic prediction based on machine learning methods. Artificial Neural Networks (ANNs) are useful in data analysis and prediction, and increasingly used for nonlinear regression and classification problems [11]. Markov chains are stochastic processes having the Markov property. In a Markov process, the present state fully captures all the information that could affect the future evolution of the process [12]. K-Nearest Neighbor (k-NN) is one of the simplest machine learning algorithms based on pattern recognition. It compares the current state with training sets in a feature space [13]. Support Vector Machine (SVM) stands out for its ability to deal with nonlinear problems. SVM has three main parameters that highly affect the performance of the technique and these regulate the kernel function, used to transform the predictors. SVM has shown great potential in many studies [14]. Recently, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machines (LightGBMs), and deep neural networks have been used in many studies and these have proved useful in predicting time series [15][16][17]. Light GBM and XGBoost have recently gained attention in the Kaggle platform as having good performance. However, both XGBoost and LightGBM tend to overfit at times, as they are both based on the decision trees. To improve forecasting accuracy and reduce overfitting at the same time, we proposed an ensemble learner-based bagging model.

Machine Learning
Machine learning is a subfield of computer science and is classified as an artificial intelligence. It has an advantage that a model can solve problems that are impossible to be represented by explicit algorithms. [18] Machine learning models the relationships between inputs and outputs, even though the representation is impossible. There are three main ways of learning methods: supervised learning, unsupervised learning, and ensemble learning. In supervised learning, the computer is given inputs and outputs, and the goal is to learn a general rule that maps inputs and outputs [19]. In contrary with supervised learning, an unsupervised learning model does not need outputs. It is able to find hidden structure in its inputs [20]. The basic concept of ensemble learning is to train multiple base learners as ensemble members, and to combine their predictions into a single output. In general, the base learner of the ensemble model is a decision tree and the ensemble model is known to have better results than the method of using a single model. In this paper, we propose an ensemble model bagging, and as a base learner we used an ensemble model, not a decision tree. Thus, we call it an ensemble of ensemble.

Decision Tree
A decision tree is a representative base learner used in most ensemble models of machine learning. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. At each node in the tree, the tree splits into branches based on its condition. Depending on the outcome of the condition, either the left or the right sub-branch of the tree is selected. Eventually, a leaf node is reached where the branch that does not split any further is the decision. [21][22][23]. A decision tree has some advantages, it is easy to understand, requires less data cleaning, the data type is unconstrained, and it is a nonparametric method. But it causes overfitting. Overfitting is one of the most practical difficulties for decision tree models.

Bagging
Bagging used in statistical classification and regression is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms. The algorithm reduces variance, which affects the performance of the forecasting model and helps to prevent overfitting. [24]. The bagging model is shown in Figure 1 and algorithm as follows:

Random Forest
Random forests randomize not only input data but also input variables. By averaging results from multiple trees, it is able to reduce the variance and the overall performance of the model improves [30]. In particular, when a random forest has a large number of input variables, it often shows better performance than bagging and boosting. The random forest algorithm follows: (1) For sets ℒ = ( , ) =1 , ∈ ℝ, create a bootstrap sample ℒ * = { * , * } =1 using n of data. While bagging is a method of reconstructing data to make the model diverse, random forest reconstructs variables as well as data, and reduces the variance of the model, resulting in better performance than general bagging. The variance of the bagging model consists of the variance of the As to why bagging can improve predictability, it can be explained based on the fact that the expected loss of the average forecasting model is less than the expected loss of a single forecasting model. Modelf (x), built using given learning sets L, highly depends on the L. In order to highlight this, it is written asf (x) = f (x, L) and for a given forecasting model, the mean forecasting model is defined as f A (x) = E L f (x, L). Here, the expected value uses the distribution of the population from which the training data were obtained, and the theorem below shows that the expected loss of the average forecasting model is less than the expected loss of a single forecasting model.
Let us say that (X, Y) is a future observation that is independent of L. For the square loss function, L(y, a) = (y − a) 2 , the expected losses of f (x, L) and f A (x), R, and R A are defined as follows: Energies 2020, 13, 1438

of 16
As the square function is a convex function, Equation (2) is established by the Jensen inequality: Then, R is always greater than or equal to R A [25]. The bagging model can be applied not only in PV forecasting but also in various fields [26][27][28][29].

Random Forest
Random forests randomize not only input data but also input variables. By averaging results from multiple trees, it is able to reduce the variance and the overall performance of the model improves [30]. In particular, when a random forest has a large number of input variables, it often shows better performance than bagging and boosting. The random forest algorithm follows: While bagging is a method of reconstructing data to make the model diverse, random forest reconstructs variables as well as data, and reduces the variance of the model, resulting in better performance than general bagging. The variance of the bagging model consists of the variance of the trees and their covariance, respectively (Equation (3)).
Although samples were selected randomly with replacement from the all data sets, as each tree has a large number of overlapping data, it is hard to say that they are independent. In short, Cov(X, Y) is not zero and it means that as the tree increases, the variance across the model may increase. A way to reduce the covariance between each tree is needed, which is accomplished by using random forests. The studies of using PV forecasting with a random forest can be found in references [31][32][33][34].

Boosting
Boosting is an ensemble model using decision trees as weak learners and building the model in a stagewise manner by optimizing a loss function [35]. It is a method that converts weak learners into strong learners.
In this paper, we used XGBoost and LightGBM, which are types of gradient boosting. Gradient boosting is a generalization of boosting to arbitrary differentiable loss functions. Gradient boosting is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems.
Let us say that there is a model h 0 that takes an input x and predicts the variable y (Equation (4)): If the error is not unpredictable random noise, the most intuitive way to increase the forecasting performance is to eliminate the error. Removing this error is the basic concept of gradient boosting. In short, rather than predicting y, as the gradient boost moves to the next step, it predicts error and lowers the error. The process is shown in Equation (5): Energies 2020, 13, 1438 5 of 16 Unlike gradient boosting, however, XGBoost adds a regularization term to the object function, which prevents the model from overfitting. Through the regularization term, XGBoost imposes penalties on complex models. The mathematical formula is given in Equation (6): where t is the number of trees, f k is the output value of k th tree, Ω is a regularization that measures the complexity of the model and avoids overfitting, and l is the distance between y i andŷ i , which is used to measure the training error [36].
LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, better accuracy, support of parallel and GPU learning, and capable of handling large-scale data. While XGBoost uses levelwise loss, LightGBM uses leafwise loss to further reduce loss [ Figure 2]. It is capable of being more than twice as fast as XGBoost based on the same parameters and requires a large amount of training data because it is sensitive to overfitting. XGboost and LightGBM are used in forecasting field and are found in references [37,38].

Ensemble Model
In this paper, we used an ensemble model as a base learner of the bagging model instead of a decision tree, which is widely used as a base learner of the bagging algorithm. In other words, the ensemble model itself is viewed as a base learner [ Figure 3].
Basically, using an ensemble model provides better performance than using a single model. The algorithms of boosting models, such as XGBoost and LightGBM used in this study, tend to be sensitive to hyperparameters. Forecasting performance changes, depending on how we set up the hyperparameter. If we use many models, accordingly there would be more hyperparameters created and it makes the result less susceptible to hyperparameters. It means that regardless of tuning the hyperparameter, we can expect that good performance can be ensured. It also reduces inefficient time consumption in hyperparameter tuning. In short, by using an ensemble model as a base learner of the bagging algorithm, we can expect two outcomes: (1) better performance than single model by using an ensemble model as a base learner;

Ensemble Model
In this paper, we used an ensemble model as a base learner of the bagging model instead of a decision tree, which is widely used as a base learner of the bagging algorithm. In other words, the ensemble model itself is viewed as a base learner [ Figure 3].

Datasets and Preprocessing
In this study, the prediction model was modeled using one year of data (2016) of a PV plant in South Korea, with data consisting of hourly temperatures, humidity, irradiation, and actual output. Solar power is an energy source greatly affected by weather conditions. In particular, irradiation is the most influential factor in output, and many studies have recognized that it is very important to be able to predict the solar irradiation effectively. As for this, some overviews can be found in references [39][40][41]. As the power system in the Republic of Korea is operated one-hour based, the data used for forecasting solar power output were one-hour timeslot data. The ratio of training data to test data was set to 80:20, and the test data were set at the last three to five days of each month to take into account the monthly and seasonal characteristics rather than random extraction. The following lists the general characteristics of a PV plant in South Korea [42][43][44][45][46]:  Basically, using an ensemble model provides better performance than using a single model. The algorithms of boosting models, such as XGBoost and LightGBM used in this study, tend to be sensitive to hyperparameters. Forecasting performance changes, depending on how we set up the hyperparameter. If we use many models, accordingly there would be more hyperparameters created and it makes the result less susceptible to hyperparameters. It means that regardless of tuning the hyperparameter, we can expect that good performance can be ensured. It also reduces inefficient time consumption in hyperparameter tuning. In short, by using an ensemble model as a base learner of the bagging algorithm, we can expect two outcomes:

Feature Engineering and Selection
(1) better performance than single model by using an ensemble model as a base learner; (2) good performance regardless of hyperparameter tuning.

Datasets and Preprocessing
In this study, the prediction model was modeled using one year of data (2016) of a PV plant in South Korea, with data consisting of hourly temperatures, humidity, irradiation, and actual output. Solar power is an energy source greatly affected by weather conditions. In particular, irradiation is the most influential factor in output, and many studies have recognized that it is very important to be able to predict the solar irradiation effectively. As for this, some overviews can be found in references [39][40][41]. As the power system in the Republic of Korea is operated one-hour based, the data used for forecasting solar power output were one-hour timeslot data. The ratio of training data to test data was set to 80:20, and the test data were set at the last three to five days of each month to take into account the monthly and seasonal characteristics rather than random extraction. The following lists the general characteristics of a PV plant in South Korea [42][43][44][45][46]:

Feature Engineering and Selection
The data features used were date, time, humidity, temperature, irradiation, and actual output. The value at time t was greatly affected by the value at time t-1. The past values are known as lags and in this study two lagged outputs were added as new features.
First, the date, time, humidity, temperature, irradiation, one-day-ahead (D-1) and two-days-ahead (D-2) output were set as new features, and the output set as a label, i.e., the target. Second, the Pearson correlation coefficient was calculated to correlate the total variables. It has a value between −1 and 1. The Pearson correlation coefficient is mainly used to approximate the characteristics of variables by identifying one-to-one correspondence between continuous variables [47]. As one variable increases and the other increases, it becomes positive and closer to 1; in the other case, it becomes negative and closer to −1. If there is no relationship, it is close to zero. The formula for calculating the Pearson correlation coefficient of two n-dimensional vectors X and Y is given by Equation (7): where x is the mean of X and y is the mean of Y.
The Pearson coefficient method was used to evaluate the correlation between features and label. The results are shown in Table 1. From Table 1, the following results were drawn: • The order of correlation is irradiation, lagged output, humidity, temperature, and time (absolute valued based).

•
There is a positive relationship with all variables except humidity.

•
Lagged output has a relationship greater than the weather conditions except for irradiation.

Modeling and Results
The hyperparameters adjusted in this prediction model were of three types: a learning rate tuning parameter in an optimization algorithm that determines the step size at each iteration, the number of samples, and the maximum depth of the tree. The hyperparameters were tuned to 0.1, 100, and 5, in that order. for all models. To verify the effectiveness of the model and assess its performance, mean absolute error (MAE) and root mean square error (RMSE) were used as an indicator. MAE shows the average distance between the measured values and the model predictions. The root mean square error (RMSE) is more sensitive to big forecast errors, and hence is suitable for applications where Energies 2020, 13, 1438 8 of 16 small errors are more tolerable and larger errors cause disproportionately high costs. It is probably the reliability factor that is most appreciated and used [23]. The MAE and RMSE are shown in Equations (8) and (9): where n is the number of samples in the test set,ŷ i is the forecasted photovoltaic power output, and y is the real photovoltaic power output. In this study, the major contributions are listed as follows: • We used lagged output data as a new feature to improve forecast accuracy.

•
We used an ensemble model as a base learner in bagging model, which gave better performance than the base learner using a single model. In this paper, we used a decision tree.
We named two features used in this study as follows: • Default features-the features that used the given variables, including hour, temperature, humidity, and irradiation.

•
New features-the features of past output data that were added to the default features, including hour, temperature, humidity, irradiation, D-1 output, and D-2 output. where n is the number of samples in the test set, ̂ is the forecasted photovoltaic power output, and is the real photovoltaic power output. In this study, the major contributions are listed as follows: • We used lagged output data as a new feature to improve forecast accuracy.

•
We used an ensemble model as a base learner in bagging model, which gave better performance than the base learner using a single model. In this paper, we used a decision tree. We named two features used in this study as follows: • Default features-the features that used the given variables, including hour, temperature, humidity, and irradiation.

•
New features-the features of past output data that were added to the default features, including hour, temperature, humidity, irradiation, D-1 output, and D-2 output.      From Tables 2 and 3 and Figures 6-9, except for June and October, the error of the other 10 months decreased for MAE. Similarly, in regard to RMSE, the error of 9 months decreased except for June, July, and October. In May and August, especially, the error decreased remarkably, as shown. Tables 4 and 5 show how much monthly error decreased when using the new features.        Tables 6 and 7 show the average MAE and RMSE rank of the default features and new features, and all bagging models using ensemble base learner show lower error than the when using the single model as a base learner.   Tables 6 and 7 show the average MAE and RMSE rank of the default features and new features, and all bagging models using ensemble base learner show lower error than the when using the single model as a base learner.

Conclusions
The change in energy mix from conventional to sustainable energy will increase the penetration of solar energy to the power system. In order to cope with solar penetration, which is highly affected by weather conditions, and to form a benefit value chain ranging from grid operator to plant operator and energy consumer, improved forecasting technology is necessary.
In this study, we formed a bagging model for photovoltaic forecasting and in order to improve forecasting performance, the following two models were proposed. First, we proposed an ensemble model as a base learner of the bagging algorithm rather than using a decision tree, which is generally used as a base learner of the bagging model. Second, we proposed using past output data as a new feature. The proposed model showed better performance than the existing method and details of simulation conclusions follow: (1) The overall performance of using an ensemble model as a base learner in the bagging predictor was better than using a decision tree-based bagging predictor. (2) The results showed that adding past output as new features instead of just using weather conditions provided better performance to make predictions and it reduced the error rate by up to 50% or more. (3) The ensemble models used as a base learner of the bagging model were random forest, XGBoost, and LightGBM, and they did not show much difference in performance.
The improvement achieved through the model was demonstrated in the data reported. However, the problem is that the MAE metric is still quite high. Clearly, the model presented showed good performance, but needs to be improved to be an accurate model. As mentioned previously, in this study, in order to compare performance difference between the single model learner-based bagging model and the ensemble model learner-based bagging model, we did not tune each base learner's hyperparameter for optimization. In other words, all the same hyperparameters were used for all models. The impact of the hyperparameter was reduced by using the ensemble model as the base learner, but it could certainly be lower than it is now if the optimal hyperparameter were set and the prediction was made. To improve the model's accuracy, we plan to do additional studies, such as optimization of the hyperparameter, data cleaning, and others.