Cooling Load Forecasting via Predictive Optimization of a Nonlinear Autoregressive Exogenous (NARX) Neural Network Model

: Accurate calculations and predictions of heating and cooling loads in buildings play an important role in the development and implementation of building energy management plans. This study aims to improve the forecasting accuracy of cooling load predictions using an optimized nonlinear autoregressive exogenous (NARX) neural network model. The preprocessing of training data and optimization of parameters were investigated for model optimization. In predictive models of cooling loads, the removal of missing values and the adjustment of structural parameters have been shown to help improve the predictive performance of a neural network model. In this study, preprocessing the training data eliminated missing values for times when the heating, ventilation, and air-conditioning system is not running. Also, the structural and learning parameters were adjusted to optimize the model parameters.


Introduction
Building structures consumes about 35% of the world's total energy and are responsible for 75% of greenhouse gas emissions [1,2]. During a building's life cycle, considerable energy is consumed for operations, especially for lighting, heating and cooling, ventilation, and transportation. Among these operations, the energy used for heating and cooling the building accounts for the largest proportion (60%) of energy consumption [3]. As a result, management tools, such as building energy management systems, are deployed at the building operations stage. Demand management also is important for dealing with heating/cooling energy that accounts for the largest proportion of energy consumption in buildings. Therefore, accurate predictions of the heating and cooling loads of buildings are critical to reduce energy consumption. Passive responses, such as insulation and windows, should be considered in the design phase. Active responses can be addressed by deploying a heating, ventilation, and air-conditioning (HVAC) system with proper capacity. In short, a building's energy performance of its operational stages can be optimized by predicting heating and cooling loads that are due to climate change, adjusting operating schedules, and controlling the HVAC system properly.
Researchers have conducted numerous studies in recent years to investigate accurate load predictions and have actively pursued the use of machine learning. Machine learning has been used in various fields to forecast demand, e.g., forecasting power demand has been ongoing since the 1990s [4]. Peng et al. [5] proposed a combined model that uses two artificial neuron network (ANN) models (Box and Jenkins) to predict cooling loads. Their results show a mean absolute percentage error of less than 2.1%. Kwok et al. [6] developed a probabilistic entropy-based neural model to forecast cooling loads. They simulated a building using dynamic occupancy area and rate as the input parameters and found that building occupancy data played an important role in load prediction and improved the prediction accuracy significantly. Ding et al. [7] also investigated the effects of changes in input variables on prediction accuracy. Their analysis utilized an ANN model and support vector machine with combinations of eight input variables. The optimized combinations of the variables obtained by K-means and hierarchical clustering methods provided accurate results. These researchers also found that historical cooling capacity data affected prediction accuracy. Jihad et al. [8] forecast heating and cooling loads in residential buildings with respect to orientation, relative compactness, glazing rate, wall surface area, height, and surface area. Their prediction accuracy was 98.7% in the training period and 97.6% in the testing period. Koschwitz et al. [9] evaluated monthly load predictions using data-driven thermal loads with two nonlinear autoregressive exogenous recurrent neural networks (NARX RNNs) at different depths and a linear epsilon-insensitive support vector machine (ε-SVM) regression model. Their results indicate that the NARX RNN method provides more accurate predictions than the ε-SVM regression model.
Pewell et al. [10] used the Neural Network model, linear ARX model, and NARX model to estimate the heating and cooling electrical loads of university campuses and evaluate the predictive performance of each model. The NARX model showed the best results and was found to be suitable for time series prediction [10,11].
As the literature shows, predictions that use machine learning algorithms offer advantages in that they do not require complex modeling compared to simulation methods that are based on mathematical models. As shown, researchers have conducted various studies of heating and cooling load prediction methods that employ machine learning algorithms [12][13][14][15][16]. However, in order to improve the accuracy of such predictions, the researchers either combined several neural network models or utilized complex structures, such as deep layers. Estimating cooling loads requires considerable information, such as building envelopes, insulation performance, location, and complex calculations. Moreover, a skilled engineer is needed for accurate prediction. For this study, we propose a model optimization method that improves the cooling load prediction performance of a simple structure using a NARX feedforward neural network.

Generation of Reference Building
Cooling load data were generated in accordance with the outdoor environment during the cooling period (i.e., warm/hot weather) using a standard energy consumption pattern (annual energy consumption per area) for the reference building used in this study. The generated data were used for training and testing in neural network analysis. The reference building model employed in this study is a large-scale office building that is in the category of ANSI/ASHRAE/IES Standard 90.1-2010 Prototype Building Models as defined by the Department of Energy (DOE, USA) [17]. The site location corresponds to international climate zones 4, and weather data was used in the form of TRY (test, reference, year) provided by the Korea Meteorological Administration. The input variables for the reference building were considered in terms of architectural energy performance and included core size, roof type, main structure type, construction year, heat transmission rate, window area ratio, and other factors such as the type of heat source and air conditioning. Table 1 shows the boundary conditions for the simulation that include adjusted weather data and location data to reflect local conditions.
The meteorological data were generated from 26,280 simulation data sets per hour using the test reference year (TRY) format data from 2016 to 2018 for Seoul, Korea. The data from May to September during the cooling period (summer season) were selected and used for the analysis in this study.

Predictive Model Using An Artificial Neural Network
Cooling load forecasting was performed using the NARX feedforward neural network model that is in the MATLAB Neural Networks Toolbox (R2018b). According to the literature, the NARX neural network model has been proven to be very accurate for time series forecast analysis [18][19][20][21]. This model consists of an input layer, hidden layer, and output layer. At the neural network learning phase in the input layer, the external conditions, seasonality data, and historical sensible cooling load data are used as input values. In the hidden layer, neural network operations are performed through the internal neurons after receiving the input signal from the input layer. In this study, the number of hidden layers and neurons was varied for comparative purposes. The outputs of the output layer are the predicted results for a sensible cooling load.

Set-up Conditions
Predictions were performed under various conditions in order to improve forecast accuracy using the NARX feedforward neural network model. An energy simulation program was run to generate cooling data for the years 2016 to 2018. The data sets for the year 2018 and for 2016-17 were used for testing and training, respectively. The training data in the ANN model were learned well, resulting in high prediction accuracy for the training data, but the test data were overfitted and yielded poor prediction performance. Therefore, the conditions of the predictive model were adjusted to prevent overfitting in the neural network.
(1) Determination of input value Several input values are needed to forecast cooling loads, including outside conditions, seasonality data, and historical sensible cooling load data. The outside conditions include the dry-bulb temperature, wet-bulb temperature, dew-point temperature, relative humidity data, and seasonality data.
(2) Preprocessing data The prediction accuracy using ANN algorithms was evaluated by preprocessing the data. The training data used in this study were normalized whereby the data sets had values between 0 to 1. Equation (1) was used for data normalization. Then, the accuracy was evaluated in terms of whether missing values in the data set were processed or not.
Missing values were generated for weekends, holidays, and off-hours in the HVAC system. Table 2 presents the number of data points used for each condition. Using Energyplus, 26,280 data points were generated every hour for three years. Among these data points, 3858 data points from May to September (the hot season in Korea when cooling is most needed) were selected for the study. The missing values were eliminated. (3) Optimal structural parameters The hidden layer that forms the structure of an ANN contains neurons, and the number of hidden layers and neurons affect the accuracy and determine the amount of learning that is needed. These structural parameters, i.e., the number of hidden layers and the number of neurons, were the variables used in this study. Table 3 summarizes the ranges of these two parameters. Table 3. Values for structural parameters used for optimization.

Number of Hidden Layers 1~5
Number of Neurons 1~20 (4) Optimal learning parameters The prediction accuracy was evaluated in terms of changes in learning rate and epoch (a learning unit), which served as the learning parameters in this study. The learning rate determines the amount of learning such that the weight parameter is updated after each learning process. Epoch is a unit of learning, meaning that it requires one pass through the entire data set one time. Table 4 shows the ranges of these two parameters. Table 4. Values for learning parameters used for optimization.

Performance Evaluation Index
The cooling load forecast results were evaluated according to the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) [22], the Federal Energy Management Program (FEMP) in the United States [23], and the International Performance Measurement and Verification Protocol (IPMVP) guidelines [24]. Table 5 shows the respective measurement and verification protocols for the energy management of buildings according to ASHRAE, FEMP, and IPMVP. Each specification provides accuracy criteria for a building energy model to predict building energy performance. The coefficient of variation of the root mean square error (CvRMSE) was used as a performance index.

Determination of Missing Values
Missing values, which are induced by operational conditions, scheduling, holidays, etc., were eliminated from the data sets. There were 11,016 data points with missing values and 3852 data points after deleting the missing values. The preliminary data set used for prediction contained one hidden layer, five neurons, 200 epochs, and one learning rate. Table 6 summarizes the results for the cooling load predictions in terms of with and without the missing values. During the training period, the difference in the prediction results was marginal with and without the missing values, but the average CvRMSE was noticeable: 20.5% with missing values and 5.9% without missing values. The performance during the testing period also confirms an increase in the accuracy of the prediction results to 27.6% before removal of the missing values, but 11.1% after removal. Although ASHRAE Guideline 14 was satisfied regardless of whether or not the values were missing, the prediction accuracy increased significantly when the missing values were removed during the training and testing periods.  Table 7 summarizes the optimized structural parameters in terms of the number of hidden layers (No. H) and the number of neurons (No. N). During the training period, the minimum CvRMSE was less than 10% regardless of the number of hidden layers when the number of neurons was increased to ten. On the other hand, when the number of neurons was 15, the minimum CvRMSE was more than 10% (13% for 4 neurons and 16.4% for 5 neurons). In addition, when the number of neurons was 20, the number of hidden layers also increased with an increase in the CvRMSE. Five hidden layers resulted in a maximum 21.1% CvRMSE. During the testing period, the minimum CvRMSE was less than 7% regardless of the number of hidden layers when the number of neurons was increased to three. As the number of neurons was increased, the CvRMSE also gradually increased. When the number Sustainability 2019, 11, 6535 6 of 13 of neurons was more than 15, the prediction accuracy reduced significantly to 98.6% for five hidden layers and 128.5% for five hidden layers with 20 neurons. The amount of learning was determined by the inside of the hidden layer and the number of neurons. The accuracy of the prediction was reduced due to overfitting as the number of hidden layers and neurons increased. To determine the final number of hidden layers and neurons, predictions were performed ten times with one to five hidden layers and one to three neurons. Figure 1 shows the prediction results for the testing period. For the neurons, the best results were obtained regardless of the number of hidden layers. As the number of neurons was increased, the mean and standard deviations increased. In particular, with three neurons, the accuracy and reproducibility of the predictions declined significantly. The optimal setting condition was determined to be two hidden layers with the fewest neurons, which yielded the lowest standard deviation. To determine the final number of hidden layers and neurons, predictions were performed ten times with one to five hidden layers and one to three neurons. Figure 1 shows the prediction results for the testing period. For the neurons, the best results were obtained regardless of the number of hidden layers. As the number of neurons was increased, the mean and standard deviations increased. In particular, with three neurons, the accuracy and reproducibility of the predictions declined significantly. The optimal setting condition was determined to be two hidden layers with the fewest neurons, which yielded the lowest standard deviation.

Optimization of Learning Parameters
The learning rate (LR in Table 8) and epochs were adjusted to optimize the learning parameters based on the selected structural parameters. Table 8 summarizes the results for the learning parameter optimization when applying two hidden layers and one neuron. When the learning rate and epochs were varied, the results show no significant difference from the average CvRMSE before optimizing the learning parameters under all conditions.

Optimization of Learning Parameters
The learning rate (LR in Table 8) and epochs were adjusted to optimize the learning parameters based on the selected structural parameters. Table 8 summarizes the results for the learning parameter optimization when applying two hidden layers and one neuron. When the learning rate and epochs were varied, the results show no significant difference from the average CvRMSE before optimizing the learning parameters under all conditions. 6.0 6.0 6.0 6.0 6.1 6.4 6.5 6.5 6.4 6.5 300 6.0 6.0 6.0 6.0 6.0 6.5 6.5 6.5 6.5 6.4 400 6.0 6.0 6.0 6.0 6.0 6.4 6.5 6.4 6.5 6.5 500 6.0 6.0 6.0 6.0 6.0 6.5 6.4 6.5 6.5 6.4 1000 6.0 6.0 6.0 6.0 6.0 6.5 6.5 6.5 6.5 6.4 Figure 2 shows the CvRMSE with respect to the various learning parameters. For all conditions, the average CvRMSE was 6.5%~6.6%, which is not significantly different from the minimum value with a low standard deviation. Good predictions were obtained regardless of the change in learning parameters based on the optimized structural parameters discussed in the previous section. Table 9 summarizes the minimum CvRMSE with respect to the learning parameters. Depending on the learning parameter variable, the prediction accuracy could be determined without optimizing the structural parameter. The structural parameters were fixed with two hidden layers and ten neurons. The CvRMSE values of 5.9% in the training period and 23.1% in the testing period were obtained. By adjusting the learning rate and epochs, the CvRMSE increased slightly during the training period but mostly showed the same high prediction accuracy as before the adjustment. The results confirm that the prediction accuracy decreased with changes in learning rate and epochs. 6.0 6.0 6.0 6.0 6.0 6.5 6.4 6.5 6.5 6.4 1000 6.0 6.0 6.0 6.0 6.0 6.5 6.5 6.5 6.5 6.4 Figure 2 shows the CvRMSE with respect to the various learning parameters. For all conditions, the average CvRMSE was 6.5% ~ 6.6%, which is not significantly different from the minimum value with a low standard deviation. Good predictions were obtained regardless of the change in learning parameters based on the optimized structural parameters discussed in the previous section.  Table 9 summarizes the minimum CvRMSE with respect to the learning parameters. Depending on the learning parameter variable, the prediction accuracy could be determined without optimizing the structural parameter. The structural parameters were fixed with two hidden layers and ten neurons. The CvRMSE values of 5.9% in the training period and 23.1% in the testing period were obtained. By adjusting the learning rate and epochs, the CvRMSE increased slightly during the training period but mostly showed the same high prediction accuracy as before the adjustment. The results confirm that the prediction accuracy decreased with changes in learning rate and epochs. Table 9. CvRMSE with respect to learning parameters.   Figure 3 presents the comparative results after ten predictions for each condition without optimizing the structural parameters. The minimum CvRMSE met the ASHRAE guideline criteria, but the results were decidedly outside the criteria after repeating the analyses. As the structural parameters were optimized, the adjustment of the learning parameters insignificantly affected prediction accuracy. Without optimizing the structural parameters, it is difficult to expect a large improvement in prediction accuracy through the adjustment of the predictive learning parameters. As the amount of learning increased, the prediction accuracy As the structural parameters were optimized, the adjustment of the learning parameters insignificantly affected prediction accuracy. Without optimizing the structural parameters, it is difficult to expect a large improvement in prediction accuracy through the adjustment of the predictive learning parameters. As the amount of learning increased, the prediction accuracy decreased. The results indicate that the ANN prediction model subsequently needed to be optimized after adjusting the structural parameters.

Forecasting the Cooling Load
The main parameters of the ANN model that are needed for cooling load predictions were optimized in this study. The cooling load was predicted by applying the optimized parameters and was verified by comparing the data generated for the reference building. The structural parameters were selected using two hidden layers and a neuron with the best prediction accuracy. The learning parameters with the lowest standard deviations, i.e., 200 epochs and 0.01 learning rate, were used for analysis. Figure 4 shows the generated cooling load and predicted the cooling load for the training and testing results. The predicted cooling loads match well with the generated cooling loads. decreased. The results indicate that the ANN prediction model subsequently needed to be optimized after adjusting the structural parameters.

Forecasting the Cooling Load
The main parameters of the ANN model that are needed for cooling load predictions were optimized in this study. The cooling load was predicted by applying the optimized parameters and was verified by comparing the data generated for the reference building. The structural parameters were selected using two hidden layers and a neuron with the best prediction accuracy. The learning parameters with the lowest standard deviations, i.e., 200 epochs and 0.01 learning rate, were used for analysis. Figure 4 shows the generated cooling load and predicted the cooling load for the training and testing results. The predicted cooling loads match well with the generated cooling loads.    Figure 5 provides a summary of the cooling loads for each month from May to September and the overall cooling load during the cooling period. The error rate was 0.4%~3.5% in the case of monthly cooling loads and 1.0% for the overall forecast period, indicating high accuracy.  Figure 5 provides a summary of the cooling loads for each month from May to September and the overall cooling load during the cooling period. The error rate was 0.4% ~ 3.5% in the case of monthly cooling loads and 1.0% for the overall forecast period, indicating high accuracy.
(a) Monthly cooling load (b) Total cooling load

Conclusion
This study evaluated the preprocessing of training data and the changes in prediction accuracy that are due to adjusting the model parameters using an ANN model for cooling load forecasts in

Conclusions
This study evaluated the preprocessing of training data and the changes in prediction accuracy that are due to adjusting the model parameters using an ANN model for cooling load forecasts in buildings. For the data preprocessing, predictions were performed using 3852 data sets. These data points were obtained by removing missing values from 11,016 data sets and generating data for the reference building for three years. The prediction accuracy was improved after preprocessing and deleting missing values that were due to operational schedules and holidays.
For the optimization of the structural parameters, when the number of neurons was below a certain level, the CvRMSE showed good prediction accuracy of less than 10% regardless of the increase in the number of hidden layers. As the number of neurons and hidden layers was increased, the prediction accuracy decreased. This result was due to overfitting that in turn was due to an excessively large amount of learning of the neural network.
For the optimization of the learning parameters, which used a range of epochs from 100 to 1000 and learning rates from 0.0001 to 1.0, the prediction accuracy was investigated with and without optimizing the structural parameters. After optimizing the structural parameters, the results showed good predictions regardless of changes in epochs and learning rates. The CvRMSE was reduced slightly if the epochs and learning rates were reduced when the structural parameters were not optimized. This phenomenon is believed to compensate for the excessive learning capacity that is due to the structural parameters by adjusting the learning parameters to reduce the amount of learning. However, whenever the predictions were performed under the same conditions, the results deviated significantly and consistent results could not be obtained by adjusting only the learning parameters.
The performance of the ANN prediction model based on data learning is significantly influenced by the data condition and learning condition. The missing values in the training data resulted in the misreading of the data, therefore, data preprocessing is strongly necessary including elimination of missing values.
Structural parameters and learning parameters show good results by adjusting to maintain the proper size of the learning capacity and the learning amount.
Further research is planned to implement the proposed algorithms to predict the cooling load forecasting in the existing building.