Monthly and Quarterly Sea Surface Temperature Prediction Based on Gated Recurrent Unit Neural Network

Zhen Zhang 1 , Xinliang Pan 1, Tao Jiang 1,*, Baikai Sui 1, Chenxi Liu 1 and Weifu Sun 2 1 College of Geomatics, Shandong University of Science and Technology, Qingdao 266590, China; zhang_L08@lzu.edu.cn (Z.Z.); panxinliang@yujiangtech.cn (X.P.); suibaikai@yujiangtech.cn (B.S.); liuchenxi@yujiangtech.cn (C.L.) 2 First Institute of Oceanography, Ministry of Natural Resources of the People’s Republic of China, Qingdao 266061, China; sunweifu@fio.org.cn * Correspondence: jiangtao@sdust.edu.cn; Tel.: +86-0532-86057287


Introduction
The sea surface temperature (SST) is an important parameter of the energy balance on the Earth's surface, and it plays a fundamental role in energy, momentum, and water exchange between the ocean and atmosphere [1][2][3]. SST changes have an immeasurable effect on global climate and biological systems [4][5][6]. Therefore, the timely prediction of SST is crucial in many application fields, such as climate prediction, marine fishery production, and marine environmental protection [7][8][9]. However, the prediction accuracy is unsatisfactory due to the influence of many uncertain factors, such as heat flux, radiation, and diurnal wind near the sea surface [10].
At present, SST prediction is mainly divided into physics-based numerical prediction [11,12] and data-driven prediction methods. The former achieves SST prediction by modeling the physical environment. The accuracy of this method depends on the degree of refinement of the model parameters. The more refined the physical environment parameters, the higher the prediction accuracy; however, various problems, such as increased computational complexity and difficulty in obtaining many parameters, also emerge. The seasonal prediction based on a physics-based numerical model has satisfactory performance and prediction accuracy, but it performs poorly for the prediction of long time-series data at a smaller scale. Data-driven prediction methods have gradually developed into Testing the stability and validity of the prediction algorithm to ensure that the annual temperature fluctuation varies greatly is an important consideration. The Bohai Sea is located northeast of China, between latitude 37°07'N-40°56'N and longitude 117°33'E-122°08'E. The eastern side of the Bohai Sea is adjacent to the Yellow Sea, by the boundary between the Laotie Mountain of the Liaodong Peninsula and the Penglai Mountain at the northern end of the Shandong Peninsula; the other three sides are surrounded by land. The Bohai Sea has the largest temperature difference and highest latitude of the four seas in China. Therefore, this study uses the Bohai Sea, with large temperature changes, as the study area. From this sea area, six sites are selected for prediction research. Figure 1 shows the study area.

Data Source
The U.S. National Oceanic and Atmospheric Administration 1/4° daily Optimum Interpolation (OISST) is used in this study. This resource contains a global grid SST formed by combining observation data from different platforms, such as satellites, ships, and buoys. The OISST dataset consists of two types of data, namely from the Advanced Very-High-Resolution Radiometer (AVHRR) and Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E). Since the AMSR-E stopped working in October 2011, the AVHRR-only dataset with a longer time series than AMSR-E was adopted in this study. The dataset covers the global daily SST values from 89.975°S to 89.875°N and 0.125°E to 359.875°E from September 1, 1981 to the present. This study selects the site data from January 1982 to December 2019 and reorganizes the data on the basis of the monthly and seasonal averages. Therefore, each site from P1 to P6, as shown in Figure 1, generates 456 items of monthly data from January 1982 to December 2019 and 152 items of quarterly data from spring 1982 to winter 2019. Table 1 shows the basic statistical characteristics of the data used in the experiments.

Data Source
The U.S. National Oceanic and Atmospheric Administration 1/4 • daily Optimum Interpolation (OISST) is used in this study. This resource contains a global grid SST formed by combining observation data from different platforms, such as satellites, ships, and buoys. The OISST dataset consists of two types of data, namely from the Advanced Very-High-Resolution Radiometer (AVHRR) and Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E). Since the AMSR-E stopped working in October 2011, the AVHRR-only dataset with a longer time series than AMSR-E was adopted in this study. The dataset covers the global daily SST values from 89.975 • S to 89.875 • N and 0.125 • E to 359.875 • E from September 1, 1981 to the present. This study selects the site data from January 1982 to December 2019 and reorganizes the data on the basis of the monthly and seasonal averages. Therefore, each site from P1 to P6, as shown in Figure 1, generates 456 items of monthly data from January 1982 to December 2019 and 152 items of quarterly data from spring 1982 to winter 2019. Table 1 shows the basic statistical characteristics of the data used in the experiments. The headings "Mean", "Coldest", and "Warmest" represent the mean value, minimum value, and maximum value for monthly or quarterly data, respectively. Std.dev represents the standard deviation for monthly or quarterly data. There is no subitem "Mean" for quarter data in Table 1 because the mean values of all quarterly data are the same as those of monthly data, which are the average temperature of the region for 38 years. Thus, in order to avoid duplication, only the means of monthly data are shown in the table.

Methods
A neural network model for the high-precision prediction of medium and long-term SSTs based on GRU and a fully connected layer is constructed. The sequential prediction of the time series of SST data is performed by observing and learning the regular characteristics of historical data over a certain time window. The time window is moved forward to make predictions for the next time node until all training data have been covered. The number of trainings can be set to repeat this learning process, enhancing the accuracy and generalization of the predictive model.

GRU Neural Network Model
GRU is a variant of LSTM. LSTM adds three gate functions on the basis of the RNN network, namely the input, forgetting, and output gates to the control input, memory, and output values, respectively. However, only two gates are present in the GRU model: update and reset gates. Figure 2 presents the specific structure, where σ is the gating function. Figure 2. Structure diagram of long short-term memory (LSTM) and gated recurrent unit (GRU) methods. The upper part is LSTM [25] and the lower part is GRU [27]. The headings "Mean", "Coldest", and "Warmest" represent the mean value, minimum value, and maximum value for monthly or quarterly data, respectively. Std.dev represents the standard deviation for monthly or quarterly data. There is no subitem "Mean" for quarter data in Table 1 because the mean values of all quarterly data are the same as those of monthly data, which are the average temperature of the region for 38 years. Thus, in order to avoid duplication, only the means of monthly data are shown in the table.

Methods
A neural network model for the high-precision prediction of medium and long-term SSTs based on GRU and a fully connected layer is constructed. The sequential prediction of the time series of SST data is performed by observing and learning the regular characteristics of historical data over a certain time window. The time window is moved forward to make predictions for the next time node until all training data have been covered. The number of trainings can be set to repeat this learning process, enhancing the accuracy and generalization of the predictive model.

GRU Neural Network Model
GRU is a variant of LSTM. LSTM adds three gate functions on the basis of the RNN network, namely the input, forgetting, and output gates to the control input, memory, and output values, respectively. However, only two gates are present in the GRU model: update and reset gates. Figure  2 presents the specific structure, where σ is the gating function.

Figure 2.
Structure diagram of long short-term memory (LSTM) and gated recurrent unit (GRU) methods. The upper part is LSTM [25] and the lower part is GRU [27].
indicates the sigmoid and tanh neural network layers.
represents the pointwise multiplication and addition operations. is the operation subtracted by 1.
depicts the input time series data. exhibits the output data.
GRU combines the forgetting and input gates of LSTM into a single update gate. This process effectively reduces the amount of calculation and the probability of gradient explosion or disappearance. The specific working mechanism is as follows [27]: indicates the sigmoid and tanh neural network layers. The headings "Mean", "Coldest", and "Warmest" represent the mean value, minimum value, and maximum value for monthly or quarterly data, respectively. Std.dev represents the standard deviation for monthly or quarterly data. There is no subitem "Mean" for quarter data in Table 1 because the mean values of all quarterly data are the same as those of monthly data, which are the average temperature of the region for 38 years. Thus, in order to avoid duplication, only the means of monthly data are shown in the table.

Methods
A neural network model for the high-precision prediction of medium and long-term SSTs based on GRU and a fully connected layer is constructed. The sequential prediction of the time series of SST data is performed by observing and learning the regular characteristics of historical data over a certain time window. The time window is moved forward to make predictions for the next time node until all training data have been covered. The number of trainings can be set to repeat this learning process, enhancing the accuracy and generalization of the predictive model.

GRU Neural Network Model
GRU is a variant of LSTM. LSTM adds three gate functions on the basis of the RNN network, namely the input, forgetting, and output gates to the control input, memory, and output values, respectively. However, only two gates are present in the GRU model: update and reset gates. Figure  2 presents the specific structure, where σ is the gating function.  [25] and the lower part is GRU [27].
indicates the sigmoid and tanh neural network layers.
represents the pointwise multiplication and addition operations. is the operation subtracted by 1.
depicts the input time series data. exhibits the output data.
GRU combines the forgetting and input gates of LSTM into a single update gate. This process effectively reduces the amount of calculation and the probability of gradient explosion or disappearance. The specific working mechanism is as follows [27]: represents the pointwise multiplication and addition operations. The headings "Mean", "Cold maximum value for month deviation for monthly or qu because the mean values of a average temperature of the re of monthly data are shown in

Methods
A neural network model based on GRU and a fully con series of SST data is performed data over a certain time windo next time node until all traini repeat this learning process, en

GRU Neural Network Model
GRU is a variant of LSTM namely the input, forgetting, a respectively. However, only tw 2 presents the specific structure is the operation subtra output data.
GRU combines the forgett effectively reduces the amou disappearance. The specific wo The headings "Mean", "Coldest", and "Warmest" represent the mean value, minimum value, and maximum value for monthly or quarterly data, respectively. Std.dev represents the standard deviation for monthly or quarterly data. There is no subitem "Mean" for quarter data in Table 1 because the mean values of all quarterly data are the same as those of monthly data, which are the average temperature of the region for 38 years. Thus, in order to avoid duplication, only the means of monthly data are shown in the table.

Methods
A neural network model for the high-precision prediction of medium and long-term SSTs based on GRU and a fully connected layer is constructed. The sequential prediction of the time series of SST data is performed by observing and learning the regular characteristics of historical data over a certain time window. The time window is moved forward to make predictions for the next time node until all training data have been covered. The number of trainings can be set to repeat this learning process, enhancing the accuracy and generalization of the predictive model.

GRU Neural Network Model
GRU is a variant of LSTM. LSTM adds three gate functions on the basis of the RNN network, namely the input, forgetting, and output gates to the control input, memory, and output values, respectively. However, only two gates are present in the GRU model: update and reset gates. Figure  2 presents the specific structure, where σ is the gating function.

Figure 2.
Structure diagram of long short-term memory (LSTM) and gated recurrent unit (GRU) methods. The upper part is LSTM [25] and the lower part is GRU [27].
indicates the sigmoid and tanh neural network layers.
represents the pointwise multiplication and addition operations. is the operation subtracted by 1.
depicts the input time series data. exhibits the output data.
GRU combines the forgetting and input gates of LSTM into a single update gate. This process effectively reduces the amount of calculation and the probability of gradient explosion or disappearance. The specific working mechanism is as follows [27]: The headings "Mean", "Coldest", and "Warmest" represent the mean value, minimum value, and maximum value for monthly or quarterly data, respectively. Std.dev represents the standard deviation for monthly or quarterly data. There is no subitem "Mean" for quarter data in Table 1 because the mean values of all quarterly data are the same as those of monthly data, which are the average temperature of the region for 38 years. Thus, in order to avoid duplication, only the means of monthly data are shown in the table.

Methods
A neural network model for the high-precision prediction of medium and long-term SSTs based on GRU and a fully connected layer is constructed. The sequential prediction of the time series of SST data is performed by observing and learning the regular characteristics of historical data over a certain time window. The time window is moved forward to make predictions for the next time node until all training data have been covered. The number of trainings can be set to repeat this learning process, enhancing the accuracy and generalization of the predictive model.

GRU Neural Network Model
GRU is a variant of LSTM. LSTM adds three gate functions on the basis of the RNN network, namely the input, forgetting, and output gates to the control input, memory, and output values, respectively. However, only two gates are present in the GRU model: update and reset gates. Figure  2 presents the specific structure, where σ is the gating function.

Figure 2.
Structure diagram of long short-term memory (LSTM) and gated recurrent unit (GRU) methods. The upper part is LSTM [25] and the lower part is GRU [27].
indicates the sigmoid and tanh neural network layers.
represents the pointwise multiplication and addition operations. is the operation subtracted by 1.
depicts the input time series data. exhibits the output data.
GRU combines the forgetting and input gates of LSTM into a single update gate. This process effectively reduces the amount of calculation and the probability of gradient explosion or disappearance. The specific working mechanism is as follows [27]: exhibits the output data.
GRU combines the forgetting and input gates of LSTM into a single update gate. This process effectively reduces the amount of calculation and the probability of gradient explosion or disappearance. The specific working mechanism is as follows [27]: where Z t and r t represent the update and reset gates, respectively. W z , W r , W, and W are the weight parameters of the input data, h t−1 is the output of the previous layer, and x t is the input of the current layer. b z , b r , and b h are the biases, σ is the sigmoid function, and tanh is used to help adjust the value flowing through the network. The output values after σ and tanh functions are controlled between (0, 1) and (−1, 1), respectively. After obtaining the final output, the loss value can be calculated by the loss function: where E t is the loss of a single sample at a certain time, y d is the real label data, y 0 t is the output value of the first iteration, and E is the loss of a single sample at all times.
The back-error propagation algorithm is used to learn the network, so the partial derivative of the loss function for each parameter must be calculated. After calculating the partial derivative of each parameter, the parameters can be updated and the loss convergence can be determined iteratively.

Construction of GRU Model for Medium-and Long-term SST Prediction
On the basis of the GRU network structure, we build a six-layer neural network model, which includes one input layer, three GRU layers, and two dense layers. Figure 3 displays the predictive model framework structure. The input data are the time series of the SST. In this experiment, the time series of monthly and quarterly scales are used to verify the model, and the amount of data fed into the network at one time is determined by the length of the set learning sequence and the number of batch trainings. On the basis of the variation law of SST, the lengths of learning sequences used in monthly and quarterly trainings are set to 12 and 4, respectively. The number of neurons in each layer of the three GRU layers is 100, and the numbers of neurons in the two dense layers are 10 and 1. The dense layers use the sigmoid and linear activation functions, respectively, to optimize the output. The dropout with 0.2 is used behind the first and third layers of the GRU neural network layer.
into the network at one time is determined by the length of the set learning sequence and the number of batch trainings. On the basis of the variation law of SST, the lengths of learning sequences used in monthly and quarterly trainings are set to 12 and 4, respectively. The number of neurons in each layer of the three GRU layers is 100, and the numbers of neurons in the two dense layers are 10 and 1. The dense layers use the sigmoid and linear activation functions, respectively, to optimize the output. The dropout with 0.2 is used behind the first and third layers of the GRU neural network layer.

Figure 3.
Model structure used in this study for medium and long-term SST prediction.  Figure 3. Model structure used in this study for medium and long-term SST prediction.

Data Preprocessing
The 38-year history of SST observation data is sorted on the basis of two different scales: monthly and quarterly. They are fed into the established GRU SST medium and long-term prediction models. The monthly average data refer to the average daily SST data for each month. From January 1982 to December 2019, 456 items of monthly time series data are used. The SST in spring is defined as the average of three months (March, April, and May); summer is the average of June, July, and August; autumn is the average of September, October, and November; and winter is the average of December and January and February of the following year. A total of 152 items of quarterly time series data are used. Normalizing the SST time series data of each site is beneficial for accelerating network convergence and preventing under-fitting during network training.
where x norm is the result after normalization, x is the SST value before normalization, and x max and x min are the maximum and minimum SST values in all time-series data, respectively. The divisions of training, testing and validating datasets are different according to the predicted length. However, this generally complies with the following equation: L all = L training + L testing + L validating = L training + L testing + L prediction (9) where L all is data for all series lengths. L training , L testing , and L validating represent the lengths of training, testing, and validating data, respectively. It should be noted that the length of the validating dataset is our prediction length; that is, L validating = L prediction . Meanwhile, the values of the training data and testing data are 85% and 15%, respectively.

Experiment Setup
The established neural network model is implemented by Keras under the background of Tensorflow 1.6 (GPU). The GPU is an NVIDIA GTX 1080 graphics card with 8 GB video memory.
Three evaluation indexes, namely the root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (r), are used for the SST prediction results in this study, and they are defined as follows: where y i pre is the predicted SST value, y i true is the true SST value, y pre is the mean value of the predicted SST value, and y true is the mean value of the true SST value.
On the basis of the annual periodic change rule of SST, the time window length of monthly data is set as 12, and the time window length of quarterly data is set as 4. By using monthly data, we make different predictions for the last 4, 6, 12, 18, and 24 months. By using quarterly data, we make different predictions for the last 2, 4, 6, and 8 quarters.
The monthly and quarterly data used in this paper are shown in the Figure 4 below.  Quarterly data used in this paper.

Results of Using Different Parameters
In the network model, steps_per_epoch (steps) refers to the number of steps contained in an epoch, and each step is a batch of data input; epochs indicate each data traversal time. The proposed model is trained by reading data in time series. When all data are covered, the iteration stops. Therefore, epochs and steps in the network model cannot be simultaneously set to be too large. This study uses the monthly data of a P2 site to predict the SST for the next 12 months (January 2019 to December 2019) and explores the optimal parameters by setting different steps and epochs. Figure 5 shows the SST prediction results for 12 months under different epochs. The results show that epoch 6 cannot accurately predict the overall trend of SST, whereas the other four epochs can effectively predict the trend of SST. For epoch 6, the prediction performance of August is the worst of the entire year, and an unrealistic SST is predicted for winter months as well. Figure 6 presents the quantitative prediction accuracy. With the increase in epochs, the prediction accuracy of the model initially increases and then tends to stabilize with a slight decrease. For epoch 20, the prediction accuracy is the highest, with an RMSE of 1.273 °C , MAE of 1.077 °C , and r of 0.99.

Results of Using Different Parameters
In the network model, steps_per_epoch (steps) refers to the number of steps contained in an epoch, and each step is a batch of data input; epochs indicate each data traversal time. The proposed model is trained by reading data in time series. When all data are covered, the iteration stops. Therefore, epochs and steps in the network model cannot be simultaneously set to be too large. This study uses the monthly data of a P2 site to predict the SST for the next 12 months (January 2019 to December 2019) and explores the optimal parameters by setting different steps and epochs. Figure 5 shows the SST prediction results for 12 months under different epochs. The results show that epoch 6 cannot accurately predict the overall trend of SST, whereas the other four epochs can effectively predict the trend of SST. For epoch 6, the prediction performance of August is the worst of the entire year, and an unrealistic SST is predicted for winter months as well. Figure 6 presents the quantitative prediction accuracy. With the increase in epochs, the prediction accuracy of the model initially increases and then tends to stabilize with a slight decrease. For epoch 20, the prediction accuracy is the highest, with an RMSE of 1.273 • C, MAE of 1.077 • C, and r of 0.99.
large. This study uses the monthly data of a P2 site to predict the SST for the next 12 months (January 2019 to December 2019) and explores the optimal parameters by setting different steps and epochs. Figure 5 shows the SST prediction results for 12 months under different epochs. The results show that epoch 6 cannot accurately predict the overall trend of SST, whereas the other four epochs can effectively predict the trend of SST. For epoch 6, the prediction performance of August is the worst of the entire year, and an unrealistic SST is predicted for winter months as well. Figure 6 presents the quantitative prediction accuracy. With the increase in epochs, the prediction accuracy of the model initially increases and then tends to stabilize with a slight decrease. For epoch 20, the prediction accuracy is the highest, with an RMSE of 1.273 °C , MAE of 1.077 °C , and r of 0.99.

Monthly Data Prediction
The last 4, 6, 12, 18, and 24 months are predicted using the optimal parameters of steps and epochs, which are set as 15 and 20, respectively. The P2 site is used as an example to draw the prediction curves, as shown in Figure 9. The horizontal axis is the predicted monthly SST data. For example, the last 4 months refer to the 4 months from December 2019 to the historical months; i.e., the predicted months are September, October, November, and December 2019.

Monthly Data Prediction
The last 4, 6, 12, 18, and 24 months are predicted using the optimal parameters of steps and epochs, which are set as 15 and 20, respectively. The P2 site is used as an example to draw the prediction curves, as shown in Figure 9. The horizontal axis is the predicted monthly SST data. For example, the last 4 months refer to the 4 months from December 2019 to the historical months; i.e., the predicted months are September, October, November, and December 2019. In terms of the fitting degree, the proposed network model is superior to LSTM except for the prediction length of step 4. With the increase in prediction length, the advantage is gradually evident. Figure 10 illustrates the prediction accuracy of quantitative analysis. In terms of the fitting degree, the proposed network model is superior to LSTM except for the prediction length of step 4. With the increase in prediction length, the advantage is gradually evident. Figure 10 illustrates the prediction accuracy of quantitative analysis.  Figure 10 shows that the accuracy of the proposed model and LSTM tends to decline with the increase in prediction time. However, in comparison with LSTM, the SST prediction error based on the proposed model fluctuates and exhibits a more stable accuracy. The r values of the proposed model using different prediction lengths are all above 0.98; in contrast, when using LSTM, r is lower than 0.88 when the prediction runs for 24 months. The accuracy of LSTM prediction is slightly higher than that of the proposed model when the prediction length is 4 months, with an RMSE of 0.325 °C and MAE of 0.251 °C. However, when the prediction length is increased, the accuracy of the SST prediction model based on the proposed model is significantly higher than that of LSTM, especially for 18 and 24 months, and the RMSE is 1.129 °C and MAE is 1.275 °C .
However, as shown in Figure 11, we can see from the quantitative prediction results of each month that the LSTM and the proposed model have a large prediction error in May of each year. The MAEs of the proposed model and LSTM are 2.641 °C and 4.401 °C for the first May (fifth month) and 4.159 °C and 7.030 °C for the second May (17th month), respectively. The possible reason for this is that April to May is the period with the largest temperature change in the entire year, and the prediction model conservatively predicts the SST. In terms of the monthly prediction accuracy of Figure 11, the maximum error occurs in the 17th month, with an MAE of 4.2 °C by the proposed model and 7.0 °C by LSTM. Figure 11. Twenty-four-month prediction accuracy.
To test the stability of the SST prediction neural network model based on the proposed model, we conducted a quantitative evaluation of the prediction results of six sites with different time lengths. Table 2 presents the results.   Figure 10 shows that the accuracy of the proposed model and LSTM tends to decline with the increase in prediction time. However, in comparison with LSTM, the SST prediction error based on the proposed model fluctuates and exhibits a more stable accuracy. The r values of the proposed model using different prediction lengths are all above 0.98; in contrast, when using LSTM, r is lower than 0.88 when the prediction runs for 24 months. The accuracy of LSTM prediction is slightly higher than that of the proposed model when the prediction length is 4 months, with an RMSE of 0.325 • C and MAE of 0.251 • C. However, when the prediction length is increased, the accuracy of the SST prediction model based on the proposed model is significantly higher than that of LSTM, especially for 18 and 24 months, and the RMSE is 1.129 • C and MAE is 1.275 • C.
However, as shown in Figure 11, we can see from the quantitative prediction results of each month that the LSTM and the proposed model have a large prediction error in May of each year. The MAEs of the proposed model and LSTM are 2.641 • C and 4.401 • C for the first May (fifth month) and 4.159 • C and 7.030 • C for the second May (17th month), respectively. The possible reason for this is that April to May is the period with the largest temperature change in the entire year, and the prediction model conservatively predicts the SST. In terms of the monthly prediction accuracy of Figure 11 Figure 10 shows that the accuracy of the proposed model and LSTM tends to decline with the increase in prediction time. However, in comparison with LSTM, the SST prediction error based on the proposed model fluctuates and exhibits a more stable accuracy. The r values of the proposed model using different prediction lengths are all above 0.98; in contrast, when using LSTM, r is lower than 0.88 when the prediction runs for 24 months. The accuracy of LSTM prediction is slightly higher than that of the proposed model when the prediction length is 4 months, with an RMSE of 0.325 °C and MAE of 0.251 °C. However, when the prediction length is increased, the accuracy of the SST prediction model based on the proposed model is significantly higher than that of LSTM, especially for 18 and 24 months, and the RMSE is 1.129 °C and MAE is 1.275 °C .
However, as shown in Figure 11, we can see from the quantitative prediction results of each month that the LSTM and the proposed model have a large prediction error in May of each year. The MAEs of the proposed model and LSTM are 2.641 °C and 4.401 °C for the first May (fifth month) and 4.159 °C and 7.030 °C for the second May (17th month), respectively. The possible reason for this is that April to May is the period with the largest temperature change in the entire year, and the prediction model conservatively predicts the SST. In terms of the monthly prediction accuracy of Figure 11, the maximum error occurs in the 17th month, with an MAE of 4.2 °C by the proposed model and 7.0 °C by LSTM. Figure 11. Twenty-four-month prediction accuracy.
To test the stability of the SST prediction neural network model based on the proposed model, we conducted a quantitative evaluation of the prediction results of six sites with different time lengths. Table 2 presents the results.  To test the stability of the SST prediction neural network model based on the proposed model, we conducted a quantitative evaluation of the prediction results of six sites with different time lengths. Table 2 presents the results. A comparison of the prediction results of the six sites indicates that, except for P3, LSTM has a slight advantage over the proposed model in predicting the SST in a short time; for example, in the 4 month prediction. The maximum differences of RMSE and MAE between LSTM and the proposed model occurred at P6, at 0.924 • C and 0.906 • C, respectively. On the whole, in the short prediction length (4 month prediction), the mean differences of RMSE and MAE between LSTM and the proposed model are 0.540 • C and 0.549 • C, respectively, and LSTM is better than the proposed method in terms of prediction performance. However, with an increased prediction time, the SST prediction results of the proposed model are significantly better than those of LSTM. The mean differences of RMSE and MAE between the proposed model and LSTM are 1.292 • C and 0.701 • C, respectively, and the proposed method is better than LSTM in terms of prediction performance. Moreover, the monthly SST data prediction based on the proposed method is more robust and accurate.

Quarterly Data Prediction
A total of 152 items of quarterly data are used in every site. Different time lengths, such as the last 2, 4, 6, and 8 quarters, are predicted on the basis of the model trained by quarterly data. P2 site data are used as an example to draw the prediction curves, as shown in Figure 12. A comparison of the prediction results of the six sites indicates that, except for P3, LSTM has a slight advantage over the proposed model in predicting the SST in a short time; for example, in the 4 month prediction. The maximum differences of RMSE and MAE between LSTM and the proposed model occurred at P6, at 0.924 °C and 0.906 °C , respectively. On the whole, in the short prediction length (4 month prediction), the mean differences of RMSE and MAE between LSTM and the proposed model are 0.540 °C and 0.549 °C , respectively, and LSTM is better than the proposed method in terms of prediction performance. However, with an increased prediction time, the SST prediction results of the proposed model are significantly better than those of LSTM. The mean differences of RMSE and MAE between the proposed model and LSTM are 1.292 °C and 0.701 °C , respectively, and the proposed method is better than LSTM in terms of prediction performance. Moreover, the monthly SST data prediction based on the proposed method is more robust and accurate.

Quarterly Data Prediction
A total of 152 items of quarterly data are used in every site. Different time lengths, such as the last 2, 4, 6, and 8 quarters, are predicted on the basis of the model trained by quarterly data. P2 site data are used as an example to draw the prediction curves, as shown in Figure 12. From the perspective of fitting degree, although the proposed model and LSTM can effectively predict the trend of SST, the SST prediction model based on the proposed model is superior to the LSTM regardless of the prediction length. When the prediction length is small, such as over two quarters, the difference of prediction accuracy between the two models is small, at about 0.3 °C . Overall, the SST prediction error of quarterly data is within 2 °C based on the proposed method; however, the prediction error based on LSTM can reach 3.5 °C . With the increase of prediction length, the GRU model is relatively stable, while LSTM fluctuates greatly, especially when the prediction length is 4. By comparing the curves under the four prediction lengths, the prediction results of the two prediction models are slightly lower than the true SST value. Thus, both models have the characteristics of conservative prediction. Figure 13 presents the prediction accuracy of the P2 site's quarterly data. From the perspective of fitting degree, although the proposed model and LSTM can effectively predict the trend of SST, the SST prediction model based on the proposed model is superior to the LSTM regardless of the prediction length. When the prediction length is small, such as over two quarters, the difference of prediction accuracy between the two models is small, at about 0.3 • C. Overall, the SST prediction error of quarterly data is within 2 • C based on the proposed method; however, the prediction error based on LSTM can reach 3.5 • C. With the increase of prediction length, the GRU model is relatively stable, while LSTM fluctuates greatly, especially when the prediction length is 4. By comparing the curves under the four prediction lengths, the prediction results of the two prediction models are slightly lower than the true SST value. Thus, both models have the characteristics of conservative prediction. Figure 13 presents the prediction accuracy of the P2 site's quarterly data. As shown in Figure 13, with the increase in prediction time, the error based on the proposed prediction model does not present a significant increasing trend. The proposed model exhibits a further stable performance with high prediction accuracy. The RMSE and MAE are both within 2 °C , and r is above 0.98. Table 3 displays the quantitative results of prediction accuracy for different quarterly lengths for the six sites. In contrast to the monthly data, quarterly SST prediction results based on the proposed model indicate better prediction accuracy than those based on LSTM, even with a short-term prediction length. The maximum differences of RMSE and MAE between LSTM and proposed model occurred at P2, at 1.095 °C and 1.015 °C , respectively. The minimum is P5, and the differences of RMSE and MAE are 0.182 °C and 0.122 °C , respectively. The mean MAEs of the six sites for a predicting length of 2, 4, 6, and 8 quarters based on the proposed method are 1.422 °C , 1.675°C , 1.246°C , and 1.536 °C respectively. For LSTM, these values are 1.801 °C, 1.924 °C, 1.537 °C, and 1.843 °C, respectively. Moreover, the quarterly SST data prediction based on the proposed method is more robust and accurate.

Conclusions
In this study, a neural network model based on GRU with fully connected layers is designed to predict short and medium-term SST. By using the Bohai Sea, characterized by a large annual temperature difference, as the research area, two different time scales of monthly and quarterly SST As shown in Figure 13, with the increase in prediction time, the error based on the proposed prediction model does not present a significant increasing trend. The proposed model exhibits a further stable performance with high prediction accuracy. The RMSE and MAE are both within 2 • C, and r is above 0.98. Table 3 displays the quantitative results of prediction accuracy for different quarterly lengths for the six sites. In contrast to the monthly data, quarterly SST prediction results based on the proposed model indicate better prediction accuracy than those based on LSTM, even with a short-term prediction length. The maximum differences of RMSE and MAE between LSTM and proposed model occurred at P2, at 1.095 • C and 1.015 • C, respectively. The minimum is P5, and the differences of RMSE and MAE are 0.182 • C and 0.122 • C, respectively. The mean MAEs of the six sites for a predicting length of 2, 4, 6, and 8 quarters based on the proposed method are 1.422 • C, 1.675 • C, 1.246 • C, and 1.536 • C respectively. For LSTM, these values are 1.801 • C, 1.924 • C, 1.537 • C, and 1.843 • C, respectively. Moreover, the quarterly SST data prediction based on the proposed method is more robust and accurate.

Conclusions
In this study, a neural network model based on GRU with fully connected layers is designed to predict short and medium-term SST. By using the Bohai Sea, characterized by a large annual temperature difference, as the research area, two different time scales of monthly and quarterly SST data are used to verify the practicability and stability of the proposed model. The main conclusions are as follows: (1) The designed SST prediction model based on GRU can efficiently fit the trend of the real SST and has high reliability. Additionally, the proposed model in this paper has the characteristics of a conservative estimation for SST prediction; that is, the predicted value of SST is smaller than the real value. Furthermore, LSTM experiences the same problem as the proposed method in this paper. In future, the multi-source physical ocean parameters of the GRU network can be investigated to construct a physical prediction model of SST. Besides, there may be local outliers on a smaller spatial scale, so the GRU SST prediction method considering the spatial relationship may effectively improve the fault tolerance and prediction accuracy of SST.