Prediction of Weights during Growth Stages of Onion Using Agricultural Data Analysis Method

: In this study, we propose a new agricultural data analysis method that can predict the weight during the growth stages of the ﬁeld onion using a functional regression model. We have used onion weight on growth stages as the response variable and six environmental factors such as average temperature, average ground temperature, rainfall, wind speed, sunshine, and humidity as the explanatory variables in the functional regression model. We then deﬁne a least minimum integral squared residual (LMISE) measure to obtain an estimate of the function regression coe ﬃ cient. In addition, a principal component regression analysis was applied to derive the estimates that minimize the deﬁned measures. Next, to evaluate the performance of the proposed model, data were collected, and the following results were identiﬁed through analyses of the collected data. First, through graphical and correlation analysis, the ground temperature, mean temperature, and humidity have a very signiﬁcant e ﬀ ect on the onion weights, but environmental factors such as wind speed, sunshine, and rainfall have a small negative e ﬀ ect on onion weights. Second, through functional regression analysis, we can determine that the ground temperature, sunshine, and precipitation have a signiﬁcant e ﬀ ect on onion growth and are essential in the goodness-of-ﬁt test. On the other hand, wind speed, mean temperature, and humidity did not signiﬁcantly a ﬀ ect onion growth. In conclusion, to promote onion growth, the appropriate ground temperature and amount of sunshine are essential, the rainfall and the humidity must be low, and the appropriate wind or mean temperature must be maintained.


Introduction
In general, onions produced in various parts of Korea are highly value-added vegetables, as they are used not only for various dishes but also as part of a healthy diet. Therefore, farmers who cultivate vegetables in the field are concerned with cultivation strategies that can improve the vegetable yields. In addition, the related agencies, such as the Rural Development Administration, devote much attention to developing onion cultivation techniques to address the farmers' needs [1].
However, the onion growth among vegetables is especially affected by the change of weather. In general, onions have the characteristic of being well grown in low temperature and dry weather. Because of this characteristic, if rainy weather prevails, onion yields drop significantly, whereas onion yields increase considerably in predominantly sunny weather. Therefore, if onion production is excessively increased, onion prices will drop, whereas prices will be excessively high if production is low. For this reason, maintaining an appropriate level of production in government offices and farms is vital.
Hence, we are going to examine how the production of onion grown in the field is affected by various weather conditions and environmental factors and try to develop the best farming strategy based on them. To get the right answers to these problems, we first examine both the mathematical theory and application of the functional regression model, which is the most suitable statistical model for analyzing time series data collected directly from the agricultural field. Second, on the basis of onion data collected from various regions, we examine the effects of various environmental factors and climatic conditions on the onion yields using the functional regression model. Finally, we would like to propose an optimal farming strategy that can maintain appropriate levels of the various environmental factors and weather.

Related Works
Yields of crops or vegetables are predicted using various statistical data analysis models, pattern recognition methods, and deep learning algorithms. First, Kamilaries and Prenafeta-Boldu [2] conducted a survey of more than 40 previous studies for artificial intelligence technologies applied to various agricultural and crop production problems. They looked at the specific agricultural problems under study, the particular models and systems used, and the data correction and pretreatment. They also reviewed the overall performance of each study. Furthermore, they compare the artificial intelligence method with other existing popular technologies, such as various machine learning methods and statistical prediction models. Second, Manikandan and Vethamoni [3], in their review paper, proposed how agricultural statistical techniques can be used to define the ordering of researches and to understand the elementary interactions among plant soil atmospheric elements. As a research tool, developing and applying models can contribute to more efficient and targeted research planning. Third, Bhange and colleagues [4] wrote a paper that suggests ways to help farmers choose the optimal conditions for their farm, depending on various environmental and soil factors such as temperature, humidity, water level, soil type, soil PH, fertilizer, and cultivated interval or season factors. They have carried out this prediction using a random forest classification machine learning algorithm. Fourth, Mythra and colleagues [5] presented a brief comparative study of several papers discussing various techniques used to determine crop yields. This paper determined the appropriate data processing model to realize high accuracy and prediction ability. Fifth, Sekkam and Poovammal [6] analyze environmental factors such as cultivated area, annual rainfall, and food price index that affect crop yields and relate the relationships to these factors. In this study, multiple regression analysis was used to analyze environmental factors and their impact on crop yields. Sixth, Abdelkhalik and colleagues [7] conducted field experiments on two farming stages in Spain to evaluate how effective controlled deficit different irrigation water facilities are for onion growth and total yields. They found that deficit irrigation strategies have reduced marketable yields. Hence, if enough water is available, full irrigation is recommended. Seventh, Maskey and colleagues [8] published a paper exploring the correlation of various environmental factors affecting the yields of strawberries grown in an open field. In addition, they use the principal component regression method, two pattern recognition techniques such as a one-layer neural network model, and the random forest classification method to evaluate yield prediction. The weather parameters used in this study are a combination of sensor data obtained in strawberry fields, weather data observed from a nearby weather station, and agricultural sources such as cooling time. Eighth, Rathod and Mishra [9] used both linear and nonlinear as well as parametric and nonparametric statistical models to predict the yields of Karnataka's mangoes and bananas. They proposed a hybrid model consisting of linear and nonlinear models. This model, consisting of a combination of autoregressive integrated moving average (ARIMA) and support vector machine regression (SVMR), performed better than other researchers' methods in terms of validity and verifiability of the model. Ninth, Villiers [10] has written a master's thesis on the topic of predicting the tomato yields from weather data using machine learning technology. In his master's thesis, he found that all crops and weather variables are beneficial for predicting tomato yield. Furthermore, for the most important prediction of the median, mean, and total yield density, he considered the daily average wind speed and the maximum daily relative humidity during the growing season of each crop.

Dataset
Here we are going to consider how to grow onions, which are popular among Koreans and used in various dishes, to improve onion growth. Generally, onions are sown in mid-August to mid-September, planted quite early in October to early November, and harvested between May and June of the following year. Figure 1 shows the onion growing step by step [11]. and total yield density, he considered the daily average wind speed and the maximum daily relative humidity during the growing season of each crop.

Dataset
Here we are going to consider how to grow onions, which are popular among Koreans and used in various dishes, to improve onion growth. Generally, onions are sown in mid-August to mid-September, planted quite early in October to early November, and harvested between May and June of the following year. Figure 1 shows the onion growing step by step [11]. In this study, we used datasets of the various environmental factors and the weights of onion during growth stages collected from farmers in several regions of Korea from March 2019 to June 2019. We used six factors such as mean wind speed, mean temperature, mean ground temperature, mean humidity, daily sunshine, and daily rainfall as the explanatory variables of the model, and the onion weight growth stage was used as a response variable. A description of each variable is given in Table 1 below. Daily rainfall mm We collected a total of 178 observations from 28 farms and used them in the experiment. During the observation period, environmental factor data were used from the meteorological office data observed by region, and the onion weight was obtained from the measurements by the staff of the RDA (Rural Development Administration). Of the data collected from 28 farms, a total of 63 observations, repeated 9 times in 7 farms with relatively good measurements were used in the experiment. Here, each environmental variable was measured for 7 days and repeated for 9 weeks. Onion weight was measured 9 times from 7 farms during the same period. Finally, the data of environmental factors consisted of data obtained from measuring six factors through nine repeated measurements each week for 7 days for 7 objects. Therefore, the environmental factors are given in In this study, we used datasets of the various environmental factors and the weights of onion during growth stages collected from farmers in several regions of Korea from March 2019 to June 2019. We used six factors such as mean wind speed, mean temperature, mean ground temperature, mean humidity, daily sunshine, and daily rainfall as the explanatory variables of the model, and the onion weight growth stage was used as a response variable. A description of each variable is given in Table 1 below. We collected a total of 178 observations from 28 farms and used them in the experiment. During the observation period, environmental factor data were used from the meteorological office data observed by region, and the onion weight was obtained from the measurements by the staff of the RDA (Rural Development Administration). Of the data collected from 28 farms, a total of 63 observations, repeated 9 times in 7 farms with relatively good measurements were used in the experiment. Here, each environmental variable was measured for 7 days and repeated for 9 weeks. Onion weight was measured 9 times from 7 farms during the same period. Finally, the data of environmental factors consisted of data obtained from measuring six factors through nine repeated measurements each week for 7 days for 7 objects. Therefore, the environmental factors are given in the form of a 64 × 35 matrix.
In addition, data on onion weights were obtained from nine replicate measurements of 7 farms. Thus, onion weight data is given in a matrix of 9 × 7. Tables 2 and 3 below show some part of the data on average wind speed among the six environmental factors as well as onion weight observed during the growth period. Table 2. Example of input data of environmental variables of growth period.

Method
We considered a functional sequence regression model with p functional covariates to determine how p environmental factors influence onion weights n over time [12][13][14]. This model is defined as follows: where n is the number of observations, p is the number of functional covariates, y i (t) is onion weight in time t, x ik (s) is the functional covariate, α(t) is the mean functional, β k (s, t) is the regression functional for the k-th covariate, and i (t) is a random error function. In this case, to derive the estimators of the function sequence parameter β k (s, t) of the proposed model, we use a criterion of the least minimum integral squared residual (LMISE) defined as: Here, we have applied the functional principal component analysis to obtain estimators of the functional regression parameters that can minimize the LMISE measure defined on above. In this case, we center the explanatory variables x ik and response variable y i as follows to remove the α(t)-intercept term of the model given in (1) above.
First, we express the x * ik s and y * i in terms of finite sum using the basic functions φ jk s and ψ l as follows: x and where φ and ψ are basis vector functions of size J and L, respectively. Furthermore, if we denote the coefficient matrix of these basis function vectors as C k and D, then we can write these expressions in the following matrix function form: Here, we can express the regression functional coefficient β k (s, t) as a double sum as expansion: where B k is a (J × L) matrix of coefficients b jlk , or, more compactly, as β k (s, t) = φ T k B k ψ. We define J φ k and J ψ as the matrices of inner products between the elements of the φ k and ψ bases, respectively. Then, we have the following expression: In addition, if we substitute, respectively, the sum expressions given in Equations (4) and (7) for x * ik and β k in Equation (1), we can obtain: If we denote D to be the matrix of coefficients of the basis expansion of the vector of predictors y * (corresponding to the matrix D for the vector y * ), we obtain the following matrix form for the estimated model: Therefore, we can get a matrix form for the integrated squared residual: and, finally, a criterion of the least minimum integral squared residual (LMISE) is given by a sum of quadratic forms in the unknown coefficient matrices B k .
Furthermore, we are going to consider the minimization of the criterion LMISE (B k ) given at Formula (12). In this case, if J φ k and J ψ are identity matrices, and the matrix B k will minimize the Formula (12) if and only if The matrix B k is easily found by using the singular value decomposition (SVD) of C k . Here, we have obtained C k = U∆ C k V T , where ∆ C k is a diagonal matrix with strictly positive diagonal elements and U and V have orthogonal columns. Then, and hence the Moore-Penrose g-inverse of C k T C k is V∆ −2 C k V T . Substituting it into (13) gives us the following equation: Therefore, if we substitute B k into β k (s, t) in Equation (7), we get the following functional estimates for functional regression coefficients: Finally, we can obtain the following predicted equation for response values:

Graphical Analysis
First, we plotted a line graph of the onion weights collected from seven farmers over nine weeks from transplantation to harvest. From Figure 2, we can see that the onion weight increases over time. Second, a two-dimensional scatter plot was plotted to graphically determine the relationship between the onion weights and the mean of six environmental factors during the cultivated interval. Figure 3 shows the relationship between the onion weights and six environmental factors. As a result of Figure 3, we can see first that the average temperature and average ground temperature have a high positive correlation with the onion weights. It is evident that humidity has a small positive correlation with onion weight. Finally, three environmental variables such as wind speed, sunshine, Second, a two-dimensional scatter plot was plotted to graphically determine the relationship between the onion weights and the mean of six environmental factors during the cultivated interval. Figure 3 shows the relationship between the onion weights and six environmental factors. As a result of Figure 3, we can see first that the average temperature and average ground temperature have a high positive correlation with the onion weights. It is evident that humidity has a small positive correlation with onion weight. Finally, three environmental variables such as wind speed, sunshine, and rainfall have no relation to onion weight. Second, a two-dimensional scatter plot was plotted to graphically determine the relationship between the onion weights and the mean of six environmental factors during the cultivated interval. Figure 3 shows the relationship between the onion weights and six environmental factors. As a result of Figure 3, we can see first that the average temperature and average ground temperature have a high positive correlation with the onion weights. It is evident that humidity has a small positive correlation with onion weight. Finally, three environmental variables such as wind speed, sunshine, and rainfall have no relation to onion weight.  Next, we calculated the correlation coefficient between each variable to statistically confirm the results derived so far. Table 4 shows the correlation coefficients between these variables. Given the results in Table 4, we can equally confirm the relationship between the variables discussed above.  Next, we calculated the correlation coefficient between each variable to statistically confirm the results derived so far. Table 4 shows the correlation coefficients between these variables. Given the results in Table 4, we can equally confirm the relationship between the variables discussed above.

Functional Data Analysis
We have applied a functional regression model with functional covariates and functional responses to see how the six environmental factors affect the onion weights over growing time. Here, to perform a functional data analysis on onion data, we have used the "fda" package in the R-software.
First, we conducted a functional data analysis to determine how six environmental variables influence the onion weights. Figure 4 shows the time series properties of the functional regression coefficients of each environmental factor during onion cultivation. In Figure 4, we first see that wind speed has a positive effect on the onion weights from the beginning to the middle, has a negative effect beyond the midpoint, and then has a positive effect again after the third quarter. Second, both the ground and mean temperature have a similarly positive effect in the early stage of onion weight growth, a negative effect at the midpoint, and a positive effect in the later growth period. Therefore, the ground temperature and average temperatures have a similar effect on onion radish growth, but the ground temperature has a relatively large magnitude in terms of influence. Third, humidity has a negative effect on onion weight throughout the growing season, but its impact is not significant. Fourth, the amount of sunshine has a positive effect early in the growth period of onions, a negative effect from the beginning to the middle, and a positive effect after the middle. Finally, rainfall has alternating positive and negative effects during onion growth, but its impact is not significant.  Next, we calculated the correlation coefficient between each variable to statistically confirm the results derived so far. Table 4 shows the correlation coefficients between these variables. Given the results in Table 4, we can equally confirm the relationship between the variables discussed above.    Second, we calculated the coefficients of determination ( ) representing the explanatory power for the onion weights when environmental variables were used individually and when they were used together. We also calculated the F-test statistic to test the goodness-of-fit of each environmental variable for onion weight. Table 5 shows both coefficients of determination, represents the explanatory power of the onion weight for each environmental variable and the F-statistics used in the goodness-of-fit test. From Table 5 we can see that our results are similar to those above. We can also observe that if all environmental factors are used, the value of is the highest and the value of Second, we calculated the coefficients of determination (R 2 ) representing the explanatory power for the onion weights when environmental variables were used individually and when they were used together. We also calculated the F-test statistic to test the goodness-of-fit of each environmental variable for onion weight. Table 5 shows both coefficients of determination, R 2 represents the explanatory power of the onion weight for each environmental variable and the F-statistics used in the goodness-of-fit test. From Table 5 we can see that our results are similar to those above. We can also observe that if all environmental factors are used, the value of R 2 is the highest and the value of F-statistic is the largest.
Additionally, individually, the value of R 2 and the F-statistic of ground temperature, sunshine, and rainfall is high. On the other hand, the value of R 2 and F-statistics of wind speed, mean temperature, and humidity are lower. Third, we graphically plot the actual observations of onion weights and the predicted values of each environmental variable to determine the predictive power of the six environmental variables. Figure 5 shows the relationship between actual observations and predicted values.
We also calculated a Root Mean Square Error (RMSE) to numerically measure the predictive power of each environmental variable. Table 6 shows the RMSE values for six environmental variables. From Figure 5 and Table 6, we can get the following results. First, all environmental variables are included in the model, and have the most predictive power. Second, among the individual environmental variables, ground temperature, sunshine, and rainfall exhibited high predictive power. Third, wind speed, mean temperature, and humidity showed the lowest predictive power.
To summarize, we discuss the following results. First, we first find that both the ground and mean temperature have a high positive correlation with the onions weights. It can also be seen that humidity has a small positive correlation with the onion weights. Finally, three of the environmental variables, such as wind speed, sunshine, and rainfall, have a small negative correlation with onion weight. Second, we can confirm that ground temperature, sunshine, and precipitation have a significant effect on onion growth and are very significant in the goodness-of-fit test. On the other hand, mean temperature, wind speed, and humidity did not significantly affect onion growth.
In conclusion, to promote onion growth, the appropriate ground temperature and amount of sunshine are essential, rainfall and humidity must be low, and appropriate wind or mean temperature must be maintained.

Conclusions
In the study, we applied a statistical functional regression model to investigate the relationship between various environmental factors and onion weights during the growing season. To solve this problem, we performed the following two tasks. In the first one, we identified the six most important environmental factors among those that affect onion weight during the growing season. In the second, we proposed an optimal cultivation strategy that could suggest how to manage the six identified environmental factors to maximize the onion weights.
From the analysis results so far, we could note the following facts. First, through the graphical and the correlation analysis, we can see that ground temperature, mean temperature, and humidity are positively correlated with onion weights, while wind speed, sunshine, and rainfall have a small negative correlation with the onion weights. Second, from the functional regression analysis for six environmental variables and onion weight, we note that the ground temperature, sunshine, and rainfall have a statistically significant effect on the onion weights, but other environmental factors such as wind speed, mean temperature, and humidity have little effect on the onion weights. In conclusion, to promote onion weights, the appropriate ground temperature, amount of sunshine, wind speed, and mean temperature must be maintained, and rainfall and the humidity must be low.
Future works should utilize functional data analysis to investigate how these environmental factors affect onion yields. In addition, an overall study is needed to understand how environmental and growth factors and weights and yields of onions affect each other.