A Software Reliability Model Considering the Syntax Error in Uncertainty Environment, Optimal Release Time, and Sensitivity Analysis

: The goal set by software developers is to develop high quality and reliable software products. During the past decades, software has become complex, and thus, it is difﬁcult to develop stable software products. Software failures often cause serious social or economic losses, and therefore, software reliability is considered important. Software reliability growth models (SRGMs) have been used to estimate software reliability. In this work, we introduce a new software reliability model and compare it with several non-homogeneous Poisson process (NHPP) models. In addition, we compare the goodness of ﬁt for existing SRGMs using actual data sets based on eight criteria. The results allow us to determine which model is optimal.


Introduction
The basic goal set by software developers is to develop high quality software products that are stable and reliable. As technology has advanced, consumers demand more functions. For this reason, the software structure has become more complex during the last few decades, which makes it difficult to produce software with high quality and stable reliability [1]. Software reliability growth models (SRGMs) have been used by researchers to estimate software reliability. The goodness of fit is decided by common criteria, which will be discussed in Section 3. Many SRGMs have been developed during the past decades. Most SRGMs are based on a non-homogeneous Poisson process (NHPP). These models assume that the total number of failures follows m(t), which is a mean value function based on a NHPP. SRGMs are distinguished by its unique function, m(t). This m(t) reflects various environments through the assumptions of the parameters. After developing a software reliability model, we fit the model to the actual data and estimate its goodness of fit. If the model that best fits the actual data is determined, it is possible to estimate the optimal release time and the minimum expected cost of development, thereby establishing a release policy. Thus, it is crucial to consider not only the development of a model that reflects diverse environmental factors but also one that presents the best goodness of fit for actual data sets.
Goel-Okumoto [2] developed an exponential model that has been extended to develop SRGMs as a basic framework. Yamada [3,4] proposed the inflection S-shaped NHPP model and the delayed S-shaped NHPP model, which incorporates the testing effort. Quadri et al. [5] and Ahmad et al. [6] extended the SRGM with the exponentiated Weibull to consider testing efforts. Pham et al. [7] proposed the Pham-Zhang Model that has the Inflection S-shaped model as fault detection function. Pham  on the factors of the development environment. Teng et al. [9] dealt with software reliability models with parameters that indicated the random field environments. Pham [10] discussed a new model that incorporates the uncertainty of the system fault detection rate per unit of time subjected to operating environments. Inoue et al. [11] conducted software reliability modeling considering the uncertainty of the testing environment. Li et al. [12] performed new testing coverage modeling that considers not only the error generation but also the fault removal efficiency based on NHPP. Song et al. [13][14][15] considered operating environments. They applied random variable to the software reliability model, which represent the uncertainty in the operating environment. Zhu et al. [16] defined two type of software faults for considering software fault dependency and imperfect fault removal. Zhu et al. [17] described a theoretic software reliability model follows gamma distribution incorporating the fault detection process is a stochastic process due to the randomness caused by the environmental factors. As shown above, many software reliability models have been proposed. In addition, several papers have been suggested from the statistical method perspective. Zeephongsekul et al. [18] applied the maximum-likelihood estimation of parameters. Candini et al. [19] referred to a Bayesian Monte Carlo method to estimate small failure probabilities with uncertainties. Meta learning and deep learning have been brought to attention recently. Caiuta et al. [20] applied meta-learning algorithms in software reliability models. Tamura et al. [21] performed simulations to select the optimal software reliability model based on deep learning. Tamura et al. [22] and Wang et al. [23] studied the prediction of the number of software failures based on a deep learning model. Kim et al. [24] explained application of the software reliability model to increase the software reliability, and introduced not only some analytical methods but also the prediction and estimation results.
In this paper, we propose a new NHPP SRGM based on the Weibull distribution, which considers testing time with syntax error. Then, we discuss optimal release time and sensitivity analysis using the proposed model. In Section 2, the proposed model mean value function with common NHPP SRGMs is presented. We show eight criteria to estimate the goodness of fit in Section 3. Section 4 discusses the best model by various criteria using actual data sets. Section 5 examines software release policy and Section 6 deals with sensitivity analysis of parameters affecting the release policy. Finally, conclusions are presented in Section 7 and future research topics are suggested in Section 8.

Non Homogeneous Poisson Process Model
General software reliability models follow the NHPP as follows; where m(t) is mean value function, which is expected number of failures detected by testing time t. It can be written as: where λ(t) is intensity function of failure. Most of NHPP SRGM is expressed using the differential equation as: Through solution of Equation (3), that make to find unique m(t) using a(t) and b(t). Also, this process can be applied to assume for software testing.

Proposed Software Reliability Model
The proposed model is also based on the NHPP. The mean value functions of the proposed model and the testing coverage model are similar, but there are some differences.
The important point is the proposed model consider uncertainty environment. The actual testing environment and the theoretical test environment of the software are very different. In an actual testing environment, the developer may have unexpected variables like syntax error, which is considered in proposed model. Figure 1 shows infrastructure of time testing. Generalized testing time of the SRGMs is 0 < t. However, if the software that will perform the test has a syntax error, it cannot be compiled and tested. To continue testing it, the code must be modified to remove the syntax errors. Thus, the actual testing time differs from the theoretical testing time. This is a point that the proposed model considered.
Testing time of proposed model considers the syntax error, which is an error in the syntax or tokens written in a programming language. So, the testing time of proposed model is t 0 < t.

Proposed Software Reliability Model
The proposed model is also based on the NHPP. The mean value functions of the proposed model and the testing coverage model are similar, but there are some differences.
The important point is the proposed model consider uncertainty environment. The actual testing environment and the theoretical test environment of the software are very different. In an actual testing environment, the developer may have unexpected variables like syntax error, which is considered in proposed model. Figure 1 shows infrastructure of time testing. Generalized testing time of the SRGMs is 0 < t. However, if the software that will perform the test has a syntax error, it cannot be compiled and tested. To continue testing it, the code must be modified to remove the syntax errors. Thus, the actual testing time differs from the theoretical testing time. This is a point that the proposed model considered. Testing time of proposed model considers the syntax error, which is an error in the syntax or tokens written in a programming language. So, the testing time of proposed model is t t. The mean value function m t of the proposed model is based on the Weibull distribution model [14], which is the expected number of software failures by time t and is defined as follows: where b t is the failure detection rate function, N is the expected number of failures which exists in the software before testing phase. η is a random variable that represents the uncertainty of the system fault detection rate in the operating environments with a probability density function: where t is the time when debugging starts after modifying the code causing syntax errors. b t of proposed model follows a Weibull failure detection rate function, that can be written as: b t a bt , a, b 0 where a is a scale parameter and b is a shape parameter, that are known. Then, we find solution of m t for proposed model with syntax error by substituting b t above into Equation (6). Finally, the mean value function is defined by: Table 1 shows the mean value function m t of using NHPP SRGMs for existing and wellknown software reliability models. The mean value function m(t) of the proposed model is based on the Weibull distribution model [14], which is the expected number of software failures by time t and is defined as follows: where b(t) is the failure detection rate function, N is the expected number of failures which exists in the software before testing phase. η is a random variable that represents the uncertainty of the system fault detection rate in the operating environments with a probability density function: where t 0 is the time when debugging starts after modifying the code causing syntax errors. b(t) of proposed model follows a Weibull failure detection rate function, that can be written as: where a is a scale parameter and b is a shape parameter, that are known. Then, we find solution of m(t) for proposed model with syntax error by substituting b(t) above into Equation (6). Finally, the mean value function is defined by: Table 1 shows the mean value function m(t) of using NHPP SRGMs for existing and well-known software reliability models.
Testing Coverage [30] New model

Numerical Example
To assess whether the proposed model well, we can compare with other NHPP SRGMs through estimation of criteria using actual data sets.

Criteria
In this section, we compare the NHPP SRGMs using the two data sets (in Section 3.2) and discuss the goodness of fit of the models. Eight criteria-MSE, PRR, PP, R 2 , AIC, SAE, PRV, and RMSPE-are used for the comparison of goodness of fit.
The first criterion is the mean squared error (MSE), which measures the distance of the model estimates from the actual data considering the number of observations and number N of parameters in the models. It is defined as follows: where y i is the total number of failures observed at time t i according to the real data and M(t i ) is the estimated cumulative number of failures at time t i for i = 1, 2 . . . , n. The second criterion, the predictive ratio risk (PRR) [31], measures the distance of the model estimates from the actual data against the model estimate. It is defined by: The third criterion, the predictive power (PP) [31], measures the distance of model actual data from the estimates against the actual data. It is defined by: Appl. Sci. 2018, 8, 1483

of 17
The fourth criterion is R-square (R 2 ) [32], which is used to examine the fitting power of the SRGMs, and it is the correlation index of the regression curve equation, which is expressed as follows: The fifth criterion is Akaike's information criteria (AIC) [33], which is used to compare for maximization of the likelihood function. It can be considered as an approximate distance from the true probability model: where N is the degree of freedom. L and lnL are given as follows: The sixth criterion is the sum of absolute error (SAE) [14], which measures the distance between the prediction for the number of failures and the observed data. SAE is defined by: The seventh criterion is the predicted relative variation (PRV), which is called variance [34][35][36]. It means the standard deviation of the prediction bias and is defined as: where bias is given as follows: The last criterion is the root mean square prediction error (RMSPE). It can estimate the closeness with which the model predicts the observation [34][35][36]: For all these criteria, excluding R 2 , a smaller value means better goodness of fit of the model. On the contrary, the larger the value R 2 is, the better the goodness of fit that it indicates.

Data Sets Information
Tables 2 and 3 present the cumulative failures of each data set [37]. Data set 1, which was collected from software based on code from a product with enhancements provided with a new hardware platform, was observed during 13 months in the field. The total failures were collected for 58,633 system-days. In this work, t i is used as cumulative real time, and thus, t 1 , t 2 , . . . , t 13 = {1249, 4721, . . . 58, 633}. Data set 2 includes test data that was collected from a product that has a high rate of wireless data service, during a combination of feature testing and load testing. When t i = {1, 2, . . . , 19}, it shows 19 observations of the field failure process and 22 total failures. Detailed information can be seen in [37].

Comparison of Goodness of Fit
We estimate the parameter of all nine models at t 1 , t 2 , . . . , t 13 = {1249, 4721, . . . 58, 633} from data set 1. The parameters of all nine models were also estimated at t 1 , t 2 , . . . , t 19 = {1, 2, . . . , 19} from data set 2. Then, we use the least squares estimation (LSE) method with Matlab and R. Table 4 summarizes the parameters estimated for all the SRGMs in Table 1. Tables 5 and 6 compare all the models using the criteria presented in Section 2, for both data sets. For the reasons mentioned above, except for R 2 , to have smaller values in most of the criteria is considered better. On the contrary, the larger the value of R 2 is, the better. Table 4. Estimation of parameters for both data sets.

Model
Data Set 1 Data Set 2  In Table 5, the values for the new model of MSE, PRR, PP, SAE, PRV, and RMSPE are the lowest in comparison with those of the other models. Furthermore, the R 2 value is 0.9937, which is the largest. Although the value of AIC is not the lowest among the values of all the SRGMs, we can safely say that the proposed model is sufficient to fit better than the other SRGMs for data set 1.
In Table 6, in the values of the new model, it can also be seen that its value of AIC is not the smallest among those of all the SRGMs. However, the values of MSE, PRR, PP, SAE, PRV, and RMSPE for the proposed model are significant. In general, when all the values are considered, the new model proposed in this work is the optimal among all the SRGMs for data set 2.

Confidence Interval
We also estimate the confidence interval [31] of the new proposed model for the data sets in Table 7, which is defined as: where z α/2 is the 100(1 − α) percentile of the standard normal distribution. Figures 2 and 3 show the confidence interval of the proposed model for both data sets. Figures 4 and 5 show the mean value functions of all the SRGMs for both data sets.

Optimal Release Time and Cost
In this section, we address the optimal release policy from the point of the view of time and cost. It can be helpful to find the optimal software release time T * that has the minimum expected total software development cost. Although the optimal release policy has been studied for decades, it is

Optimal Release Time and Cost
In this section, we address the optimal release policy from the point of the view of time and cost. It can be helpful to find the optimal software release time T * that has the minimum expected total

Optimal Release Time and Cost
In this section, we address the optimal release policy from the point of the view of time and cost. It can be helpful to find the optimal software release time (T * ) that has the minimum expected total software development cost. Although the optimal release policy has been studied for decades, it is still a sensitive problem. For example, if the testing period is long, the software can be reliable, but the software development cost is increased. On the contrary, if the testing period is short, the product can be unreliable. Moreover, risk costs, such as the follow-up service cost, can be increased. Thus, it is crucial to find a balanced time point between release time and minimum cost. Figure 6 shows the system development lifecycle considered in the following cost model, which includes the testing phase before release time T, the testing environment period, the warranty period, and the operational life in the actual field environment (that is usually quite different from the testing environment) [33]. The expected total software development cost C T depends on various factors and it can be expressed as: where C is the set-up cost for the test, C T is the cost of test, C m T μ is the expected cost to remove all failures detected by time T during the testing period, C 1 R x|T is the penalty cost of removing failures that occur after the system release time T, and C m T T m T μ is the expected cost to remove all of the failures that are detected during the warranty period T, T T . Furthermore, it requires the assumption that the cost required to remove errors during the operating period is higher than that during the testing period and the time that is needed is much longer. Finally, the expected total software cost can be calculated using the m t function consisting of the estimated parameters. The primary purpose of the equation is to find the optimal software release time T * minimizing the expected total cost.

Results of Optimal Release Time and Cost
We apply the mean value function m t obtained in Section 4 to the defined cost model C T () and consider coefficients in the cost model for the baseline case. The coefficients of the baseline case are listed in Table 8.  500 10 50 5000 500 10 0.1 0.1 10 Table 9 presents the values of release time and expected total cost under certain conditions derived from variation of the cost coefficients and warranty period T . Similarly, in Tables 10-13 appear the values for change of parameters of the C T function. Figures 7-11 illustrate Tables 9-13 when T is 10. In addition, the baseline case is drawn as a red line.  The expected total software development cost C(T) depends on various factors and it can be expressed as: where C 0 is the set-up cost for the test, C 1 T is the cost of test, C 2 m(T)µ y is the expected cost to remove all failures detected by time T during the testing period, C 3 (1 − R(x|T)) is the penalty cost of removing failures that occur after the system release time T, and C 4 [m(T + T W ) − m(T)]µ W is the expected cost to remove all of the failures that are detected during the warranty period [T, T + T W ]. Furthermore, it requires the assumption that the cost required to remove errors during the operating period is higher than that during the testing period and the time that is needed is much longer. Finally, the expected total software cost can be calculated using the m(t) function consisting of the estimated parameters. The primary purpose of the equation is to find the optimal software release time (T * ) minimizing the expected total cost.

Results of Optimal Release Time and Cost
We apply the mean value function m(t) obtained in Section 4 to the defined cost model C(T) and consider coefficients in the cost model for the baseline case. The coefficients of the baseline case are listed in Table 8.  Table 9 presents the values of release time and expected total cost under certain conditions derived from variation of the cost coefficients and warranty period T w . Similarly, in Tables 10-13 appear the values for change of parameters of the C(T) function. Figures 7-11 illustrate Tables 9-13 when T w is 10. In addition, the baseline case is drawn as a red line. Table 9. Optimal release time T * and total cost C(T) for T w (Case 1).   Table 11. Optimal release time T * and total cost C(T) for C 2 (Case 3).    Table 13. Optimal release time T * and total cost C(T) for C 4 (Case 5). In Table 9, T * and C(T) increase when the warranty period T w increases. Further, C(T) increases when C 0 increases in Table 10. In Tables 11-13, T * and C(T) increase when C 1 , C 3 , and C 4 increase. As a result, C 0 does not affect T * , but it affects C(T). C 1 affects T * , and has some effect on C(T). C 3 and C 4 have a significant effect on T * and C(T). In Table 9, T * and C T increase when the warranty period T increases. Further, C T increases when C increases in Table 10. In Tables 11-13, T * and C T increase when C , C , and C increase. As a result, C does not affect T * , but it affects C T . C affects T * , and has some effect on C T . C and C have a significant effect on T * and C T .  (Table 9).    (Table 10).  (Table 11). Figure 10. T * and C T for T 10 (Table 12).   Figure 8. T * and C(T) for T w = 10 (Table 10).  (Table 10).  (Table 11). Figure 10. T * and C T for T 10 (Table 12).   Figure 9. T * and C(T) for T w = 10 (Table 11). Figure 9. T * and C T for T 10 (Table 11). Figure 10. T * and C T for T 10 (Table 12). Figure 11. T * and C T for T 10 (Table 13). C4=100 C4=300 C4=500 C4=1000 Figure 10. T * and C(T) for T w = 10 (Table 12). Figure 9. T * and C T for T 10 (Table 11). Figure 10. T * and C T for T 10 (Table 12). Figure 11. T * and C T for T 10 (Table 13). C4=100 C4=300 C4=500 C4=1000 Figure 11. T * and C(T) for T w = 10 (Table 13).

Sensitivity Analysis of Parameters
We conduct a sensitivity analysis of the parameters for the optimal release [38]. S T is defined as the relative change of release time when θ is changed by 100p%, where θ is a parameter of the mean value function, m(t). S T can be expressed as: Table 14 and Figure 12 reveal how much each parameter is changed for release time T. From Table 14, b is the most sensitive parameter and T 0 is the most insensitive parameter. Moreover, the values in the sensitivity analysis of parameters α and β are almost equal because the software reliability function has the same amount of variation when parameters α and β vary. In brief, the optimal release time (T * ) increases with the decrease in a and b. Furthermore, T * decreases with the increase in α, β, T 0 , and N.

Results of Sensitivity Analysis
If the software is released too early, more resources will be required, such as risk cost, update cost, and human resources from the users and the development company. On the contrary, if the software is released too late, it will be necessary to assume higher cost of development. Therefore, the overestimation of α, β, T 0 , and N and the underestimation of a and b, which can lead to misestimations, such as underestimation of the optimal release time (T * ), should be avoided. the increase in α, β, T , and N. If the software is released too early, more resources will be required, such as risk cost, update cost, and human resources from the users and the development company. On the contrary, if the software is released too late, it will be necessary to assume higher cost of development. Therefore, the overestimation of α, β, T , and N and the underestimation of a and b, which can lead to misestimations, such as underestimation of the optimal release time (T * ), should be avoided.

Conclusions
We suggested a new SRGM considering the start of actual debugging time for software affected by syntax errors. To compare it with several existing NHPP SRGMs, we applied two real data sets. Parameters for all models were estimated by the LSE method. In addition, common criteria were used to compare the goodness of fit in order to discuss the optimal model (Tables 5 and 6). The values of the criteria of the proposed model are better than those of the other SRGMs, as listed in Table 1. The values of AIC for both data sets were not the lowest compared to those of the other models, but the proposed model, from the point of view of the other criteria, was the most significant. Then, we applied a cost model to the new proposed SRGM, and found out how much each parameter is changed regarding coefficients, release time, and cost.
In summary, the proposed model fits better than all the other models for both data sets. Through in Section 5, we found out variance of C 3 and C 4 affecting the field environment are greater than other coefficients. In order to establish the optimal release policy, it is necessary to subdivide the coefficients related to the field environment. As discussed in Section 6, b is the most sensitive parameter and T 0 is the most insensitive parameter (in Table 14). Overestimation of α, β, T 0 , and N and underestimation of a and b have to be avoided because they can lead to misestimation of the optimal release time.
Recently, many researchers have studied software reliability model considering software development environment. Likewise, we studied software reliability model considering uncertainty of software development environment like syntax error. Then, we provided optimal release policies having minimized the total development cost for various environments. Therefore, when the other data sets and various environments are given, the proposed model is beneficial.

Future Research
A further direction of this study will be to find diverse and more recent data sets to prove clearly the goodness of fit of the new model. In addition, we estimated parameters using LSE method. Therefore, we need to apply MLE or Bayesian inference to estimate parameters, and also need to consider the change-point.