Prediction of Prehypertenison and Hypertension Based on Anthropometry, Blood Parameters, and Spirometry

Hypertension and prehypertension are risk factors for cardiovascular diseases. However, the associations of both prehypertension and hypertension with anthropometry, blood parameters, and spirometry have not been investigated. The purpose of this study was to identify the risk factors for prehypertension and hypertension in middle-aged Korean adults and to study prediction models of prehypertension and hypertension combined with anthropometry, blood parameters, and spirometry. Binary logistic regression analysis was performed to assess the statistical significance of prehypertension and hypertension, and prediction models were developed using logistic regression, naïve Bayes, and decision trees. Among all risk factors for prehypertension, body mass index (BMI) was identified as the best indicator in both men [odds ratio (OR) = 1.429, 95% confidence interval (CI) = 1.304–1.462)] and women (OR = 1.428, 95% CI = 1.204–1.453). In contrast, among all risk factors for hypertension, BMI (OR = 1.993, 95% CI = 1.818–2.186) was found to be the best indicator in men, whereas the waist-to-height ratio (OR = 2.071, 95% CI = 1.884–2.276) was the best indicator in women. In the prehypertension prediction model, men exhibited an area under the receiver operating characteristic curve (AUC) of 0.635, and women exhibited a predictive power with an AUC of 0.777. In the hypertension prediction model, men exhibited an AUC of 0.700, and women exhibited an AUC of 0.845. This study proposes various risk factors for prehypertension and hypertension, and our findings can be used as a large-scale screening tool for controlling and managing hypertension.

The obesity indices have been mainly used in association studies with hypertension. For example, waist circumference (WC) is a risk factor for hypertension in African populations, Caribbean populations, Brazilian women, and Filipina women [23][24][25][26]. The body mass index (BMI) is a risk factor for hypertension in China, the Philippines, the United States (US), Australian women and India [27][28][29]. The WC ratio (WHR) is associated with hypertension in Chinese women in Hong Kong and in Australian men [29,30], and the waist-to-height ratio (WHTR) is the best indicator of hypertension in Chinese men in Hong Kong [30], a Taiwanese population [31] and a Korean population [32]. WC, BMI, and WHR are also associated with prehypertension in a Taiwanese population [33].
Several published studies have investigated blood parameters and hypertension. For example, compared with normotensive individuals, hypertensive patients show higher levels of fasting plasma glucose, serum high-sensitivity C-reactive protein (hs-CRP), TG, TC, LDL-C, uric acid (UA), white blood cells (WBCs), red blood cells (RBCs), hemoglobin (HGB), hematocrit (HCT) and mean corpuscular hemoglobin and lower serum HDL-C, mean corpuscular volume and RBC distribution width [34,35]. A higher glycated hemoglobin (HbA1c) level is correlated with a higher prevalence of hypertension [36], and inadequate hypertension treatment elevates serum creatinine level [37]. Markers of inflammation (CRP, WBC, amyloid-a, and homocysteine) are present at high levels in men and women with prehypertension [38].
In previous studies of spirometry and hypertension, the forced vital capacity (FVC) was identified as a negative predictor of hypertension, and lower FVC values were found to be a risk factor for future hypertension [39,40]. In studies of Beijing and Guangzhou populations, the FVC and forced expiratory volume in 1 s (FEV1) were found to be inversely proportional to SBP and DBP in women in both populations and in men in Beijing. A follow-up study conducted 2 or 4 years later showed a low incidence of hypertension with low lung function, but this effect was found only in Guangzhou women [41]. In Swedish men aged 55-68 years, BP increased with decreasing FVC. A lower FEV1 was correlated with higher SBP and DBP [42].
Previous studies have reported the associations of anthropometry, blood parameters, and spirometry with hypertension or prehypertension, but no studies have yet described the relationships between each of prehypertension and hypertension and anthropometric indices, blood parameters, and spirometric indices. The purposes of this study are to analyze risk factors of hypertension and prehypertension and to present a machine-learning-based prediction model to reduce the risks of diseases (CVD, CKD, stroke) caused by hypertension and to prevent diseases. First, we present risk factors for hypertension and prehypertension with statistical significance using demographic indices, anthropometric indices, blood parameters, and spirometric indices. Second, we develop predictive models of hypertension and prehypertension based on machine learning using correlation-based feature selection (CFS) and wrapper-based feature selection (WFS) methods and logistic regression (LR), naïve Bayes (NB), and decision tree (DT) prediction algorithms. Last, we propose the best hypertension and prehypertension prediction model through the performance evaluation between the developed prediction models. To the best of our knowledge, this study provides the first demonstration of the associations of prehypertension and hypertension with obesity indices, blood parameters, and spirometric indices in a Korean population. The findings of the present study provide basic information for the treatment and prevention of prehypertension and hypertension.

Subjects and Dataset
We obtained data from the sixth Korea National Health and Nutrition Examination Survey (KNHANES VI). The KNHANES database is publicly available at the KNHANES website (http://knhanes.cdc.go.kr/knhanes/eng). In this study, demographics, anthropometric indices, blood parameters and spirometric indices from the KNHANES VI were analyzed. KNHANES was approved by the Institutional Review Board of the Korea Centers for Disease Control & Prevention (KCDC) and the KCDC Bioethics Committee. Informed consent was obtained from all the participants prior to KNHANES data collection (Approval Numbers: 2013-07CON-03-4C, 2013-12EXP-03-5C  and 2015-01-02-6C).
The total number of subjects was 22948. According to the exclusion criteria, we performed the following:

•
We selected 12,838 individuals after excluding 10110 participants under 40 years of age.
In total, 8212 subjects were included in the study. According to the definitions of hypertension, 3035 subjects with normal blood pressure, 2002 subjects with prehypertension and 3175 subjects with hypertension were classified (the details of the data preprocessing are provided in Figure 1).  Numbers: 2013-07CON-03-4C, 2013-12EXP-03-5C and 2015-01-02-6C).
The total number of subjects was 22948. According to the exclusion criteria, we performed the following: • We selected 12,838 individuals after excluding 10,110 participants under 40 years of age.
In total, 8212 subjects were included in the study. According to the definitions of hypertension, 3035 subjects with normal blood pressure, 2002 subjects with prehypertension and 3175 subjects with hypertension were classified (the details of the data preprocessing are provided in Figure 1).

Definitions of Prehypertension and Hypertension
We used the criteria proposed by the World Health Organization (WHO) [5], the Joint National Committee 7 (JNC 7) [43] and previous studies to define normotension, prehypertension and hypertension [8,10,11,33,44,45]. Normotension was defined as an SBP less than 120 mmHg and a DBP less than 80 mmHg. Prehypertension was defined as an SBP between 120 mmHg and 139 mmHg and a DBP between 80 mmHg and 89 mmHg. Hypertension was defined as an SBP of at least 140 mmHg and/or a DBP of at least 90 mmHg [43],a diagnosis of hypertension or reported use of antihypertensive medications [11].

Statistical Analysis
Statistical analyses were performed using SPSS 20 for Windows (SPSS Inc., Chicago, IL, USA). Binary logistic regression (LR) was conducted to identify significant differences between normotension and prehypertension and between normotension and hypertension after standardized transformation was applied to the male and female datasets. Independent two-sample t-tests were used to examine the differences between men and women (basic characteristics are described in Table  1). We developed prehypertension and hypertension prediction models using the Waikato Environment for Knowledge Analysis data mining tool. The prediction models were developed using LR, naïve Bayes (NB), and decision tree (DT) classification algorithms, all of which are widely used classification models. The NB classifier uses Bayes' theorem and conditional probability to measure the probability of occurrence between classes and attributes and has the advantage of low computational cost [46]. The DT classifier creates an attribute with high information gain as an upper node based on entropy. Specifically, this classifier recursively creates an optimal tree structure by

Definitions of Prehypertension and Hypertension
We used the criteria proposed by the World Health Organization (WHO) [5], the Joint National Committee 7 (JNC 7) [43] and previous studies to define normotension, prehypertension and hypertension [8,10,11,33,44,45]. Normotension was defined as an SBP less than 120 mmHg and a DBP less than 80 mmHg. Prehypertension was defined as an SBP between 120 mmHg and 139 mmHg and a DBP between 80 mmHg and 89 mmHg. Hypertension was defined as an SBP of at least 140 mmHg and/or a DBP of at least 90 mmHg [43],a diagnosis of hypertension or reported use of antihypertensive medications [11].

Statistical Analysis
Statistical analyses were performed using SPSS 20 for Windows (SPSS Inc., Chicago, IL, USA). Binary logistic regression (LR) was conducted to identify significant differences between normotension and prehypertension and between normotension and hypertension after standardized transformation was applied to the male and female datasets. Independent two-sample t-tests were used to examine the differences between men and women (basic characteristics are described in Table 1). We developed prehypertension and hypertension prediction models using the Waikato Environment for Knowledge Analysis data mining tool. The prediction models were developed using LR, naïve Bayes (NB), and decision tree (DT) classification algorithms, all of which are widely used classification models. The NB classifier uses Bayes' theorem and conditional probability to measure the probability of occurrence between classes and attributes and has the advantage of low computational cost [46]. The DT classifier creates an attribute with high information gain as an upper node based on entropy. Specifically, this classifier recursively creates an optimal tree structure by partitioning followed by pruning. DT has the advantages of being easy to understand, providing a visual tree structure and having low calculation cost [47]. LR classifiers are used extensively in medical statistical surveys because the results of analyses relating categorical dependent variables and one or more independent variables are easily interpreted. Depending on the number of dependent variables, a binary or polynomial model may be used [46]. To select the features associated with prehypertension and hypertension, correlation-based feature selection (CFS) and wrapper-based feature selection (WFS) were applied. CFS solves the multicollinearity problem to recommend a variable with low correlation between attributes and high correlation between attribute and class [48], and WFS selects variables through black-box testing using a classification algorithm [49]. The values are expressed as means and standard deviations; Experimental results are presented as independent t-tests to verify statistical differences between two groups of men and women; and the statistical significance criteria are as follows: **: p < 0.01, ***: p < 0.0001.

Performance Evaluation
The prehypertension and hypertension predictive models were tested using the area under the receiver operating characteristic curve (AUC), and the performance of each predictive model was evaluated through analyses of sensitivity and (1-specificity). Sensitivity indicates that the response value was predicted to be positive in the positive case, and (1-specificity) is the false positive value in the negative case. The data were standardized such that the numerical data in different ranges could be analyzed on the same line. A ten-fold cross-validation test was performed to evaluate the predictive power of each model.

Results
The normal BP group included 1068 (13%) men and 1967 (24%) women, whereas the prehypertension group included 983 (12%) men and 1019 (12.4%) women, and the hypertension group included 1586 (19.3%) men and 1589 (19.3%) women. In the statistical analyses, p-values, odds ratios (ORs), and 95% confidence intervals (CIs) for each feature were obtained using binary LR. Tables 2 and 3 show the significance of differences in the studied variables between normotension and prehypertension or hypertension after adjustment for age in men and women. The results of binary logistic regression analyses adjusted by age (p value, OR and 95% CI) for each feature in men and women are shown. Abbreviations: OR, odds ratio; CI, confidential interval.

Performance Evaluation of the Prehypertension Prediction Model Combined with Feature Selection
We developed prediction models for prehypertension and hypertension using feature selection methods and classification algorithms. Features were selected using the CFS and WFS methods, and the predictive models were developed by applying the LR, NB, and DT algorithms with the selected features. The AUC was used to evaluate the performance of each prediction model.
The analysis of the prehypertension prediction model revealed that the WFS-LR model with AGE, BMI, GLU, TC, HDL-C, TG, aspartate aminotransferase (AST), HGB and blood urea nitrogen (BUN) showed the best predictive power (AUC = 0.635) for men, with a sensitivity of 0.52 and a1-specificity of 0.338. In contrast, the CFS-DT model showed the lowest predictive power (AUC = 0.559). For women, the WFS-LR model with AGE, WHTR, BMI, GLU, TC, TG, WBC, RBC and FVCP and peak expiratory flow (PEF) showed the best predictive power (AUC = 0.700), with a sensitivity of 0.308 and a 1-specificity of 0.11, whereas the CFS-DT model exhibited the lowest predictive power (AUC = 0.622).
The predictive performance of the prehypertension prediction model is compared and shown in Figure 2. The analyses of the prehypertension prediction model for men showed that the AUCs of LR, NB, and DT based on CFS were 0.610, 0.602 and 0.559, respectively, and that the AUCs of LR, NB, and DT based on WFS were 0.635, 0.626, and 0.580, respectively. In contrast, the AUCs of the prehypertension prediction model for women generated using LR, NB, and DT based on CFS were 0.698, 0.691 and 0.622, respectively, and the AUCs of LR, NB, and DT based on WFS were 0.700, 0.699 and 0.646, respectively. methods and classification algorithms. Features were selected using the CFS and WFS methods, and the predictive models were developed by applying the LR, NB, and DT algorithms with the selected features. The AUC was used to evaluate the performance of each prediction model.

Performance Evaluation of the Hypertension Prediction Models Combined with Feature Selection
The analysis of the prehypertension prediction model revealed that the WFS-LR model with AGE, BMI, GLU, TC, HDL-C, TG, aspartate aminotransferase (AST), HGB and blood urea nitrogen (BUN) showed the best predictive power (AUC = 0.635) for men, with a sensitivity of 0.52 and a1specificity of 0.338. In contrast, the CFS-DT model showed the lowest predictive power (AUC = 0.559). For women, the WFS-LR model with AGE, WHTR, BMI, GLU, TC, TG, WBC, RBC and FVCP and peak expiratory flow (PEF) showed the best predictive power (AUC = 0.700), with a sensitivity of 0.308 and a 1-specificity of 0.11, whereas the CFS-DT model exhibited the lowest predictive power (AUC = 0.622).
The analyses of the prehypertension prediction model for men showed that the AUCs of LR, NB, and DT based on CFS were 0.610, 0.602 and 0.559, respectively, and that the AUCs of LR, NB, and DT based on WFS were 0.635, 0.626, and 0.580, respectively. In contrast, the AUCs of the prehypertension prediction model for women generated using LR, NB, and DT based on CFS were 0.698, 0.691 and 0.622, respectively, and the AUCs of LR, NB, and DT based on WFS were 0.700, 0.699 and 0.646, respectively.      The predictive performance of the hypertension prediction model is compared and shown in Figure 3. The analysis of the hypertension prediction model for men showed that the AUCs of LR, NB, and DT based on CFS were 0.749, 0.732 and 0.666, respectively, and that the AUCs of LR, NB, and DT based on WFS were 0.777, 0.748 and 0.698, respectively. In contrast, the AUCs of the hypertension prediction model for women generated through LR, NB, and DT based on CFS were 0.843, 0.819 and 0.761, respectively, and the AUCs of LR, NB, and DT based on WFS were 0.845, 0.833 and 0.796, respectively.   The analysis of the hypertension prediction model for men showed that the AUCs of LR, NB, and DT based on CFS were 0.749, 0.732 and 0.666, respectively, and that the AUCs of LR, NB, and DT based on WFS were 0.777, 0.748 and 0.698, respectively. In contrast, the AUCs of the hypertension prediction model for women generated through LR, NB, and DT based on CFS were 0.843, 0.819 and 0.761, respectively, and the AUCs of LR, NB, and DT based on WFS were 0.845, 0.833 and 0.796, respectively.

Performance Evaluation of the Hypertension Prediction Models Combined with Feature Selection
The features and performance results of the prehypertension and hypertension prediction models are summarized in Table 4. In men, the WFS-LR showed satisfactory performance (AUC = 0.635) in the prehypertension prediction model and the best performance (AUC = 0.777) in the hypertension prediction model. In contrast, in women, the WFS-LR showed satisfactory performance (AUC = 0.700) in the prehypertension prediction model and the best performance (AUC = 0.845) in The features and performance results of the prehypertension and hypertension prediction models are summarized in Table 4. In men, the WFS-LR showed satisfactory performance (AUC = 0.635) in the prehypertension prediction model and the best performance (AUC = 0.777) in the hypertension prediction model. In contrast, in women, the WFS-LR showed satisfactory performance (AUC = 0.700) in the prehypertension prediction model and the best performance (AUC = 0.845) in the hypertension prediction model. Among the classification methods, LR exhibited higher prediction performance than did NB and DT. The hypertension prediction model performed better than the prehypertension prediction model and showed better performance in women than in men.

Discussion
In this study, anthropometric indices, blood parameters, and spirometric indices were examined to identify risk factors for prehypertension and hypertension. The features for the prehypertension and hypertension prediction models were selected using the CFS and WFS methods. Prediction models were then developed using the LR, NB, and DT classification algorithms.
In a previous study, Ko and colleagues analyzed the associations of BMI, WHR, WC, and WHTR with hypertension in a Chinese population in Hong Kong and found that WHTR was the strongest indicator in men (OR = 1.18, 95% CI = 1.14-1.23) whereas WHR was the strongest indicator in women (OR = 1.26, 95% CI = 1.18-1.35) [30]. Lee and colleagues demonstrated that among obesity factors, WC, WHR, and WHTR, were more predictive of hypertension than was BMI and that WHTR was the best obesity-related predictor of hypertension, regardless of gender, ethnicity and age [hazard ratio (HR) = 1.49, 95% CI = 1.35-1.65 in men and HR = 1.48, 95% CI = 1.33-1.64 in women] in middle-aged Korean adults [32]. Chang and colleagues found that BMI in men (OR = 2.07, 95% CI = 1.44-2.99) and abdominal obesity in women (OR = 2.04, 95% CI = 1.54-2.71) were associated with an increased risk of prehypertension [45]. Grievink and colleagues evaluated BMI, WC, and WHR as predictors of hypertension in a Caribbean population and identified WC (OR = 1.7, 95% CI = 1.4-2.0) as the best independent predictor of hypertension [24]. Tsai and colleagues reported that WHR, BMI, and WC were associated with prehypertension, particularly high BMI in men (OR = 1.106, 95% CI = 1.051) and high WC in women (OR = 1.031, 95% CI = 1.012-1.051) [33]. In this study, BMI was identified as the best predictor of prehypertension in men (OR = 1.429, 95% CI = 1.303-1.567) and women (OR = 1.427, 95% CI = 1.321-1.542). The risk factors that best predicted hypertension were BMI in men (OR = 1.993, 95% CI = 1.817-2.185) and WHTR in women (OR = 2.071, 95% CI = 1.884-2.276). Our findings are consistent with those of previous studies [27][28][29] and indicate that BMI is the best indicator of hypertension in men and of prehypertension in men and women.
Several studies of blood parameters and hypertension have been conducted. Cirillo and colleagues reported that hematocrit level was positively correlated with SBP and DBP in men and women [34]. In addition, Emamian and colleagues performed a multivariate LR analysis of demographic, biochemical, and hematological parameters and found that hematocrit (OR = 1.02, 95% CI = 1.003-1.04) was an independent predictor of hypertension [35]. Daniel and colleagues demonstrated that high HbA1c levels were associated with increased hypertension rate and that the rate of CVD (OR = 1.39, 95% CI = 1.06-1.83) increased by 1% with each increase in HbA1c level [36]. Christina and colleagues showed that men and women with prehypertension presented 31% higher CRP, 32% higher tumor necrosis factor-a, 9% higher amyloid-a, 6% higher homocysteine, and 10% higher WBC levels [38]. In this study, the best predictor of prehypertension was found to be HBG (OR = 1.322, 95% CI = 1.204-1.452) in men and GLU (OR = 1.289, 95% CI = 1.180-1.410) in women. HCT (OR = 1.262, 95% CI = 1.151-1.383) and TG (OR = 1.259, 95% CI = 1.162-1.365) were also highly associated with prehypertension in men and women, respectively. The best predictor of hypertension was TG (OR = 1.434, 95% CI = 1.304-1.576) in men and GLU (OR = 1.675, 95% CI = 1.508-1.861) in women. GLU (OR = 1.363, 95% CI = 1.247-1.489) and HbA1c (OR = 1.539, 95% CI = 1.393-1.700) were also highly associated with hypertension in men and women, respectively. Our findings are consistent with those of previous studies [36] and indicate that the HbA1c index is significantly associated with hypertension in women.
Through a study of hypertension and spirometry, Sarah and colleagues demonstrated that FVC was significantly associated with hypertension and a negative predictor [39]. Follow-up studies showed that hypertension could develop in the future, and an OR of approximately 0.7 was found in an LR analysis [39]. Jacobs and colleagues performed an HR analysis and found that the risk of hypertension (HR from 1 to 2.21) increased by more than 2-fold with decreasing FVC and that a low FVC might result in cardiovascular morbidity and mortality [40]. In this study, FVCP was identified as the best predictor of prehypertension and hypertension. Low FVCP indices were associated with prehypertension in women (OR = 0.814, 95% CI = 0.755-0.877), hypertension in men (OR = 0.791, 95% CI = 0.728-0.859) and hypertension in women (OR = 0.681, 95% CI = 0.629-0.739). FVC predictors were also significantly associated with hypertension, but the associations were slightly less significant than the association of FVCP with hypertension. Our findings are consistent with those of previous studies [39,40] and indicate that the FVC index is significantly associated with prehypertension in women and hypertension in men and women.
Prior to the present study, several researchers have proposed hypertension prediction model based on data mining techniques [50][51][52][53]. For instance, Tayefi and colleagues proposed a hypertension prediction model based on DTs in the Iranian population. The hypertension DT model suggested that demographics and selected biochemical markers (such as age, BMI, fasting blood GLU, TG, UA, hs-CRP, TC and LDL-C) have higher predictive power than other biochemical markers [50]. Ture and colleagues compared the performance of DTs, statistical algorithms, and neural networks using features such as age, sex, family history, smoking habits, lipoprotein, TG and UA and found that the neural network algorithm had the best predictive power for hypertension [51]. The evaluation of the performance of DT, NB, and LR performed in this study identified LR as the best classification algorithm. The model combining the demographic index, blood parameters and spirometric indices showed the best predictive power. Among the prediction models of prehypertension and hypertension, the WFS-LR prediction models were identified as the best for both men and women. A hypertension prediction model was then developed by combining the obesity index, blood parameters, and spirometric indices, whereas previous studies [50,51] used demographic characteristics, BMI, and blood parameters.
This study has several limitations. First, it is difficult to identify cause-and-effect relationships because we used data from a cross-sectional survey. Second, the most significant risk factor for hypertension was the obesity index, but hip circumference was not measured and could not be compared to WHR. Third, we did not have information on disease (diabetes, dyslipidemia, and hyperlipidemia) secondary to hypertension, so we did not consider it in this study. Finally, in this study, the predictive model was designed considering only anthropometric indices, blood parameters, and spirometric indices, and indicators such as smoking, drinking, and physical activity were excluded.

Conclusions
Hypertension is a risk factor that can lead to cardiovascular diseases and death, and treatment and management strategies for hypertension remain lacking. In this study, we examined the associations of prehypertension and hypertension with spirometric indices, obesity indices, and blood parameters and proposed prehypertension and hypertension prediction models to aid the effective management and prevention of hypertension. A statistical analysis of the three types of variables revealed that the obesity indices were the highest risk factors for prehypertension and hypertension in men and women. GLU, HbA1c, and TG as well as low spirometric values were also associated with hypertension in both men and women. Thus, we developed prediction models with the LR, NB and DT classifiers using two subset selection methods, namely, CFS and WFS. The predictive model with the highest prediction power was the WFS-LR prediction model that combined various factors (i.e., age, obesity indices, blood parameters, and spirometric indices). Our findings can be applied as a large-scale screening tool for the control and management of hypertension.