Data Analytics in Smart Healthcare: The Recent Developments and Beyond

: The concepts of the smart city and the Internet of Things (IoT) have been facilitating the rollout of medical devices and systems to capture valuable information of humanity. A lot of artiﬁcial intelligence techniques have been demonstrated to be e ﬀ ective in smart city applications like energy, transportation, retail and control. In recent decade, retardation of the adoption of data analytics algorithms and systems in healthcare has been decreasing, and there is tremendous growth in data analytics research on healthcare data. The results of analytics aim at improving people’s quality of life as well as relieving the issue of medical shortages. In this special issue “Data Analytics in Smart Healthcare”, thirteen (13) papers have been published as the representative examples of recent developments. Guest Editors also highlight some emergent topics and opening challenges in healthcare analytics which follow the visions of the movement of healthcare analytics research.


Introduction
In the light of the promotion of the smart city [1], many smart applications have been raised, for instance, smart energy [2], smart education [3], smart transportation [4] and smart healthcare [5]. Guest Editors have proposed a special issue on the theme of "Data Analytics in Smart Healthcare" which aims at collecting innovative applications in smart healthcare via data analytic techniques.
The tremendous growth of Internet-of-Things devices enable the (big) data collection of health-related parameters (e.g., body temperature, blood pressure, heart beat, respiratory rate, oxygen saturation, blood glucose level, wrist pulse signal, magnetoencephalogram (MEG), galvanic skin response (GSR), electrooculography (EOG), mechanomyogram (MMG), electromyogram (EMG), electrocardiogram (ECG) and electroencephalogram (EEG)). Numerous data analytic techniques are applied to analyze the data in order to realize smart healthcare applications.
The world has been seeking effective measures to relieve the issues of population ageing as well as inadequate amounts of medical staff. The World Health Organization (WHO) reported that the world requirement and actual number of medical staff are about 60 million and 43 million, respectively, in 2013 [6]. These figures will be increased to 82 million and 67 million, respectively, by 2030. It can be seen that the percentage of medical shortages will be decreased; unfortunately, more than 15 million of shortage in medical staff is an unimaginable figure which requires some alternatives. This editorial is organized as follows. Section 2 summarized the applications, methodologies and key results of the published articles. In Section 3, guest editors discuss emergent topics in smart healthcare. Finally, a conclusion is drawn.

Special Issue Articles
This section provides a summary using both table (Table 1) and written description. Table 1. Summary on the application and methodology of the special issue articles.

Work
Application Methodology [7] Prediction of inpatient violence incidents Recurrent neural network; convolutional neural network; neural network; Naïve Bayes; support vector machine; decision tree [8] Prediction of type 2 diabetes and hypertension density-based spatial clustering; synthetic minority over-sampling [9] Prediction of biochemical recurrences in patients treated by stereotactic body radiation therapy prostate clinical outlook [10] Forecast of tuberculosis prevalence rate Kruskal-Wallist test; regression model; Cuckoo search optimization algorithm; radial basis function neural networks [11] Investigation of the association between policy factors and healthcare system efficiency Tobit model [12] Improvement on software reuse in smart healthcare Systematic analysis [13] Investigation of the relationship between continuity of care in the multidisciplinary treatment of patients with diabetes and their clinical results Statistical analysis [14] Optic disk localization Statistical edge detection; circular hough transform [15] Classification of lung cancers Deep convolutional neural network; support vector machine [16] Probability analysis of hypertension-related symptoms XGBoost; clustering algorithm [17] Skin aging estimation Scale-invariant feature transform; color histogram intersection; polynomial regression; support vector regression [18] Minimizing the number of physicians and nurses Discrete event simulations [19] Classification of organ inflammation Genetic algorithm; support vector machine The first paper entitled "Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text" coauthored by V. Menger, F. Scheepers and M. Spruit, focused on the topic of predicting inpatient violence incidents [7]. Deep learning and shallow learning have been applied and compared to the clinical text to build the classification model. Results showed that the improvement by the deep learning approach is statistically insignificant using the area under the receiver operating characteristic curve. It recommends the selection between deep learning and shallow learning, and the former allows more experimentation in model setup whereas the latter possesses a small training time.
M. Ijaz, G. Alfian, M. Syafrudin, and J. Rhee proposed a random forest-based algorithm for the prediction of Type 2 Diabetes and hypertension using risk-factors from individuals [8] in their manuscript "Hybrid prediction model for Type 2 Diabetes and hypertension using DBSCAN-based outlier detection, synthetic minority over sampling technique (SMOTE)". Two techniques named density-based spatial clustering and synthetic minority over-sampling have been utilized for outlier removal and imbalanced classes handling. This showed significant improvement in precision, sensitivity, specificity, F1 score and accuracy, compared to existing methods support vector machine (SVM), multilayer perception (MLP), logistic regression, naïve bayes, and C4.5 using three benchmark datasets. S. K. Mun et al. [9] in their manuscript entitled "The prostate clinical outlook (PCO) classifier application for predicting biochemical recurrences in patients treated by stereotactic body radiation therapy (SBRT)", implemented a prostate clinical outlook classifier for the classification of biochemical recurrence in patients undergoing stereotactic body radiation therapy. It concluded that four parameters are effective in increasing the prediction accuracy: Gleason scores, pretreatment prostate specific antigen, clinical radiological staging and age. The results are around 0.7 (c-index) which has proven the feasibility of the application.
The manuscript "Data analysis and forecasting of tuberculosis prevalence rates for smart healthcare based on a novel combination model" authored by J. Wang, C. Wang and W. Zhang proposed a combination forecasting model to determine the tuberculosis prevalence rate [10]. The steps have been divided into five major parts, including the Kruskal-Wallist test (also known as one-way analysis of variance (ANOVA) on ranks), regression model, cuckoo search optimization algorithm, combine forecasting method with weighted coefficients, and radial basis function neural networks. The goodness-of-fit with adjusted R-squares is more than 0.96.
S. Lee and C. Kim studied the association between policy factors and healthcare system efficiency in their manuscript entitled "Estimation of association between healthcare system efficiency and policy factors for public health" [11]. The efficiency score was corrected by data envelopment analysis with bootstrapping. It was then investigated with policy factors using the Tobit model. Results have concluded that the efficiency was affected by user choice for basic insurance coverage and degree of decentralization to sub-national governments in a reverse positive way.
"A systematic review of open source clinical software on GitHub for improving software reuse in smart healthcare" authored by Z. Shen and M. Spruit [12] collected and analyzed around 13,000 GitHub repositories on open source clinical software between 2009 and 2018. Some representative findings were highlighted (i) popularity of software using number of stars; (ii) most productive countries contributing to the community; (iii) the causes of the popularity of the software; and (iv) top 10 groups of software.
In [13], C. Saint-Pierre, F. Prieto, V. Herskovic, and M. Sepúlveda wrote a manuscript "Relationship between continuity of care in the multidisciplinary treatment of patients with diabetes and their clinical results". This paper studied the relationship between continuity of care in the multidisciplinary treatment of type 2 diabetes sufferers and their clinical results. The continuity was measured by four traditional parameters named sequential continuity, Herfindahl index, continuity of care index and usual provider continuity. The result revealed that continuity of care by dietitians, physicians and nurses has a positive effect on the clinical results of type 2 diabetes patients. In particular, authors had emphasized the importance of dietitians and nurses as they may be treated as less important in some clinics and hospitals.
H. M. Ünver, Y. Kökver, E. Duman, and O. A. Erdem presented a manuscript entitled "Statistical edge detection and circular hough transform for optic disk localization" [14]. An optic disk detection algorithm is proposed for the application in retinal images. Two techniques were employed for the detection which were the statistical edge detection algorithm and circular hough transformation. This can be applied to various retinal diseases like glaucoma, papilledema and diabetic retinopathy. About 260 samples were tested on the proposed algorithm which had an error rate of less than 3%.
A combined deep convolutional neural network and support vector machine was adopted for the classification of lung cancers via pulmonary computed tomography (CT) images in the article "Classification of pulmonary CT images by using hybrid 3D-deep convolutional neural network architecture" [15], authored by H. Polat, and H. Danaei Mehr. The accuracy was 92% over 2100 testing cases using the Data Science Bowl and Kaggle dataset. It also recommended introducing more convolution layers in order to extract more representative features for better accuracy.
W. Change et al. in their manuscript entitled "Probability analysis of hypertension-related symptoms based on XGBoost and clustering algorithm" [16] proposed a clustering based XGBoost algorithm to classify type I and type II hypertension. Moreover, studies showed that symptoms of ventricular hypertrophy, arteriosclerosis and microalbuminuria are more prone to occur for type II hypertension sufferers. Probabilistically, 98.5% the proposed method was correct with the testing set of 531 patients.
In [17], J. Rew, Y. H. Choi, H. Kim, and E. Hwang published an article "Skin aging estimation scheme based on lifestyle and dermoscopy image analysis" on the topics of skin condition tracing and skin texture aging estimation based on lifestyle. Various techniques including scale-invariant feature transform, color histogram intersection, polynomial regression and support vector regression, were adopted. This involved 365 volunteers in the performance evaluation and the results indicated an accuracy of 93%.
L. Popova Zhuhadar, and E. Thrasher simulated a crisis scenario in "Data analytics and its advantages for addressing the complexity of healthcare: A simulated Zika case study example" [18] in which the number of physicians and nurses was minimized using data analytics techniques. The simulation analyzed various cases based on the number of outpatients arriving the clinic and the waiting time to consult a medical doctor or pre-examination by a nurse.
The last manuscript entitled "A novel MOGA-SVM multinomial classification for organ inflammation detection" authored by K. T. Chui and M. D. Lytras [19]. The wrist pulse signal contains crucial information for human status. In this paper, a multi-objective optimization problem was formulated for the support vector machine classification of organ inflammations and solved by a genetic algorithm. Since typical kernels like linear, radial basis function, polynomial and sigmoid possess different characteristics which suit for different applications, a combination of kernels could yield a better performance. The achievement of the results was 92% in accuracy which improved the accuracy from 9% to 60% compared to stand-alone traditional kernel.

Emergent Topics in Smart Healthcare
We would like to share some emergent topics in the fields of dementia, anxiety, medicalchain, genetics and genomics, virtual reality, social media and robotic surgery, in smart healthcare, which have not been discussed in the aforementioned 13 papers. Details of the related works will not be discussed and are highly recommended to readers who are interested in the topic(s). Table 2 shows the emergent topics in smart healthcare and recommended readings. Table 2. Summary on the application and methodology of the special issue articles.

Conclusions
This special issue is composed of 13 papers with various topics and methodologies in data analytics for smart healthcare. The Guest Editors have briefly summarized the details of each work as well as highlighting six emergent topics in healthcare. Finally, we would like to thank you all colleagues and reviewers for their contribution.
Author Contributions: M.D.L., K.T.C. and A.V. contributed equally to the design, implementation, and the delivery of the special issue. All co-editors, contributed equally in all the phases of this intellectual outcome.
Funding: Authors would like to thank Effat University in Jeddah, Saudi Arabia, for funding the research reported in this paper through the Research and Consultancy Institute.

Conflicts of Interest:
The authors declare no conflict of interest.