Edinburgh Research Explorer 3D Technologies to Acquire and Visualize the Human Body for Improving Dietetic Treatment

: This research aims to improve adherence to dietetic-nutritional treatment using state-of-the-art RGB-D sensor and virtual reality (VR) technology. Recent studies show that adherence to treatment can be improved by using multimedia technologies which impact on the body awareness of patients. However, there are no studies published to date using 3D data and VR technologies for this purpose. This paper describes a system capable of obtaining the complete 3D model of a body with high accuracy and a realistic visualization for 2D and VR devices to be used for studying the effect of 3D technologies on adherence to obesity treatment.


Introduction
The prevalence of overweight and obesity has increased globally, tripling over the last three decades in the countries of the European Union. Overweight and obesity contribute to the emergence of chronic diseases (hypertension, type II diabetes, cancer, etc.) and the development of neurodegenerative pathologies (Alzheimer's or dementias) [1,2]. The treatment and follow-up care required by these patients has a high impact on the costs of health services [3][4][5]. Adherence to the treatment of obesity has been considered one of the factors causing the failure of intervention programs [6][7][8]. Given this evidence, improving adherence would contribute to the outcome of treatments and their maintenance over time, leading to lower health costs.
Some authors have suggested that nutritional interventions that reinforce the follow-up of therapies through the use of technologies achieve beneficial effects over time [8][9][10]. The results show how adherence to treatment can be increased by incorporating techniques based on the use of (2D) images of the patient's evolution in dietary treatment that enhance their cognitive experience [11]. However, to the best of our knowledge there are no studies that take advantage of and quantify the potential of using realistic 3D images and virtual reality techniques to reinforce adherence to treatment. This work provides the necessary framework for specialists to address these types of studies.
Nowadays, 3D modelling of the human body is transforming our ability to accurately measure and visualize it, showing great potential for health applications from epidemiology to diagnosis or patient monitoring [12,13]. Recent works [14,15] deal with the acquisition of images of the human body from RGB-D cameras and video sequences, providing 3D models with texture and avatars, but they do not provide the accuracy necessary in health applications. There are systems on the market for body model acquisition: Fit3D [16] uses a single camera but needs a complex and expensive device; Body Labs acquired by Amazon also developed technology to capture the human body in three dimensions, but it does not consider any time parameter; Naked Labs [17] provides body visualization and tracking functions for the fitness market.
There are different types of 3D sensors with different characteristics that can be used to capture the human body. Devices based mainly on laser, such as Lidar, have high accuracy but they only provide depth information and do not provide color data. Stereo sensors use two color cameras to infer the depth, which usually means high cost and difficult portability since both cameras must be calibrated. Recently, RGB-D cameras (such as Microsoft Kinect or Intel RealSense) integrate color and depth in a single device, and they use different technologies to estimate the depth (structured light, ToF, active stereo). The characteristics of these RGB-D devices, including accuracy, portability, capture frequency, etc., are causing their popularization and integration in mobile consumer devices [18]. For these reasons, this work makes use of these devices to capture 3D models. Moreover, the use of RGB-D device networks is proposed to meet the required quality levels for 3D representations.
Virtual reality has also experienced an important growth in recent years. This technology simulates realistic 3D interactive environments using HMDs (head mounted displays), also known as virtual reality glasses. At first, these devices had high prices due to their complexity. However, recent developments such as "Google Cardboard" allowed any smartphone to be converted into a HMD with very low costs. Currently, there is a lot of research using the potential of virtual reality. The immersive experience provided by virtual reality stands out, improving concentration in the training process [19]. It is hoped that the use of these technologies can be effective in improving adherence to the treatment of overweight patients [10].
Classical treatments for obesity patients have shown limited effectiveness in resolving chronicity. The use of technologies for 3D reconstruction of the human body is sufficiently mature in different fields of application. The field of virtual reality is also evolving remarkably, finding lowcost systems and successful application experiences in different fields. However, to the best of our knowledge, there are no works that combine RGB-D acquisition devices and virtual reality, analyzing shape changes over time (4D) to improve adherence to obesity treatments.
There are important scientific challenges to the development of computational methods for the study of changes in the shape of the human body using 3D/4D vision techniques to improve obesity treatment processes. In this paper, we provide a system capable of obtaining the 3D model of the body with color texture representation and a realistic visualization for 2D and virtual reality devices (see Figure 1) in order to be used, in the future, for studying the effect of 3D technologies on the adherence to obesity treatment.
For the development of this research the following specific objectives have been addressed: • Obtaining the 3D/4D model: 3D acquisition of the human body from low cost RGB-D camera network, obtaining the 3D geometric model and the texture representation of sequences of bodies over time (4D).

•
Visualization of the 3D body: From the 3D models captured over time, realistic visualizations of the body evolution are generated using virtual reality.
The rest of the paper is organized as follows: Section 2 describes the system for reconstructing the human body from RGB-D cameras, Section 3 details the system of visualization of the human body and, finally, Section 4 presents the conclusions and summarizes the nutritional intervention plans and their impact and expected results to be obtained using the system. . Shape evolution analysis (c) and textured representation with virtual reality (VR) devices (d).

3D Reconstruction of Human Body from Multiple RGB-D Views
In order to obtain the 3D model of the human body, a network of RGB-D cameras was used to meet the quality requirements of the system. The network is composed of eight Intel RealSense RGB-D cameras located on four aluminum masts of 2200 × 80 × 80 mm distributed around the capture area. Figure 2 shows the setup to capture a human body (left) and an example of eight depth and RGB color images obtained by the system (right). The pipeline used to obtain the 3D textured model from different RGB-D sensors has five stages ( Figure 3) from (a) to (e): acquisition, pre-processing, registration, mesh generation and texture projection. Calibration of the individual sensors and estimation of the relative sensor positions is explained but not included in the pipeline as it is part of the set-up process. Figure 3. Pipeline of 3D body reconstruction. The system is able to acquire several images from cameras (a) that are preprocessed in order to improve the quality of the acquisition (b). The set of points are registered in a unique origin of coordinates (c). Finally, in order to obtain the 3D model of the body, the 3D points are converted into a mesh (e) and the images are projected on it (d).

Acquisition
Pre-processing Registration Mesh generation Texture projection

Calibration
To correct the distortions of the images caused by the lens, an intrinsic calibration was carried out using the provided Intel RealSense SDK. The calibration method requires a minimum of six samples of the provided chessboard marker from different points distributed in the capture area.
Since we were using a network of RGB-D cameras, it was necessary to carry out an extrinsic calibration to unify the views locating the different point clouds in the same coordinate space. To obtain the transformation matrices of each of the point clouds, we carried out an extrinsic calibration based on 3D markers, specifically we used two types of 3D markers, spherical and cubic [20].

Acquisition
The network was composed of eight Intel RealSense RGB-D D435 cameras with the appropriate characteristics (FOV, color and depth resolution...). Intel's SDK for RealSense was used as the basis for the development of the acquisition software. Once the system has been calibrated, the acquisition process requires the synchronization of all the cameras in the network to perform the capture, so that the captured data from the eight cameras all come from the same time. Semaphore management was used to address the synchronization. The semaphores act in a similar way to a barrier and assure that all threads begin the acquisition practically at the same time, and they do not finish the acquisition of the frame until the rest of the cameras have finished.

Preprocessing
At this stage, some noisy point clouds from the different RGB-D sensors were obtained, so it was necessary to apply different methods to improve their quality (see Figure 3b). First, the point cloud was truncated in the z-axis (depth) to remove the points that were beyond the center of the capture area. After that, three filters were applied: median, bilateral and statistical outlier removal (SOR). Median filters are able to reduce noise and are very efficient in terms of processing time, as they require a single pass over the cloud [21]. Bilateral filters smooth edges and areas with high curvatures while preserving sharpness using a non-linear combination of values from nearby areas [22]. SOR filters remove edge noise and outliers using neighborhood statistics [23]. Finally, the normal vector for each point in the cloud was calculated. The problem of determining the normal at a point on the surface was approached using the problem of estimating the normal of a plane tangent to the surface [24].

Registration
In order to align the different point clouds in a single 3D coordinate system, the transformation matrices T obtained from the extrinsic calibration were applied to the data extracted from each camera (see Figure 3c). Each camera has its own reference system with the origin (0,0,0) located within. With the extrinsic parameters from the calibration we assumed one camera as reference and transformed the rest of the point clouds to this one and to obtain a unified dataset [25].

Mesh Generation
Different methods such as greedy projection or marching cubes were tested, obtaining the best result with the Poisson surface algorithm [26]. It is possible to reconstruct a triangle mesh from a set of oriented 3D points by solving a Poisson system. That is, solving a 3D Laplacian system with positional value constraints. The method approaches the problem of surface reconstruction using an implicit function framework. It computes a 3D indicator function that returns as greater than 0 the points inside the surface and as less than 0 the points outside it. This function can be found because there is a relationship between the orientation of the points and the function itself. Specifically, the gradient of the function to be found is a vector field with a value of 0 in all points except those close to the surface, which takes the value of the surface normal oriented towards the interior (see Figure  3e). After that, the reconstructed surface is obtained by extracting an appropriate isosurface.

Texture Projection
The method proposed by Callieri et al. [27] was used to carry out the raster projection and texture generation. This method generates parameters of the mesh in relation to its vertices and generates the texture based on the projection of the different images considering the position and orientation of the different cameras. Figure 4 shows the same body represented by the mesh with the projected texture seen from four different points of view. You can find a demo video at [28].

4D Visualization of the Human Body Using Virtual Reality for Obesity Treatment Improvement
The second objective of this work is to provide a visualization system of the generated 3D models. This system allows interaction with the acquisition subsystem, management of patient data and a realistic visualization of the human body models over time. This system would be used both by the medical specialist to assist research in the field of obesity treatment, and by the patient to improve their adherence to treatment. Thus, the system is composed of two subsystems: the specialist 4D visualization system for obesity treatment and the virtual reality visualization system.

Specialist 4D Image Visualization System for Obesity Treatment
The visualization system allows the medical specialist several options. From the system, they can perform a scan of the patient's body since the visualization system communicates with the acquisition system.
In addition, the system provides the possibility to connect the virtual reality system to the specialist so the patient can see himself with the VR glasses while the expert is acting with the virtual model. On the other hand, the system can visualize all the body models generated, allowing the point of view to be changed, body models to be compared and information from different sessions to be navigated through ( Figure 5). Finally, the system allows the specialist to register new patients or view the history of sessions.

Virtual Reality System
The second subsystem is composed by a virtual reality (VR) system contained in a mobile application that allows patients to see their progress in a more immersive way with the goal to improve adherence to treatments. The VR system, as well as the visualization system, was developed using Unity [29].
Two main functionalities were developed for this subsystem. First, the VR system was synchronized in real time with the computer specialist visualization system, allowing interaction between both systems. Second, the movement of the user's head was transmitted to the rotation of the 3D model being visualized (Figure 6).

Conclusions
This work focuses its contribution on developing technologies for health. Specifically, 3D cameras (RGB-D) and virtual reality technologies are intended to improve obesity treatment. Future research will investigate the positive effect of realistic 3D representations on the body awareness and psychological well-being of the individual. The proposed system has been developed to give the medical specialists a tool to study problems related to adherence to dietetic-nutritional treatment. The proposed methodology allows a 3D visual model of human body to be obtained over time to analyze the morphological progress due to changes resulting from obesity treatments. 3D models were obtained from the acquisition of multiple views through a network of RGB-D cameras. These views were filtered and aligned to obtain a mesh model on which the texture was projected generating realistic 3D models. Sequences of realistic 3D models allowed the generation of the 4D models used in the visualizations, which we hope will provide a powerful method to improve adherence to obesity treatments. Different scientific challenges in the area of computer vision were addressed, including the problem of calibrating RGB-D camera networks, improving the quality of point clouds, as well as realistic 3D representation. Furthermore, from a technological point of view, a system based on low-cost, adaptable and portable technologies and intuitive environments based on realistic 3D representations has been designed. The potential of these technologies, together with the purpose of improving the treatment of obesity, allows a high social and economic impact to be predicted. The system is planned to be installed in three Primary Health Care Centers in order to obtain and study the psychological results. Subsequently, it is planned to transfer the operation to health institutions in the area of Alicante (Spain).