A Comparative Study on Carbohydrate Estimation: GoCARB vs. Dietitians

GoCARB is a computer vision-based smartphone system designed for individuals with Type 1 Diabetes to estimate plated meals’ carbohydrate (CHO) content. We aimed to compare the accuracy of GoCARB in estimating CHO with the estimations of six experienced dietitians. GoCARB was used to estimate the CHO content of 54 Central European plated meals, with each of them containing three different weighed food items. Ground truth was calculated using the USDA food composition database. Dietitians were asked to visually estimate the CHO content based on meal photographs. GoCARB and dietitians achieved comparable accuracies. The mean absolute error of the dietitians was 14.9 (SD 10.12) g of CHO versus 14.8 (SD 9.73) g of CHO for the GoCARB (p = 0.93). No differences were found between the estimations of dietitians and GoCARB, regardless the meal size. The larger the size of the meal, the greater were the estimation errors made by both. Moreover, the higher the CHO content of a food category was, the more challenging its accurate estimation. GoCARB had difficulty in estimating rice, pasta, potatoes, and mashed potatoes, while dietitians had problems with pasta, chips, rice, and polenta. GoCARB may offer diabetic patients the option of an easy, accurate, and almost real-time estimation of the CHO content of plated meals, and thus enhance diabetes self-management.


Introduction
The worldwide incidence of type 1 diabetes (T1D) is increasing by 3% a year [1]. In high income countries, more than half a million children and 7-12% of the general population are affected by the disease. In Europe, there are ca. 140,000 cases in all and~21,600 new cases per year [1]. from two different viewing angles. Once the image acquisition process is completed, a number of computational steps permit the recognition of the various food items and the meal's volume estimation. By knowing the food type and the respective volume, and by using the USDA food composition database, the CHO content is estimated [17,20,22]. The procedure followed for GoCARB is described in detail elsewhere [20].

Study Design
Fifty-four dishes were ordered from the Bern University Hospital ("Inselspital") restaurant over a period of two months (July, August 2014). The dishes corresponded to standard Central European meals as served by the restaurant to employees and visitors-examples are shown in Figure 1a-d and constitutes a subset of the dishes used in a previous study [20]. For each of the 54 meals, three different sizes were shown (small, medium, and large), with each plate containing three different food items: one high in protein (e.g., meat), one high in CHO (e.g., pasta) and one vegetable component (e.g., zucchini), and the items did not overlap. Fifteen food categories were considered; breaded meat (e.g., schnitzel), pasta, rice, potatoes, mashed potatoes, polenta, green beans, meat, carrots, salad (i.e., leafy vegetables), chips (French fries), noodles, zucchini, broccoli, and fish.
Nutrients 2018, 10, x FOR PEER REVIEW 3 of 11 computational steps permit the recognition of the various food items and the meal's volume estimation. By knowing the food type and the respective volume, and by using the USDA food composition database, the CHO content is estimated [17,20,22]. The procedure followed for GoCARB is described in detail elsewhere [20].

Study Design
Fifty-four dishes were ordered from the Bern University Hospital ("Inselspital") restaurant over a period of two months (July, August 2014). The dishes corresponded to standard Central European meals as served by the restaurant to employees and visitors-examples are shown in Figure 1a-d and constitutes a subset of the dishes used in a previous study [20]. For each of the 54 meals, three different sizes were shown (small, medium, and large), with each plate containing three different food items: one high in protein (e.g., meat), one high in CHO (e.g., pasta) and one vegetable component (e.g., zucchini), and the items did not overlap. Fifteen food categories were considered; breaded meat (e.g., schnitzel), pasta, rice, potatoes, mashed potatoes, polenta, green beans, meat, carrots, salad (i.e., leafy vegetables), chips (French fries), noodles, zucchini, broccoli, and fish. GoCARB was used to estimate the meals' CHO content. GoCARB needs two photos that were acquired by different shooting angles to estimate the CHO content (1 GoCARB usage = 2 images). For each of the 54 dishes, GoCARB was used one to three times (1× for 13 meals; 2× for 25 meals; 3× for the rest 16 meals). Thus, two to six photos for each meal were taken resulting in a total of 111 GoCARB estimations and 222 meal images. Then, each food item was weighed using household scales (Kenwood, model AT850B) and ground truth (GTR) was estimated using the meals' exact food items, as it appears in the USDA National Nutrient Database for Standard Reference.
The same 222 images were sent to a pool of dietitians in November 2016. Each dietitian was asked to calculate the corresponding CHO content of each image. Their answers were received by June 2017. The dietitians were advised to use the Diabetic Exchange List that was designed by a committee of the American Diabetes Association and the American Dietetic Association [23], but the final decision to use the proposed list was made by the dietitians. They were blinded to the actual weight of the meals, their CHO content, and the corresponding food items. An overview of the methodological approach that was used for GoCARB and the dietitians is presented in Figure 2. GoCARB was used to estimate the meals' CHO content. GoCARB needs two photos that were acquired by different shooting angles to estimate the CHO content (1 GoCARB usage = 2 images). For each of the 54 dishes, GoCARB was used one to three times (1× for 13 meals; 2× for 25 meals; 3× for the rest 16 meals). Thus, two to six photos for each meal were taken resulting in a total of 111 GoCARB estimations and 222 meal images. Then, each food item was weighed using household scales (Kenwood, model AT850B) and ground truth (GTR) was estimated using the meals' exact food items, as it appears in the USDA National Nutrient Database for Standard Reference.
The same 222 images were sent to a pool of dietitians in November 2016. Each dietitian was asked to calculate the corresponding CHO content of each image. Their answers were received by June 2017. The dietitians were advised to use the Diabetic Exchange List that was designed by a committee of the American Diabetes Association and the American Dietetic Association [23], but the final decision to use the proposed list was made by the dietitians. They were blinded to the actual weight of the meals, their CHO content, and the corresponding food items. An overview of the methodological approach that was used for GoCARB and the dietitians is presented in Figure 2.

Participants
Twelve dietitians from German speaking countries (i.e., Germany, Switzerland, and Austria) were personally contacted and informed about the study's nature. Eligibility criteria required individuals to have received a BSc degree in Nutrition and Dietetics, be German speaking, and also to have had at least five years' experience with diabetic patients and in CHO counting as professional dietitians. Six of them (four from Switzerland, one from Austria and one from Germany) replied positively and agreed to participate in the study (response rate 50%). An overview of the participants' characteristics is provided in Table 1.

Data Analysis
Data were analysed using RStudio (Version 1.0.153-© 2009-2017 RStudio, Inc., Boston, MA, USA). Dietitians' CHO estimations, as well as differences between visual estimation and the weighing method were described as mean ± standard deviation (SD). Differences between CHO estimation of the two methods were calculated as actual counts and absolute errors using the equation: difference = visual estimation minus weight. Absolute values were calculated in order to evaluate the accuracy of the visual estimation, while actual values were used to calculate over-and under-estimations by the dietitians and GoCARB. Since our data was not normally distributed according to the Shapiro-Wilk test (p < 0.05), we performed non-parametric analyses. Spearman's correlation coefficient and Wilcoxon's signed-rank test were used for the comparison of CHO content calculated by the dietitians' visual estimation, GoCARB estimation, and weighing methods. The interrater reliability was assessed by using the intraclass correlation coefficients (ICCs) and their 95% CI and calculated in a two-way random effects model based on absolute agreement. The Mann-Whitney test was

Participants
Twelve dietitians from German speaking countries (i.e., Germany, Switzerland, and Austria) were personally contacted and informed about the study's nature. Eligibility criteria required individuals to have received a BSc degree in Nutrition and Dietetics, be German speaking, and also to have had at least five years' experience with diabetic patients and in CHO counting as professional dietitians. Six of them (four from Switzerland, one from Austria and one from Germany) replied positively and agreed to participate in the study (response rate 50%). An overview of the participants' characteristics is provided in Table 1.

Data Analysis
Data were analysed using RStudio (Version 1.0.153-© 2009-2017 RStudio, Inc., Boston, MA, USA). Dietitians' CHO estimations, as well as differences between visual estimation and the weighing method were described as mean ± standard deviation (SD). Differences between CHO estimation of the two methods were calculated as actual counts and absolute errors using the equation: difference = visual estimation minus weight. Absolute values were calculated in order to evaluate the accuracy of the visual estimation, while actual values were used to calculate over-and under-estimations by the dietitians and GoCARB. Since our data was not normally distributed according to the Shapiro-Wilk test (p < 0.05), we performed non-parametric analyses. Spearman's correlation coefficient and Wilcoxon's signed-rank test were used for the comparison of CHO content calculated by the dietitians' visual estimation, GoCARB estimation, and weighing methods. The interrater reliability was assessed by using the intraclass correlation coefficients (ICCs) and their 95% CI and calculated in a two-way random effects model based on absolute agreement. The Mann-Whitney test was performed to assess the differences between GoCARB and dietitians' errors in each of the food categories. Significance was set at the 0.05 level.

Results
GoCARB and dietitians achieved comparable accuracies in CHO estimation. The mean absolute error of the dietitians in 54 meals was 14.9 (SD 10.12) grams of CHO versus 14.8 (SD 9.73) grams of CHO for the GoCARB system (p = 0.93).
With regard to the accuracy of the estimations, 35.2% of the dietitians' CHO estimation error range was within ±10 g, whereas GoCARB achieved 37.03% in the same range. Differences between the visual estimation that was made by dietitians and GTR were significantly and highly correlated (r = 0.89, p < 0.001), as presented in Figure 3a. The correlation for GoCARB and GTR was also significant (r = 0.76, p < 0.001), as shown in Figure 3b. performed to assess the differences between GoCARB and dietitians' errors in each of the food categories. Significance was set at the 0.05 level.

Results
GoCARB and dietitians achieved comparable accuracies in CHO estimation. The mean absolute error of the dietitians in 54 meals was 14.9 (SD 10.12) grams of CHO versus 14.8 (SD 9.73) grams of CHO for the GoCARB system (p = 0.93).
With regard to the accuracy of the estimations, 35.2% of the dietitians' CHO estimation error range was within ±10 g, whereas GoCARB achieved 37.03% in the same range. Differences between the visual estimation that was made by dietitians and GTR were significantly and highly correlated (r = 0.89, p < 0.001), as presented in Figure 3a. The correlation for GoCARB and GTR was also significant (r = 0.76, p < 0.001), as shown in Figure 3b.  As well as the errors that were made by GoCARB and dietitians per meal, analysis of meal sizes is important. As shown in Table 2, no statistically significant differences were found between the estimations of dietitians and GoCARB for meal size. However, the larger the size of the meal, the greater were the estimation errors in both methods. As well as the errors that were made by GoCARB and dietitians per meal, analysis of meal sizes is important. As shown in Table 2, no statistically significant differences were found between the estimations of dietitians and GoCARB for meal size. However, the larger the size of the meal, the greater were the estimation errors in both methods. The boxplot of Figure 4 illustrates the errors made by the different methods (GoCARB and dietitians) for each food category. The results per food category are as following: High CHO food categories (breaded meat, chips, mashed potatoes, noodles, pasta, polenta, potatoes, and rice): Both dietitians and GoCARB had difficulty in estimating pasta, rice, and polenta, as demonstrated by the wide range of differences and both over-and under-estimations. GoCARB underestimated potatoes, chips, and mashed potatoes. For mashed potatoes, the same pattern was observed for the dietitians. Dietitians also had a tendency to underestimate the CHO content of breaded meat, while GoCARB overestimated noodles.

Meal Size Absolute Error (grams), Mean ± SD p-Value
Dietitians GoCARB Small (n = 16) 5.9 ± 3.5 8.5 ± 5.6 0.18 Medium (n = 16) 7.6 ± 6.3 11.3 ± 8.9 0.27 Large (n = 16) 19.4 ± 15.2 20.7 ± 11.6 0.41 The boxplot of Figure 4 illustrates the errors made by the different methods (GoCARB and dietitians) for each food category. The results per food category are as following: High CHO food categories (breaded meat, chips, mashed potatoes, noodles, pasta, polenta, potatoes, and rice): Both dietitians and GoCARB had difficulty in estimating pasta, rice, and polenta, as demonstrated by the wide range of differences and both over-and under-estimations. GoCARB underestimated potatoes, chips, and mashed potatoes. For mashed potatoes, the same pattern was observed for the dietitians. Dietitians also had a tendency to underestimate the CHO content of breaded meat, while GoCARB overestimated noodles.  Low CHO food categories (beans, broccoli, carrots, salad, and zucchini): Estimation errors made by both methods were small for the zucchini category. While GoCARB overestimates salad, there was a high level of agreement with dietitians, as evidenced in the close to zero median difference and very short whiskers. Dietitians also tended to underestimate carrots. As regards to beans, GoCARB gave good accuracy, but dietitians tended to overestimate. Both methods underestimated broccoli.
No CHO food categories (fish, meat): GoCARB was 100% accurate in estimating fish and meat. On the other hand, dietitians tended to overestimate the CHO content for both fish and meat.
Overall, these results indicate that the higher the CHO content of a food category, the more challenging it is to estimate accurately the amount of CHO content. Moreover, it is observed that GoCARB had considerable difficulty in estimating rice, pasta, potatoes, and mashed potatoes, while dietitians had problems with pasta, chips, rice, and polenta.
Finally, it has to be noted that the result for ICC among dietitians was 0.86 with 95% CI (0.78, 0.91), which implies good to excellent reliability. However, there are differences in the replicability of their estimations, as shown in Table 3. Table 3. Standard deviation (grams) of replicated results of dietitians' estimations.

Discussion
In this study, we examined the accuracy of the visual estimation method of experienced dietitians in German speaking countries and compared this to the estimations of the GoCARB system. When we compared GTR and visual estimation, we found that both dietitians' visual estimations and GoCARB's estimations correlated with the weighing method. Both of the methods had a mean error of less than 15 g. In addition, both dietitians and GoCARB found the CHO content estimation of large meals challenging, with the dietitians reporting higher errors in the cases of pasta, polenta, noodles, salad, and rice. These findings indicate that despite the good performance of both dietitians and GoCARB there is room for improvement, namely in training and more detailed guidelines for dietitians and optimised algorithms for the computational system.
Studies to date have focused on the identification of food items [24] and the estimation of portion size [24,25]. To our knowledge, there is no published study comparing differences in CHO estimation by a computerised system running on smartphone and by dietitians to the actual CHO content of food items. Previous studies that recruited nutrition students and aimed to estimate portion size gave inaccurate results. More specifically, in a study that was conducted in 2010 [26], it was pointed out that only 18.5% of the estimates were considered accurate (±10% of the actual weight). Moreover, another study conducted among nutrition and dietetics students found that only 38% of the estimations reached ±10% of the actual food weight [24]. Thus, it was concluded that nutrition students need more training, in order to quantify the portion size more accurately. Lastly, a comparison study that was implemented between older adults, young adults, and nutritionists regarding portion size estimation, showed that the nutritionists achieved more accurate results when compared to the other groups [27]. In the current study, we compared the visual estimation of dietitians with the GTR and showed that food items, such as pasta, rice, and chips are difficult to estimate, which corroborates previous observations [28]. The results of our study indicate that among vegetables, broccoli, and carrots were mostly underestimated. These results are consistent with those of other studies [24,26] on other cruciferous vegetables, such as cauliflower. In contrast to earlier findings that mixed salads were difficult to estimate [28], the current study found that leafy vegetables were one of the most accurately estimated food category. In another study [29], vegetables tended to be underestimated by direct observation that was made by ten trained individuals. Furthermore, some studies point out that there were significant underestimations in leafy vegetables, and especially lettuce [24,26], while another study conducted in 15 adolescents found out that there was a large overestimation regarding lettuce [30].
There are some possible explanations for the inaccurate estimations that are made by dietitians, which may be attributed to the fact that not all of them used the same method for carbohydrate estimation (four out of six dietitians used the Diabetes Exchange List). Moreover, the lack of specific guidelines for the proposed CHO counting method for breaded food may have contributed to the inaccurate estimation of this food category by dietitians (Figure 1a) [23]. Furthermore, the presence of sauce around the meat as shown in the images may have affected some dietitians' estimations of meat (Figure 1b), since they added possible CHOs in the sauce on top of their initial estimation. On the contrary, when GoCARB recognises "meat", the result of zero CHO is automatically recorded. Similarly, there is a thin crust around some images containing fish, and this may have caused dietitians to overestimate fish (Figure 1c). Additionally, a possible explanation for the underestimation of "broccoli" (Figure 1d) from both GoCARB and dietitians might be that "broccoli" belongs to the wide category of "vegetables" in CHO counting guidelines, and thus if it is measured by eye as cups, the corresponding amount of CHO is only 5 g/cup [23]. However, the USDA database (Basic report: 11091, Broccoli, cooked, boiled, drained, without salt) gives 11.20 g of CHO per broccoli cup [31]. The current version of GoCARB needs to be optimised, especially in the food categories of potatoes, chips, polenta, and mashed potatoes, since in these categories there is systemic underestimation, which may influence values for the overall amount of CHO consumed in a meal.
Further research should emphasise on the accuracy of dietitians' visual estimation of all the macronutrients (CHO, protein, fat), in order to obtain a generalised conclusion on the efficacy of this approach. Moreover, more dietitians should be recruited. Additionally, the food types that are included in the current prototype should be expanded to cover a wider spectrum of eating habits and cultural differences. In this case, GoCARB could be an alternative tool of dietary assessment of daily eating patterns.
The strength of the study is that it is the first study of its kind that compares the visual estimation by expert dietitians with an app. Since the approaches used are standardised, the study can be replicated in different areas in order to give comparable findings. This study has some limitations. Firstly, a small sample of dietitians was included because of the strict inclusion criteria, and especially the fact that the study included only German speaking dietitians, since Central European meals were provided and dietitians from the aforementioned countries were more likely to be familiar with that cuisine. However, this is the primary population on whom we tested GoCARB. Further studies from dietitians of different countries should be carried out, in order to draw a solid conclusion on the visual estimation by dietitians around Europe. Moreover, the meals were not randomly selected, but were organised so that all of them included three different sizes and each of them three different items. As the trial was conducted in a German speaking Swiss hospital and all the meals were prepared from the same kitchen, we cannot assume that all meals with the same ingredients would yield the same result. Furthermore, the estimation of CHO content of the meals may had been more precise, if the dietitians had actually viewed the plates in real time. However, this scenario was not realistic due to dietitians' residency, as well as the fact that, nowadays, daily practice of the dietitians often includes remote diet assessment of individuals with diabetes based on photo food diaries. Lastly, the meal photos that were provided did not contain overlapping food items. However, the current version of GoCARB and/or dietitians may not perform well in estimating CHO content of dishes containing mixed ingredients (e.g., lasagna), sauces or soups. As mentioned before, the current version of GoCARB contains a limited number of food categories and thus their number should be expanded. Moreover, complex meals are challenging, while other factors that influence blood glucose response, such as glycemic index or presence of fat in the meal, are difficult to be quantified by the used technology, and thus have not been taken into consideration.
In addition, since various platforms and databases receive feedback on meal portion size/nutrient content from different annotators/coders (e.g., healthcare professionals, such as nurses, dietitians or people with unknown profession), there is ambivalence about their qualification, training, and expertise in the field. Since there is lack of information about reliable apps and also insufficient evidence regarding the benefits of each app, it is important that there is an app that performs equivalently to professional dietitians who are experts in the field of carbohydrate estimation. Thus, GoCARB is a decision support system for CHO estimation, which may be beneficial for individuals who cannot accurately estimate the CHO content of the meals and also for those who do not have adequate training in CHO counting. It can also be a useful tool for CHO estimation training of people who have prediabetes and can potentially assist diet assessment of the general population who does not receive relative training. In this way, users and health professionals can receive accurate information and recommend a validated mHealth app to patients.

Conclusions
This study has investigated possible differences between visual estimation by expert dietitians and the estimations of a computer vision based app, GoCARB. The app may not only make it easier for individuals with diabetes to estimate CHO (by reducing the CHO estimation error), but also offers the option of an easy, accurate, and almost real-time estimation of the CHO content of meals on plates, and thus help to enhance and improve diabetes self-management. GoCARB may also be a useful training tool for those who have limited or no access to dietary education.