Facial Appearance and Psychosocial Features in Orthognathic Surgery: A FACE-Q- and 3D Facial Image-Based Comparative Study of Patient-, Clinician-, and Lay-Observer-Reported Outcomes

Outcome measures reported by patients, clinicians, and lay-observers can help to tailor treatment plans to meet patients’ needs. This study evaluated orthognathic surgery (OGS) outcomes using pre- and post-OGS patients’ (n = 84) FACE-Q reports, and a three-dimensional facial photograph-based panel assessment of facial appearance and psychosocial parameters, with 96 blinded layperson and orthodontic and surgical professional raters, and verified whether there were correlations between these outcome measurement tools. Post-OGS FACE-Q and panel assessment measurements showed significant (p < 0.001) differences from pre-OGS measurements. Pre-OGS patients’ FACE-Q scores were significantly (p < 0.01) lower than normal, age-, gender-, and ethnicity-matched individuals’ (n = 54) FACE-Q scores, with no differences in post-OGS comparisons. The FACE-Q overall facial appearance scale had a low, statistically significant (p < 0.001) correlation to the facial-aesthetic-based panel assessment, but no correlation to the FACE-Q lower face and lips scales. No significant correlation was observed between the FACE-Q and panel assessment psychosocial-related scales. This study demonstrates that OGS treatment positively influences the facial appearance and psychosocial-related perceptions of patients, clinicians and lay observers, but that there is only a low, or no, correlation between the FACE-Q and panel assessment tools. Future investigations may consider the inclusion of both tools as OGS treatment endpoints for the improvement of patient-centered care, and guiding the health-system-related decision-making processes of multidisciplinary teams, policymakers, and other stakeholders.


Introduction
Orthognathic surgery (OGS) has been demonstrated to correct a wide spectrum of dentofacial deformities [1][2][3]. There is an ever-growing body of OGS outcomes research, focused not only on functional occlusion, but also on the facial appearance and psychosocial domains [4,5].
The recent introduction of the FACE-Q tool, a cross-culturally developed and facial-procedurespecific PRO instrument, has revolutionized the field of facial surgery outcome research by enabling the detection of meaningful and interpretable facial features and treatment-related changes and benefits [6][7][8][9]. However, only a few OGS studies have adopted the FACE-Q tool [10,11]. In this setting, the impact of different surgical interventions, including OGS procedures, on facial aesthetic and social perceptions, has been demonstrated using the panel assessment tool, a metric that is centered on clinician-reported outcome (ClinRO, professionals using medical or dental judgments) and observer-reported outcome (ObsRO, judgments from laypersons with no formal training) principles [12][13][14][15][16][17][18][19][20][21][22][23].
Interestingly, certain FACE-Q scales and panel assessment scales recently adopted in OGS studies have interconnected concepts of interest, including facial appearance (i.e., the FACE-Q facial appraisal scales and the panel assessment's beautiful, attractive, and pleasant facial aesthetic scales) and psychosocial (i.e., the FACE-Q psychosocial scales and the panel assessment's psychosocial perception of personality traits and emotional expressions scales) concepts [10,11,[17][18][19][20][21][22][23]. So far, no OGS study has applied the PRO-based FACE-Q and ClinRO-and ObsRO-based panel assessment measurement tools in the same cohort of patients with a dentofacial deformity. Moreover, we are not aware of any investigation focused on the possible correlations between these outcome measurement tools. Understanding the multidimensional impact of OGS treatment by comprehending PRO-, ClinRO-, and ObsRO-based tools may support professionals (psychologists, dentists, orthodontists, ear, nose, and throat surgeons, plastic surgeons, head and neck surgeons, oral surgeons, and maxillofacial surgeons) working in multidisciplinary teams to provide better counseling to patients and family members, to set the expectations of preoperative patients with respect to facial appearance and psychosocial aspects, and to anticipate potential postoperative care profiles for patients with the early establishment of psychosocial support.
The primary purpose of this study was to assess the pre-versus post-OGS treatment outcomes using FACE-Q facial appearance and psychosocial reports and ClinRO-and ObsRO-based panel assessments of facial aesthetic, personality trait, and emotional expression parameters. A secondary purpose was to verify whether or not there are correlations between these outcome measurement tools.

Patients and Methods
A comparative cross-sectional study was performed, as shown in Figure 1, on patients with a dentofacial deformity (skeletal Class II and III deformities) who were managed by the same multidisciplinary team following standard pre-and post-orthognathic surgery (OGS) treatment principles [24][25][26][27][28] between 2016 and 2017. Demographic, clinical, and outcome (FACE-Q and panel assessment tool) data were collected from the Chang Gung Craniofacial Research Center's database. Patients with an abnormal mentality that would impair the instrument's application, patients with a normal occlusion, any syndromic diagnosis, or who had previously undergone facial surgery or a facial aesthetic procedure, and patients with an incomplete recording or postoperative follow-up (<12 months), were excluded from this study.
The study was approved by the Institutional Review Board (IRB no. 104-A253B) and conducted in compliance with the 1975 Declaration of Helsinki as amended in 1983. Patients provided written consent for the use of their images. The study was approved by the Institutional Review Board (IRB no. 104-A253B) and conducted in compliance with the 1975 Declaration of Helsinki as amended in 1983. Patients provided written consent for the use of their images.

FACE-Q Tool
Taiwanese Chinese patients completed the validated Mandarin Chinese version of FACE-Q [29] during clinical appointments before or after (>12 months) an OGS procedure. Five scales were applied, namely three in the facial appraisal domain (the satisfaction with facial appearance overall, satisfaction with lower face and jawline, and satisfaction with lips scales), and two in the quality of life domain (the social function and psychological well-being scales) [6][7][8][9][10][11]: (a) Satisfaction with facial appearance overall: Measures patient satisfaction with the overall appearance of their face. (e) Psychological well-being: Measures psychological well-being in terms of a series of positively-worded statements. All FACE-Q scales ask patients to answer items with facial appearance in mind. The sum score for each scale was converted to an equivalent Rash score, ranging from 0 to 100, with higher values indicating a greater satisfaction with appearance or superior quality of life [6][7][8][9][10][11].

FACE-Q Tool
Taiwanese Chinese patients completed the validated Mandarin Chinese version of FACE-Q [29] during clinical appointments before or after (>12 months) an OGS procedure. Five scales were applied, namely three in the facial appraisal domain (the satisfaction with facial appearance overall, satisfaction with lower face and jawline, and satisfaction with lips scales), and two in the quality of life domain (the social function and psychological well-being scales) [6][7][8][9][10][11]: All FACE-Q scales ask patients to answer items with facial appearance in mind. The sum score for each scale was converted to an equivalent Rash score, ranging from 0 to 100, with higher values indicating a greater satisfaction with appearance or superior quality of life [6][7][8][9][10][11].
Normal Taiwanese Chinese individuals' FACE-Q reports (no history of facial deformity, trauma, or surgery) were retrieved from the Chang Gung Craniofacial Research Center's database, matched for age and gender, and adopted for a comparative analysis.

Three-Dimensional (3D)-Image-Based Panel Assessment Tool
Three-dimensional frontal and profile photographic imaging data of preoperative and postoperative (>12 months) patients were acquired using the 3dMD system (3dMD LLC, Atlanta, GA, USA) under standard conditions (a permanent installation with a fixed ambient lighting system and individuals in a fixed position with a natural head position, a relaxed facial musculature, a closed mouth, and wearing a thin elastic nylon cap to keep the hair away from the face) [30]. Using the 3dMD Vultus software (version 2.2, 3dMD Inc., Atlanta, GA, USA), a standard positioning of the three-dimensional facial images was achieved through the use of soft tissue reference planes that are meaningfully correlated to craniofacial skeleton orientation [30,31]. The system was calibrated before the image capture process.
Presentations (colored slides with frontal and profile views of the right and left sides, respectively) were delivered using PowerPoint for Mac (Microsoft Corporation, Redmond, WA, USA) on a 15-inch MacBook Pro (Apple, Inc., Cupertino, CA, USA). All preoperative and postoperative image slides were randomly distributed and rated by a panel composed of 96 raters, with no previous or current relationship to the patients using previously-published 7-point Likert scales [17][18][19][20][21][22][23]. We used three scales for facial aesthetic parameters (beautiful, attractive and pleasant), five social scales for personality trait parameters (intelligent, friendly, threatening, trustworthy and dominant), and six social scales for emotional expression parameters (angry, surprised, happy, sad, afraid and disgusted) (see Supplemental Materials, Table S1) [17][18][19][20][21][22][23]. For the ObsRO assessment, 72 laypersons (36 women, aged 18-27 years old) with no specialized professional training (i.e., no dental, medical, or psychology background) were randomly recruited based on incidental contacts from members of the general community. For the ClinRO assessment, 24 professionals (12 women) with dental or surgical training (12 orthodontics and 12 plastic surgeons) were randomly selected from the Taiwan Association of Orthodontists and the Taiwanese Society of Plastic Surgery. All raters received the same instruction and guidance before their appraisal of the 3D image set. Using one spreadsheet per slide, the rater wrote down (marking a circle corresponding to a choice from 1 to 7 on a 7-point Likert scale) his/her perceptions of the patient under appraisal with respect to the facial aesthetic, personality trait and emotional expression parameters. Raters were blinded to the purpose of the study, masked to the operative status of each image, and were not permitted to go back in the presentation. Ten percent of the images were randomly replicated for intra-rater reliability. The scores (1-7) for each scale were averaged for all pre-and post-OGS photographs, and were then adopted in the analysis.

Statistical Analysis
For the descriptive analysis, the mean was used for metric variables, and percentages were used for categorical variables. The data distribution was verified by the Kolmogorov-Smirnov test. The Wilcoxon signed-rank, Kruskal-Wallis, Spearman's correlation, Cronbach's Alpha, and intraclass correlation coefficient (ICC) tests were used for the analysis [32][33][34][35][36][37]. A Bonferroni correction was applied for multiple comparisons. The correlations among the FACE-Q scales were predicted to be moderate because these scales measure different but related features. For FACE-Q scales versus panel-assessment-related scales, the correlations were predicted to be low or non-significant, as these outcome measurement tools assess different constructs within the broader facial appearance and psychosocial domains. Spearman's rank correlation coefficients were interpreted as high (r > 0.70), moderate (r = 0.30-0.70), and low (r < 0.30). Two-sided values of p < 0.05 were considered statistically significant. All analyses were performed using SPSS version 20.0 (Chicago, IL, USA).

Results
Eighty-four patients (22.4 ± 1.4 years of age at time of the FACE-Q report, 50% females, and 84% with a skeletal class III deformity) and 54 normal, age-, gender-, and ethnicity-matched individuals were included in this study (Table 1).

FACE−Q Tool
Post−OGS FACE−Q facial appraisal (satisfaction with facial appearance overall, satisfaction with lower face and jawline, and satisfaction with lips scales) and quality of life (social function and psychological well−being scales) scores were significantly (p < 0.05) higher than pre−OGS scores. The pre−OGS scores were significantly (p < 0.05) lower than the normal individuals' scores. No significant difference was found in the comparisons between patients' post−OGS reports and normal individuals' reports (Table 1).

Correlation Evaluation
For the FACE−Q tool, significant (p < 0.05, moderate coefficients) correlations were observed between the facial appraisal (satisfaction with facial appearance overall, satisfaction with lower face and jawline, and satisfaction with lips scales) and quality of life (social function and psychological well−being scales) domains (Table 5).  For the panel assessment tool, the facial aesthetic scales for the beautiful and attractive parameters demonstrated significant (p < 0.05, low−to−moderate coefficients) correlations in all three groups of raters (Table 6; see also Supplemental Materials, Tables S3 and S4). The "pleasant" parameter had significant (p < 0.05, low coefficients) correlations for pre−OGS clinicians' scores and non−significant correlations for post−OGS clinicians' scores and pre− and post−OGS observers' scores (Table 6; see also  Supplemental Materials, Tables S3 and S4). No significant correlations were observed for the 11 social scales (personality trait and emotional expression parameters) ( Table 7; Table 8; see also Supplemental  Materials, Tables S5-S8). The panel assessment of facial aesthetics presented significant (p < 0.05, low correlation coefficients) correlations with the FACE−Q satisfaction with the overall facial appearance scale, but had no significant correlation with the satisfaction with the lower face and jawline and satisfaction with lips scales (Table 6;  see also Supplemental Materials, Tables S3 and S4). There was no significant correlation between the panel assessment of personality traits and emotional expressions and the FACE−Q social function and psychological well−being scales (Tables 7 and 8); see also Supplemental Materials, Tables S5-S8).
In previous cross−sectional studies, OGS treatment was found to improve FACE−Q reports with respect to the facial appraisal and quality of life scales [10,11]. It has also been demonstrated that the ObsRO−based panel assessment scores significantly change after OGS treatment with respect to the perceptions of facial aesthetics and personality trait and emotional expression parameters [17][18][19][20][21][22][23]. In this study, we adopted a validated and reliable PRO−based FACE−Q tool and a high−quality, high−precision 3D facial surface image−based panel assessment. Our current results reinforce these previous findings [10,11,[17][18][19][20][21][22][23], as patients' and laypersons' post−OGS scores were significantly different from their pre−OGS measurements. We also contribute to the literature on OGS by demonstrating the significant impact of OGS treatment on facial aesthetics, personality traits, and emotional expressions as perceived by clinicians with dental and surgical backgrounds. Additionally, using normal individuals' FACE−Q data as a reference point, we revealed substantial modifications to before and after OGS treatment scores for all tested scales.
Previous non−OGS investigations have shown that facial personality traits can influence careers, financial success and political leadership [38][39][40][41][42]. Facial emotional expressions play a key role in guiding social judgments, including deciding whether or not to approach another person [37][38][39][40][41][42][43]. Importantly, abnormalities in these facial−aesthetics−based social judgments can result in different degrees of socially inappropriate and risky behavior [37,38]. Interestingly, a comparative analysis revealed a positive change in personality traits and emotional expressions in OGS−treated patients compared to their peers who had not undergone an OGS procedure. Additionally, our findings, along with previous results [10,11,[17][18][19][20][21][22][23], suggest that the overall effect of OGS treatment has the potential to improve patients', clinicians' and observers' perceptions across many aspects of facial appearance and social interaction. Further investigation is necessary to verify if FACE−Q and panel assessment tools can predict these socially relevant parameters with important real−world consequences in the OGS population.
In this investigation, we also explored the question of whether the commonly used ClinRO− and ObsRO−based panel assessment tools have any correlation to the recently developed PRO−based FACE−Q tool. In the literature, the presence or absence of associations between two different outcome measurement tools has implications for the ongoing discussions on how to interpret and apply each existing measurement tool in clinical practice and research [44][45][46][47].
For our study, the tests for correlations were based on predefined propositions about expected correlations, as they may attenuate the risk of bias for the described results as well as enable us to avoid alternative explanations after data collection and analysis. The potential correlations between the FACE−Q and panel assessment tools were established with respect to the facial appearance and psychosocial domains. To support the testing for correlations, for example, in the psychosocial domain, we adopted propositions from previous studies: (1) A point that patients repeatedly made during the development of the FACE−Q psychosocial well−being scale was to feel more confidence at different levels of social interactions (including in group situations or with strangers) after facial treatment [6]; and (2) the panel assessment of social perceptions was justified by the assumption that the included groups of raters are representative of the persons with whom the patients may randomly interact on a daily basis [17][18][19][20][21][22][23].
As shown in Table 5, the correlations between the FACE−Q scales had moderate correlation coefficients. This was also demonstrated by the original developers of the FACE−Q tool [6][7][8][9]. It reinforces that each specific FACE−Q scale measures particular features that matter to patients [6][7][8][9][10][11]. With regard to the remaining correlations, only the FACE−Q satisfaction with the facial appearance overall scale had significant correlations with the panel assessment of facial aesthetics; however, it only had a low correlation coefficient. Furthermore, we found no significant correlation between the facial−aesthetic−related "pleasant" parameter and some groups of raters (Table 6; see also Supplemental  Materials, Tables S3 and S4), suggesting that the tested tools were appraising the facial appearance domain in a different way. The FACE−Q satisfaction with the facial appearance overall scale is composed of multiple items that were carefully selected from a pool of potential items using advanced qualitative and quantitative research methods [6][7][8][9]. This instrument development process resulted in a particular FACE−Q scale that captures patients' satisfaction with facial appearance from a global perspective, with no focus on particular anatomical regions of the face [6][7][8][9]. The panel assessment of facial aesthetics also represents an appraisal of patients' face photographs from a global perspective. However, the unidimensional beautiful, attractive, and pleasant scales [17][18][19][20][21][22][23] are not as comprehensive as the FACE−Q satisfaction with the facial appearance overall scale [6][7][8][9].
We did not find a significant correlation between the FACE−Q satisfaction with the lower face and jawline and the satisfaction with the lips scales and the panel assessment of facial aesthetics. As these FACE−Q scales were specifically developed to capture facial−appearance−related details for each facial anatomical area [6][7][8][9], the patients provided a score for each particular scale mainly by considering specific regions of the face. In contrast, due to the characteristics of facial aesthetic scales (generic and unidimensional features) adopted for panel assessment [17][18][19][20][21][22][23], it is plausible to suppose that raters (including professionals who specialize in surgical and orthodontic fields) primarily rated the face as an overall unit, with the lower face and lips regions not necessarily being considered as targets of appraisal. Further studies should further investigate this issue with the inclusion of a panel assessment of both full−face and cropped facial images (lower face and lip regions), by using facial anatomical region−specific scales (e.g., lip attractiveness).
Our results show that the panel assessment of personality traits and emotional expressions did not have a significant correlation with the FACE−Q social and psychological well−being scales. Similar explanations regarding the dimensionality and comprehensiveness of the tested scales as those detailed above may be applied to these findings. Furthermore, as the psychosocial domain should be interpreted as being represented by an integrated biopsychosocial model of health status that accounts for the complex interplay, not only of psychological factors, but also of sociodemographic and environmental components [48], several features not directly measured in our study may have influenced patients', clinicians' and observers' perceptions regarding pre− and post−OGS treatment status as well as the tested correlations. As the panel assessment was an indirect appraisal of patients' faces, raters' judgments may have been influenced by their prior experiences as well as other non−controlled−for factors (e.g., eyes and beard aspects), that can lead to scoring that is unconnected to OGS−treatment−related features. On the other hand, the FACE−Q reports are probably subject to a marked influence from the expectations and results (pre−OGS and post−OGS measurements, respectively) of the OGS treatment itself on the patients' scores.
Therefore, while the results of our panel assessment, as well as the results of previous studies [17][18][19][20][21][22][23], demonstrated that post−OGS patients' facial images were perceived to be more friendly, happy, trustworthy, intelligent and dominant and less threatening, angry, sad, afraid and disgusted, the relationship between these parameters and the patients' perspective about themselves in a "real−world environment" should be verified in future studies using study designs with alternate methodologies.
This study is not without limitations. As only patients matched for age, gender and type of skeletal deformity were included, a relatively small final sample was adopted in the analysis. The number of enrolled FACE−Q reports and facial photographs was, however, superior to previous studies evaluating similar outcome measurement tools [10,11,[17][18][19][20][21][22][23]. Moreover, our cohort was constituted by patients with no stratification per facial appearance as judged by the authors or treating professionals, reducing the bias related to this analysis based only on the "best surgical results" [19,20]. Extrapolations from the current findings on the impact of OGS treatment on the tested parameters should be carefully made. This study was grounded on patients who were managed by senior professionals (orthodontists and surgeons) working in a multidisciplinary OGS team with specific technical strategies, such as 3D simulation, a digital occlusion setup, a surgery−first model, the single−splint, two−jaw surgery technique, and modified face bow principles [24][25][26][27][28]. The context of data collection should also be considered when interpreting these results, as facial aesthetics, personality traits and emotional expressions are perceived differently by individuals of different cultural backgrounds [49,50]. There are particular nuances for facial−appearance−related treatments and appraisals in Asians compared to Caucasians [51,52].
Similarly to previous PRO−based cross−sectional studies [10,11,53], the relationships identified using correlation coefficients should be interpreted as associations and not causal relations [33]. The present study may act as a data reference to generate hypotheses that justify further investigations. Regarding the panel assessment tool, we did not include professionals with different training backgrounds. In contrast to previous OGS studies that used the panel assessment tool [17][18][19][20][21][22][23], we have divided the raters into two groups: Clinicians and observers. Each group of raters demonstrated good to excellent intra− and inter−rater reliability, indicating that the panel assessment data were consistently collected. However, only low correlations were observed between lay observers and clinicians, indicating that the presence of a background in surgery or orthodontics may have some influence on an appraisal of facial aesthetic, personality trait and emotional expression features. Based on our findings, we suggest that some of the results from panel assessments in previous studies on these same features should be cautiously interpreted, as no explicit criteria were adopted to separate raters with and without specialized training [17][18][19][20][21][22][23]. Further research may increase the number of observers and also include OGS−treated patients and other professionals (e.g., general dentists, psychologists, ear, nose and throat surgeons, head and neck surgeons, oral surgeons, and maxillofacial surgeons) as raters to help us better understand these OGS outcomes. Other groups are also encouraged to assess their OGS cohorts to verify and expand our findings by enrolling a large sample of patients managed with a different orthodontic-surgical approach, as well as by performing further analyses, including an evaluation of the potential impact of independent variables (e.g., sociodemographic, clinical and surgical information) on FACE−Q and panel assessment tools. Further OGS outcome measurement tools, such as alternative scales for panel assessment (e.g., perceptions of symmetry, presence of lip cant, and harmony of smile), may also be tested.
Despite these shortcomings, the results of the present study enable us to provide practical suggestions for future OGS−outcome−based research and clinical practice. Institutions, clinicians, healthcare networks, and policymakers use results from clinical trials as the foundation for healthcare decision−making when managing individual patients or particular populations [44][45][46][47][61][62][63][64]. To design valid and meaningful clinical trials, the fundamental issue is "which form of therapeutic management presents the highest possibility of being more beneficial for the least cost and inconvenience (i.e., risk-benefit ratio) to the patient as well as to the provider?" [61]. The selection of a proper outcome measurement tool is, therefore, a key component that influences the value of outcome−based research [44][45][46][47][61][62][63][64]. As both the panel assessment tool and the FACE−Q tool were found to be capable of distinguishing patients before and after OGS treatment, consistent with previous findings [10,11,[17][18][19][20][21][22][23], the lack of statistical significance for most of the tested correlations was probably not related to an inability of each tool to capture relevant factors connected with the patient (facial appearance and social factors), the disease (dentofacial deformity) and OGS treatment. In addition, considering the inherent bias in, and limitations to, each outcome measure tool, recent literature has counseled that it can be advantageous to use different tools to complement one another [44][45][46][47]. We may, therefore, advocate for the adoption of panel assessment and FACE−Q tools, either in isolation (this is acceptable if the study is constructed over a well−defined hypothesis and the restrictions of each tool are accepted) or in combination (two or more outcome measurement tools), but not as interchangeable tools. As such, capturing FACE−Q data would be a valuable addition to a panel assessment (and vice−versa) as, in fact, one outcome measurement tool may provide useful and complementary information beyond that provided by another one about the domains under consideration. For this, it is of paramount importance that each specific tool is appropriately selected using an a priori hypothesis regarding the clinical scenario or treatment outcome.
For clinical practice, the integration of ClinRO− and ObsRO−based metrics with appropriate PRO−based measures should allow for multidisciplinary teams to move toward patient−centered care. It is reasonable to better educate future OGS patients on the differences among clinicians', observers' and actual patients' perspectives, as they reflect dissimilar but complementary contexts. FACE−Q data may help health−care professionals predict how future patients are likely to react to a range of concepts within the facial appearance and psychosocial domains, while a panel assessment may help these professionals understand how a patient's facial appearance is likely to generate perceptions among the general public and clinicians. Policymakers and other stakeholders may also apply the current findings in strategic, science−driven and health−system−related decision−making processes, and to guide investment decisions for the management of patients with facial deformities and malocclusion.

Conclusions
This study demonstrates that: (1) OGS treatment positively influences patients' facial appearance and psychosocial perceptions, as well as clinicians' and lay observers' perceptions of the facial aesthetics, personality traits, and emotional expressions of OGS patients; and (2) there is a low, or no, correlation between the FACE−Q and panel assessment tools.
Supplementary Materials: The following are available online at http://www.mdpi.com/2077-0383/8/6/909/s1. Table S1: Rating scales adopted in the panel assessment, Table S2: Reliability data for the panel assessment tool used in this study, Table S3