Customer Complaints Analysis Using Text Mining and Outcome-Driven Innovation Method for Market-Oriented Product Development

The rapid increase in the quantity of customer data has promoted the necessity to analyse these data. Recent progress in text mining has enabled analysis of unstructured text data such as customer suggestions, customer complaints and customer feedback. Much research has been attempted to use insights gained from text mining to identify customer needs to guide development of market-oriented products. However, the previous research has a drawback that identifies limited customer needs based on product features. To overcome the limitation, this paper presents application of text mining analysis of customer complaints to identify customers’ true needs by using the Outcome-Driven Innovation (ODI) method. This paper provides a method to analyse customer complaints by using the concept of job. The ODI-based analysis contributes to identification of customer latent needs during the pre-execution and post-execution steps of product use by customers that previous methods cannot discover. To explain how the proposed method can identify customer requirements, we present a case study of stand-type air conditioners. The analysis identified two needs that experts had not identified but regarded as important. This research helps to identify requirements of all the points at which customers want to obtain help from the product.


Introduction
Considering customer needs during the product development process is a useful method to minimize the risk of developing inappropriate products [1]. Customer needs have helped to direct research and development (R&D) to launch new products and services [2]. Customers' suggestions and complaints can generate ideas to determine product concepts. Moreover, considering customer requirements during new product development (NPD) can increase the number of novel ideas and thereby improve the quality of innovation [3]. Empirical study has demonstrated that identifying customer needs affects development activities and new products [4]. Therefore, identifying customer requirements offers a starting point for effective and efficient planning of companies' overall development activities such as R&D and launch of new products or services.
Many studies [5][6][7][8][9][10][11][12][13][14] have analysed unstructured text data such as customer suggestions, customer complaints and customer feedback to identify customer needs for the product development. However, the previous research used keyword extraction, regression analysis, clustering method, association rule mining, classifier, latent Dirichlet allocation and the Kano model; they only identified a limited set of customer needs based on features of a product. Customer requirements that are identified in this way are only a few of the requirements [15,16]. Previous studies focused on only customer needs that are directly associated with an execution step of a product, whereas several steps such as order, installation, monitoring and arrangement can be involved when customers use a product [17]. Customer needs in the pre-usage and the post-usage steps of a product should be considered.
To solve the problem, we use the Outcome-Driven Innovation (ODI) method, which identifies real customer needs in various steps of product use by customers; the purpose is to develop innovative products or services [18]. The method considers a job map, which identifies latent customer needs based on the concept of a job; and a format of customer requirement, which captures unambiguous customer needs. Many firms (e.g., Bosch, Microsoft, Kroll Ontrack, Hussmann and Abbott Medical Optics) have used the ODI method to innovate products and services.
However, the ODI method to identify customer needs lacks objectivity because advance information is totally dependent on managers who know about customers [19]. The ODI method needs supporting tools to derive customer requirements, not from subjective opinions but from real customer feedback. In these circumstances, text mining can help managers to analyse customer feedback quickly [20]. Consequently, to help managers to obtain objective evidence for customer requirements, tools to mine textual data are essential.
This paper suggests a method to identify customer requirements from customer complaints by applying text mining to the ODI method. This research fills a gap in text mining literature for product development by providing a method to explore customer needs in various steps of product use by customers. This study also supports the ODI method by providing a method to analyse large amounts of customer complaints.
The proposed method first collects customer complaints from customer service centres, then feature candidates are extracted. The extracted feature candidates are selected in terms of the concept of job and clustering. Features are used to construct a similarity matrix, which is used in clustering analysis to identify customer requirements. Ultimately, customers' true requirements are identified and they guide subsequent product activities including product concept, technology development and project prioritization. A case study of stand-type air conditioners is given to explain how the proposed approach is used.
The remainder of this paper is organized as follows. Section 2 reviews the theoretical background of the proposed method. Section 3 explains the proposed method to capture customer requirements and Section 4 presents a case study of stand-type air conditioners. Section 5 presents conclusions and future work.

Text Mining for Market-Oriented Product Development
Development of smartphones and proliferation of electronic word-of-mouth channels have expanded opportunities for customers to express their opinions [21][22][23]. Web harvesting techniques enable companies to collect or save customer reviews or complaints on web pages by automatically scanning many of them [24,25]. Recent progress in text mining has enabled analysis of the unstructured text data of customers [26][27][28]. Therefore, companies can analyse customer needs from a large amount of the textual data in a short time by using text mining and this analysis can reduce the time to develop products to respond to customer needs [20].
Much research for product development has been performed to identify customer needs from customer reviews or complaints by using text mining. Customer concerns were identified by extracting salient topics, then ranking them based on automatic summarization of customer reviews of a target product. The identified customer concerns provided preferred features of the product and the reasons for these preferences [5]. Decker and Trusov [6] used text-mining techniques to identify the pros and cons of product attributes, then used negative binomial regression to estimate the influence of each attribute. Park and Lee [7] constructed a 'voice of customer' vector by extracting product features from customer complaints about a mobile phone, then used clustering analysis to segment customers according to product features that they complained about. Then the authors used co-word analysis to inspect each customer group by identifying useful keywords. The result was used to devise specifications of the new product. Aguwa et al. [8] used text mining and association rule mining to identify customer needs from qualitative and quantitative customer data by transforming customer input to engineering input. A probabilistic Naïve Bayes-based classifier has been proposed to automatically identify customer requirements by developing preferred features of a product's feature [9]. Aguwa et al. [10] used text mining to extract significant attributes of a product and to develop a real-time system to monitor customer feedback by learning association rules; the system includes a unique model that uses fuzzy logic to identify negative and positive feedback. Liang et al. [11] used the topic model of latent Dirichlet allocation (LDA) to identify product features that customers frequently mentioned, then identified product problems by exploring the relationship of product features using association rule mining. Qiao et al. [12] developed a new LDA model to identify critical product defects by defining keywords related to specific product. Wang et al. [13] used sentiment analysis with regression model to identify the impact of features of washing machine such as type, colour, display and energy-efficiency. Min et al. [14] developed a review-based Kano model that to identifies customer needs by evaluating customer satisfaction with product attributes.
Prior studies [5][6][7][8][9][10][11][12][13][14] identified customer needs on the basis of product features by analysing unstructured text data provided by customers but this approach is insufficient to represent customers' true requirements. When a customer uses a product, several steps are involved, such as ordering, installing, using, monitoring and arranging [29]. Existing research has tended to overlook latent customer needs in the pre-usage and the post-usage steps by focusing on a step of usage. However, customer requirements are not derived in the various product usage steps, so the combination of preferred product features in a usage step cannot be considered as a final product outcome that customers genuinely want. Therefore, existing studies result in imitative products, because customers tend to seek unusual product features that existing competitors already offer [30].
To identify genuine customer needs, an alternative model must be developed to analyse unstructured text data of customers from new perspective. This paper suggests a method to identify customers' true needs from customer complaints by using the concept of job in the ODI method.

Outcome-Driven Innovation Method
ODI identifies customer requirements by applying the concept of job, which is defined as the goal that the customers want to achieve, or problems that they try to solve in a given situation [18,30]. Christensen, et al. [17] argued that customer needs identified by considering only existing characteristics of the product were poor indicators of customer behaviour and that companies must identify customer needs by assessing how well customers get a job done while using products or services. Identifying customer needs based on the features of a product is a point-in-time solution that can change over time but job-based customer needs have a long-term focus that is stable over time. The concept of job also has no racial and religious features. Therefore, by using the concept of job, companies can easily comprehend what jobs customers are trying to complete, regardless of time, race, religion, or region. Furthermore, definite measurements such as speed and predictability can be used to evaluate how well customer requirements are actualized based on a specific job.
Initially, the job-based approach observed customers' use of products or services in detail [31]. Customers' reason for using products or services is to complete a job. Customers conduct several actions to achieve the job [29]. For example, "to lower indoor temperature," customers define goals on how much to reduce the temperature, investigate items that can perform the job, then install or use them. After executing these steps, customers monitor whether the product operates well and arrange the product. The concept of job helps companies to identify customer needs in various steps of product usage, so it is distinct from the existing research that concentrates on customer needs during one step of product execution [29,[32][33][34].
The ODI method elicits customer needs by conducting in-depth interviews. To conduct such interviews effectively, ODI experts require advance information of the needs, which is produced by experienced managers who have interacted with many customers [19]. However, the experience of managers is subjective. Therefore, the ODI method needs supporting tools to derive customer requirements from empirical evidence by analysing large volumes of unstructured text data. Our study proposes a method that uses text mining to analyse unstructured text data from customers to support the ODI method.

Overall Research Framework
The overall process of customer complaints analysis for market-oriented product development ( Figure 1) collects customer complaints regarding target product, then extracts feature candidates from unstructured contents of customer complaints. To apply the concept of job to analysis of the complaints, these candidates are first identified based on the diverse jobs of the target product. Then secondary candidates that have a great effect on clustering are selected. Collected customer complaints are grouped using a clustering method on the basis of significant features that are associated with various job steps. Analysis of the clustering result can identify customer requirements with a structure of customer needs. study proposes a method that uses text mining to analyse unstructured text data from customers to support the ODI method.

Overall Research Framework
The overall process of customer complaints analysis for market-oriented product development ( Figure 1) collects customer complaints regarding target product, then extracts feature candidates from unstructured contents of customer complaints. To apply the concept of job to analysis of the complaints, these candidates are first identified based on the diverse jobs of the target product. Then secondary candidates that have a great effect on clustering are selected. Collected customer complaints are grouped using a clustering method on the basis of significant features that are associated with various job steps. Analysis of the clustering result can identify customer requirements with a structure of customer needs.

Data Collection and Feature Candidate Extraction
Customer complaints of target product are collected by the company's internal channels such as web sites or mobile applications of service centres, or external channels such as third-party review sites (i.e., Epinions, Amazon customer reviews). The collected data should contain enough number of complaints in various steps of product usage of customers. The criteria of suitable data for analysis can be roughly identified by searching keywords related to the various steps of product usage.
From the collected complaints, we extract single keywords, multiple keywords and action-object (AO) or subject-action (SA) combinations, which all become candidates for the feature that represents the free-form text of customer complaints. To extract feature candidates, a language parser such as Stanford parser or Korean NLP Package is first used to automatically tag parts of speech such as noun, verb, adjective, or adverb. Then standard pre-processing is conducted [35]: steps include transforming all text to lowercase; tokenizing customer complaints; eliminating meaningless stop words (e.g., 'she,' 'the,' 'but,' 'what'); lemmatizing words (e.g., to convert 'foreign substances' and 'generated' to the root forms 'foreign substance' and 'generate'); removing words that occur either too frequently or very rarely. A single keyword is identified by nouns and multiple keywords are identified by n-gram modelling, which models sequences of natural language by using statistical

Data Collection and Feature Candidate Extraction
Customer complaints of target product are collected by the company's internal channels such as web sites or mobile applications of service centres, or external channels such as third-party review sites (i.e., Epinions, Amazon customer reviews). The collected data should contain enough number of complaints in various steps of product usage of customers. The criteria of suitable data for analysis can be roughly identified by searching keywords related to the various steps of product usage.
From the collected complaints, we extract single keywords, multiple keywords and action-object (AO) or subject-action (SA) combinations, which all become candidates for the feature that represents the free-form text of customer complaints. To extract feature candidates, a language parser such as Stanford parser or Korean NLP Package is first used to automatically tag parts of speech such as noun, verb, adjective, or adverb. Then standard pre-processing is conducted [35]: steps include transforming all text to lowercase; tokenizing customer complaints; eliminating meaningless stop words (e.g., 'she,' 'the,' 'but,' 'what'); lemmatizing words (e.g., to convert 'foreign substances' and 'generated' to the root forms 'foreign substance' and 'generate'); removing words that occur either too frequently or very rarely. A single keyword is identified by nouns and multiple keywords are identified by n-gram modelling, which models sequences of natural language by using statistical properties. An AO or a SA is identified using a matrix that represents the co-occurrence of keywords and verbs.

First Job-Based Feature Selection
We select first job-based features from extracted feature candidates (e.g., single keyword, multiple keyword, AO, SA) by considering high relativeness to a job map. The job map represents the process of product usage of customers separated from technical solutions and is composed of universal process steps [29]: defining what the job needs; gathering and locating required inputs; organizing and preparing the components in accordance with circumstance; confirming that the task is ready to be performed; executing the job; monitoring the result of execution; giving modifications; ending the job. For example, the job map of "lowering indoor temperature" is expressed by the form of AOs that present specific jobs under the universal job steps [33] (Figure 2); an object is expressed by a single keyword or multiple keywords. Therefore, the high relativeness is assessed by identifying that each feature candidate semantically corresponds to AOs of the job map. To conduct semantic processing, we use WordNet, which is a large hierarchical generic database of English words; or a standard Korean dictionary, which is the Korean version of WordNet. This job-based feature selection helps to analyse customer complaints in various jobs from beginning to end.
Then the first job-based feature is determined by considering the dependency of innate features such as synonyms regardless of clustering algorithms, because feature dependency degrades clustering accuracy [36]. If the object in an AO overlaps single keywords or multiple keywords, the first features will remove duplicated keywords and retain AOs that represent customer complaints in detail.
After selecting the first job-based features, customer complaints that do not contain the first feature set are considered as the first outlier set, which is analysed. Many of these first outliers are customer complaints that are not related to various job steps but were collected as a result of the inaccuracy of a parser. properties. An AO or a SA is identified using a matrix that represents the co-occurrence of keywords and verbs.

First Job-Based Feature Selection
We select first job-based features from extracted feature candidates (e.g., single keyword, multiple keyword, AO, SA) by considering high relativeness to a job map. The job map represents the process of product usage of customers separated from technical solutions and is composed of universal process steps [29]: defining what the job needs; gathering and locating required inputs; organizing and preparing the components in accordance with circumstance; confirming that the task is ready to be performed; executing the job; monitoring the result of execution; giving modifications; ending the job. For example, the job map of "lowering indoor temperature" is expressed by the form of AOs that present specific jobs under the universal job steps [33] (Figure 2); an object is expressed by a single keyword or multiple keywords. Therefore, the high relativeness is assessed by identifying that each feature candidate semantically corresponds to AOs of the job map. To conduct semantic processing, we use WordNet, which is a large hierarchical generic database of English words; or a standard Korean dictionary, which is the Korean version of WordNet. This job-based feature selection helps to analyse customer complaints in various jobs from beginning to end.
Then the first job-based feature is determined by considering the dependency of innate features such as synonyms regardless of clustering algorithms, because feature dependency degrades clustering accuracy [36]. If the object in an AO overlaps single keywords or multiple keywords, the first features will remove duplicated keywords and retain AOs that represent customer complaints in detail.
After selecting the first job-based features, customer complaints that do not contain the first feature set are considered as the first outlier set, which is analysed. Many of these first outliers are customer complaints that are not related to various job steps but were collected as a result of the inaccuracy of a parser.

Second Feature-Selection for Clustering
After the first feature set related to the job process is finalized, the second feature set is selected to increase the accuracy of clustering among the first feature set by applying a dimensionalityreduction technique to the text data. These techniques can be divided into feature extraction and feature selection [37,38]. Feature extraction generates a set of new features with reduced dimensionality from the original features; the goal is to identify the most important influences.

Second Feature-Selection for Clustering
After the first feature set related to the job process is finalized, the second feature set is selected to increase the accuracy of clustering among the first feature set by applying a dimensionality-reduction technique to the text data. These techniques can be divided into feature extraction and feature selection [37,38]. Feature extraction generates a set of new features with reduced dimensionality from the original features; the goal is to identify the most important influences. Examples of the technique include principle component analysis (PCA) [39] and word clustering [40]. However, the new features created by feature extraction techniques may not have a clear physical meaning, so the clustering results may be difficult to interpret. In contrast, feature selection chooses a small subset of the original feature set by considering how the subset affects the clustering result; examples include document frequency (DF) [41], entropy-based ranking (En) [42] and term contribution (TC) [37]. Compared to feature extraction, these techniques provide better readability and interpretability of the clustering results, because their physical meanings are not lost.
Therefore, this paper uses TC to select second features from the first features ( Figure 3). TC is suited to interpret the clustering results in text data and does not have high computational complexity. First, by calculating term frequency-inverse document frequency (TF-IDF) value between customer complaints and first features, a matrix is represented. Then the second features are selected using TC and a matrix that represents TF-IDF values between customer complaints and second features is constructed. TC is calculated based on the similarity of customer complaints and therefore gives a high value to features that have a major influence on similarities among many customer complaints. The similarity of customer complaints is calculated by cosine similarity: where c i , c j are two customer complaints and f ( f , c i ), f f , c j represents the TF-IDF [43] weight of feature f in customer complaint c. TC is calculated as [37] The second feature set is determined by TC rankings. After the second feature selection for clustering, customer complaints that do not include the second features are regarded as second outliers; these are trivial complaints that are not as important as the main customer needs based on the job.  [41], entropy-based ranking (En) [42] and term contribution (TC) [37]. Compared to feature extraction, these techniques provide better readability and interpretability of the clustering results, because their physical meanings are not lost. Therefore, this paper uses TC to select second features from the first features ( Figure 3). TC is suited to interpret the clustering results in text data and does not have high computational complexity. First, by calculating term frequency-inverse document frequency (TF-IDF) value between customer complaints and first features, a matrix is represented. Then the second features are selected using TC and a matrix that represents TF-IDF values between customer complaints and second features is constructed. TC is calculated based on the similarity of customer complaints and therefore gives a high value to features that have a major influence on similarities among many customer complaints. The similarity of customer complaints is calculated by cosine similarity: where , are two customer complaints and ( , ), , represents the TF-IDF [43] weight of feature f in customer complaint c.
TC is calculated as [37] ( ) = ( , ) × ( , ) , ∩ The second feature set is determined by TC rankings. After the second feature selection for clustering, customer complaints that do not include the second features are regarded as second outliers; these are trivial complaints that are not as important as the main customer needs based on the job.

Clustering Analysis
Semantic similarities of customer complaints are calculated using cosine similarity based on the second feature vector and its synonyms in diverse contexts [44], then the clustering algorithm is performed on the similarity matrix. While a clustering result is statistically significant in representing the original data, various clustering methods such as spectral clustering, k-means clustering and hierarchical clustering can be applied according to the properties of the data. Spectral clustering is effective if a clustering result is expected to have a small number of clusters and k-means clustering is appropriate if experts know the number k of clusters. Hierarchical clustering is suitable if the

Clustering Analysis
Semantic similarities of customer complaints are calculated using cosine similarity based on the second feature vector and its synonyms in diverse contexts [44], then the clustering algorithm is performed on the similarity matrix. While a clustering result is statistically significant in representing the original data, various clustering methods such as spectral clustering, k-means clustering and hierarchical clustering can be applied according to the properties of the data. Spectral clustering is effective if a clustering result is expected to have a small number of clusters and k-means clustering is appropriate if experts know the number k of clusters. Hierarchical clustering is suitable if the clusters have different sizes. This paper uses hierarchical clustering because clusters in case data are expected to have different sizes.
After clustering has been performed, the number of clusters is determined by considering whether or not the whole cluster includes customer complaints about a diverse job process and whether or not each cluster includes related customer complaints about a single job. In each cluster, the criteria can be easily identified by looking for representative features, which are discovered in customer complaints over a certain level.
Lastly, ODI experts analyse the clustering results ( Figure 4) to elicit customer requirements based on the structure of customer needs [45]. When they analyse the clustering results, ODI experts follow rules for structuring customers' statements of need (Table 1). Well-formatted requirements include the type of improvement (minimize, increase) and a unit of measure (time, likelihood, number, frequency, amount, risk). ODI experts distinguish between requirements and solutions, clarify vague statements and eliminate duplicates. The structure of customer needs in ODI helps to capture concrete requirements of customers. clusters have different sizes. This paper uses hierarchical clustering because clusters in case data are expected to have different sizes. After clustering has been performed, the number of clusters is determined by considering whether or not the whole cluster includes customer complaints about a diverse job process and whether or not each cluster includes related customer complaints about a single job. In each cluster, the criteria can be easily identified by looking for representative features, which are discovered in customer complaints over a certain level.
Lastly, ODI experts analyse the clustering results ( Figure 4) to elicit customer requirements based on the structure of customer needs [45]. When they analyse the clustering results, ODI experts follow rules for structuring customers' statements of need (Table 1). Well-formatted requirements include the type of improvement (minimize, increase) and a unit of measure (time, likelihood, number, frequency, amount, risk). ODI experts distinguish between requirements and solutions, clarify vague statements and eliminate duplicates. The structure of customer needs in ODI helps to capture concrete requirements of customers.  Table 1. Rules for structuring customer requirements in outcome-driven innovation (ODI) method (from [45]).

1.
Needs statements must be free from solutions and specifications-and stable over time.

2.
Needs statements must not include words that will cause ambiguity or confusion, for example, certain adjectives and adverbs, pronouns, process words, jargon, acronyms.

3.
Needs statements must be specific without sacrificing brevity.

4.
Needs statements must follow the rules of proper grammar.

5.
Do not use different terms to describe the same item or activity from statement to statement; be consistent in language. 6.
Needs statement must have a consistent structure, content and format. 7.
Needs statements must relate to the primary job of interest and not to ancillary jobs. 8.
Needs statements must be introduced with only one of two words: minimize (90%) or increase (10%). 9.
Needs statements must contain a metric (time, likelihood, number) so performance can be measured. 10. Examples added to the end of a statement for purposes of clarification must be similarly and consistently formatted.

11.
Needs statements must be usable in all downstream activities, for example, questionnaires, for deployment.

Collecting Data and Extracting Feature Candidates
The proposed method was used to analyse customer complaints in 2013 recorded from web sites or mobile applications of customer service centres in a large Korean company with annual sales of 56 billion South Korean won (~ US$ 47 million) that manufactures electronic appliances. In that interval, the electronics company wanted to identify customers' true needs for innovative product development by adopting a new text mining approach to analyse customer complaints. The company  Table 1. Rules for structuring customer requirements in outcome-driven innovation (ODI) method (from [45]).

1.
Needs statements must be free from solutions and specifications-and stable over time.

2.
Needs statements must not include words that will cause ambiguity or confusion, for example, certain adjectives and adverbs, pronouns, process words, jargon, acronyms.

3.
Needs statements must be specific without sacrificing brevity.

4.
Needs statements must follow the rules of proper grammar.

5.
Do not use different terms to describe the same item or activity from statement to statement; be consistent in language. 6.
Needs statement must have a consistent structure, content and format. 7.
Needs statements must relate to the primary job of interest and not to ancillary jobs. 8.
Needs statements must be introduced with only one of two words: minimize (90%) or increase (10%). 9.
Needs statements must contain a metric (time, likelihood, number) so performance can be measured. 10. Examples added to the end of a statement for purposes of clarification must be similarly and consistently formatted.

11.
Needs statements must be usable in all downstream activities, for example, questionnaires, for deployment.

Collecting Data and Extracting Feature Candidates
The proposed method was used to analyse customer complaints in 2013 recorded from web sites or mobile applications of customer service centres in a large Korean company with annual sales of 56 billion South Korean won (~US$ 47 million) that manufactures electronic appliances. In that interval, the electronics company wanted to identify customers' true needs for innovative product development by adopting a new text mining approach to analyse customer complaints. The company provided complaints about several appliances. A case study of stand-type air conditioners was selected for two reasons. First, this appliance's database system to record customer complaint data was well-established, so it is suitable for extraction of metadata by using queries. Second, the air conditioner market is becoming increasingly competitive as an increasing number of new products is introduced to satisfy unmet needs, so this company is confronted with a situation in which it should develop new and competitive products.
To occupy the market in these circumstances, innovative technology and products must be developed to attract customers. We collected 2362 customer complaints in the period 2013/01-2013/12 in Korea from customer database systems by using a search query. Most customers expressed complaints in voice calls, so the volume of textual complaints was not too high. However, these complaints were expected to discover customer needs in the various steps of product usage because they contained complaints about product preparation, installation, execution, or monitoring steps.
Parts of speech such as noun, verb, adjective, or adverb were tagged by the Korean NLP package, the hannanum parser, which is a POS tagger that has been developed by the Semantic Web Research Centre at KAIST since 1999. After text pre-processing, rare words that occurred 10 times or fewer, or very common words that occur 500 times or more were excluded due to low discriminatory power [46]. 312 single keywords were easily extracted from the POS-tagged nouns and 73 multiple keywords were extracted using the frequencies of bigrams and trigrams; 62 AOs or SAs were extracted by developing a matrix that notes co-occurrence more than 100 times between keywords and verbs.

Selecting First Job-Based Feature
In this step, the first job-based feature set was selected by a high relativeness of the job map, then the first feature set was determined by considering the dependency of features. We identified 63 features from 447 feature candidates by semantically comparing AOs of the job map of "lowering indoor temperature" by consulting the standard Korean dictionary. These 63 features included 48 keywords such as 'installation (ᄉ ᅥ ᆯᄎ ᅵ),' 'operation (ᄌ ᅡ ᆨᄃ ᅩ ᆼ),' 'power (ᄌ ᅥ ᆫᄋ ᅯ ᆫ),' 'noise (ᄉ ᅩᄋ ᅳ ᆷ)' and 'voice (ᄋ ᅳ ᆷᄉ ᅥ ᆼ)' and 15 of AOs or SAs such as 'cool-air weak (ᄂ ᅢ ᆼᄀ ᅵᄀ ᅡ ᄋ ᅣ ᆨᄒ ᅡᄃ ᅡ).' A total first feature set of 54 was chosen by removing features that had synonym relationships.
After the first job-based feature was selected, 83 customer feedbacks that the first feature set did not represent became the first outliers among the 2362 customer complaints. The first outliers included complaints such as dry function and its smartphone control that are not associated with the core jobs of an air conditioner. Some of the first outliers were caused by the hannanum parser, which sometimes mistook nouns for verbs.

Selecting Second Feature for Clustering
The second feature for clustering was selected from the 54 in the first feature set by using TC ranking. After constructing the first feature vector that represents customer complaints based on TF-IDF, each TC value in the first feature set was calculated (Table A1, Appendix A). Some features that have a major effect on the similarity of customer complaints and the clustering result had higher TC values than others. The second feature set was determined by ranking TC values from the first feature set. Although the difference of TC value was large in the top rank, the second feature for clustering contained many crucial features, which will identify the requirements of customers within various features that represent job steps. In the end, the 54 features in the first feature set were reduced to 33 in the second feature set and the second feature vector to indicate customer complaints was built for clustering based on TF-IDF.
After selecting the second feature for clustering, 12 customer feedbacks that the second feature set did not represent were considered as second outliers. Analysis determined that these were composed of unimportant customer questions such as requirements for notification of gas leak and function modification alarms in the 'monitor' job.

Analysing Clusters
After the first and second outliers had been eliminated, hierarchical clustering was performed by constructing a 2267 × 2267 similarity matrix of customer complaints; it was composed of cosine similarity calculated using second feature vectors. Various linkage methods such as single linkage, complete linkage, average linkage, weighted linkage and centroid linkage can be used for clustering. A linkage method was selected that uses cophenetic correlation, which indicates a high correlation of original data. The average linkage method had the highest cophenetic correlation of 0.86, which is high enough to demonstrate the reliability of the clustering process. This value means that clustering result explains 86% of the variation in the data and was therefore used for hierarchical clustering.
We determined the number of clusters by identifying representative features that present diverse job steps and analogous customer complaints about a single job. The feature was considered 'representative' if it represented 70 percent of customer complaints in each cluster. Therefore, 14 clusters included the diverse job steps and analogous complaints in the single job; they were analysed to elicit customer requirements.
The ODI experts then identified analogous customer complaints in each cluster and elicited 22 customer requirements by using the structure of customer needs in the ODI (Table A2, Appendix B). The most commonly-mentioned job step was an 'Execute' job step with 33.4% of the total. In cluster 1, 2 and 3, we identified that most customers complained when the air conditioner's ability to emit cool air did not meet customers' expectation. The second most-frequently reported job step was a 'Monitor' job step (32.3%). In cluster 9, 12 and 13, we identified most customers complained about the noisiness of the air conditioner. The third most commonly-mentioned job step was a 'Modify' job step (11.6%); 'Prepare' and 'Confirm' job steps (4%) were the final job steps identified. In cluster 5, we identified that some customers complained about side effects and monitoring after installation.
Product managers who had worked in customer service centre for ten years identified 22 customer needs and agreed on primary customer needs. The managers also regarded customer needs in 'Prepare' and 'Confirm' job steps as valuable unexpressed needs that the existing method cannot discover.
Previous studies that analyse customer reviews or complaints identified partial needs of customers because the studies focused on visible problems based on product attributes in a step during which customers use the product [29]. To identify the full set of problems, we analysed customer complaints by using the concept of job in the ODI method. This concept helps to identify customer needs in various steps, including preparation, installation, execution and monitoring of the product. This job-based analysis can identify customer latent needs in the pre-execution steps and the post-execution steps; prior research cannot discover these needs. Job-based analysis can present these needs precisely by using a format of customer requirement in the ODI method [45]. As a result, this research helps to identify requirements of all the points at which customers want to obtain help from the product.

Conclusions and Future Study
This research suggests an ODI-based method that uses text mining to identify customer's true needs from customer complaints. This study identified not only customer needs in the execution step of product but also latent needs of customers in the pre-execution and post-execution steps of product. The discovery of latent needs is distinct from the existing research. The proposed method also provides ODI experts with supporting tools to analyse a large number of customer complaints.
These tools provide a clustering that can present complaints about various jobs, so experts can derive customer needs from the clustering result. These needs derived from customer data can effectively provide advance information that can help experts to conduct in-depth interviews; the data can also reduce the need to use experienced managers in the interview process. Companies can gain new insights by combining the knowledge from analysis of these complaints with the previous knowledge of managers [47].
Customer complaints analysis in the case of the Korean air conditioners offers managerial insights to practitioners. Decision makers can obtain directions of innovative product development from the customer perspective by allocating resources related to customers' true requirements. Needs analysts can use the proposed method as a tool to perform analysis of a large volume of unstructured contents of complaints based on the job. Developers can identify the specification of customer viewpoint that the final product should satisfy.
However, this study has some limitations, which provide directions for further research. First, this paper analysed complaints from end users to identify customer needs; the analysis focus on five job stages (e.g., prepare, confirm, execute, monitor, modify) even though the job map consists of eight stages. Therefore, future analysis to capture all customer requirements that include purchase motivation of the product and delivery will be represented.
Second, a customer database system to record customer complaints in diverse contexts through various channels should be constructed in advance, so that the suggested framework can be applied. In the case study, the proposed method was applied to enough complaints listed in a constructed a customer database but analysis of such a small amount of biased customer complaints may not clearly identify the latent needs, so the reliability of the result is not guaranteed. In the end, the proposed framework combines automatic method and expert judgment but it is not fully automated. Therefore, the output of the method is static analysis and requires effort from a human expert. This method is expected to be improved to be fully automated for dynamic real-time analysis.

Conflicts of Interest:
The authors have no conflict of interest.