A Machine Learning Approach to Fitting Prescription for Hearing Aids

: A successful Hearing-Aid Fitting (HAF) is more than just selecting an appropriate Hearing Aid (HA) device for a patient with Hearing Loss (HL). The initial ﬁtting is given by the prescription based on user’s hearing loss; however, it is often necessary for the audiologist to readjust some parameters to satisfy the user demands. Therefore, in this paper, we concentrated on a new application of Neural Network (NN) combined with a Transfer Learning (TL) strategy to develop a ﬁtting algorithm with the prescription database for hearing loss and readjusted gain to minimize the gap between ﬁtting satisfaction. As prior information, we generated the data set from two popular hearing-aid ﬁtting software, then fed the training data to our proposed model, and veriﬁed the performance of the architecture. Pondering real life circumstances, where numerous ﬁtting records may not always be accessible, we ﬁrst investigated the number of minimum ﬁtting records required for possible sufﬁcient training. After that, we evaluated the performance of the proposed algorithm in two phases: (a) NN with reﬁned hyper parameter showed enhanced performance in compare to state-of-the-art DNN approach, and (b) the TL approach boosted the performance of the NN algorithm in a broad way. Altogether, our model provides a pragmatic and promising tool for HAF.

NAL-NL1 [2,3], the first prescription fitting procedure for prescribing nonlinear gain was developed by National Acoustic Laboratories (NAL) in 1999. The purely theoretical fitting formula aims at maximizing speech intelligibility while ensuring the overall loudness does not exceed the loudness that perceived by a normal-hearing person. Like its progenitor NAL-NL1, the second generation prescription procedure of NAL-NL2 [4,5] also aims to make speech intelligible and overall loudness comfortable. However, the theoretical derivation of NAL-NL2 differs from that of NAL-NL1 in two major points: (a) Intelligibility model and (b) gain constraint. The result of this is that NAL-NL2 prescribes relatively more gain in high and low frequencies than at the mid frequencies. Moreover, NAL-NL2 takes the profile of the hearing-aid user's age, gender, experience, language, and compressor speed into consideration [4]. On the other hand, the DSL fitting [6] aims at comfortable listening levels to maximize the speech recognition performance in each frequency region. The procedure uses the desired sensation levels to calculate its empirical real-ear aided gain.
Standard machine learning approaches to predict hearing aid fitting could be a potential solution for audiologists in clinical applications. Recently, deep learning is becoming a mainstream technology in solving major issues that have been stacked for ages in the artificial intelligence (AI) community.

•
First, we extracted the fitting data set from two popular nonlinear hearing-aid fitting software (NAL-NL1 and NAL-NL2) and divided the randomly shuffled data set in two groups for training two different models. • Second, we investigated the number of minimum fitting records required for sufficient training and observed that more than 1500 records are essential. • Third, we trained the NN model with improved hyper parameters where we used random weight initialization and exponential weight decay concept and the approach shows enhanced performance. • Fourth, we applied the inductive parameter transfer learning approach in our second model in a smaller data set condition with the same hyper parameter. We transferred the final weights from the NN model as the initial weights for the TL model, and it performed surprisingly well over the traditional NN model.

•
Finally, we compared and analyzed the output of the two models for verification.
The model acquired in this research was selected by reviewing the top performing machine learning algorithm as applied to hearing-aid fitting prescription. As no other papers were found on the topic of transfer learning based fitting algorithm, all of our research on engineering features to estimate the expected gain for hearing loss patient is fancy.

Background
In this section, we introduced basic concepts of neural network and transfer learning, as the background of our proposed NN-TL based algorithm.

Fully Connected Neural Networks
Deep Neural Networks (DNN) are distinguishable from other types of Neural Networks by their depth, computational complexity, and performance. In a fully connected feed-forward DNN model [20], nodes are fully connected with all the nodes in the previous layer. The computational procedure of a fully connected DNN involved with matrix-vector arithmetics and transformation by the activation function can be expressed as follows: where Y and X are the outputs of the current layer and previous layer, respectively; W is the weight matrix; b is the bias vector; and σ is the activation function. Each node in a neural network does a weighted sum of all of its inputs, adds a constant named bias, and then feeds the result through some nonlinear activation function (e.g., sigmoid, softmax, or relu). Nodes from the final layer produces the prediction, and a loss function measures the distance between the actual output and predicted output. The connections in a neural network can be shown in Figure 1.
Weights Bias b Inputs Figure 1. Neural network connections.

Transfer Learning
In most applications in the real world, it is expensive or impossible to collect the necessary training data and to rebuild different models. Here comes the knowledge transfer or transfer learning approach that could reduce the effort of data collection. The main aim is to extract the knowledge from one domain and to apply the knowledge in another domain.
In transfer learning what to transfer, when to transfer, and how to transfer are three main research issues. Inductive transfer learning, transductive transfer learning, and unsupervised transfer learning are three popular sub-settings in this approach. Based on "what to transfer", the approaches to transfer learning in the above sub-settings can be further classified into four classes [16]. The specific class used in this paper can be referred to as the parameter-transfer approach. The parameters (weights) sharing concepts have been widely used in different machine learning models. In general, very common hard weight sharing uses previously trained weights as the initial weights of a neural network model. A general parameter-based transfer learning method can be shown as in Figure 2.

Source Domain Target Domain
Random weights

Proposed Architecture
A simple block diagram of our proposed hearing-aid fitting model is shown in Figure 3. The data set contains hearing loss information for 6 different frequencies for each of the individual subjects. Other than hearing loss and insertion gain information for the corresponding loss, the data set does not contain any other features. Therefore, we did a binary conversion of the hearing loss data as pre-processing and then positioned them based on frequency hierarchy to make more features (6 frequencies * 7 bit binary conversion * 6 positions) as shown in Figure 4.  Then we applied the NN and TL concept as two separate models. Finally, we compare and verify the results of these two models.

Neural Network with Refined Hyper-Parameter
In the first part of our proposed design, we considered a   Initialized with random weights The architecture adopted here is similar to the DNN regression approach [21] that is based on a feed-forward neural network [20,22], having many levels of nonlinearities and allowing them to densely illustrate an immensely varying and distinctly nonlinear function. In Reference [21], the training involves an unsupervised pre-training and supervised fine-tuning. However, in our approach, we only considered the simple supervised training with a refined hyper-parameter.
We choose random weight initialization, ensuring the weights varied in the range of −0.5 to 0.5 (±0.1) from input layer to first hidden layer, −1.0 to 1.0 (±0.1) from first hidden layer to second hidden layer, and −1.5 to 1.5 (±0.1) from second hidden layer to output layer. We also considered the exponential weight decay concept in our approach. After initializing all the parameters, in the training session, the feed forward propagation is calculated first, the back propagation is calculated second, then the parameter is updated, and loss function is calculated at the end.

Feed Forward Propagation
Feed forward neural network learned to map a fixed size input to a fixed size output. The results of weighted sums from hidden layers to the output layer passes through a nonlinear activation function. We use sigmoid (σ = 1/(1 + e −x )), the most classical activation function in all the layers. In the feed forward propagation, the output of two hidden layers and output layer can be expressed as follows: First hidden layer: Second hidden layer: Output layer: where I i is the input; w ij , w jk , and w kl are the corresponding weight matrices; H 1 j , H 2 k , and O l are the outputs of the first hidden layer to the output layer respectively; and σ is the activation function.

Error Calculation
The error is calculated with the simple subtraction as follows: where y l is the real sample output.

Back Propagation and Weight Update
To properly adjust the weight vector, we considered the mini-batch Stochastic Gradient Descent (SGD) [23] optimizer. To lower the computation cost and to get lower variance, we used 100 subset of our data set each time we trained the network. First, we get the gradients of each layer of the networks for the current mini-batch, and then, we use that gradients to update the weights of each layers of the networks. To make the update operation very easy, we just add the gradient of particular weight matrix to our existing weight matrix. In addition, to make the learning better, we scale the gradients with a suitable learning rate. This less-complicated procedure usually finds a good set of weights surprisingly well in comparison to far more elaborate optimization techniques [9].

Transfer Learning Approach
In the second part of our proposed design, we use the concept of simple inductive parameter transfer learning approach [16]. We use the same NN model that includes the same hyper-parameter here in the TL approach. However, instead of random weight initialization, we retained the final weights from NN model and employed them here as the initial weights of the Transfer Learning model as shown in Figure 6 Initialized with the final weights from the NN model

Experimental Results and Evaluation
The performance of our proposed fitting algorithm have been evaluated in different phases. In the first subsection, the complete information about the data set has provided. The the next section, we tried to find the minimum number of fitting records requires for sufficient training. After that, performance of the NN algorithm with a refined hyper-parameter and the performance of the TL approach have been evaluated. Evaluation with a Mean Squared Error (MSE) analysis and some statistical parameter has also been considered.

Data Sets
Hearing loss vs. insertion gain data have been extracted manually for 1100 subjects from the two renowned nonlinear hearing-aid fitting softwares; National Acoustic Laboratories' nonlinear fitting procedure, version 1 (NAL-NL1) and version 2 (NAL-NL2). We considered flat hearing loss type for each subject, and the hearing loss varied maximally by 3 dB within 6 frequency bands (250 Hz, 500 Hz, 1 KHz, 2 KHz, 4 KHz, and 8 KHz). The maximum and minimum hearing loss we considered were in between 10 dB and 70 dB respectively. We manually put the hearing loss information in the software and noted the corresponding insertion gain information for the 50 dB, 65 dB, and 80 dB input levels. The 1000 training and 100 test sets were selected randomly in such a way that no data appears more than once in the combined training and test sets. There were no correlation between the two data sets; hence, we considered them separately. The full data sets can be found in the Supplementary Materials.

Number of Fitting Records for Sufficient Training
Number of training data always plays an important role in the performance of the machine learning based algorithm. In general, the more the data set, the better the training [9]. However, pondering real life situations, a big number of fitting records from audiologist may not always be practical. Therefore, we did an experiment to find the minimum number of fitting data set required for sufficient training by varying the amount of fitting data from 2800 to 1000. It was noticed that, the more training data used, the smaller the gain difference became. Furthermore, a monotonic difference in the gain with the increase in hearing loss was examined. Then, we were convinced that about 1500 data are necessary for sufficient training.

Efficiency of Predicting Fitting Records
The algorithm performance for 3 different input levels (50 dB, 65 dB, and 80 dB) for NAL-NL1 are shown in Figures 7-9 and for NAL-NL2 are shown in Figures 10-12    Even though the proposed NN approach have followed the trend of the fitting formula pretty well compared to the state-of-the-art approach [19], it is clearly visible that the TL approach has outperformed it and that the predictions were very close to the original fitting records extracted by the formula. In most of the cases, the differences were in between −2 dB and +2 dB. However, some unexpected gain fluctuation can be noticed in some cases but can be neglected. 10

Mean Square Errors Comparison
Predictive mean square error results are shown in Table 1. In the table, we described the prediction error rate of our proposed TL method compared with NN method for three different input levels, and the results were taken after the 3rd epoch for each cases. From the table, it is clear that the TL algorithm for each of the input level significantly outperforms the results of the NN algorithm. In addition, our proposed transfer-learning-based fitting algorithm obtains better predictions and thus had smaller MSE for all the cases. The average MSE for TL and NN were 0.8662% and 1.1792% respectively.

Statistical Evaluation
Tables 2 and 3 use three statistical indicator (average, minimum, and maximum) to analyze the performance of TL algorithm over the NAL-NL1 and NAL-NL2 data set, respectively. In both of the tables, it is clearly noticeable that the differences were quite negligible except some cases that were highlighted in bold.

Discussion
All the advanced and modern fitting methodologies are fortunate to provide at least some corrective gain to the hearing-impaired ear.
Although not yet developed, the optimum fitting method would seek to restore all the dynamic acoustic properties lost through cochlear and conductive causes.
However, each patient functions as a unique experiment with variable and unpredictable outcomes. Therefore, the insertion gain calculated by fitting formulas might not be the preferred gain for a hearing loss patient, and there is no potential benchmark for an acceptable prediction error. The preferred gain can vary largely from subject to subject, male to female, young to old, or one geographical region to another. In addition, we also do not know how each fitting software calculates the insertion gain for a corresponding hearing loss information, thus it remains a black box. Therefore, in this paper, we extracted the patient hearing loss and insertion gain data from fitting software and applied the machine learning strategies to see if they can follow the trend of fitting formulas. By applying this approach with the clinical data from audiologist, we can mitigate the gap between fitting satisfaction of a hearing loss patient. The collection of large hearing aid fitting data from audiologist for further investigation will be a immense challenge due to the regulations and privacy concerns. That is why we considered parameter transfer learning approach, so that we can deal with the smaller data set problem. The TL prediction error for insertion gain is basically the calculated difference between predictive gain and the original gain from fitting formula. Therefore, very small or no difference will be expected in ideal case.

Conclusions and Future Work
MLP-NN model with refined hyper-parameters empowers computational prototypes consisting of multiple processing layers to learn data representation in multiple levels of abstraction and TL approach speeds up the training and enhances the performance of the model even in a smaller data set condition. For future work, we will be using the hearing-aid fitting data that would be collected from the health care professionals. We are expecting more features in the data set that includes the patients' personal preference to the hearing-aid type. We will be considering more challenging and smaller data set conditions to get a more realistic impression about the performance of the proposed algorithm. More consideration will have to be provided to optimize and evaluate the proposed approach.
Supplementary Materials: The following are available online at http://www.mdpi.com/2079-9292/8/7/736/s1. A "Readme.txt" file has also been included mentioning all the details about the included files.