Using a Machine Learning Algorithm Integrated with Data De-Noising Techniques to Optimize the Multipoint Sensor Network

In this paper, for an intensity wavelength division multiplexing (IWDM)-based multipoint fiber Bragg grating (FBG) sensor network, an effective strain sensing signal measurement method, called a long short-term memory (LSTM) machine learning algorithm, integrated with data de-noising techniques is proposed. These are considered extremely accurate for the prediction of very complex problems. Four ports of an optical coupler with distinct output power ratios of 70%, 60%, 40%, and 30% have been used in the proposed distributed IWDM-based FBG sensor network to connect a number of FBG sensors for strain sensing. In an IWDM-based FBG sensor network, distinct power ratios of coupler ports can contain distinct powers or intensities. However, unstable output power in the sensor system due to random noise, harsh environments, aging of the equipment, or other environmental factors can introduce fluctuations and noise to the spectra of the FBGs, which makes it hard to distinguish the sensing signals of FBGs from the noise signals. As a result, noise reduction and signal processing methods play a significant role in enhancing the capability of strain sensing. Thus, to reduce the noise, to improve the signal-to-noise ratio, and to accurately measure the sensing signal of FBGs, we proposed a long short-term memory (LSTM) deep learning algorithm integrated with discrete waveform transform (DWT) data smoother (de-noising) techniques. The DWT data de-noising methods are important techniques for analyzing and de-noising the sensor signals, and it further improves the strain sensing signal measurement accuracy of the LSTM model. Thus, after de-noising the sensor data, these data are fed into the LSTM model to measure the sensing signal of each FBG. The experimental results prove that the integration of LSTM with the DWT data de-noising technique achieved better sensing signal measurement accuracy, even in noisy data or environments. Therefore, the proposed IWDM-based FBG sensor network can accurately sense the signal of strain, even in bad or noisy environments; can increase the number of FBG sensors multiplexed in the sensor system; and can enhance the capacity of the sensor system.


Introduction
Due to the primary appealing characteristics of high multiplexing capability, low price, small size, low noise interference, and remote sensing suitability, fiber Bragg grating (FBG) sensors are commonly used for strain, temperature, vibration, and other measurements [1][2][3]. The strain sensing principle of the FBG sensor depends on the shift of the peak wavelength of each FBG due to the change in physical or environmental factors, such as strain, stress, temperature, vibration, pressure, and others [4][5][6][7][8].
(RNN) that is particularly appropriate for sequences of long input. We applied the LSTM algorithm to recognize and learn the features of the reflection spectrum of FBGs at different strain values and to build the sensing signal measurement model for an IWDM-based FBG sensor network.
The LSTM model can extract wavelength features from the reflection spectra of FBGs. Then, a well-trained LSTM model can measure the sensing signal of each FBG sensor from the overlapping spectra of FBG sensors. Although the LSTM can measure the sensing signal of FBGs, the measurement error may be high due to noises or interference caused by the instability of the broadband erbium-doped fiber (EDF) amplified spontaneous emission (ASE) source and the rough measurement environments. Thus, noise reduction and signal processing play a significant role in strain sensing signal measurements using machine learning techniques.
In this paper, we proposed a DWT data de-noising mechanism in conjunction with an LSTM model to reduce the noise of the sensor dataset and to improve the signal-to-noise ratio prior to training the LSTM model. The experimental results prove that our proposed LSTM model, integrated with discrete waveform transform (DWT) techniques, achieves a better sensing signal measurement of each FBG sensor with a smaller measurement error. Therefore, the contribution of this paper is a data de-noising technique using a DWT function in conjunction with an LSTM model with the aim of providing accurate sensing signal measurements of each FBG and of improving the multiplexing capability and computational speed of IWDM-based FBG sensor systems.
The rest of this paper is structured as follows: The operational principle of our proposed IWDM-based FBG sensor system and the proposed LSTM algorithm is described in Section 2. In Section 3, the experimental setup, data collection, data preprocessing, and design of our proposed LSTM model are presented. In Section 4, the test results and the discussion are presented. Finally, in Section 5, the conclusion is presented. Figure 1 shows the experimental setup of the proposed IWDM-based FBG sensor network. The sensor network structure consists of an erbium-doped amplifier (EDFA), optical spectrum analyzer (OSA), optical coupler (C), a personal computer (PC), and FBGs. The EDFA emits the light. The EDFA is utilized to illuminate the FBG sensor array positioned in a parallel structure. The light produced from the EDFA is passed through a coupler (C1), then split into two divisions (i.e., C1 and C2), and fed into FBG sensors. Then, the reflected signals of FBGs are transmitted to the central office (CO) through the C1 coupler. The OSA, located in the CO, can detect and record the reflected spectra of FBGs. Finally, for additional data processing, the detected reflection spectra of the FBG sensors from OSA will be passed to a personal computer (PC). Thus, the PC is used to perform data preparation and to perform the simulation of the deep learning model for measuring the sensing signals of FBGs. For the proposed IWDM-based FBG sensor system, the total reflection spectra () of the FBGs is the sum of all FBGs spectra in the sensor system. Assuming that n number of identical FBGs with distinct peak wavelengths are positioned in a parallel structure and that their reflectivity is small enough, the measured spectrum of FBG sensors is expressed as follows: For the proposed IWDM-based FBG sensor system, the total reflection spectra R(λ) of the FBGs is the sum of all FBGs spectra in the sensor system. Assuming that n number of identical FBGs with distinct peak wavelengths are positioned in a parallel structure and that their reflectivity is small enough, the measured spectrum of FBG sensors is expressed as follows:

Operational Principle of the IWDM-Based FBG Sensor Network
where λ is the broadband source's wavelength, λ Bi is the central wavelength of the ith FBG, R i g i (λ, λ Bi ) is the reflected spectrum of the ith FBG, n is the total number of FBGs, and noise(λ) is a random noise.
Furthermore, the return power of the FBGs on the OSA is relative to the broadband source spectrum and the reflection spectrum of the FBGs, which is expressed as follows: where Z(λ − λ I ) is the broadband source and R(λ) is the spectrum of FBG. Since the broadband source's spectral width is lower than the FBG bandwidth and the spectrum of FBG is Gaussian shaped, the broadband source's total returned power can be calculated as follows: where λ I , R m , λ Bi , and ∆λ B are the broadband source wavelength, the peak reflectivity of FBGs, the central wavelength of FBGs, and the FBG's bandwidth, respectively. Z is a delta function, and C is the coefficient of power allocation, which depends on numerous influences, including losses of transmission and fluctuations of power. To demonstrate and verify the proposed system experimentally, we use four FBG sensors in this real experiment as we have four FBGs in our lab. Thus, for a four-FBG sensor, there are four Gaussian spectra that return power for each FBG. In IWDM-based FBG sensor systems, the output power of each FBG is different. The reflective spectra of each FBG sensor have a Gaussian shape, calculated by the following: where I peak is the peak reflectivity of the FBGs. In the proposed IWDM-based FBG sensor system, once the reflection spectra of FBGs enter the overlapping region, the reflection spectra of adjacent FBGs can be overlapped. There are three kinds of overlapping situations, such as nonoverlapping spectra of FBGs, partially overlapping spectra of FBGs, and fully overlapping spectra of FBGs. When the spectra of FBGs are nonoverlapping, the spectra of the four FBGs are separate and the strain sensing signal of each FBG can be easily identified. If the reflection spectra of FBGs are partly overlapped, the strain sensing signal of each FBG may be identified. However, when the reflection spectra of two or three adjacent FBGs are completely overlapped, it is very challenging to measure or identify the exact sensing result of each FBGs from the overlapped spectra using conventional peak detection methods or OSA. Moreover, the partial and fully overlapping spectra of FBG sensors also bring peak wavelength cross talks.
Thus, the conventional peak detection (CPD) techniques cannot understand the overlapped FBG spectra (cross talk) easily and cannot accurately measure the strain sensing signal of each FBG. In this paper, we propose deep learning techniques to overcome the cross talk of overlapping FBG spectra. Our objective is to measure the sensing signal values of FBG1, FBG2, FBG3, and FBG4. However, when FBG1 and FBG2, FBG1 and FBG3, or FBG1 and FBG4 are close and overlap, it is difficult to measure FBG1, FBG2, FBG3, and FBG4 directly from the measured reflection spectrum R = (λ, λ Bi ). For this Sensors 2020, 20, 1070 5 of 18 reason, we use the proposed LSTM algorithm to measure the sensing signal of each FBGs. In the model training stage, the reflection spectra sequential feature of the four FBGs at different strain steps is set as the input for training the LSTM model. The training data set is built as follows: where X k R l is the reflection spectra FBGs, Y k is the central wavelength of the four FBG sensors at each strain step, and l is the number of sampling points. After completing the training of the LSTM model, the strain sensing signal of each FBG will be measured or recognized from the overlapping spectra of FGB sensors. Thus, the LSTM model can measure the sensing signal of each FBG sensor even if the FBG spectra has cross talk or the overlap problem. The details for the LSTM algorithm are described in the section below.

Long Short-Term Memory (LSTM) Algorithm
The LSTM algorithm is a special kind of recurrent neural network (RNNs) that manages sequential information by memorizing the information for long periods [25,26]. Unlike traditional RNNs, LSTM adds a new framework called a "memory cell" with an internal state to store valuable information [27]. In LSTM, the block of memory replaces the hidden unit structure of the RNN, as shown in Figure 2. The most important structures in the memory block (cell) of LSTM are the three gates and a cell structure. The three gates of LSTM are input gate, output gate, and forget gate. These three gates are applied into the LSTM memory cell of the hidden layer to solve the problem of the vanishing gradient and to thus make it suitable to avoid long-term dependency problems [28,29]. The following equations describe how a memory cell layer is updated at each time, t. First, we calculate the values of the input gate, , and the values of candidate state, , for the memory cells states at time t: = tanh( + ℎ + ) Second, we calculate the activation value of the memory cells of forget gates, , at time t: Third, given the activation value of the candidate state, ; input gate, ; and the forget gate, , we can calculate the new state, memory cells, at time t, which is calculated by the combination of and and of and through the element-wise multiplication (*): The final step is to determine the output value. We can calculate the value of the output gates and eventually update the hidden state for the next iteration. Their outputs are calculated as follows: The input gate decides which information in the cell states needs to be updated, and the output gate decides which part of the information in the cell states will be output. The forget gate decides which information should be dropped from the cell state to reset the partial memory [30]. In this way, LSTM has the option of removing or adding information to the cell state rather than fully overwriting cell states as done by standard RNNs [28]. Unlike traditional neural feed-forward networks, LSTM is a sequential algorithm that has a capacity to connect prior information to the current task. Figure 2 shows the LSTM memory cell structure. In the figure, at the time of t, the input value to the memory cell is x t . The input value helps to capture all sequences of the FBG reflection spectra.
The following equations describe how a memory cell layer is updated at each time, t. First, we calculate the values of the input gate, i t , and the values of candidate state, c t , for the memory cells states at time t: Second, we calculate the activation value of the memory cells of forget gates, f t , at time t: Third, given the activation value of the candidate state, c t ; input gate, i t ; and the forget gate, f t , we can calculate the new state, c t memory cells, at time t, which is calculated by the combination of i t and c t and of f t and c t−1 through the element-wise multiplication (*): The final step is to determine the output value. We can calculate the value of the output gates and eventually update the hidden state for the next iteration. Their outputs are calculated as follows: h t and c t are transmitted as the input parameters to the next time step, σ represents the sigmoid function between 0 to 1, and the tanh activation function can set the data value within the range −1 to 1.
where W i , W f , W o , and W c are weight matrices which connect x t to the gates and the candidate value; H i , H f , H o , and H c are weight matrices which connect h t−1 to the gates and the candidate value; and b i , b f , b o , and b c are bias vectors of the three gates and candidate value. Figure 3 illustrates our proposed architectures of the LSTM algorithms. The first layer of the architecture of our proposed model consists of a layer of LSTM cells. This helps to collect the sensor reflection information about our sensor data throughout different strain sensor values. The LSTM model output may still have remaining nonlinearities; hence, we used two dense hidden layers to solve the distortions or nonlinearities. Then, we implemented a dropout layer to mitigate the risk of overfitting by regularizing the output. Finally, in order to obtain the optimal prediction, the output layer is set to structure the output of the model. Taking the benefit of LSTM in processing sequential data, the sensing signal detection problem is changed into a regression sequential data problem. As shown in Figure 3, the sensing signal detection is considered as a sequential learning problem. When the sequence of reflection spectra of FBGs X = { x 1 , x 2 , . . . . . . .., x m } is given as an input, an LSTM-based sensing signal measurement model calculates the hidden value h y , and then the output value Z = { λ BFBG1 , λ BFBG2 , λ BFBG3 , λ BFBG4 } is calculated by the following: where b is the bias vector, W is the weight matrices, and H(.) is the recurrent function of the hidden layer.
= ℎ + (15) where b is the bias vector, W is the weight matrices, and H(.) is the recurrent function of the hidden layer.

Discrete Waveform Transform (DWT)
DWT is a technique that uses a mother wavelet function to simultaneously analyze a signal in the frequency domain and the time domain. Important data are retrieved from the sensor signal, and noisy data are eliminated from the signal. The wavelet transform will break down signals to low and high frequency to maintain the original data. The wavelet transform breaks down the original signal with processes such as extending and basic wavelet translation. Then, several coefficients of wavelets are obtained. The high-frequency information or low-frequency information of the signals are obtained through high-pass or low-pass filters, respectively [30,31]. Let us say that n is the sensor signal length; then the noised signal, Y, is expressed as follows: where x is the important signal and z is the unimportant (noisy) signal. When the noise signal is random and discrete, the resulting wavelet coefficients are therefore relatively low after the DWT. The low-frequency and high-frequency wavelet coefficients are filtered with a pre-seating threshold. The remaining part is then transformed inversely by DWT. Finally, the real signal is constructed. The method of the DWT noise reduction process is described as follows: i The original data with noise are collected using Equation (1). ii Apply wavelet transform on the data. iii Apply threshold processing. iv Make it the signal reconstruction. v Finally, the noise of the signal is reduced.
After the wavelet transform is applied to the data, to reduce the fluctuations in peak power and shape of the FBG spectra, the threshold λ is expressed as follows: where is the control coefficient and is the mean square error. is used as the threshold substrate for wavelet coefficient processing, and is used as the control coefficient for . The control coefficient, , is regulated utilizing the loss established once the training is completed and the threshold λ is regulated globally. At last, λ can be used to enhance the original wavelet transform techniques.

Discrete Waveform Transform (DWT)
DWT is a technique that uses a mother wavelet function to simultaneously analyze a signal in the frequency domain and the time domain. Important data are retrieved from the sensor signal, and noisy data are eliminated from the signal. The wavelet transform will break down signals to low and high frequency to maintain the original data. The wavelet transform breaks down the original signal with processes such as extending and basic wavelet translation. Then, several coefficients of wavelets are obtained. The high-frequency information or low-frequency information of the signals are obtained through high-pass or low-pass filters, respectively [30,31]. Let us say that n is the sensor signal length; then the noised signal, Y, is expressed as follows: where x is the important signal and z is the unimportant (noisy) signal. When the noise signal is random and discrete, the resulting wavelet coefficients are therefore relatively low after the DWT. The low-frequency and high-frequency wavelet coefficients are filtered with a pre-seating threshold. The remaining part is then transformed inversely by DWT. Finally, the real signal is constructed. The method of the DWT noise reduction process is described as follows: i.
The original data with noise are collected using Equation (1). ii. Apply wavelet transform on the data. iii. Apply threshold processing. iv. Make it the signal reconstruction. v.
Finally, the noise of the signal is reduced.
After the wavelet transform is applied to the data, to reduce the fluctuations in peak power and shape of the FBG spectra, the threshold λ is expressed as follows: where ε is the control coefficient and σ is the mean square error. σ is used as the threshold substrate for wavelet coefficient processing, and ε is used as the control coefficient for σ. The control coefficient, ε, is regulated utilizing the loss established once the training is completed and the threshold λ is regulated globally. At last, λ can be used to enhance the original wavelet transform techniques. Moreover, since there are different threshold functions, such as hard-threshold de-noising and soft-threshold de-noising functions, the best threshold function can be chosen. The wavelet coefficient (y) is a function of time in terms of the oscillations, which are localized in both time and frequency. In the soft-threshold de-noising method, when the wavelet coefficient y < λ, the noise can be reset to zero, while when y ≥ λ, the |y| is subtracted by λ. The soft-threshold de-noising method is expressed as follows [30]: where y is the wavelet coefficient and λ is the threshold. On the other hand, in the hard threshold-de-noising method, when y < λ, the noise can be reset to zero and, when y ≥ λ, the wavelet coefficient retains y [30]. The hard-threshold de-noising method can be expressed as follows: The final step is that the signal is obtained by the inverse wavelet transform to reconstruct the real signal and to eliminate the noise from the signal.

Experimental Setup
The experimental setup of the IWDM-based FBG sensor system is presented in Figure 1. The output power of the EDFA light source is approximately 16 dBm. The four ports of an optical coupler, with distinct output power ratios of 70%, 60%, 40%, and 30%, have been used in the proposed distributed IWDM-based FBG sensor network to connect a number of FBG sensors for strain sensing. An IWDM technique is suggested to enhance the multiplexing capacity. The EDFA broadband source was used in the FBG sensor scheme to illuminate the four FBG sensors. The central wavelengths of FBG1, FBG2, FBG3, and FBG4 are 1542.34, 1545.2039, 1545.5393, and 1545.8403, respectively. The full width half maxima (FWHM) of the four FBG sensors is 0.2 nm, and the resolution of OSA was set to 0.1 pm. The span width of the OSA was set to 5 nm and was sampled by 2001 points. In our proposed experimental setup, the number of FBGs sensors increases four times the traditional WDM, as each output path of the coupler (i.e., four paths) can support a number of FBGs. The details regarding how the training data were collected are discussed below: Figure 4 shows the data processing and preparation process. As for the data preparation, the main objective is to learn the direct correlation between the strain and the reflection spectrum of FBG. During the experiment, the first step is to record the spectra of FBG sensors based on applying a distinct strain to the FBG1 sensor. The experiment is conducted based on the setup shown in Figure 1.

Data Collection and Preprocessing
For training the LSTM model, the training dataset and testing dataset are recorded using Equation (1), with specified parameters in the experimental setup, including the peak power of each FBG, FWHM, central wavelength of each FBGs, and sample points. The training and testing dataset are the reflection spectra of four FBGs at distinct strain values of FBG1. Thus, we collect a number of training datasets by applying different strain values to the FBG1 sensor (i.e., 0-1285 µε) until we find the maximum strain value. When strain is applied to FBG1, the central wavelength of the FBG1 sensor is shifted in the range from 1542.34 nm to 1547.34 nm and the measured spectra of FBGs are sampled by 2001 points, whereas the central wavelengths of other FBGs remain fixed. The strain applied to FBG1 sensor at each strain step is~41 µε. Sensors 2020, 20, x 9 of 18 Figures 5 and 6 illustrate the reflection spectra of the FBG sensors at FBG1 stain values of 335 με and 595 με, respectively. As shown in the figures, the spectra of two or three adjacent FBGs overlap fully or partially. The training and testing data samples are recorded using an optical spectrum analyzer by changing the strain applied to FBG1. The collected training data (i.e., the four FBGs reflection spectra at distinct strains) are denoted as z = [ 1, 2. . . ], where i is the length of the data input dimensionality. We have 2001 total features as input dimensionality to the proposed system. However, the FBG sensing signals are subject to unstable reflection spectra and noises in practical applications due to the instability of the output power, harsh environment, random noise, deviations in the shape of reflection spectra, aging of the equipment, and other environmental factors that affect the reflection spectra shape, which makes it difficult to distinguish the sensing signals of FBGs from the noise spectrum or noise data.
The reflection spectra features, such as asymmetry, spectral broadening, and top fluctuation, can reduce the sensing signal measurement accuracy. As a result, to adapt and prove our proposed system even in bad environments or noisy data, we train and test the proposed model using noisy data. Due to white Gaussian noise better simulating random noise when the cause of the noise is very complicated, we add white Gaussian noise to the original sensor data using Equation (1). Figures 5a and Figure 6a show the spectra of four FBGs after random noise is added in the original sensor data. The signal-to-noise ratio (SNR) of the noisy FBG signals is 20 dB. However, due to the presence of a high level of noise in the FBG sensor data, the LSTM deep learning model cannot accurately measure the sensing signal of each FBG. To solve this problem, in this paper, the DWT de-noising method is proposed to reduce the noise of the training data (noisy sensor data). The data de-noising process using a wavelet transform of the hard/soft threshold technique can eliminate the noise from the signal.
As shown in Figures 5b and 6b, the noise of the sensing signal is filtered and the spectra of the FBGs look clear. Among the number of spectra of FBGs that are used as training data (output data) that feed into our proposed algorithm, Figures 5b and 6b show the spectra of four FBGs at strain values of 335 με and 595 με, respectively. Then, the data after DWT de-noising is used as the training data for the LSTM model. Thus, after the data de-noising and data preprocessed procedure of the sensors data is completed, the sensor data are fed into the LSTM model to measure the sensing signal of each FBG. The LSTM can learn and understand the features from the reflection spectra of FBG sensors and can design the sensing signals measurement model for the FBG sensor system.
The preprocessed dataset has been divided into training and testing datasets, as shown in Figure  4. Each has a similar number of features and target values. Before training, the training data are normalized to [0, 1] using min-max scaling. The sequence of the reflection spectra data of four FBGs at different strain steps is used as the input data, and the central wavelengths of FBG1, FBG2, FBG3, and FBG4 at different strain steps are used as target data for training the LSTM model. The training data are used to train the network and to adjust the parameters iteratively to minimize the loss function of the model. Then, the well-trained LSTM model estimates the sensing signal of the unknown test samples from the test data, and a test loss is measured. Therefore, the sensing signal measurement of FBGs can be rapidly determined by sequentially feeding the reflection spectra of FBGs into the well-strained LSTM model.  Figures 5 and 6 illustrate the reflection spectra of the FBG sensors at FBG1 stain values of 335 µε and 595 µε, respectively. As shown in the figures, the spectra of two or three adjacent FBGs overlap fully or partially. The training and testing data samples are recorded using an optical spectrum analyzer by changing the strain applied to FBG1. The collected training data (i.e., the four FBGs reflection spectra at distinct strains) are denoted as z = [z1, z2 . . . zi], where i is the length of the data input dimensionality. We have 2001 total features as input dimensionality to the proposed system. However, the FBG sensing signals are subject to unstable reflection spectra and noises in practical applications due to the instability of the output power, harsh environment, random noise, deviations in the shape of reflection spectra, aging of the equipment, and other environmental factors that affect the reflection spectra shape, which makes it difficult to distinguish the sensing signals of FBGs from the noise spectrum or noise data.

Determining the Optimal LSTM Parameters
Our proposed LSTM model is implemented using the TensorFlow framework, in conjunction with the Keras and Sklearn libraries. The simulation part of this paper runs on a PC, which has an

Determining the Optimal LSTM Parameters
Our proposed LSTM model is implemented using the TensorFlow framework, in conjunction with the Keras and Sklearn libraries. The simulation part of this paper runs on a PC, which has an The reflection spectra features, such as asymmetry, spectral broadening, and top fluctuation, can reduce the sensing signal measurement accuracy. As a result, to adapt and prove our proposed system even in bad environments or noisy data, we train and test the proposed model using noisy data. Due to white Gaussian noise better simulating random noise when the cause of the noise is very complicated, we add white Gaussian noise to the original sensor data using Equation (1). Figures 5a and 6a show the spectra of four FBGs after random noise is added in the original sensor data. The signal-to-noise ratio (SNR) of the noisy FBG signals is 20 dB. However, due to the presence of a high level of noise in the FBG sensor data, the LSTM deep learning model cannot accurately measure the sensing signal of each FBG. To solve this problem, in this paper, the DWT de-noising method is proposed to reduce the noise of the training data (noisy sensor data). The data de-noising process using a wavelet transform of the hard/soft threshold technique can eliminate the noise from the signal.
As shown in Figures 5b and 6b, the noise of the sensing signal is filtered and the spectra of the FBGs look clear. Among the number of spectra of FBGs that are used as training data (output data) that feed into our proposed algorithm, Figures 5b and 6b show the spectra of four FBGs at strain values of 335 µε and 595 µε, respectively. Then, the data after DWT de-noising is used as the training data for the LSTM model. Thus, after the data de-noising and data preprocessed procedure of the sensors data is completed, the sensor data are fed into the LSTM model to measure the sensing signal of each FBG. The LSTM can learn and understand the features from the reflection spectra of FBG sensors and can design the sensing signals measurement model for the FBG sensor system.
The preprocessed dataset has been divided into training and testing datasets, as shown in Figure 4. Each has a similar number of features and target values. Before training, the training data are normalized to [0, 1] using min-max scaling. The sequence of the reflection spectra data of four FBGs at different strain steps is used as the input data, and the central wavelengths of FBG1, FBG2, FBG3, and FBG4 at different strain steps are used as target data for training the LSTM model. The training data are used to train the network and to adjust the parameters iteratively to minimize the loss function of the model. Then, the well-trained LSTM model estimates the sensing signal of the unknown test samples from the test data, and a test loss is measured. Therefore, the sensing signal measurement of FBGs can be rapidly determined by sequentially feeding the reflection spectra of FBGs into the well-strained LSTM model.

Determining the Optimal LSTM Parameters
Our proposed LSTM model is implemented using the TensorFlow framework, in conjunction with the Keras and Sklearn libraries. The simulation part of this paper runs on a PC, which has an Intel Core i7-4790 3.60 GHz GPU and 20.0 GB RAM. Figure 7 shows the flowchart and training process of the proposed LSTM with four hidden layers and two fully connected layers. The basic architectural structure of our designed LSTM network is as follows: First, the collected training dataset (strain sensing signals) is preprocessed, removes the noise using data de-nosing techniques, and is structured according to the machine training formats. Then, to train the LSTM algorithm, the preprocessed reflection spectra of the FBGs are used as inputs to the LSTM and the corresponding peak wavelengths of FBGs are used as target values. We adjust the different parameters, such as epochs, hidden layers, batch sizes, and optimizer and activation functions, until optimal values are obtained. Then, the well-developed LSTM model is tested by using unseen test datasets. Finally, the prediction outputs are generated by the dense layer, and then, we use a loss function to compare the prediction outputs with the actual values. The prediction performance of our proposed model is evaluated through root mean square error (RMSE). During the training of the LSTM, various parameters such as the number of epochs, batch sizes, hidden layers, hidden units, and optimizer and activation functions are adjusted until optimal values are obtained. Tuning all these parameters results in different training times, root mean square errors (RMSE), and mean square errors (MSE). To know the optimal optimizers, we train the model with distinct optimizers [32] and compare the performance based on MSE (see Figure 8).
As illustrated in figure 8, Adamax achieves a smaller MSE (i.e., 0.025 pm) than the other optimizers, as adamax is computationally efficient and minimizes noise. Thus, the proposed algorithm is trained using the Adamax optimizer. We also use different activation functions [33]sigmoid, Relu, tanh, and softmax-to squash the output of the proposed algorithm and to compare the performance. Moreover, we apply dropout regularization within the LSTM layer.
Throughout the training period, a portion of the input units are dropped randomly at each update, both at the input gates and at the recurrent connections, resulting in a lower probability of overfitting and a better generalization performance. Hence, several trainings have been computed on the proposed algorithm until we found the optimal outputs. Therefore, after several trainings, we compile the network using the Adamax optimization algorithm and tanh activation function. Finally, the prediction outputs are generated by the dense layer, and then, we use a loss function to compare the prediction outputs with the actual values. The loss metrics for evaluating the training and validation losses of the proposed model is MSE. The MSE can be calculated as follows: where n, , and are the number of predicted values, the actual value, and the predicted value, respectively.  During the training of the LSTM, various parameters such as the number of epochs, batch sizes, hidden layers, hidden units, and optimizer and activation functions are adjusted until optimal values are obtained. Tuning all these parameters results in different training times, root mean square errors (RMSE), and mean square errors (MSE). To know the optimal optimizers, we train the model with distinct optimizers [32] and compare the performance based on MSE (see Figure 8). During the training of the LSTM, various parameters such as the number of epochs, batch sizes, hidden layers, hidden units, and optimizer and activation functions are adjusted until optimal values are obtained. Tuning all these parameters results in different training times, root mean square errors (RMSE), and mean square errors (MSE). To know the optimal optimizers, we train the model with distinct optimizers [32] and compare the performance based on MSE (see Figure 8).
As illustrated in figure 8, Adamax achieves a smaller MSE (i.e., 0.025 pm) than the other optimizers, as adamax is computationally efficient and minimizes noise. Thus, the proposed algorithm is trained using the Adamax optimizer. We also use different activation functions [33]sigmoid, Relu, tanh, and softmax-to squash the output of the proposed algorithm and to compare the performance. Moreover, we apply dropout regularization within the LSTM layer.
Throughout the training period, a portion of the input units are dropped randomly at each update, both at the input gates and at the recurrent connections, resulting in a lower probability of overfitting and a better generalization performance. Hence, several trainings have been computed on the proposed algorithm until we found the optimal outputs. Therefore, after several trainings, we compile the network using the Adamax optimization algorithm and tanh activation function. Finally, the prediction outputs are generated by the dense layer, and then, we use a loss function to compare the prediction outputs with the actual values. The loss metrics for evaluating the training and validation losses of the proposed model is MSE. The MSE can be calculated as follows: where n, , and are the number of predicted values, the actual value, and the predicted value, respectively.  As illustrated in Figure 8, Adamax achieves a smaller MSE (i.e., 0.025 pm) than the other optimizers, as adamax is computationally efficient and minimizes noise. Thus, the proposed algorithm is trained using the Adamax optimizer. We also use different activation functions [33]-sigmoid, Relu, tanh, and softmax-to squash the output of the proposed algorithm and to compare the performance. Moreover, we apply dropout regularization within the LSTM layer.
Throughout the training period, a portion of the input units are dropped randomly at each update, both at the input gates and at the recurrent connections, resulting in a lower probability of overfitting and a better generalization performance. Hence, several trainings have been computed on the proposed algorithm until we found the optimal outputs. Therefore, after several trainings, we compile the network using the Adamax optimization algorithm and tanh activation function. Finally, the prediction outputs are generated by the dense layer, and then, we use a loss function to compare the prediction outputs with the actual values. The loss metrics for evaluating the training and validation losses of the proposed model is MSE. The MSE can be calculated as follows: where n, y i , and y are the number of predicted values, the actual value, and the predicted value, respectively. Moreover, to obtain the optimal model, we trained the LSTM model with a variety of hidden units to choose the best number of hidden units. The comparison of the LSTM models' RMSE with various hidden unit numbers is shown in Figure 9. As shown in the figure, the RMS error decreases as the numbers of hidden units increases. However, when the number of hidden units increases, this requires more testing time. The number of hidden units is saturated, and the error increases after the 824th hidden unit. The maximum test time for the 824 hidden unit LSTM model is 0.526 s. Therefore, for the proposed LSTM model, the optimal hidden unit's number is 824, which achieves both acceptable accuracy and test time.
Sensors 2020, 20, x 12 of 18 various hidden unit numbers is shown in Figure 9. As shown in the figure, the RMS error decreases as the numbers of hidden units increases. However, when the number of hidden units increases, this requires more testing time. The number of hidden units is saturated, and the error increases after the 824th hidden unit. The maximum test time for the 824 hidden unit LSTM model is 0.526 s. Therefore, for the proposed LSTM model, the optimal hidden unit's number is 824, which achieves both acceptable accuracy and test time. Moreover, Figure 10 shows the MSE variation of the training and validation losses of our proposed LSTM model with various epoch numbers. Both the training and validation losses reduce significantly with the increase of the epochs/iterations. When the epoch number exceeds 250, the training loss reduces slowly and the validation loss varies in the range of 0 pm to 0.014 pm. The training loss and validation loss converge quickly after approximately 800 iterations, while the optimum value is obtained at the 1400th iteration. The training and validation losses of LSTM are 0.003 and 0.0005 pm, respectively. Figure 11 shows the training accuracy and validation accuracy of our proposed LSTM models with different epoch numbers. As shown in the figure, when the epoch number increases, the accuracy of the model also increases. Thus, at epoch 1400, the LSTM validation accuracy (blue color) achieves 100% accuracy and the LSTM training accuracy (red color) achieves 95% accuracy. Therefore, after several training computations, the optimal values are achieved using the following parameters: 824 hidden units, 1500 batch size, 1400 epochs, and four hidden layers for the welltrained LSTM model, which achieves an acceptable accuracy and testing time.   Figure 11 shows the training accuracy and validation accuracy of our proposed LSTM models with different epoch numbers. As shown in the figure, when the epoch number increases, the accuracy of the model also increases. Thus, at epoch 1400, the LSTM validation accuracy (blue color) achieves 100% accuracy and the LSTM training accuracy (red color) achieves 95% accuracy. Therefore, after several training computations, the optimal values are achieved using the following parameters: 824 hidden units, 1500 batch size, 1400 epochs, and four hidden layers for the well-trained LSTM model, which achieves an acceptable accuracy and testing time.

Model Testing
This section describes the strain sensing signal measurement performance of our proposed LSTM deep learning model. To test the strain sensing signal measurement performance of the welltrained LSTM, we have taken the unseen test data (reflection spectra of FBGs) from OSA. To test the sensing signal measurement performance of our well-trained LSTM model, we use four different testing cases. Thus, the test data focus on when the situation of the spectra of two or three adjacent FBG sensors are partially or fully overlapped (see blue color spectrum in Figures 12 and 13). As shown in Figures 12 and 13, when two or three FBG sensors are overlapped, the output power (intensity) is the sum of the two or three sensors and the peak power is high. To test the strain sensing signal measurement accuracy performance of our proposed model, we use RMSE evaluation methods, defined as follows: where n, , and are the number of predicted values, the actual value, and the predicted value, respectively. Therefore, the sensing signal measurement of FBGs can be rapidly determined by sequentially feeding the reflection spectra of FBGs into the well-strained LSTM model. As a result, the strain

Model Testing
This section describes the strain sensing signal measurement performance of our proposed LSTM deep learning model. To test the strain sensing signal measurement performance of the well-trained LSTM, we have taken the unseen test data (reflection spectra of FBGs) from OSA. To test the sensing signal measurement performance of our well-trained LSTM model, we use four different testing cases. Thus, the test data focus on when the situation of the spectra of two or three adjacent FBG sensors are partially or fully overlapped (see blue color spectrum in Figures 12 and 13). As shown in Figures 12  and 13, when two or three FBG sensors are overlapped, the output power (intensity) is the sum of the two or three sensors and the peak power is high. To test the strain sensing signal measurement accuracy performance of our proposed model, we use RMSE evaluation methods, defined as follows: where n, y i , and y are the number of predicted values, the actual value, and the predicted value, respectively. Therefore, the sensing signal measurement of FBGs can be rapidly determined by sequentially feeding the reflection spectra of FBGs into the well-strained LSTM model. As a result, the strain sensing signals output the results of the well-trained LSTM model for four distinctive test cases, as shown in Figures 12 and 13. Figure 12a,b shows the sensing signal measurement output of our proposed LSTM model without using DWT data de-noising techniques when the spectra of two FBGs are overlapped (see Figure 12a) and of three FBGs are overlapped (see Figure 12b). As shown in Figure 12a, the proposed LSTM model can measure the sensing signals of FBGs without using de-noising techniques when the spectra of FBG1 and FBG2 are completely overlapped. As shown in Figure 12b, the proposed LSTM model measures the sensing signal of each FBG without using de-noising techniques when the spectra of the FBG1, FBG2, and FBG3 sensors are overlapped. without using a DWT is unsatisfactory. On the other hand, the performance of the LSTM model in conjunction with the DWT de-noising technique achieves a better performance even in noisy data, as shown in Figure 13. Hence, the LSTM model in conjunction with a DWT de-noising method is capable of efficiently improving the sensing signal measurement accuracy of the FBG sensor system. Therefore, our proposed deep learning algorithm proves that we can accurately measure the sensing signal of FBGs even with overlapped FBG spectra and noisy sensor data.

Performance Evaluation
Furthermore, to verify and validate the strain sensing measurement performance of our proposed deep learning model, we compare and contrast the performances of our proposed LSTM model with two other models: extreme learning machine (ELM) and multilayer perceptron (MLP). We computed the simulation using the same parameters, the same training and testing data, and under the same PC environment. For the ELM model, the number of hidden units is set to 1200. Table  1 indicates the comparison of strain sensing signal measurement performance of our proposed LSTM models with the other two models based on four different test cases. As shown in the table, the performances of our proposed LSTM model without using DWT method in test case a and test case b are 0.092 pm, and 0.098 pm, respectively.
On the other hand, the performances of our proposed LSTM model in conjunction with a DWT data de-noising method in test case i and test case ii are 0.024 pm and 0.067 pm, respectively. Hence, the RMSE of our proposed LSTM model is smaller than the other two models in all test cases. Therefore, our proposed LSTM model achieves better sensing signal measurement performance than the MLP and ELM models with a low-error sensing signal measurement. The LSTM algorithm has the ability to learn complex representations from the sequential features of the spectra of FBGs. The proposed LSTM model can also avoid the randomness and uncertainty of EAs with higher reliability. Thus, even when the adjacent FBG sensors spectra are fully or partly overlapped, the proposed LSTM model accurately measures the strain sensing signal of the four FBGs.  The sensing signal measurement performances of the proposed LTM model without using a DWT data de-noising method are 0.092 pm and 0.098 pm when two FBG spectra (test case a (Figure 12a)) and three FBGs spectra (test case b (Figure 12b)) are overlapped, respectively. On the other hand, Figure 13a,b shows the sensing signal measurement output of our proposed LSTM model in conjunction with using DWT data de-noising techniques when the spectra of two FBGs spectra and three FBGs spectra are overlapped, respectively.
As shown in Figure 13a, the proposed LSTM model accurately measured the sensing signals of FBGs using DWT data de-noising techniques even when the spectra of FBG1 and FBG2 are completely overlapped. As shown in Figure 13b, the proposed LSTM model in conjunction with using DWT de-noising techniques accurately measured the sensing signal of each FBGs even when the spectra of the FBG1, FBG2, and FBG3 sensors are overlapped. The sensing signal measurement performances of the proposed LTM model in conjunction with a DWT data de-noising method are 0.024 pm and 0.067 pm when the two FBG spectra (test case a (Figure 13a)) and three FBGs spectra (test case b (Figure 13b)) are overlapped, respectively.
The smaller the RMSE measurement error indicates that the measured value by our proposed model is nearly closer to the actual value. As a result, the experimental results, as shown in Figure 12, demonstrate that the sensing signal measurement performance using our proposed LSTM model without using a DWT is unsatisfactory. On the other hand, the performance of the LSTM model in conjunction with the DWT de-noising technique achieves a better performance even in noisy data, as shown in Figure 13. Hence, the LSTM model in conjunction with a DWT de-noising method is capable of efficiently improving the sensing signal measurement accuracy of the FBG sensor system. Therefore, our proposed deep learning algorithm proves that we can accurately measure the sensing signal of FBGs even with overlapped FBG spectra and noisy sensor data.

Performance Evaluation
Furthermore, to verify and validate the strain sensing measurement performance of our proposed deep learning model, we compare and contrast the performances of our proposed LSTM model with two other models: extreme learning machine (ELM) and multilayer perceptron (MLP). We computed the simulation using the same parameters, the same training and testing data, and under the same PC environment. For the ELM model, the number of hidden units is set to 1200. Table 1 indicates the comparison of strain sensing signal measurement performance of our proposed LSTM models with the other two models based on four different test cases. As shown in the table, the performances of our proposed LSTM model without using DWT method in test case a and test case b are 0.092 pm, and 0.098 pm, respectively.  On the other hand, the performances of our proposed LSTM model in conjunction with a DWT data de-noising method in test case i and test case ii are 0.024 pm and 0.067 pm, respectively. Hence, the RMSE of our proposed LSTM model is smaller than the other two models in all test cases. Therefore, our proposed LSTM model achieves better sensing signal measurement performance than the MLP and ELM models with a low-error sensing signal measurement. The LSTM algorithm has the ability to learn complex representations from the sequential features of the spectra of FBGs. The proposed LSTM model can also avoid the randomness and uncertainty of EAs with higher reliability. Thus, even when the adjacent FBG sensors spectra are fully or partly overlapped, the proposed LSTM model accurately measures the strain sensing signal of the four FBGs.
Furthermore, Figure 14 shows the performance comparison between our proposed LSTM models with an MLP model based on a different number of epochs.
In our proposed LSTM model, a small RMSE is achieved when the epoch number is 1400. Thus, as shown in the figure, even when the spectra of three FBG sensors are overlapped, the RMS errors are 0.024 pm and 0.258 pm for LSTM and MLP, respectively. As a result, our proposed LSTM model achieves better sensing signal measurement performance than MLP at different epoch numbers.
The LSTM algorithm has the ability to learn complex representations from the sequential features of the spectra of FBGs. In our proposed LSTM model, a small RMSE is achieved when the epoch number is 1400. Thus, as shown in the figure, even when the spectra of three FBG sensors are overlapped, the RMS errors are 0.024 pm and 0.258 pm for LSTM and MLP, respectively. As a result, our proposed LSTM model achieves better sensing signal measurement performance than MLP at different epoch numbers. The LSTM algorithm has the ability to learn complex representations from the sequential features of the spectra of FBGs.
Furthermore, the comparison of the RMS error from the proposed LSTM models with the other two models based on hidden unit numbers variation is shown in Figure 15. As shown in the figure, the RMS error decreases as the numbers of hidden units increases and LSTM has better performance than the other two models. Our proposed LSTM model achieves a better result when the hidden unit is 824. Therefore, the LSTM-based strain sensing signal measurement method can improve the sensing signal measurement accuracy, speed, and the number of FBG sensors even in bad or noisy environments.

Conclusions
In this paper, we proposed an LSTM integrated with data de-noising techniques for an IWDMbased multipoint FBG sensor system to improve the sensing signal measurement accuracy. As the performance of our proposed LSTM model depends on the sensing signal measurement of FBGs, we calculated the sensing signal measurement errors to test the effectiveness of our proposed LSTM model. First, we used DWT data de-noising methods to reduce the noise and processed the noisy FBG signals. Then, we utilized our proposed LSTM model to measure the sensing signal of each FBG Furthermore, the comparison of the RMS error from the proposed LSTM models with the other two models based on hidden unit numbers variation is shown in Figure 15. As shown in the figure, the RMS error decreases as the numbers of hidden units increases and LSTM has better performance than the other two models. Our proposed LSTM model achieves a better result when the hidden unit is 824. Therefore, the LSTM-based strain sensing signal measurement method can improve the sensing signal measurement accuracy, speed, and the number of FBG sensors even in bad or noisy environments. In our proposed LSTM model, a small RMSE is achieved when the epoch number is 1400. Thus, as shown in the figure, even when the spectra of three FBG sensors are overlapped, the RMS errors are 0.024 pm and 0.258 pm for LSTM and MLP, respectively. As a result, our proposed LSTM model achieves better sensing signal measurement performance than MLP at different epoch numbers. The LSTM algorithm has the ability to learn complex representations from the sequential features of the spectra of FBGs.
Furthermore, the comparison of the RMS error from the proposed LSTM models with the other two models based on hidden unit numbers variation is shown in Figure 15. As shown in the figure, the RMS error decreases as the numbers of hidden units increases and LSTM has better performance than the other two models. Our proposed LSTM model achieves a better result when the hidden unit is 824. Therefore, the LSTM-based strain sensing signal measurement method can improve the sensing signal measurement accuracy, speed, and the number of FBG sensors even in bad or noisy environments.

Conclusions
In this paper, we proposed an LSTM integrated with data de-noising techniques for an IWDMbased multipoint FBG sensor system to improve the sensing signal measurement accuracy. As the performance of our proposed LSTM model depends on the sensing signal measurement of FBGs, we calculated the sensing signal measurement errors to test the effectiveness of our proposed LSTM model. First, we used DWT data de-noising methods to reduce the noise and processed the noisy FBG signals. Then, we utilized our proposed LSTM model to measure the sensing signal of each FBG

Conclusions
In this paper, we proposed an LSTM integrated with data de-noising techniques for an IWDM-based multipoint FBG sensor system to improve the sensing signal measurement accuracy. As the performance of our proposed LSTM model depends on the sensing signal measurement of FBGs, we calculated the sensing signal measurement errors to test the effectiveness of our proposed LSTM model. First, we used DWT data de-noising methods to reduce the noise and processed the noisy FBG signals. Then, we utilized our proposed LSTM model to measure the sensing signal of each FBG from the de-noising FBG signal. The well-trained LSTM model learned from the sequential features of the FBG sensors spectra and can identify the strain sensing signal of FBGs for the entire FBG sensor system. A significant benefit of the proposed LSTM model is that, in any bad environment or with noisy sensor data, we do not require retraining or building of a new model. As a result, the well-trained LSTM model achieves a better sensing signal measurement performance even though the spectra of FBG sensors completely overlap and the sensor data are noisy. Compared with other traditional machine learning techniques, the LSTM model achieves a high sensing signal measurement accuracy performance. Therefore, the proposed LSTM model can increase the number of FBG sensors in the sensor system and improves the sensing signal measurement accuracy performance of the IWDM FBG sensor network.