Noise Cancellation Algorithm Based on Air-and Bone-Conducted Speech Signals by Considering an Unscented Transformation Method

Noise control is essential when applying speech recognition in noisy environments such as factories. In this study, a signal processing for noise cancellation is proposed by using a noise-insensitive bone-conducted speech signal together with an air-conducted speech signal. The speech signal is generally expressed by a nonlinear model. The extended Kalman ﬁlter is very famous as a state estimation method for nonlinear systems. However, this ﬁlter needs a linearized approximation model for the nonlinear systems. By using the sample point called Sigma point, the unscented Kalman ﬁlter (UKF) can be applied to the nonlinear system model without linear approximation. In this study, new type method is proposed based on the UKF. Although UKF considers Gaussian noise, an extended UKF considering non-Gaussian noise is proposed. A noise cancellation method is derived by use of air-and bone-conducted speech signals. The validity of this method is investigated by using both conducted speech signals measured in a noisy real environment.


Introduction
Recently, speech recognition systems are used in car navigation systems and smart speakers etc.. However, speech recognition can not be performed effectively in circumstances with heavy noises. Therefore, some countermeasures against surrounding noise are indispensable in such situations . Kalman filter has been applied to many noisy circumstances as a noise cancellation method for speech signal [1], [2], [3]. This filter assumes a linear model subject to white and Gaussian noise as the system equation and the observation equation [4], [5]. Though, the extended Kalman filter (EKF) [6] can be applied to nonlinear systems, linear approximation models of nonlinear systems are required. Therefore, many improvements are necessary for the noise cancellation method to apply it to actual speech signal processing. From the above viewpoint, in our preciously reported study, a noise cancellation method was proposed by using air-and bone-conducted speech signals in the situation contaminated by non-Gaussian and non-white noises [7].
However, since the calculation of the expansion coefficients in the previous algorithm was very complicated, a simplified method is required.
On the other hand, the unscented Kalman filter (UKF) by use of unscented transformation (UT) method can be applied to nonlinear system [9]. The UT method is a technique for calculating the statistics of a random variable that has been nonlinearly transformed. The set of samples on so-called sigma points(σ-points) are chosen so that they capture the specific properties of the underlying distribution. Therefore, this method can be applied to arbitrary nonlinear systems. In our previous study, a noise cancellation method based on only air conducted speech signal has been proposed by applying the UT method [8].
In this study, a new noise cancellation algorithm based on air-and bone-conducted speech signals is proposed by considering the UT method. The relationship between airconducted speech signal and backgrand noise is expressed as an additive model based on the additive property of sound pressure. However, propagation mechanism of boneconducted speech signal is complicated and has to be considered as an unknown system in general. Therefore, a system model including unknown parameters is introduced in this study. More specifically, the sample points obtained by using the UT method are introduced. The noise cancellation algorithm is derived by use of an expansion expression of Bayes' theorem. This method can be considered non-Gaussian properties of noises and nonlinear correlation information between the speech signal and observation. Furthermore, the validity of the proposed method is experimentally confirmed by applying it to real speech signal with noises.

Modeling of Air-and Bone-Conducted Speech Signals
We consider the original speech signal x k , observed airconducted speech signal y k and bone-conducted speech signal z k at discrete time k. The observation y k is contaminated by a surrounding noise v k . According to the additive property of sound pressure, the following relationship can be established.
where the mean and variance of v k are known. In order to derive the propagation model of the bone-conducted speech signal, the correlation information between x k and z k is required.However, it is difficult to obtain prior information on the unknown speech signal x k . In this study, a new adaptive algorithm for noise cancellation is proposed by introducing a propagation model with unknown parameters between x k and z k as the bone-conducted speech signal model for z k : where w k is a random noise (mean : 0, variance1) and a k and b k are unknown parameters.

Estimation Method Combined Bayes' Theorem with UT Method
The conditional joint probability distribution of the specific signal x k and the unknown parameters a k and b k is expressed by using expansion expression of Bayes' theorem [10]. To simplify the derivation process of the estimation algorithm, σ-points and the weighting coefficients are introduced in expansion coefficients. The conditional probability distribution of x k , a k and b k is expressed as with where Y k (= {y 1 , y 2 , ..., y k }) and Z k (= {z 1 , z 2 , ..., z k }) are sets of air-and bone-conducted speech signal data up to time k. The above five functions ϕ (1) l (x k ), ϕ (2) m (a k ), ϕ (3) n (b k ), ϕ (4) s (y k ) and ϕ (5) t (z K ) are orthonormal polynomials of degrees l, m, n, s and t with weighting functions P 0 ( , which can be chosen as the probability functions describing the dominant part of the fluctuation. As the examples of standard probability functions, Gaussian distribution is adopted: with The orthonormal polynomials [11] with five weighting probability distributions in Eq. (5) are then specified as The estimates for mean and variance (i.e., conditional mean and variance) of x k , a k , b k , which are the first and second order statistics, can be expressed as follows: . (17) Where coefficients are appropriate constant satisfying the following equality: By using the UT method, the expansion coefficients defined by (4) can be realized for arbitrary nonlinear systems. When the UT method is applied to approximate the means and variances of x k , a k , b k , y k and z k , the σ-points k are obtained as sample points, as follows: The σ-points are decided so as to obtain the approximately same mean and variance as original variables. Where λ is a regulation parameter. The weights to be used are obtained as follows: , , Here, the weighing coefficients W (i) have to satisfy the normalization constraint.
Each expansion coefficient A lmnst defined by (4) is obtained specifically by substituting σ-points k and z * (i) k into the conditional expectation of x k , y k and z k .
with A 00000 = 1, A 10000 = A 00010 = A 00001 = A 20000 = A 00020 = A 00002 = 0. Furthermore, expansion coefficients of a k , y k and z k are www.astesj.com expressed as follows: The expansion coefficients of b k , y k and z k (A 00110 , A 00120 , A 00210 , A 00220 , A 00101 , A 00102 , A 00201 , A 00202 , A 00111 , A 00211 , A 00121 , A 00112 , A 00212 , A 00122 , A 00221 , A 00222 ) are calculated through the same manners.
After substituting (1) (2) into the definition of four parameters y * k , Ω k , z * k and Φ k in (6), the following expressions can be derived.
In order to derive the predicted values of the speech signal x k and the unknown parameters a k , b k , the time transition of the speech signal x k is expressed as follows.
where, u k is a random input with mean 0 and variance 1. Parameters F and G are calculated from time correlation information of x k and x k+1 : Therefore, x * k+1 and Γ x k+1 can be expressed as follows: Since the parameters a k and b k are constants, time transition models are introduced for the recursive estimation.
By using these relationships, the predictions are given as follows, The state estimation algorithm with expansion coefficient A lmnst reflecting linear and nonlinear correlation information among variables and statistics of non-Gaussian noise is completed.

Experiment
In order to confirm the validity of the proposed noise cancellation algorithm, we compared it with the method using only the air-conducted speech signal. The compared method was derived by considering the following conditional probability distribution.
Based on (35), the estimates of mean and variance of x k are derived as follow:x Male and female speech signals were used in the experiment. The speech signal data were measured in the anechoic chamber in the acoustic laboratory. The observed speech signal are contaminated with the white noise, the pink noise and the machine noise respectively. The spectra of these noises are shown in Figures 1-3. Furthermore, the observation data of air-conduced speech signal were created by mixing noises with speech signal on a computer.    Table 1 shows the specifications of the personal computer for signal processing in the experiment. The signal processing time for speech signal of about 3.5 seconds in length was from 0.5 to 0.8 seconds. As an evaluation method of estimation result, the Root Mean Square Error (RMSE) and Performance Evaluation Index (PEI) are adopted.
As the RMS Error is smaller value, the better estimation result is obtained. On the other hand, the larger the PEI is, the better the estimation is. Table 2 and Table 3 show the results for the male speech signal and a female speech signal respectively. In the cases of lower noises, almost the same estimation results are obtained in the proposed method and the compared method. On the other hand, in the case of higher noises, the proposed method obtains better results than the compared method. Furthermore, in comparison with our previous method [7] with complicated algorithm, almost the same accurate estimation results are obtained as the proposed method. Therefore, the superiority of the proposed method adopting the simplified algorithm could be confirmed. Some of the waveform summarized in Tables 2 and 3 are shown in Figures 4-23. Figures 4 and 14 show the original speech signals of male and female, respectively. Figures 5,8,11,15,18 and 21 show the speech signals contaminated by noises with amplitude of 3 times larger than the original signals. The estimated results by using of proposed method are shown in Figures 6, 9, 12, 16, 19 and 22. On the other hand, the comparison results are shown in Figures 7, 10, 13, 17, 20 and 23. In the cases of contaminated by white noise and pink noise, the better results are obtained by the proposed method than the compared method.

Conclusion
In this paper, a new method to suppress noise for speech signal has been proposed, which is applicable to actual enwww.astesj.com vironment with non-Gaussian and non-white noises. The aim of the proposed method is to improve the accuracy of estimation by using air-and bone-conducted speech signals.
The proposed method considered σ-points of not only x k but also unknown parameter a k , b k and observation values y k , z k . Moreover, this study has proposed a method including the higher order correlation information between σ-points. Our algorithm has been realized by utilizing the Bayes' theorem as the fundamental principle of estimation and UT method using σ-points. Application of our algorithm has been made to real speech signal contaminated by noises. It has been revealed by experiments that better estimation results could be obtained by the proposed algorithm as compared with the method without using bone-conducted speech signal. However, we have not tried to apply the proposed algorithm to real speech recognition by use of a voice recognition software. Therefore, by applying the algorithm to speech recognition system, the effectiveness of the theory has to be confirmed experimentally.
The proposed approach is quite different from those traditional standard techniques. However, we are still in an early stage of development, and a number of practical problems are yet to be investigated in the future. These include: (i) Introduction of a realistic nonlinear model expressing the actual propagation characteristics of bone-conducted speech signal instead of the simple model in (2). (ii) Consideration of higher order expansion coefficients A lmnst (l, m, n, s, t 3), in the estimation algorithm. (iii) Selection of an optimal point to put the sensor to measure the bone-conducted speech signal.