Development of Wavelet-Based Tools for Event Related Potentials' N400 Detection: Application to Visual and Auditory Vowelling and Semantic Priming in Arabic Language

.


Introduction
The Event Related Potential (ERP) N400 wave is a negative deflection or component elicited by the brain as a reaction to semantically unexpected words in sentence contexts [1].The N400 component has been found in many languages, such as English, French, Mandarin Chinese..., but to our knowledge, only a few studies have examined the occurrence of an N400 in the Arabic language [2].
The shallow or deep character (i.e vowelled or unvowelled words respectively) of written depends on how its phonology is reflected by the orthography of the language.Indeed, Arabic written words are composed from consonant and long vowels in addition to diacritics.These last reflect the vowellisation of the written word in order to enable inferring specific pronunciation.Also, Arabic is characterized by a non-concatenative morphology whereby every surface form is analyzable into a consonantal root, that conveys semantic meaning, and a word pattern (made up of vowels and of a subset of consonants) conveying morphosyntactic and phonological information.
In the present experiment, we recorded and analyzed ERP, in particular the N400 component, while participants performed a semantic judgment task with Arabic words.As many neural signals, ERPs are very weak signals and strongly corrupted by noise.Thus, previous studies aimed at improving the quality of ERP signals using statistical methods [3,4], linear and nonlinear adaptive filtering [5], neural network based techniques [6] and wavelets denoisng techniques [7,8].
In the present study, our aim was to go one step further to improve the quality of ERP signals.We used discrete wavelet combined with principal component analysis (PCA) as nonlinear filtering tools.This allowed us to enhance the signal to noise ratio and thereby to highlight the N400 component.In addition, we used the Mexican hat function to achieve the time-scale analysis of the filtered ERPs in order to detect the N400 with more accuracy.

Experiment
In our experiment, which was approved by the Ethics Committee of the Mohammed V University, a total of 20 Master MSc and PhD students (10 women), aged between 20 and 34 years old, were tested after giving their written consent to participate in the experiment.They were all right-handed and without neurological disorders.They all use Arabic daily.
Each participant was comfortably seated in a Faradized room and was asked to silently read two words that were successively presented at the center of a computer screen.A total of 256 Arabic prime-target word pairs were used as stimuli with 128 pairs presented in the vowelled condition and 128 pairs presented in the unvowelled condition.For each condition, 64 pairs were semantically related and 64 pairs were semantically unrelated.For both vowelled and unvowelled pairs, two lists were constructed so that across lists, the same target word was paired once with a semantically related prime and once with a semantically unrelated prime [2].The order of presentation of the two lists was balanced across participants.
EEG data was continuously recorded using 24 electrodes (impedance < 5kΩ) mounted on an elastic head cap according the 10/20 International EEG System [9].The signals were amplified using SAI amplifiers (San Diego) and recorded at a sampling frequency of 250 Hz.The Electro-Occulogram (EOG) was recorded from an electrode placed under the right eye to detect eye blinks.Finally, two reference electrodes were placed on the left and right mastoids.The experiment was conducted in a Faraday's cage in order to reduce external interference [10].
The analyzed ERP signals correspond to the electrodes F3, F4, C3, C4, P3, P4, Fz, Cz and Pz.Previous results have shown that the N400 component is larger over centro-parietal regions of the right hemisphere than over frontal regions.
At the end of the experiment, three participants have been excluded because they present too many ocular and muscular artifacts contaminating the EEG signal.

Data analysis
Since the creation of wavelets, scientific and technical applications based on this mathematical tool have continued to be developed [11,12], exploiting their power and their efficiency for to perform multiresolution data analysis [13,14].In the present work, we used a filtering technique developed by AminGhafari M. et al., [15] in order to denoise multivariate signals.This method combines both univariate wavelet decomposition of the signal and the principal component analysis (PCA) of the resulting wavelet coefficients in order to evaluate the correlation structure of the noise.
According to AminGhafari M. et al., [15], the algorithm performs the filtering task in four main steps.First, for a matrix X (nxp) of p observed signals, we achieve the wavelet decomposition at a defined level K.This results into two matrices Dk and Ak that contain respectively the details and approximation coefficients up to the level K of the p signals.Second, using the matrix Dj of finest details, a minimum covariance estimation is calculated and used for the diagonalization of a robust estimate of the noise covariance matrix.The obtained diagonal matrix is then used for changing the basis at each level 1<i<K.In the third step, the matrix Dk undergoes a classical one-dimensional soft thresholding.We then apply the PCA to both the detail and approximation coefficients matrices in order to choose the appropriate number of useful principal components.The best number is automatically defined using the Kaiser distance criterion which retains components associated with eigenvalues higher than the mean of all eigenvalues.
From the simplified matrices D and A, inverting the wavelet transform provides a new matrix containing the filtered signals.These lasts correspond to the main features of the original matrix X.
To evaluate the performance of the different wavelets, we computed the structural similarity index (SSIM), which is usually used to evaluate image quality on the basis of its luminance, contrast and structure characteristics.It is based on the comparison of an image I with a reference image [21,22].It is known that the closer the SSIM value is to 1, the stronger the structural similarity between the evaluated image and the reference image is.By contrast, an SSIM value close to 0 indicates that there are no similarities between the 2 images [21].To use this metric, we considered the matrix containing the wavelet based filter's output signals as an image (30 matrices of 9 analyzed electrodes x 2200 samples for each participant) and we compared it to the matrix that contained the averaged trials for each electrode.
In addition, we computed the signal-to-noise ratio (SNR) via a MATLAB ® routine.This signal processing metric is commonly used to assess the performance of signal processing methods.It is often expressed in decibels as [23,24]: where Px and σ² denote the power density of the original and the noise signals respectively.
In the case of closer SNR values, we also used the mean square error (MSE), given by the following equation 2, as second metric to evaluate the accuracy of the chosen wavelet.The value of the MSE, the closer the filtered signal is to the original one and thus the better filtering method is [6,10,21,23].
where () and  ̂() denote the original and the filtered signals respectively.N is the length of x(n).
In the second step, we used the continuous wavelet transform (CWT) as an alternative method to the classical signal time representation.This is based on the idea that wavelet analysis can provide accurate and specific time-frequency decomposition of neurologic signals.This method has already been applied to EEG denoising [18,24,25], ERP component separation [26], spindle and spike detection [27,28,29], etc.It allowed an automatic processing of the signal and provided both qualitative and quantitative information.
The continuous wavelet transform converts a continuous signal into extremely redundant signal of dual continuous variables which are the translation and the scale.The resulting changed signal is easy to interpret and valuable for time-frequency or timescale analysis [30].In general, CWT of a signal s(t) is defined as: (, ) = ΔT is the sampling period.
For time-scale analysis, we represented the modulus of the CWT coefficients, which corresponds to the energy density of the analyzed signal, according to log2(α) (ordinate axis) and time (abscissas axis) [12].In this graphical representation, called scalogram, a color map is used to quantify the energy density of the transformed signal.The highest value of the energy corresponds to the white color whereas the lowest is represented by black color [31,32,33].
In our application, ψ is chosen to be the Mexican Hat (equation 4), which is the second-order derivative of the Gaussian function.This last is not a wavelet, but all its derivatives can be used as wavelets, particularly the first and the second derivatives [34,35].In practice, the Mexican Hat is expressed by the real function of equation ( 5) and represented by the figure 1 below.As illustrated by this figure, the Mexican Hat waveform looks like most of the waves that compose the ERPs (Figure 2).This was the reason to use it to perform the time-scale analysis of our data set.In addition, this function is easy to implement under the MATLAB environment.

Results and discussion
The averaged values of the performance metrics, SNR and MSE, corresponding to the application of the Aminghafari's algorithm with different DWT functions to our signals are presented in Table 1.As described in the Materials and Methods section, we recorded 30 ERP signals per electrode from each participant (9 electrodes and 20 participants).We processed 5, 10 and 15 trials to evaluate the efficiency of the filtering algorithm and thus to define the most accurate DWT function that allowed a good filtering of the ERPs.Results showed that the 10 th -order Daubachies wavelet (db10) yielded the best results in terms of accuracy of de-noising ERP signals.In fact, this DWT function shows the highest SNR values and the lowest MSE values in all test conditions.Importantly, and as presented in Table 2, db10 showed the highest values of the structural similarity index for all electrodes.In addition, and as illustrated on Figure 3, its application to filter the ERPs showed good improvement of the visual quality of waveforms' plots.
Visual comparison of these results to those obtained using the classical averaging method implemented in the EEGLab Toolbox showed that, for each electrode, the same visual quality of plots is obtained via our method and via EEGLab when averaging all recorded signals for SNR values of about 3.5.However, using our db10-PCA filtering method, only 5 to 10 ERP trials are sufficient to improve the SNR value and to highlight the occurrence of an N400 component.This can be explained by the fact that our filtering method is a nonlinear technique that takes into consideration some statistics of the signal and how the noise is affecting the signal.By contrast, the averaging method is a linear method that assumes that the noise is additive, white and Gaussian.
In order to still improve accuracy in N400 waveform detection, a time-scale representation of the ERP signals was performed on the basis of the Mexican hat CWT function.An example of results obtained for the 9 recorded electrodes is illustrated on Figure 4.   Visual reading of these scalograms revealed the presence of light-colored vertical bands for medium and small scales, depicting smaller changes throughout the ERP signal.In particular, highenergy concentrations are observed in the time range of the N400 component [360; 470msec] and for scales between 3 and 8. Importantly, these regions correspond to a maximum energy around the position of the N400 component in the temporal representation of the ERP signal.
Based on these results and similar ones obtained by processing the entire set of data available for this study, the maximum energy localization in time and scale allowed us to define a qualitative criterion to detect the N400 occurrence and position.Moreover, we have demonstrated in our previous scientific works, that the primed Arabic words elicit smaller N400 components than unprimed Arabic words [2].This result is confirmed by the use of the CWT technique.Indeed, for unprimed Arabic words, a high energy band is detected around 380 to 410 millisecond, that is, in a narrow range of about 30 milliseconds, whereas for primed words, this energy region is of very low intensity.

Conclusion
In this study we exploited the wavelets, as signal processing tools, to improve the quality of the ERP signals recorded during a semantic priming task in Arabic.We used the discrete wavelet transform to denoise the recorded signals.Moreover, the filtering procedure combined the DWT and the PCA methods to form a nonlinear filter that allowed us to improve the visual quality of ERP plots based on a few trials only.Importantly, using the SSIM, the SNR and MSE metrics, we demonstrated that 10th-order Daubachies wavelet of order 10, was the most efficient to improve the SNR and thus to reveal the occurrence of the N400 component.Finally, visual comparison with the results obtained using EEGLab tools, showed that our procedure allowed a clear improvement of the quality of the ERP plots.
In a second part, we used the continuous wavelet transform based on the Mexican Hat function to perform the time-scale analysis of the filtered ERPs.The resulting scalograms allowed us to define qualitative and quantitative criteria to detect the presence of the N400 component in the auditory and visual evoked signals.
The qualitative criterion consists of visual reading of the energy density representation whereas the quantitative criterion is based on defining with acute precision the maximum of the N400 energy within time and scale positions.In this case, we found that, for unprimed Arabic words, a high energy band is detected between 3 and 8 on scale axis and in the time range [380; 410msec].This last corresponds to the normal position of the N400 wave in the temporal representation of an ERP.In the case of primed Arabic words, a very low intensity energy region is present in the time range [360; 470msec].
β represent the scaling and the dilation factors respectively, whereas ψ* is the complex conjugate of the mother wavelet function.The scale α is associated to the wavelet's central frequency Fc and varies according to frequencies (in Hz) by:

Figure 1 .
Figure 1.Plot of real function of the Mexican Hat function.

Table 1 :
Averaged values of metrics obtained using 5, 10 and 15 EEG trials.

Table 2 :
Averaged values of structural similarity index metric.