Artificial Intelligence Approach for Target Classification: A State of the Art

Online: 09 August, 2020


Introduction
The desire to organize in order to simplify has progressively evolved towards the ambition of classifying to understand and, why not, to predict. This development has led to the release of several new techniques capable of satisfying this need. This is how artificial intelligence and the study of its different techniques has become a trend that attracts the interest of researchers in different fields.
Neural networks consist of artificial neurons or nodes that are analogous to biological neurons [1]- [3]. They are the result of an attempt to design a very simplified mathematical model of the human brain based on the way we learn and correct our mistakes. Machine learning allows us to obtain computers capable of selfimprovement through experience [4].
The latter is today one of the most developed technical fields, bringing together computer science and statistics, leading to artificial intelligence and data science development [5]. Due to the explosion on the amount of information available online and at low cost, machine learning has experienced tremendous progress resulting in the nonstop development of new algorithms [6]. The use of data-intensive machine learning methods is in all fields of technology and science [1], [5], [6].
Many companies and researches today claim to use artificial intelligence, when in fact the term does not apply to the technologies they use. In the same vein, there is more or less confusion between artificial intelligence and the concept of Machine Learning, without even mentioning Deep Learning. This paper will shed light on these different concepts by detailing each of them and then focuses on the different algorithms employed to extract features and classification of detected objects.
This work complements the various state of the art studies already carried out in AI domain [1]- [3] and is mainly interested in its contribution to road safety through radar target classification (pedestrians, cyclists…). For this, section 2 will represent a study on the classification of radar signals and targets without forgetting the tracking. Then on section 3 we will focus on artificial intelligence by detailing its different concepts in order to eliminate any confusion. We will also study the feature extractors and the algorithms of ASTESJ ISSN: 2415-6698 classification generally used in the overall classification process. A summary bringing together results of the research work in literature in this field will be carried out before concluding, thus opening the way too many perspectives

Radar Signal Classification
Classification can be defined as the search for a distribution of a set of elements into several categories [7]. Each category, called a class, groups together individuals who share similar characteristics. The objective is to obtain the most homogeneous and distinct classes possible. Identifying categories requires careful definition of a space in which the classification problem must be resolved [7,8]. Such a space is often represented by vectors of parameters, as shown in Figure3, extracted from the elements to be classified and the classification is carried out by adopting a probabilistic, discriminative, neuronal or even stochastic approach.
The method of classification of radar signals is presented in Figure 4 that contains the neural network procedures. Their continuous development and improvement have made it possible to clearly understand the potential and the limits of this technique in several fields. Among these, remote sensing, signal processing, identification and characterization of targets [1,9].
Within these modes of classification, we find; the cluster method that seeks to construct a partition of a data set so that the data from the same group exhibit common properties or characteristics. It distinguishes them from the data contained in the other groups [10,11]. As such, clustering (or regrouping) is a subject of research in learning stemming from a more general problem, namely classification. A distinction is made between supervised and unsupervised classification. In the first case, it is a question of learning to classify a new individual among a set of predefined classes, from training data (couples (individual, class)). Derived from statistics, and more specifically from Data Analysis (ADD), unsupervised classification, as its name suggests, consists of learning without a supervisor. From a population, it is a question of extracting classes or groups of individuals with common characteristics, the number and definition of classes not being given a priori. Clustering methods are used in a lot of domains of application ranging from biology (classification of proteins or genome sequences), to document analysis (texts, images, videos) or in the context of analysis of traces of use. Many unsupervised classification methods have been published in the literature and it is therefore hard to give an exhaustive list, despite the numerous articles published attempting to structure this very rich and constantly evolving field for more than 40 years [12][13][14][15][16][17][18]. Entropy of similarity is a method from Shannon's entropy. The latter named "entropy" his definition of the amount of information. We will therefore talk indifferently in Shannon about the quantity of data generated from a message or the entropy of this message source. The classification method based on entropy will then make it possible to answer several questions which were encountered in all sociological or scientific surveys, namely: the measurement of the correlation between the characters and their selectivity, the homogeneity of the groups as well as problems of formation of new homogeneous classes [19].
Entropy methods are dedicated to the analysis of irregular and complex signals [20]. The support vector machine [7,8], or large-margin separators, represents a set of supervised learning techniques intended to deal with the issue of discrimination and regression. Support vector machines represents an extension of linear classifiers. They were used because they are capable to work with big amount of information, the low number of hyperparameters, their performance and fiability. SVM have been used in many fields [21][22][23].
According to the data, the performance of support vector machines is of the same order, or even better, then that of a neural network or of a Gaussian mixture model. Also, the timescale characteristics [24], the modulation domain [25], basic function neural networks [26], Rihaczek distribution and Hough transform [27], which is a pattern recognition technique invented in 1959 by Paul Hough, subject to a patent, and used in the processing of digital images. The simplest application can detect lines present in an image, but modifications can be made to this technique to detect other geometric shapes; it is the generalized Hough transform developed by Richard Duda and Peter Hart in 1972 [28][29][30], frequency estimation [28], pulse repetition interval [31], twodimensional bispectrum [32], etc.
These methods of classification represent research in several disciplines [22,30,[33][34][35]. To allow proper operation in complex signal environments with many radar transmitters, signal classification should be able to handle not determined, corrupt, and equivocal measurements reliably.
For radar classification of vehicle type and determination of speed profitably by calculation, Cho and Tseng [35] created an improved algorithm that will help smart transport applications in real time containing eight types of mode setting categorization of radar signals.
In general, during a transmission the scattered waves depend on the distance to the target, so to measure these waves we need to measure this distance. In the literature, the scattered waves received are processed by algorithms to detect the presence, the distance and the type of the target [36][37][38][39][40]. Among the target classification methods are the AALF, AALP and ABP method [24].
Tracking is a very important element in this process. There are different tracking methods summarized in Figure 5 and Figure  6.

Introduction to Artificial Intelligence
Artificial intelligence (AI) has come to the fore in recent years. It is used in several applications for various disciplines [41][42][43][44][45][46][47][48][49][50][51][52][53][54]. Artificial Intelligence (AI) as we know it is weak Artificial Intelligence, as opposed to strong AI, which does not exist yet. Today, machines are capable of reproducing human behavior, but without conscience. Later, their capacities could grow to the point of turning into machines endowed with consciousness and sensitivity.
AI has evolved a lot thanks to the emergence of Cloud Computing and Big Data, which is an inexpensive computing power that gives accessibility to a large amount of data. Thus, the machines are no longer programmed; they learn instead [53,54].
The following subsections aim to highlight machine learning, deep learning and extreme machine learning respectively, in order to eliminate any confusion between these concepts.

Machine Learning
Machine Learning is a sub-branch of artificial intelligence, which consists of creating algorithms capable of improving automatically with experience [45][46][47][48][49][50]. We also speak in this case of self-learning systems. Machine Learning, or automatic learning, is capable of reproducing a behavior thanks to algorithms, themselves fed by a lot of information. In front of a lot of circumstances, the algorithm learns which behavior to follow and decision to take creating a model. The machine makes the tasks automate depending on the situation [54][55][56][57][58][59][60][61][62].   There are three main types of Machine Learning [60]- [66] represented in figure 7: In supervised learning, the algorithms are based on already categorized datasets, in order to understand the criteria used for classification and reproducing them [67]- [70]. In unsupervised learning, algorithms are trained from raw data, from which they try to extract patterns [71]- [76]. Finally, in reinforcement learning, the algorithm functions as an autonomous agent, which observes its environment and learns as it interacts with it [73], [75].
Machine learning is a broad field, which includes many algorithms. Among the most famous are regressions (linear, multivariate, polynomial, regularized, logistic, etc.); these are curves that approximate the data. Naïve Bayes' algorithm; which gives the probability of the prediction, in knowledge of previous events [77][78][79][80][81][82]. Clustering is always using mathematics; we will group the data into packets so that in each packet the data is as close as possible to each other [83][84][85]. It is used in particular in recommendations for films close to the films we have already seen! The decision trees in which we arrive at a result (with a probability score) by answering a certain number of questions and following the branches of the tree carrying these answers [86,87].
There are also more sophisticated algorithms based on several statistical techniques such as the Random Forest (a forest of voting decision trees), Gradient Boosting, Support Vector Machine [21], [23] The learning techniques showed in Figure7 with (*) have emerged recently with their use mainly limited to object recognition, including the classification of radar targets in urban areas.
Machine learning is a very important approach for classification. For instance, the classification of a single target e.g. a pedestrian or a cyclist is relatively simple because the micro-Doppler signatures of the pedestrian and the cyclist are different, but the problem arises when classifying overlapping targets e.g. pedestrians and cyclists. The classification here is much more difficult, which requires the intervention of deep learning techniques to deal with this issue.

Deep Learning
Deep Learning is itself a sub-domain of ML, in which we develop algorithms capable of recognizing abstract concepts, like a young child who is taught to distinguish a dog from a horse [88]. Deep Learning aim to understand concepts in a more précised way. This can be achieved by pinpointing the data through nonlinear understanding. Its function is similar to the human brain [89][90][91][92]. In a neural network, successive layers of data are combined to learn the concepts. The simplest networks have only two layers: an input and an output, knowing that each one can have several hundreds, thousands, even millions of neurons. The more they increase, the more the network's capacity to learn abstract representations increasingly develops [91]. Among the most used deep learning algorithms, we have: • Artificial neural networks (ANN): these are the simplest and are often used in addition because they sort information well • Convolutional neural networks (CNN): Applies apply filters to the information collected in order to have new data (for example, bringing out the contours in an image can help to find where is the face) • Recurrent neural networks (RNN): the best known are LSTM, which have the ability to retain information and reuse it soon after. They are used for text analysis (NLP), since each word depends on the previous few words (so that the grammar is correct) As well as more advanced versions, such as auto-encoders [93], Boltzmann machines, self-organizing maps (SOM), etc. Figure8 shows the key algorithms of deep learning and the research fields that are interested in it.

Extreme learning machine
Extreme Learning Machine (ELM) is usually used for pattern classification. It can be considered as an algorithm for direct overshadow single layer neural networks. The latter mastery the slow training speed and over-adjustment complications compared to the conventional neural network learning algorithm. ELM is based on the empirical theory of risk inferiorization. The learning process of it requires only one iteration. The multiple iterations and local minimization are avoided by the algorithm. We can find ELM useful in multiple fields and applications thanks to its robustness, controllability, good generalization capacity and its fast learning rate.
The researchers proposed modifications to the algorithms, to improve ELM. [93][94][95][96][97][98] proposes the fully complex ELM (C-ELM). The latter extends the ELM algorithm from the real domain to the complex domain. Given the significant time consumed by the update procedure using the old data with the new information received, an online sequential ELM (OS-ELM) is proposed in [94][95][96][97][98][99], which can learn the training data one by one or block by block and discard the data for which the training has already been carried out. A new adaptive set model of ELM (Ada-ELM) is proposed in [95,100] and allows better prediction performances to be obtained. It can automatically adjust the overall weights. ELM performance is affected by hidden layer nodes. These are difficult to determine, the incremental ELM (I_ELM) [96,101], the pruned ELM (P_ELM) [97,102] and the self-adaptive ELM (SaELM) [98,103] have been proposed in other works ELM achieves good results and shortens training times that takes several days in deep learning to a few minutes by ELM.
It is difficult to achieve such performance by conventional learning techniques. Example of datasets are showed in Table 1.
Artificial intelligence is widely used for the classification of radar targets. The following section will focus on the classification procedure and the different feature extractors and classifiers used in machine learning. The most famous and most used algorithms in our field (road safety) will be mentioned as well.

Artificial Intelligence for Radar Target's Identification and Classification
There is a lot of characteristics for target identification using micro Doppler signature and ML algorithms that have been studied in other research and interesting results have been presented [47,48]. Many public datasets for target classification are introduced in [104][105][106][107][108][109][110][111].
It lays the groundwork for disentangling data into independent components [94]. PCA ignores the less important components [98]. We can use SVD in order to find PCA by truncating the less important base vectors in the original SVD matrix.
Most of the research done is on supervised learning, but very little of it uses unsupervised machine learning. The latter turns out to be one of the latest trends and added value in recent work [8,49] and this based on sparse coding.

Feature Extraction based on sparse coding (sparse)
Sparse coding is a technique based on the study of algorithms aimed at learning a sparse / sparse useful representation of all data [112][113][114][115][116][117][118][119]. The next step consists in encoding so that each data will be in the form of sparse code. The algorithm uses information from the input to learn the sparse representation. This can be applied clearly to any type of information. We call this unsupervised learning. It will find the representation without losing any part or aspect of the data [119,120]. To do this, two main constraints try to be satisfied using sparse coding algorithms. Figure 9 describes them: Given a number of dimensions, sparse coding tries to learn a basis that is too complete to represent data effectively. To do this, we must first have provided enough dimensions to learn this overly comprehensive basis [121].
In reality, we just give more than the number of dimensions in which the original data is encoded or sometimes the same amount. Figure10 describes the target identification system based on sparse coding. Radar data in the time domain must be processed before sparse coding. This is done using the short-term Fourier transform (STFT), which is a method of time-frequency analysis used for micro-Doppler signatures by other researchers. We use the Hamming window to extract micro-Doppler signatures from targets. After the STFT, comes the step of the complete dictionary construction, for this each spectrogram must be reshaped into a matrix. The micro Doppler signature is converted into a matrix with the concatenation of the two parts (real and imaginary), whose dimension is 2U x M. We generate the dataset for training using N given samples. We then gather the data received from the targets from all the angles. The dimension of the data set for each angle X ^ k is 2U x MN.
The resulting data set that we have trained is made of signals from the types of targets combined (pedestrians and cyclists for example). D and W (matrix coefficient) are deducted from the information of training in the optimization equation (1) Reading equation (1) from the left to the right, the last term represents the error of reconstruction between the original data and its representation based on the dictionary D. A better approximation compared to the original data can be made using the minimization of this term. The work here is to play with D and W both at the same time, adjusting them to solve the equation.
Among the methods adopted for learning the complete dictionary, we have the K-SVD method. We will first search for W without touching D then on the second iterations we will search for both D and W while keeping the non-zero elements in W intact and fixed. Figure 11: Reduction of sparse coding characteristics (for pedestrians and cyclists' case) [8] SVD ensures the normalization of atoms in the dictionary to each other. K-SVD takes over between the information coding with the existing dictionary and the regular updating of the dictionary in order to obtain a better fit.
The functionality extraction based on a sparse representation can be obtained once the D dictionary has been created. At this stage, the set of sparse matrices can be directly used as classification characteristics. It should be noted that the functionality dimensions are still very important and requires reduction. For this, some numerical characteristics, such as the mean value and the standard deviation, can be calculated from the sparse matrix.
We illustrate in Figure 11 an example of the sparse coding functionalities reduced with an angle of 30 ° (this is the case of a pedestrian and a cyclist).
Numerical characteristics can be used in the classification, even if some information are rejected by the calculation. The sparse matrix and its reduced characteristics are used in [8] to carry out the classification. Five numerical characteristics of sparse matrices are used for the classification, namely: mean values, standard deviations, maximum values, minimum values, and difference between the maximum and minimum values.

SVD-based feature extraction
SVD makes it possible to build an empirical model, without an underlying theory, all the more precise when terms are injected into it [95]. The effectiveness of the method depends in particular on the way in which the information is presented to it. We can describe the SVD decomposition with the equation (2).

=
(2) S is a diagonal matrix of singular values. We note that the components of S represent only scaling factors and therefore don't have any information of the spectrogram. The matrices U and V contains singular vectors of F in the two directions (left and right). It represents the information of both the time and the Doppler domain of an MD signature. We can use the singular vector as a characteristic for categorization.

Robust principal component analysis RPCA
A betterment of the MFP using PCA and minimum covariance determinant estimator is discussed and presented in several works and the method is described with the equation (4)
The comparison of recognition rates of the frequency FM using many discriminant analyzes (LDA) and support vector machine (SVM) are presented in Figure 12 that suggests that the support vector machine approach is a method efficient classification of radar signals with an elevated recognition percentage. SVM has the maximum failure rate (≤97%) and it is lower for LDA (≤ 94%) and tends to change.
Those categorization procedures are used to given processed products based whether the cross-validation or leave-a-out method. The appraise and estimation of the performance of every classifier is done using independent tests of the learning set, thus minimizing the generalization inaccuracy.
In [7], the author used supervise learning after classifying specific objects. Two main phases known as the training phase and the test phase are in this process. Supervised regression methods are also generally used to approximate and guess the mapping between the directions of movement and the micro-Doppler signatures of the targets, among which we find the support vector regression (SVR) and the multilayer perceptron (MLP) used in [7]. The regression training data set is useful to approximate the correspondence between the vectors and the direction of motion. The regression model is a function map described in figure 13. Micro-Doppler signatures of complex surface targets with moving sections are used to target the approximation of the direction of movement. Then supervised regression algorithms are applied as a solution to the problem of estimating the direction of movement [133]- [147].
The next section presents a comparison between the methods used and presents the advantages of one method compared to the others. It highlights the usefulness of the algorithms according to the desired application. The challenge here is to choose the method to be used according to the constraints presented by the studied system. The section also discusses the prospects for proposing a new procedure combining the advantages of the methods presented previously and adaptable to several uses.

Open Issues and Challenges
Learning approaches can be categorized into supervised learning and unsupervised learning. On the first one, the classifier will be designed by exploiting information already known because the training data set is previously available. This is not the case on the second one because the training information known as class labels, do not exist. We then use a group of characteristic vectors that we devise into another group of subsets called clusters. The information with similar features are portioned between subsets equally. There is an increasing convergence towards the use of unsupervised ML using non-structured input information in several areas such as traffic engineering, definition of network anomalies, categorization of objects and optimization of road traffic and many others. Table 2 and Table 3 shows a comparison between methods. On one hand, the results in [7] [8] show that classification with characteristics based on sparse coding makes it possible to obtain the highest precision (> 96%) and that the SVD and RPCA methods are very efficient. On the other hand, the computation time for the procedure using the sparse coding functionality extractor and the support vector machine classifier is too long despite that it offers the best classification performance. In addition, when the parameters dimensions are minor, the total time for SVM and ELM is alike. As the size of the functionality increases, identification via ELM is faster than SVM.
We can also add that a good motion direction estimate (with an error beneath 5•) can be obtained based on the SVR-based method. The approximation conduct improves for the directions of movement towards the radar and reduce for the angles of movement at right angles to the radar sight (for a radar targets detection example).
This allows us to deduce that certain methods can be effective for certain applications and not for others. The choice of which method to use will then depend heavily on the application itself. It also prompts us to question the possibility of having a high-performance method at all levels and for all possible applications, including road safety [148][149][150][151][152]. The current trend is converging towards the development of new methods combining the properties of old algorithms and the expectations of new applications and adaptable to different uses.

Conclusion
The main goal of classification approach is to group the information into the adequate category based on common features. The classification makes it possible to determine the data including the unknown affiliate type or group. Classification methods can be known as approaches generating non-identical outcome. Artificial intelligence and its various techniques and algorithms help solving those issues. They are the subject of several research studies, including in the field of road safety. This paper is a state of the art of targets classification and the contribution of machine learning technologies in it. The study and comparison of the different extraction methods and classification algorithms allowed us to deduce that efficient algorithms may not be as efficient for some applications. It strongly depends on the desired application and its functionalities. The different deductions open up perspectives on the development of new approaches, adaptable to any kind of application with optimal algorithms in terms of calculation time and processing effectiveness.