Racial Categorization Methods: A Survey

Article history: Received: 30 January, 2020 Accepted: 25 May, 2020 Online: 11 June, 2020 Face explicitly provides the direct and quick way to evaluate human soft biometric information such as race, age and gender. Race is a group of human beings who differ from human beings of other races with respect to physical or social attributes. Race identification plays a significant role in applications such as criminal judgment and forensic art, human computer interface, and psychology science-based applications as it provides crucial information about the person. However, categorizing a person into respective race category is a challenging task because human faces comprise of complex and uncertain facial features. Several racial categorization methods are available in literature to identify race groups of humans. In this paper, we present a comprehensive and comparative review of these racial categorization methods. Our review covers survey of the important concepts, comparative analysis of single model as well as multi model racial categorization methods, applications, and challenges in racial categorization. Our review provides state-of-the-art technical information concerning racial categorization and hence, will be useful to the research community for development of efficient and robust racial categorization methods.


Introduction
Human face expresses social information that is highly useful in automated systems. It provides soft biometric information of human such as race, gender, age, identity and emotions [1][2][3][4][5][6][7]. This information is significant in interdisciplinary research areas such as psychology science, computer vision science, neuroscience, anthropological science as well as in the social security and forensic art department. Amongst the various types of biometric information, race information is crucial and is required for a wide range of applications. Race is a group of human beings differentiated based on physical or social attributes. Race conveys social and cultural traits of different communities. Facial features such as eyes, eyebrow, ear, nose, cheek, mouth, chin, forehead area and jaw differ from human to human and are highly dependent on racial category [8][9][10][11][12][13][14]. Figure 1 shows the difference in facial features of different racial groups.
Race analysis is essential in contemporary applications such as criminal judgment and forensic art [15][16][17][18][19][20], aesthetic surgery [21], healthcare [22][23][24][25][26], medico legal [27][28][29], video security surveillance and public safety [30], human computer interface [31][32][33] and face recognition [34]. In such applications, race analysis is required for identification of individuals. Several racial categorization methods have been proposed in literature. They are either single model racial categorization methods or multi model racial categorization methods. Single model racial categorization method uses facial features to recognize race [35][36][37][38][39][40][41][42][43][44][45]. Conversely, multi model racial categorization method considers fusion of physical characteristics such as gait pattern and audio clues in addition to facial features [5,7,[46][47][48][49][50]. The majority of the practical applications involve single model racial categorization because facial data is available in large quantities compared to gait pattern and audio clues. However, there are applications that consider gait pattern and audio cues in absence of facial image. The single model racial categorization methods mainly differ from each other with respect to classification approach such as Support Vector Machine (SVM) [51][52][53][54], Convolutional Neural Network (CNN) [38,44,[55][56][57][58][59], Artificial Neural Network (ANN) [60], Local Binary Pattern (LBP) [61] and Local Circular Pattern (LCP) [62]. In literature, participants based racial categorization methods such as diffusion model [63] and implicit racial attitude [64] are also available. The multi model racial categorization methods involve classification approaches such as SVM [65][66][67], logistic regression [66], Adaboost [66], random forest [66], CNN [68] and Haar-LBP histogram [69]. In this paper, we present a thorough and extensive study of various racial categorization methods. First, we present a taxonomy of available racial categorization methods. Then we describe several single model and multi model racial categorization methods. Based on our study, we identify several parameters to evaluate them. Subsequently, we present parametric evaluation of single model racial categorization methods and multi model racial categorization methods separately based on identified parameters. Next, we illustrate applications of racial categorization and future research direction in the field of racial categorization. Our comprehensive and comparative survey will serve as a catalogue to researchers in this area.
The rest of the paper is structured as follows: In section 2, we describe the taxonomy of racial categorization methods and features considered by different racial categorization methods. In section 3, we illustrate various single model racial categorization methods and their parametric evaluation. In section 4, we illustrate various multi model racial categorization methods and their parametric evaluation. Section 5 describes the major applications of racial categorization. In section 6, we list key challenges in the field of racial categorization. Finally, section 7 specifies conclusion and feature scope in the field of racial categorization.

Classification of Racial Categorization Methods
Racial categorization methods are broadly categorized into two categories: single model and multi model. As shown in Figure  2, the features used by single model and multi model methods to classify humans into features or local discriminative region based features or combination of both. Amongst the discriminative region based features, iris texture, periocular region or/and holistic face are used for racial categorization [70][71][72][73][74][75][76][77]. Multi model racial categorization takes into consideration face features, gait pattern and audio cues to classify humans into various race categories. Gait pattern is also useful to recognize the biometric information of humans [6,[78][79][80].
At this juncture, we clarify that at the top of all categorizations, a human is mainly divided into two categories, namely race and ethnicity, on the basis of his/her physical appearance and social appearance respectively. However, some researchers use words race and ethnicity interchangeably [24,35,81].  Figure 3 shows the facial features such as periocular region, anthropometry distance, silhouette, iris texture, skin tone and holistic face. Figure 3: Facial features considered by single model racial categorization [30] Skin tone differs mainly due to geographical location of humans. African-American, South-Asian, East-Asian, Caucasian, Indian and Arabian have different skin tones. Skin tone plays a minor role in identification of racial groups because skin color may also differ due to varying lighting conditions during the image capturing process [56,64].

Features Considered by Single Model Racial Categorization
Like fingerprints, iris texture is a significant biometric characteristic of humans because it is unique for every human being [82]. It is highly useful for racial categorization because different race groups such as American, Indian and so on have different iris texture [59,[74][75][76][82][83][84]. The key limitation of this feature is that it cannot be considered if race is to be identified from video because video may be of low quality and hence, may not give precise iris texture information [85][86].
Periocular region is defined as the region surrounded by eye. It is a region that overlays eyebrows, eyelid, eyelash and canthus [52]. It gives rich texture and biometric information as compared to iris texture [30]. Some facial features get influenced due to different facial poses and expressions. However, the periocular region does not get affected due to facial poses and expressions. Hence, it is considered as the most reliable feature for racial categorization [52,62,77].
Holistic face provides the texture information of various facial features such as eyes, nose, mouth, cheek, chin, skin color and jaw line [43,51,63,[87][88][89]. Extra frontal face features such as hairline and hair color in combination with cropped aligned face features ease racial categorization process [54].

Features Considered by Multi Model Racial Categorization
Multi model racial categorization improves accuracy of racial categorization via fusion of facial features with other human features such as gait pattern and audio cues [90] (Figure 4). Below we discuss the features considered by multi model racial categorization.
Gait pattern, also known as the walking pattern, is a prominent biometric feature that varies from human to human and is used for identification of a person [91][92]. Advanced racial categorization methods use gait pattern fused with facial features for overall effective racial categorization [92][93][94][95][96][97]. For videos in which humans at near to moderate distance have been captured, facial features are sufficient to identify the race. However, for videos in which humans at far distance have been captured, gait pattern is highly useful to identify the race of human because facial features of humans at far distance are not clearly visible. Thus, fusion of gait pattern with facial features improves overall racial categorization accuracy [65]. Audio pattern differs from race to race [98]. It is useful to identify race in case a video sequence or image of a person is not available. For instance, it is useful to identify race from phone calls.

Single Model Racial Categorization Methods
Race depends on physical and social characteristics of humans. As geographic distance increases, variation in facial features of inter-races become visible. As facial data is easily available compared to gait pattern, the majority of the racial categorization applications use a single model racial categorization method. Moreover, it has been revealed in literature that facial features are more prominent for race categorization [99][100]. Below we discuss various single model racial categorization methods available in literature.

Multi Ethnical Categorization using Manifold Learning
In [51], authors have proposed a method for intra-racial categorization based on facial landmarking. This method classifies eight intra-races residing in China based on facial landmarking concept. It includes Active Shape Model (ASM) to locate 77 facial landmarks. The landmarks are used to calculate three types of geometric facial features: distance, angle, and ratio. These features are provided to different classifiers such as Bayesian Net, Naive Bayesian, SMO, J4.8, RBF Network and LibSVM to identify the race category. The dimensionality reduction process carried out by manifold learning approach is useful to reduce the complexity. Though this method is efficient, it is not useful to identify race from a person's profile face images.

GWT and Ratina Sampling based Ethnicity Categorization
Multiclass SVM based ethnicity categorization method is proposed in [52]. Figure 5 shows the key steps involved in this method.
As shown in figure, first image is normalized via applying rotation operation and changing resolution. The resolution of the image is changed in such a manner that it maintains distance of 28 pixels between two inner corners of eyes. Subsequently, eye and mouth facial features are extracted by fusion of Gabor

Gait pattern Audio Cues
Wavelet Transfer (GWT) and retina sampling for efficient categorization. GWT is used to extract accurate orientation and frequency of facial features. Retina sampling method is used to set facial feature points. The features are fed to multiclass SVM classifier for ethnicity identification. Typically eye is considered as a most prominent feature for racial categorization due to its pose invariant characteristic. On the other hand, uncertainty is introduced by mouth region due to its pose variant characteristic. The disadvantage of this method is that GWT provides erroneous features in case of hollow around the eyes. Moreover, Gabor features reflect error due to variation in frequency of eyelashes.

Real Time Racial Categorization
A new method for racial categorization by fusion of Principal Component Analysis (PCA) and Independent Component Analysis (ICA) is introduced in [53]. It consists of major two steps: feature extraction and classification. During the feature extraction step, facial features are extracted using PCA. Subsequently, ICA is used to map and generate new facial features from facial features generated by PCA. New facial features are more suitable for efficient racial categorization. During the classification step, SVM classifier is applied in conjunction with '321' algorithm to classify races. '321'algorithm is inspired by the bootstrap approach for real time racial categorization from video streams. The categorization accuracy of this method can further be enhanced by including pre-processing step to diminish noise from the image and for face alignment.

Binary Tree based SVM for Ethnicity Detection
In this method, fusions of texture and shape facial features have been considered for better ethnicity categorization [54]. Figure 6 shows the functioning of this method. The first step preprocessing involves the operations such as image resize, image enhancement and image conversion. Then texture features are extracted using Gabor filter and shape features are extracted using Histogram Oriented Gradient (HOG). Subsequently, texture features and shape features are fused together. The fused feature vector is large and it requires more computational time. Hence, Kernel Principle Component Analysis (KPCA) algorithm is applied to reduce dimensionality and complexity. Fused facial features are given as input to binary tree based SVM for ethnicity detection.

Racial Categorization using CNN
In [55], a hybrid supervised deep learning based racial categorization method has been proposed. It uses VGG 16 convolution neural network for facial feature extraction and categorization. 224 X 224 face image is given as an input to VGG-16 network for race prediction. Any CNN requires millions of images for training from scratch, which is critical a situation for the medical domain. Hence, to overcome an issue of small dataset, authors have used hybrid approach via fusing VGG 16 with image ranking engine to improve race prediction. It has been shown that image ranking engines work efficiently with CNN based classifiers even for small dataset. The fused feature information extracted by CNN and image ranking engine is used by SVM to learn racial class labels. This hybrid method provides better categorization accuracy.

Neural Network based Racial Categorization
In [56], skin color, forehead area, sobel edge and geometric features are fused for efficient race estimation. Authors have proposed two methods: 1) using Artificial Neural Network (ANN) and 2) using convolution neural network. The steps involved in racial categorization using ANN are shown in Figure 7.
CNN based racial categorization method uses pre-trained VGGNet for racial categorization. It has been observed by authors that CNN based method gives more accurate racial prediction for the given image as compared to ANN based method.

Local Circular Pattern for Race Identification
Local circular pattern for race identification method works on texture and shape features extracted from 2D face and 3D face respectively. A local circular pattern is an advanced version of a local binary pattern produced for feature extraction. LCP improves the widely utilized LBP and its variants by replacing binary quantization with clustering approach. As compared to LBP, LCP provides higher accuracy even for noisy data. Moreover, AdaBoost algorithm is used for selection of better features and thereby to improve the categorization accuracy. Experimental results have revealed that this method is time efficient and memory efficient.

Biometric based Machine Learning Method
In [62], authors have proposed a method that focuses on eye region features for racial categorization. The method comprises of five major steps shown in Figure 8. First facial coordinates are located using the DLib library. Subsequently, the region of interest (eye region) is extracted. Then features extracted by LBP and HOG are integrated for efficient racial categorization. LBP and HOG both are individually useful to extract features for categorization. However, fusion of LBP features with HOG features gives higher accuracy compared to other feature fusion approaches. The performance is tested utilizing different classifiers such as SVM, Multi-Layer Perceptron (MLP) and Quadratic Discriminant Analysis (QDA).

Diffusion Model and Implicit Racial Attitude for Racial Group Identification
In [63][64]101], manual racial categorization method is proposed. Race is identified by performing several tasks with participants. Diffusion model is used to identify response time boundaries of different participants. This method takes into consideration the visualization of participants for their own race and other races. Skin color is a less effective feature for automated racial categorization methods due to lightning conditions. However, it is a prominent feature for manual race prediction.

Performance Evaluation of Single Model Racial Categorization Methods
The above discussed race categorization methods are automated except the last one which is manual race categorization method. With increasing technology, manual race categorization is less effective and less useful as compared to automated racial categorization. Amongst the various automated racial categorization methods, CNN based methods produce more accurate results [55][56] as it considers deep facial features for racial categorization. We observed that the intra-race categorization is not much focused by the researchers in their study.
Based on our study on aforementioned single model racial categorization methods, we have identified the following parameters to compare them: dataset used, racial/ethnic class considered, region of interest, feature extraction operator/s and classifiers used. Dataset refers to the source of data. It is either available online in the form of a standard dataset or it is selfgenerated.  Table 1 presents the assessment of aforementioned single model racial categorization methods based these identified parameters.

Multi Model Racial Categorization Methods
Multi model racial categorization is highly useful when we do not have human's facial image information. It has been shown in literature that gait pattern and audio cues are amongst the prominent features for biometric information identification. Hence, multi model racial categorization methods use gait pattern, audio cues or fusion of facial features with gait pattern/audio cures to identify race. However, a smaller number of multi model racial categorization methods are available in literature because race data that includes gait pattern or audio cues is not available easily.

Multi-view Fused Gait based Ethnicity Classification
In [102], authors have proposed a method that identifies ethnicity from seven gait patterns captured from seven different  Manually angles. Figure 9 shows the key steps involved in the ethnicity identification process. First, all seven gait patterns are converted into corresponding Gait Energy Image (GEI). Next, seven GEIs are fused using three different fusion methods: score fusion, feature fusion and decision fusion. The goal of using three fusion methods is to accurately identify the ethnicity (race) of the person. Subsequently, features are extracted from the fused image using Multi-linear Principal Component Analysis (MPCA) and fed to classifier for ethnicity classification.

Hierarchical Fusion for Ethnicity Identification
In this method [65], gait pattern and facial features are fused for better ethnicity categorization. Figure 10 shows the two level processing involved in this method. First level involves gait pattern evolution. It takes gait video as an input. It includes the intermediate steps such as gait cycle estimation and GEI generation. First level also includes SVM for classification. Second level takes face video as an input. It comprises of three  Figure 11: Block diagram of ethnicity classification system [67] major steps: frame extraction from video, face detection and feature extraction using Gabor filter. Features extracted in first and second level are fused together to get accurate classification. The fused features are given to SVM and then to Adaboost to identify ethnicity.

Dialogue based Biometric Information Classification
In [66], a method for biometric information classification and deception detection has been proposed. It identifies gender, personality and ethnicity from the audio (dialogue). Lexical and acoustic-prosodic features are extracted from the dialogue. Lexical features are extracted using Linguistic Inquiry and Word Count (LIWC). Acoustic-prosodic features are extracted using Praat. Both types of features are given to different machine learning classifiers such as SVM, logistic regression, Adaboost and random forest for classification.

Cross-Model Biometric Matching
A new method for cross biometric matching by fusion of voice and facial image is introduced in [68]. The cross model is used for inferring the two types of information: 1) voice from human face and 2) human face from voice. This method involves two key steps: feature extraction and classification. The features are extracted from image as well as voice. The extracted features are fused and given as input to CNN for biometric matching and classification.

Gait and Face Fusion for Ethnicity Classification
Ethnicity classification system is proposed in [67]. It considers fusion of facial features and gait pattern for better classification. As shown in Figure 11, inputs to this system are gait video and facial video. Both videos are processed in parallel. During gait video processing, first background is subtracted from the video and subsequently each gait cycle pattern is estimated. Next all gait cycle patterns are represented using spatio-temporal representation for gait pattern characterization. During face video processing, frames are extracted from the face video and subsequently the facial part is cropped from the face image. Facial features are extracted from each frame using LBP. Features extracted from gait and face videos are fused together using Canonical Correlation Analysis (CCA). Fused features are given as input to SVM for ethnicity identification.

Performance Evaluation of Multi Model Racial Categorization Methods
As illustrated in Table 2, we identified the same set of parameters for comparison of multi model racial categorization methods as we identified for single model racial categorization methods. As defined and discussed previously in section 3, they are dataset used, biometric information considered for classification, region of interest, feature extraction operator/s and classifiers used in the method.  Self-Generated Ethnicity Gait Pattern and Face LBP SVM Figure 12: Application areas of racial categorization

Application of Racial Categorization
Racial categorization has a high impact on our social life. Race defines common physical characteristics of humans to represent his existence. Physical characteristics of humans of different races differ from each other. Racial categorization is significant for several applications. As shown in Figure 12, the major application areas are video security surveillance and public safety, criminal judgment and forensic art, medico legal, healthcare, aesthetic surgery, face recognition, and human computer interface.

Video Security Surveillance and Public Safety
Race identification from the subject's face plays a crucial role in video security surveillance. Video security surveillance system assists in identifying criminals by comparing the detected subject's image with the existing criminal database. Automated race identification system fused with video security surveillance system provides quick information about the subject [53]. Such a fused system is already in use at several airports and public places. Moreover, it has been proven useful for applications such as maritime, aviation, mass transformation, government office building, recreational centers, stadium and large retail malls.

Criminal Judgment and Forensic Art
Crime related investigation requires crucial information related to criminals including cross-country evidence (if any) [103][104][105][106][107][108]. Race/ethnicity of criminals provides such crucial information. Face is typically considered for criminal investigation because face conveys important information. Particularly, it conveys age, race and gender that are needed for criminal investigation. This information makes the investigation process easy for the government to find the right criminal [ [109][110][111]. Moreover, such information is useful to prevent innocent people and provides justice to minority community groups.
Normally the forensic department has the subject's image captured using a public camera. However, it is difficult for the forensic department to manually extract the crucial information from the image. Conversely, racial categorization method can be used to identify the race from the image which assists in further investigation targeting a particular race community [106].

Medico Legal
Medico legal case is defined as a case of suffering or injury in which examination by the police is essential to determine the cause of suffering or injury. Suffering or injury may be due to several unnatural conditions such as accidents, burning and death. Race provides patient's information that is useful to law enforcing agency for further investigation [112]. By evaluating the race information of the medico legal case, law enforcing agency can obtain history of medico legal cases in that particular racial group [113][114][115]. Such information eases the investigation process and assists medico legal department in decision making.

Healthcare
Disease and healthcare issues are conflicting for different geographical areas due to their weather conditions, living sense and food. Healthcare treatment differs for different racial groups [116]. Thus, racial categorization is useful to solve the healthcare issues and to provide quick treatment [117][118][119][120][121]. Moreover, ethnic information is useful to provide appropriate services and special advantages to minority ethnic groups which are defined by the government for the minority and economically low conditions [122].
The center to Eliminate Health Disparities (CEHD) of the University of Texas Medical Branch (UTMB) has implemented the Information System for the health of people of UTMB to reduce disparities in health. Their information system is also known as REAL (race, ethnicity and language) [122]. Figure 13 illustrates the role of CEHD in the health system of UTMB and Galveston County as a whole. UTMB is a university health center that welcomes patients from diverse backgrounds. It provides services to different racial groups whose income level is below the poverty line. However, the main objectives of this REAL project are (1) to improve the UTMB's health information system for better diagnostics and stratified quality measures by race, ethnicity, language and status (2) to develop and disseminate contingency plans to address disparities through effective partnerships with relevant stakeholders.

Aesthetic Surgery
Aesthetic surgery is described as a facial plastic surgery either for the beautification of face or to create an attractive face [123]. Anthropometric measurement is the distance between two facial points. It has been revealed in literature that anthropological measurements such as ratio, geometric distance and Euclidean distance are different for different racial groups [124][125][126]. Geometric and Euclidean distances are the distances between primary or secondary facial landmarks on the frontal face/profile face. Depending on the race of patient, anthropological measurements are derived and used in aesthetic surgery [127][128][129]. For example, aesthetic surgery for Chinese people and Indian people is different as both groups have different facial features and thereby different anthropological measurements.

Face Recognition
Racial difference in humans is useful for biometric illustration and human identification [130]. Race cues and race wise anthropological measurements make it easy for face recognition systems to recognize the person [131][132][133][134]. Moreover, integration of race information with face recognition makes the face recognition system more intelligent and quick for accurate face recognition [135][136]. Figure 13: Role of the health center system to eliminate health disparities for different ethnicity [122]

Human Computer Interface
Nowadays, several systems are automated using robots [137][138][139][140]. Consumers of such robotic systems need to interact with robots frequently. In human-robot communication, racial cues play an important role [141][142][143][144][145]. Specifically, by recognizing the race of a human from his face, behavior and expression, robots can deliver the relevant services to humans. Such robotic systems are useful for an open service atmosphere where robots work as humans [146]. In particular, they are useful in hospitals, malls, stadiums, hotels, gaming zones and intelligent HCI organizations for easy communication with humans.

Challenges in Racial Categorization
Several challenges are faced to get correct and accurate racial categorization. Below we mention the major challenges faced in racial categorization. These challenges create new opportunities for researchers in this field to carry out further research. An efficient racial categorization method can be developed by overcoming these challenges and higher classification accuracy can be achieved.

Intra-race Categorization
To the best of our knowledge, intra-race categorization has not been focused much in literature. 85% of the worldwide population is divided into major 7-racial groups, namely African-American, South-Asian, East-Asian, Caucasian, Indian, Arabian and Latino race [30]. Intra-race categorization for aforementioned racial groups is challenging due to severe similarity in facial features and in physical appearance of humans belonging to a particular group [147][148]. It is difficult to infer different clues for intrarace categorization.

Anthropometry Measurements
Facial landmarking technique is used to measure anthropometry measurements. It has been shown in literature that accuracy of anthropometry measurements and thereby accuracy of racial categorization method varies with respect to the number of facial landmarks [21,51,56,62,[147][148]. Thus, existing landmarking methods can be further improved by increasing the number of landmarks. Moreover, landmarks on forehead area, hairline and earlobe can be additionally considered to increase accuracy further [56]. In addition, anatomists have revealed that Ear pinna and Iannarelli's measures differ for different racial groups. Like finger print, ear pinna is unique for each individual.

Real-Time Data
It is required to process real-time video streams at public places such as airports, hospitals, health care centers, malls and stadiums for public safety and security systems. However, existing racial categorization methods are not applicable and reliable for processing real-time video stream [53]. They are applicable to only off-line image dataset.

Manual Racial Categorization
The aforementioned issues are related to automated racial categorization methods. However, the issues faced by manual racial categorization methods are different. The major issue related to manual racial categorization which involves participants is that the number of stimulus levels for race prediction is limited. Stimulus level is defined as the number of tasks performed by participants for race prediction [56,61].

Conclusion
In this paper, we have presented in-depth review on various single model and multi model racial categorization methods. Moreover, parametric evaluation of racial categorization methods based on identified set of parameters is presented. It has been observed that fusion of facial features and physical appearance provides accurate race categorization. Moreover, it has also been observed that CNN based racial categorization model gives substantially higher accuracy because it extracts deep features. Our rigorous review on racial categorization methods will provide researchers state-of-the-art advancements related to racial categorization methods. Furthermore, the applications and challenges of racial categorization discussed herein will help researchers to develop an efficient and competent racial categorization method.