Classification of Wing Chun Basic Hand Movement using Virtual Reality for Wing Chun Training Simulation System

A R T I C L E I N F O A B S T R A C T Article history: Received: 02 October, 2020 Accepted: 28 December, 2020 Online: 15 January, 2021 To create a Virtual Reality (VR) system for Wing Chun's basic hand movement training, capturing, and classifying movement data is an important step. The main goal of this paper is to find the best possible method of classifying hand movement, particularly Wing Chun's basic hand movements, to be used in the VR training system. This paper uses Oculus Quest VR gear and Unreal Engine 4 to capture features of the movement such as location, rotation, angular acceleration, linear acceleration, angular velocity, and linear velocity. RapidMiner Studio is used to pre-process the captured data, apply algorithms, and optimize the generated model. Algorithms such as Support Vector Machine (SVM), Decision Tree, and kNearest Neighbor (kNN) are applied, optimized, and compared. By classifying 10 movements, the result shows that the optimized kNN algorithm obtained the highest averaged performance indicators: Accuracy of 99.94%, precision of 99.70%, recall of 99.70%, and specificity of 99.97%. The overall accuracy of the optimized kNN is 99.71%.


Introduction
Martial arts training is one of the preferable methods of physical exercise. It provides health & psychological benefits, such as preventing osteoporosis [1], reduces stress, depression, and increases mindfulness [2], and also has the potential to reduce aggressive behavior [3]. Basic movements of martial arts training such as stance and punches are usually done with the supervision of the instructor. However, instructors cannot always be there for one student, they must accompany other students as well. It could be a problem for new students who have just learned stances and punches. Repetitive movements without supervision could result in the students become unmotivated and stop their movement.
Having the instructor to watch over them is helpful for the students to keep repeating the movement without stopping. There needs to be a method of training to make the students keep their excitement going during basic training when the instructor was not able to watch them. Watching video tutorials and reading books are some of the ways to improve the students' training experience, however, it lacks interactions and presence needed in martial arts training. This is when Virtual Reality (VR) technology comes into play. It is mentioned by [4] that VR is typically defined in technical terms, meaning it is associated with technical hardware such as computer systems, head-mounted display (HMD), motion trackers, etc. VR itself can be simply defined as an environment that simulates the real world and is generated by a computer system in which users can then interact with by using motion trackers [5]. The trackers are used to collect data from the real environment and translate them to the virtual environment [6].
Due to its immersive capabilities, VR technology has been used in areas, such as architecture and landscape planning [7], dancing [8], military training [9], medical, such as stroke rehabilitation [10], and psychological treatment [11]. It has also been used for research in martial arts scope, even though there has not been much of them. The reason is that higher immersion and presence result in higher performance to the user [12].
A VR-based training simulation system could help in practicing the movement and motivates the user, due to its immersive capabilities [13]. To develop the system, capturing and classifying the hand movements is crucial, as being able to classify basic movement in Wing Chun is one of the basic training outcomes. This study uses Oculus Quest, a VR gear with a headmounted display (HMD) and a pair of controllers. These controllers are used to capture hand movements. Variables such as time, location, rotation, angular acceleration, linear acceleration, ASTESJ ISSN: 2415-6698 angular velocity, and linear velocity, were recorded and used as features for classification.
The purpose of this paper is to find the best possible method of classifying Wing Chun's basic hand movement to develop a VRbased training system. Since this paper uses Oculus Quest with a pair of hand controllers, hand movement data can be captured in the form of location in the world space. Not only that, other properties like rotation, angular acceleration, linear acceleration, angular velocity, and linear velocity can be captured as well. RapidMiner Studio [14], [15] is used to process the data, optimize models, and to analyze the result. This paper is presented as such: Section 1 is the introduction. In section 2, this paper outlines the studies and works that were related to motion and gesture recognition or classification. Section 3 outlines the method that is used in this paper. Section 4 shows the result and analysis of the study. Section 5 provides the conclusion of this paper based on the result and analysis. Section 6 provides discussions and further works that are possible to improve this paper.

Related Works
The study by [16] shows the approach is to segment or classify movement into distinct behaviors. The first approach is using Principal Component Analysis (PCA), which is based on the observation that simple motion exhibit lower dimensionality than complex motions. The motion is broken into frames, and each frame is represented as a point, which is the joint's location. It is based on dimensionality, in which a motion sequence with a single behavior should have a smaller dimensionality than the one with multiple behaviors. The second approach is using Probabilistic PCA. The Probabilistic PCA ignores the noises of motion, unlike PCA. The third approach is Gaussian Mixture Model (GMM), in which the entire sequence of motion is segmented whenever two consecutive sets of frames belong to different Gaussian distributions. The Probabilistic PCA obtained the best result of all three approaches: 90% precision for 95% recall.
In [17], the authors conducted a survey on sequence classification, in which it is stated that multivariate time series classification has been used for gesture and motion classification. The research by [18] captured motion data using CyberGlove, with 22 sensors in different locations at the glove and 1 angular sensor. The data are split into 3 datasets and used K-folds validation of K = 3. The multi-attribute motion data is reduced to feature vectors with Singular Value Decomposition (SVD) and then classified using Support Vector Machine (SVM). The study results in the accuracy of 96% for dataset 1, and 100% for dataset 2 and 3.
The research by [19] used wearable accelerometers to capture movement data. Several movements such as sitting, sitting down, standing, standing up, and walking, are the subject of classification. The features that are extracted from 4 accelerometers are further pre-processed into an acceleration in x, y, and z-axis for each, resulting in 12 features. The features of the movement data are then selected with Mark Hall's selection algorithm [20] and classified with C4.5 decision tree with AdaBoost ensemble method [21]. The study used 10 iterations of AdaBoost with C4.5 tree confidence factor of 0.25 and with 10folds cross-validation. The overall performance recognition was 99.4%.
In [22], the authors compared the classification accuracy of several algorithms such as C4.5 decision tree, multilayer perception, Naive Bayes, logistic regression, and k-Nearest Neighbor (kNN). A smartphone with the capability of recording accelerometer and gyroscope data was used as the motion sensor. Waikato Environment for Knowledge Analysis (WEKA) machine learning tool was used to apply the algorithms and compare them.
The result of the study shows that kNN algorithm obtained the highest (averaged) accuracy in all motion classification by 84.6%.
The study by [23] used the multi-modal approach, which is using a combination of audio, video, and skeletal joints. The audio segmentation used the Hidden Markov Model (HMM) to determine the "silence" and "event" of motion. The segmentation based on skeletal joints uses the y-coordinate of hand joint location. The first step is to identify the start and end frames of gestures of either left or right hand, and the second step is to identify the same by using both hands. For the RGB and video modalities, SVM is used as the gesture classification. The audio classifier and the SVM classifier is then fused with a fusion algorithm. This study, which used 275 test samples with over 2,000 unlabeled gestures, resulted in an average edit distance of 0.2074.
The study by [24] used a signal captured from movement instead of the joint location. The study used 12 Trigno Wireless electrodes. The signal is divided into several segments, calculates the spectrogram, and then normalized. PCA is then applied to reduce the dimensionality of the data but maintaining important information. The last step is to apply SVM to the data to do the classification. Compared with the Root Mean Square (RMS) method, the method of this study results in higher accuracy of 9.75% and a reduced error rate of 12%.
Lower-limb motion classification was done by using piezoelectret sensors that apply forcemyography (FMG) which reads force distribution generated by muscle contractions [25]. study compared kNN, Linear Discriminant Analysis (LDA), and Artificial Neural Network (ANN) to classify 4 lower-limb motions, namely leg raising, leg dropping, knee extension, and knee flexion. The highest accuracy is obtained by kNN with an accuracy of 92.90%.
The study by [26] goes into more specific, classifying motions into karate stances, movements, and forms. The captured data consists of a 3-axis accelerometer, gyroscope, and magnetometer. The classification is done by averaging the dataset with Dynamic Time Warping (DTW) Barycenter Averaging (DBA) algorithm, which is an averaging method that iteratively refills an initially selected sequence, to minimize its squared distance / DTW [27] to averaged sequences. In this study, a movement template was generated from two different-styled Karate masters and then both are compared. This results in a recognition rate of 94.20%.
In [28], the authors did a study on human motion recognition by using VR. A combination of LDA, Genetic Algorithm (GA), and SVM is proposed to classify human motion. After collecting motion data, LDA is used to extract features, and then GA is used to search for optimal parameters. After optimal parameters are found, SVM is then used for classification. There are 10 motions to be classified and several averaged results are obtained: Precision of 95.65%, Accuracy of 97.05%, Specificity of 92.78%, and Sensitivity (Recall) of 94.01%.
It can be inferred from the related works that there has been various research about human motion recognition and classification for years. SVM seems to be favored among the study of motion classification. When several studies compared algorithms for movement classification, kNN seems to provide the highest accuracy. This paper performs several algorithms that were mentioned in the related works and optimizes its parameters to improve the results.

Proposed Method
This study utilizes the features from Oculus Quest VR gear, along with Unreal Engine 4 (UE4), which helps in extracting features of movements. RapidMiner Studio is used to pre-process the data, apply algorithms, and analyze the results.

Method of extraction
To be able to capture the hand movement, a simple game based on UE4 is developed. When the game starts, the user can press a button on the Oculus Controllers to start capturing movement data. Several movement data that are recorded are basic Wing Chun hand movements such as Straight Punch, Tan Sau, Pak Sau, Gan Sau, and Bong Sau. The recording starts from the beginning stance to the ending stance of a movement. The same movement is done 10 times within around 40 seconds with around 1 to 2 seconds interval for every time the movement is done.

Extracted Data
The movement data is generated in the form of CSV files for each movement and each hand, resulting in 10 files. The extracted features are time, location, rotation, angular acceleration, linear acceleration, angular velocity, and linear velocity. Except for time, all the features are extracted in the form of vectors, and then split into x, y, and z-axis, resulting in a total of 19 features. RapidMiner Studio is used to process the data and give labels to the data. The data is then processed again by mixing all files of each hand and movement into 1 file. The dataset consists of 8,725 rows with a total of 10 labels. For the left hand, the labels are Punch, Tan, Pak, Gan, and Bong. For the right hand, the labels are R_Punch, R_Tan, R_Pak, R_Gan, and R_Bong.

Method of optimizing algorithm
Several algorithms are performed to the dataset using RapidMiner Studio to check for their accuracy. From there, the parameter optimization process is used to make the algorithms perform better. The selected algorithms for comparison are SVM, Decision Tree, and kNN. Figure 1 shows the flow of the algorithm optimization method. First, an algorithm is applied to the dataset with split validation to check the initial performance. The dataset is first normalized and then split into 80% of the training set and 20% of the testing set by using Split Data module. An algorithm is then applied to the dataset, and then measure the performance to obtain accuracy. After performing split validation, the initial results of the algorithms' performances were obtained.
Then, parameter optimization is performed to get the best possible value of the parameters in the algorithm process. The algorithm is applied to the normalized dataset with 10-folds cross-validation, and then put into the Optimize Parameter module. This module executes the cross-validation using all the selected combinations of parameters that are available in the algorithm. This process results in the best accuracy of all selected combinations of parameters. After finding the optimized value of parameters, the process of optimizing selection of features is performed to the algorithm. The process is identical to the Optimize Parameter module, but this time the algorithm is put into the Optimize Selection module. This module executes the cross-validation of the algorithm and weights the features of the dataset that further improve the accuracy. This process can be done until no more features are removed. After this process, a model from the algorithm with optimized parameters and selected features is obtained. This model is then applied to the dataset with split validation process

Result and Analysis
The performance indicators were based on the study by [28], in which the 4 most used indicators for action classifications are used: accuracy, precision, recall (sensitivity), and specificity.

Support Vector Machine
The parameters used to obtain the final result for SVM algorithm are SVM Type: nu-SVC [29] with the nu value of 0.1, in which nu value is an upper bound on the fraction of margin errors and a lower bound on the fraction of support vectors, and kernel type: linear, in which it is suitable when having a lot of features in the dataset [30]. The relevant features obtained from the process of optimize selection are: time, loc_x, loc_y, loc_z, rot_x, rot_y, rot_z, ang_acc_x, ang_vel_y, lin_vel_x, and lin_vel_z. The result of the optimized SVM model for each movement is shown in Table 1

Decision Tree
The parameters used to obtain the final result for Decision Tree algorithm are Criterion: information gain, Confidence: 0.140, and Minimal Gain: 0.019. Information gain is useful to minimize randomness in the dataset [31]. The relevant features obtained from the process of optimizing selection are loc_x, loc_y, lox_z, rot_x, rot_y, rot_z, lin_vel_y, and lin_vel_z. The result of the optimized Decision Tree model for each movement is shown in

k-Nearest Neighbor
The parameters used to obtain the result for kNN algorithm are Measure Type: Numerical Measures, Numerical Measure: Manhattan Distance, and k = 5. Manhattan distance is preferable when there is high dimensionality in the dataset [32]. The relevant features obtained from the process of optimizing selection are time, loc_x, rot_x, rot_y, rot_z, and ang_vel_x. The result of the optimized kNN model for each movement is shown in Table 3. All movements except for R_Bong and Bong obtained the highest result (100%) in all 4 indicators. The least accuracy, precision, recall, and specificity are each obtained by R_Bong (99.71%), R_Bong (98.16%), Bong (98.19%), and Bong (99.81%) respectively. The overall accuracy obtained from the optimized kNN model is 99.71%.

Summary of the result
By using RapidMiner Studio, 10 basic Wing Chun hand movements were classified. The result of the classification by 3 optimized algorithms are shown in Table 1-3. Figure 2 shows the comparison of the averaged results between the optimized SVM, Decision Tree, and kNN algorithms.
The averaged results of the optimized SVM and Decision Tree show that accuracy and specificity have higher results than precision and recall, while in the optimized Decision Tree, the precision and recall results are less than the optimized SVM one. From all 3 algorithms, kNN obtained the highest results in all 4 indicators, reaching more than 99%.   The number of extracted features for optimized SVM, Decision Tree, and kNN are varied. The time feature is not present in the Decision Tree. As for angular acceleration, linear acceleration, angular velocity, and linear velocity, the features present in the 3 algorithms are varied. However, lin_vel_z is present in both SVM and Decision Tree. The rotation feature (rot_x, rot_y, and rot_z) are present in all 3 algorithms. Location (loc_x, loc_y, loc_z) are present in SVM and Decision Tree, while only loc_x that is present in kNN. This result shows that Wing Chun hand is focused more on positioning than velocity and acceleration. This corresponds according to Master Ip Chun, if the hand position and movement is not correct, it will not be useful and may be dangerous [33]. while Decision Tree has the lowest. While all algorithm's curve has some intersections with each other at some movements, the difference between them is clear enough. It can also be seen that R_Tan reached 100% in every indicator and all applied algorithms. For every indicator, the shape of the graphs for kNN is identical, except for R_Bong and Bong, which are not reaching 100%. This is understandable because even though Bong Sau is a basic hand movement, the shape and movement are more complex than other basic movements. The initial accuracy for SVM, Decision Tree, and kNN is 52.57%, 55.14%, and 98.06%, respectively. After the optimization process, the split validation process is done again to obtain the final accuracy. The final accuracy for SVM, Decision Tree, and kNN is 98.11%, 96.06%, and 99.71%, respectively. From the comparison seen in Figure 7, it can be seen that the optimization improves all 3 algorithms, especially SVM and Decision Tree. Both default and optimized results show that kNN obtains the highest accuracy.
To check whether the result shows a statistically significant difference (p ≤ 0.05 and p ≤ 0.01), Wilcoxon signed-rank test is performed on all 4 indicators (accuracy, precision, recall, and specificity) of both default and optimized algorithms. The test shows that all 4 indicators in SVM have a statistically significant difference. Both Decision Tree and kNN have a statistically significant difference in all indicators except for precision. Table  4 shows the result of the test.
Wilcoxon signed-rank test is also performed to the comparison of each algorithm in both default and optimized. The result of the comparison of default algorithms shows that SVM and Decision Tree do not have a statistically significant difference. The result of the comparison of the optimized algorithm shows that all the algorithms have a statistically significant difference. Table 5 shows the result of the test.

Conclusion
The purpose of this study is to find the best possible method of classifying Wing Chun basic hand movement to create a Wing Chun training simulation system in VR. The results show that movement data features such as location, rotation, and linear velocity are significant enough to classify Wing Chun basic hand movements. Right hand Tan Sau has the highest value of all 4 indicators. Meanwhile, based on the result of the optimized kNN algorithm, Bong Sau of both hands is the only movement that is not reaching 100% performance, which is understandable because Bong Sau is a more complex basic movement. Before the optimization process, only kNN algorithm that has a highperformance result out of 3 algorithms. However, after the optimization of parameters and selection of features, SVM and Decision Tree's performances significantly improved. The comparison of the results shows that the optimized kNN algorithm obtained the highest results.

Discussion
Five basic movements were captured for each left and right, making it 10 movements in total. The process of capturing the movement is done with Oculus Quest VR gear that comes with a pair of handheld Oculus Controller. A simple game is developed with Unreal Engine 4 to capture motion data that consists of several features of movements, such as location, rotation, etc. RapidMiner Studio is used to pre-process the dataset, apply algorithms, and optimize the learning model. This study compares SVM, Decision Tree, and kNN algorithms since 3 of them were commonly used in the earlier study of motion classification and recognition. The performance indicators such as accuracy, precision, recall, and specificity are used to compare the overall results of the 3 algorithms.
In this study, while algorithms like SVM, Decision Tree, and kNN are compared, this study does not include combining algorithms, for example like combining SVM with kNN, etc. RapidMiner Studio has various modules that help in processing data with algorithms. AdaBoost module has been applied to the optimized kNN, but the result shows no improvement. The module like Ensemble Stacking has yet to be explored further, although attempt to use stacking to stack all 3 algorithms, the result is not as good as the obtained result of each optimized algorithm has been done. Machine learning algorithms such as Deep Learning, Neural Networks, and Spatio-temporal algorithms can also be applied and compared for future studies.
The simple game that is developed to capture the movements also has a limitation in which the user can only stand still. The game can still be developed so that it could improve the motion capture process to include the user able to move around while performing hand movements. While this study shows that the algorithms can accurately classify basic hand movements, there are more basic hand movements that are very similar to each other. These movements can only be distinguished by looking at the area of contact of the hand. This can also be a field of further study for better movement classification. Further classification of sequences could be a field to be explored as well since Wing Chun also has a more complex set of movements that require more than one type of movement.