Real-time Target Human Tracking using Camshift and LucasKanade Optical Flow Algorithm

,


Introduction
These days, with the continuous development of high precision, cost-effective computer imaging hardware equipment and corresponding auxiliary detection algorithm, real-time target human tracking technology [1]- [8] has become a research hotspot in various fields such as interactive video games, military, robotic, etc. Human tracking is the process of locating moving targets by performing tracking algorithms and analyzing that trajectory in many applications over sequential video frames [9]- [12] The models of the target could be a single human or multiple humans based on different applications. However, human tracking is a difficult task with many challenges such as the system needs to be fast enough to carry out real-time applications or operate in different environments with high accuracy. The impact of external factors affects the accuracy of the algorithm such as moving obstacles, illuminations, or similar targets, etc. Hence, researches of higher accuracy and better computer vision detection algorithm are still developed.
With the development of technology, various methodologies have been proposed to solve real-time human tracking. In the literature review, the Kalman Filter algorithm [13] could only be used in a linear system in case of mimic real scenarios. In [14] - [16], Laser Range Finder to track human is proposed to determine the human' leg and analyze the human tracking motion. However, the measurement of this method is limited in case of long distance between human' leg and sensor. The multi-camera systems [17] - [21] are also designed to track human, but those approaches only process in a small area which installs cameras. The hand-held cameras system is equipped for the sports player in [22] to synthesize a stroboscopic image of a moving target. Nevertheless, its computational speed and accuracy could not be optimal in case of the elaborate environments. In [23]- [25], Particle filter-based vehicle tracking via HOG features is very robust when fast-moving human target but this algorithm is hard to integrate with a mobile system. In [26] and [27], the Camshift algorithm is well-known as popular tracking human method. The advantages of this methodology are low-cost computation, easy to manipulate, etc. The Camshift algorithm follows the track of the targets based on histogram back projection. The stability of this method is impacted by color, illumination, and noise. Hence, this method suffers from series of errors in case of similar color background.
In light of the remarkable importance and advantages previously mentioned, a real-time tracking human system is proposed for normal cameras under the circumstances of high influence from complicated environments. Lucas-Kanade Optical Flow Algorithm is a well know method for tracking humans for a decade [28]- [30] The method is designed based on the combination between Camshift algorithm and Lucas-Kanade Optical Flow Algorithm (LK-OFA) [31], [32] with Oriented FAST and Rotated BRIEF features [33] Compared to the existing works, the proposed approach has several contributions as follows: (1) The reliable keypoints which are less affected by illumination or displacement of the human targets are tracked based on the LK-OFA features. It is noted that optical follow algorithm is very good at tracking points in the next frame from the locations of them in the previous frame. The Camshift algorithm is carried out to determine the area of the human target with similar appearances in the frames more precisely.
(2) Since the proposed method reduces number of loops that need to converge the center of the search window and centroid of human target, it is easy to implement on mobile devices.
(3) The designed system generates a small amount of data to be processed; therefore, this could be simply done in real-time, and a powerful computer is not required.
The rest of the paper is organized as follows. The next section analyses the detailed structure of the proposed approach. Section 3 summarizes the experiment platform and its result, followed by the brief conclusions and the outlook in section 4.

Proposed Methodology
In this section, a short overview of the human tracking algorithm is illustrated in the flowchart as Fig.1. To begin with, for choosing the tracking target person, a region of interest (ROI) is created manually. After that, HSV histogram of ROI is computed to prepare for CAMSHIFT later. At the following stage, the system finds ORB features in both ROI and initial frame. In the subsequent step, 50 best keypoints are chosen from best matched features based on Bruce-Force matcher. Those keypoints are traced by using Optical flow with Lucas-Kanade algorithm. At the next stage, the failed tracked points are totally removed by utilizing sum square of difference (SSD) and Backward tracking method. If the number of the keypoints is smaller than three, the process comes back to the finding ORB features step. In contrast, the process continues with good keypoints left, the center of them is determined using first and zeros moment. This center is used to create the initial search window of Camshift. By applying Camshift algorithm, the area of the human target can be found. Finally, the accuracy of tracking human's center is improved by Kalman filter.

Create ROI and compute H histogram
Before tracking human, an area (ROI) on the target person is established ( Figure 2 (b)). This ROI is used as a template of the target. Then, the HSV histogram of ROI is computed, but only the H channel is used with 16 bins to get the best result ( Figure 2 (c)).

ORB feature and matching
For the purpose of tracking and creating the initial window for Camshift, this paper selected ORB feature detector (Oriented FAST and Rotated BRIEF) to find the best points. ORB algorithm modifies both FAST [34] keypoint detector and BRIEF [35] descriptor to reach the best result. ORB algorithm detects FAST points in the image, then applies Harris corner measure to discover top N points among them. After that, the algorithm searches for the intensity weighted centroid of the patch with the located corner at the center in order to calculate the orientation of points. The orientation value is computed by using direction of the vector from this corner point to centroid. After finishing the implementation of FAST, ORB algorithm modifies BRIEF descriptors or rBRIEF to create a descriptor for each keypoint. In our system, ORB algorithm is implemented by extracting keypoints from both ROI and frame. The keypoints are searched in Gray image extracting from the original image and the archetype of ROI. Bruce-Force matcher is used to calculate the distance value between each pair of keypoints in both Gray images. By sorted pair of keypoints based on distance value, some keypoints best matched will be kept to use later as shown in Figure 3. To be more specific, the algorithm could operate with higher accuracy in case of more keypoints selection. However, if the number of keypoints incline too big, the calculation process would be taken long time to finish, especially, in case of mobile robots. In this paper, based on trial and errors method, having 50 keypoints enabled us to ensure the balance between the calculation time and the accuracy of our algorithm effectively.

Track keypoints and remove failure points
The positions of the previous keypoints are tracked by using Optical Flow with Lucas-Kanade Method. Here, a 3x3 patch around the points with an assumption that all the neighboring pixels have similar motion is applied to estimate the new locations of keypoints in the frame based on their previous positions by using equation (1).
and , xy ff are gradient of keypoints and surrounding points along x, y axis.
After updating, some of those keypoints in the frame might be failed. In order to remove these failure points, this article combines two methods at the same time: SSD and the Backward tracking technic. Firstly, the SSD in RGB color-space for consecutive frames is expressed as: where t k is the position of keypoint, m and n are width and height of the compared window. The idea of SSD is to compute the total error between a patch around keypoint in the current frame with the location of them in the previous frame. Figure 4 and Figure 5 illustrate errors between the location of keypoints in consecutive frames corresponding to two cases: small error and big error. After reckoning the SSD values in three channels, the total error is expressed as follows: where  ,  ,  are three positive coefficients and the sum of  ,  ,  is equal to one: In this paper, the coefficients are chosen as follows: The error value of all keypoints are illustrated in Figure 6.  the Euclid distance method. All keypoints whose distance value is smaller than a thresh hold are kept.
By utilizing two mentioned method above, all failed keypoints are removed. If all keypoints are failed, our method has a viable improvement on current methods. To be more detailed, in this case, a new set of keypoints would be created inside Camshift bounding box of the target.

Initial search Window for Camshift algorithm
By applying first and zeros moment for all remain keypoints, the mean position of them is indicated as: with I is the intensity of keypoints. Since we project all keypoints into a bitmask, so 1 I = . Location of initial search window Camshift is created from the center points.

Human tracking with Camshift
Camshift algorithm is an effective method to find color feature of target in frame. The principle of Camshift algorithm bases on Meanshift algorithm. However, it was modified to update the size of the tracking window in the next frame and find the orientation of the target. For more specific, Camshift computes H histogram of both ROI and Frame, and then exchanges pixel' value in the frame with the probability value of its color appearance. Then, it creates the color probability distribution image that performs the probability of appearance of each pixel within the range. Camshift algorithm uses a search window and loops it until the center of the search window converges with the centroid of points having high probability. By using an initial search window created with tracking keypoints, it reduces the number of iterations to find the area of the target. The size of tracking window is updated by zeros moment as follows: 00 2 256 M S = (12) Then, the orientation of the target is obtained by using the second central moments: The aspect ratio is expressed as: The width and the height of the window in the frame are estimated as follows: 00 w = 2.M .Ratio (15) 00 h = 2.M .Ratio (16)

Kalmal filter for the position of the human target
The center of human target finding by Camshift algorithm has some vibration and needs to be removed to improve the smoothness of the human tracking process. Kalmal filter is used in this paper to solve the vibration of center position. Kalman filter method estimates the location of the center based on value prediction and the new location of the center. It contains two main stages: prediction and correction.
Let denote , , , The following equation is inferred. 1 . tt X F X − = (19) With ˆt X is the value prediction of t X , F is translation matrix.
The predictor covariance equation is expressed as follows: In the equation,ˆt P is the value prediction of covarianceˆt P , Q is the interference factor. In correction step, there are three equations named as: the Kalman gain equation, the State update equation, and the Covariance update equation: . . .

TT t t t K P H H P H R
ˆ. .
In these above equations, R is measurement noise covariance matrix, H is measurement matrix. The value of ,, R H F and Q is implemented as follows:

Summary methodology
Our algorithm is carried out with the following step: Step 1: Create ROI for human target and analysis the HSV Histogram of ROI.
Step 2: Compute ORB features in ROI.
Step 3: Compute ORB features in the first frame and match them with ORB features in ROI using BruceForce matching. Take 50 best matching points to track later.
Step 4: Tracking those keypoints in the previous step by using the Lucas-Kanade Optical Flow algorithm.
Step 5: Remove failed keypoints by using SSD and Backward tracking technic.
Step 6: Find the center k of remain keypoints in the frame with zeros and the first moment.
Step 7: Use the center k to create the initial window for Camshift algorithm.
Step 8: Find the bounding box and update centroid of the tracking human by applying Camshift algorithm.
Step 9: Use the Kalman filter to improve the smoothness of the human tracking process.

Experiment results
During the experimental process, the configuration of the computer is as follows: 4 GB ram, Intel(R) Celeron (R) CPU N3350 @ 1.1 GHz, webcam with 0.3 Mpx (480x640) and 2 Mpx Webcam. The algorithm is carried out in Python with the support of OpenCV library and Numpy Library.
For the purpose of showing the development in performance, this paper offers comparisons between the proposed method and the original Camshift [36] The comparisons are checked in different scenes as shown in Figure 8 and Figure 9. To be more specific, the blue bounding box is created by using Camshift and the green one is created by using the proposed method. Firstly, in the case of different color shirts as shown in Figure  8, it could be seen that there are almost none of the differences between two approaches. In the second case, we evaluate the performance of the proposed algorithm with undesired humans having the same shirt in random backgrounds. From Figure 9, it is noticeable that the original Camshift recognizes and tracks the wrong target. In contrast, the analyzing and human tracking of the proposed method is very clear and precise. Table 1 provides the data about the Average Frame per second value of the proposed algorithm on both 0.3 Mpx webcam and 2 Mpx Webcam. On average, it could be seen that the system is fast enough to be utilized in real-time with FPS value of up to 25. Meanwhile, the effect of Kalman filter is evaluated over 400 frames and the results are shown in Figure 10. Furthermore, the proposed method indicates a significant decline in the chattering of the bounding box centre especially in x-axis when human target moves.  Table 2 compares the experimental results between the Original Camshift and the proposed approach among different types of the environmental conditions. In each scene, we test 60 times for 3 cases: One person, Group of people wearing the same shirts, Group of people wearing the different shirts. Out of three environmental conditions, it is clear that the false frames in poor lighting room condition of the Original Camshift is the highest, at about 420. By constract, the figure for the proposed method is only 49. Over different cases, the precision rate of the proposed method is quite higher compared to the Original Camshift. For instance, the accuracy of Original Camshift in good lighting room condition is just over 27.66%, while the proportion of the proposed method is nearly 96.35%. In addition, for tracking human wearing the same shirts, the tracking rates of the Original Camshift are noticeably low, at about 27% in average. On the other hand, the percentages of the developed method always maintain over 90%. Furthermore, the results illustrates that the proposed system could work effectively even in the indoor environments or the outdoor environments with the accuracy up to 98%.

Conclusions
In this article, the new method of human tracking is analyzed in depth. The proposed method is developed based on Camshift algorithm and the Lucas-Kanade Optical Flow Algorithm with Oriented FAST and Rotated BRIEF features. The experiment results indicate that the proposed system could be well-adjusted in real-time applications. By comparing with the existing human tracking algorithms, it could be noted that the proposed algorithm reaches to the higher accuracy. Furthermore, its computing time is relatively faster. Hence, the adaptability of the proposed method is better in practical applications. In conclusion, the present results provide practical reference values about human tracking algorithm for the development of equipment with high anti-interference performance, the design of test plans, and the establishment of international standards in the future.