A Method for Detecting Human Presence and Movement Using Impulse Radar

Article history: Received:01 July, 2020 Accepted:24 August, 2020 Online: 28 August, 2020 Using non-invasive and non-contact sensors to measure a person's presence or movement helps improve the quality of life for both healthy people and patients. In this paper, a method of measuring the presence and motion of a person is proposed by utilizing UWB Impulse Radar, which is low power consumption and safe to radiate to the human body. The experimental stage of this study is divided into the stage of extracting features by signal processing from radar signals, the stage of generating datasets with 3~6 kinds of labels, and the stage of performing and verifying machine learning by imaging. In this experiment, a small number of images were used because only good quality signals were selected and used by radiating radar signals to the human body. The experiment result show high accuracy when using neural networks such as GoogLeNet and SqueezeNet. Experiments in this study confirmed that radar signals could be used to detect human presence and motion as a result of studies using the proposed method.


Introduction
This paper is an extension of the work originally presented in the 2019 International Conference on Information and Communication Technology Convergence (ICTC) [1].
Non-contact measurements are useful for health care, human activity, security, surveillance, etc., and the prevalence of COVID-19 in 2020 is a constant increase in demand for non-invasive, contactless measurements of physiological functions in modern society. Especially, non-contact sensors such as radar technology can be used to identify health conditions and movements without limiting human activities. In commercial applications, radar sensors are widely used for LED control and alarm monitoring in indoor and outdoor environments, as well as in smart homes and cities. A radar sensor can be applied in areas where there are restrictions on the use of sound, infrared, vibration, and camera sensors. For example, acoustic and vibration sensors are very vulnerable to acoustic noise, and infrared sensors frequently generate false alarms in outdoor environments. Moreover, camera sensors are relatively expensive, require high signal processing, have low performance at night, and have lens contamination problems. On the other hand, radar sensors are robust against weather conditions and their performance does not decrease at night. In addition, since signal processing is relatively easy and effective for detecting a target, it is possible to effectively utilize radar sensors in various indoor and outdoor environments [2].
Compared with continuous-wave radar systems, ultra-wideband (UWB) radars have localization capability, consume less power, and can monitor multiple subjects [3]. Radar technology is aimed at detecting targets in aviation and military areas. Recently, studies have been conducted to detect human bodies at close range, and to detect heart rate and respiration. UWB impulse radars are used to obtain biological information due to their low risk of exposure to electromagnetic waves and low power consumption. It has become an emerging technology for indoor localization and tracking. UWB radar has many advantages, including high spatial resolution, ability to mitigate interference, through-wall visibility, simple transceiver, and low cost [4].
In this study, radar signal processing and machine learning are applied in a system to detect human presence and movement. In addition, the machine learning data set utilizing most contactless sensors is generated as a data set by data processing techniques of various statistical calculation methods. However, this study does not use a statistical calculation method, but converts the signalprocessed result into images to generate a data set, machine learning, and shows the result. As a method of research, data is received from radar in chronological order, and features are extracted through signal processing. The extracted feature data is divided into six labels, and machine learning is used to determine the existence and movement of the current person through different experiments for each label. As a result of the research, a total of three experiments were conducted: changing the epoch sizes and changing the composition of the label. In addition, even though ASTESJ ISSN: 2415-6698 this is the result of research on data sets using fewer images, the result is satisfactory in the research of converting radar signals into images and detecting the presence or absence of subject and movement.
In Section 3, the machine learning process is discussed by extracting features with a dataset configured through the preprocessing of radar signals. Section 4 presents the experimental results using several test sets. Finally, the paper is concluded in Section 5.

Related Works
The most common method is to use image sensors to detect the presence or movement of a subject. However, there are scenarios when this method cannot be used, such as conditions involving personal privacy infringement. In order to overcome problems, the UWB impulse radar signal was used in this study. Machine learning algorithms is used to determine and classify subject presence and movement. The features for the machine learning model are extracted using signals from the UWB impulse radar module, as shown in Figure 1. The impulse radar module has one transmitter (Tx) and receiver (Rx). The Tx sends very narrow pulses, and the Rx receives the reflected pulses. The received signal passes through several signal processing steps to extract the target signal. However, this target signal is generally perturbed by clutter, noise, and attenuation. Therefore, the removal of unwanted signals and signal compensation are crucial tasks for improving the detectability of a target [5]. The UWB impulse radar emits short pulses through antennas, and a radar transceiver [6] that digitizes pulses returning from the target using sampling methods is used in the experiment. A pulse with a Gaussian envelope and sine wave less than 0.4 ns in width has a 6.8 GHz frequency and a 2.3 GHz bandwidth. The sampler collection read from the radar chip in particular is called the frame, and the delay between individual samplers yields the equivalent sampling rate of the frame [7]. In this study, the experiments receive frames sequentially from the impulse radar and the process of performing basic signal processing is the same. In the previous work [1], a feature set was created using the signal distribution by calculating standard deviation, root mean square (RMS), etc. However, in this study, the signal is processed in the frequency domain to generate an image, and the dataset used as input for machine learning is converted into an image. Furthermore, support vector machine (SVM) was used in the previous experiment to classify patterns. The major strength of SVM is that training is relatively easy and it has no local optimum, unlike neural networks. It scales relatively well to high-dimensional data, and the tradeoff between classifier complexity and error can be explicitly controlled. This weakness is mitigated by a good kernel function [8][9][10][11][12][13]. However, in this study, deep learning (GoogLeNet) was used for image classification rather than SVM. In 2014, Google published its network, GoogLeNet, to the imageNet large scale visual recognition challenge. Its performance (6.7%) is slightly better than that of VGGNet (7.3%). The main attractive feature of GoogLeNet is that it runs very fast due to the introduction of a new concept called the inception module, thus reducing the number of parameters to only 5 million, which is 12 times less than that of AlexNet. It also uses lower memory and power [14]. Figure 2 shows the overall flowchart of the proposed algorithm. First, a raw signal from the radar is collected to the frameset. A feature set is then made by extracting the signal characteristics through signal processing. Next, a dataset is created by extracting the characteristics based on each action label-no one, in front of the radar, moving in front of the radar, and so onand then converting them into images to be used in machine learning. The created images are stored for each label and divided into train and test sets for machine learning and verification, respectively. The train and test sets are required when constructing models in machine learning. The training set is used to construct a model that gives a result that is close to the expected actual value, and the test set is used to check if the model constructed is reasonable. In other words, if a suitable prediction coefficient for the machine learning algorithm is found using the training set, the performance of the model can be verified with the test set. The results of the experiment suggest that the model can classify subject presence and movement using images created with radar signals.

Preprocessing
In this study, one frame with 512 samplers is received every 50 ms and its appearance is shown in Figure 3(a). The frame set is a collection of frames accumulated in chronological order, and a single frame with a total of 48,000 frames (40 min.) used in this study is shown in Figure 3(b). For your information, three frames can be used for breathing and heart rate extraction, and can be used in various ways, such as identifying a person's path of travel. Note that Figure 4 shows the feature extraction process. The raw signal is a frame that is received directly from the radar. Remove the background, etc. of the signal and store it in chronological order in the frameset. In this experiment, frameset is used to extract characteristic information of signals by means of signal processing, such as frequency analysis, digital filtering, etc., and moreover, because the subject's position is fixed, all areas of frameset are not used. The original radar signal contains noise and unstable signals, so accurate machine learning results cannot be obtained. Therefore, a feature set that can be used can be obtained only after the signal processing process. For reference, as shown in Figure 5, the feature set was used in half the size of a frameset that make up the frame, using only 256 samplers, or half of the total, to remove unnecessary data information. Figure 5(b) presents the strength of the overall signal when the signal is seen from a bird's eye view.
The signal strength appears in six forms, which are mapped to the six labels to be used in the experiment. The x-axis represents samplers within the dataset, and the y-axis represents collected each frame. Figures 5 represents the x-axis representing the sampler, the y-axis representing the frame set and the z-axis representing the amplitude of the signal.

Machine Learning
The total size of the frame set is 24,576,000 double types. In this experiment, the subject is always in a fixed position (within one meter) and is located approximately 70 to 120. A region of interest (ROI) was set to be used to reduce the size and create a data set in the Figure 6, using only the data in a specific area. Therefore, only those areas are set as areas of interest, and the rest of the data is not needed. The data set for creating images using feature sets uses only the area of interest, reducing the amount of data to be used. In addition, the labels used to predict the condition of the subjects were divided into six, as shown in Table 1 below. The dataset was created to generate images for machine learning by classifying the data of the same labels from scattered data within feature set. Due to the characteristics of radar signals, various behaviors (each label) are mixed, such that it was difficult to generate images for machine learning. Hence, a block consisting of data of the same label was created, and the size of the data for each label was compared, as shown in Figure 7. The x-axis represents six labels, and the y-axis represents the number of data within each label. In Figure 7, the 48,000-frame dataset has a complete configuration for each label, and six blocks are created with each label. This is because the data need to be classified under the same label to generate images for machine learning.  Figure 8 shows the composition of the data shape for each label. The x-axis represents samplers of a specific area within the dataset with a corresponding label, and the y-axis represents the amplitude. This section describes the process of creating a dataset to generate an image required for machine learning. Particularly, the quality of the image depends on signal processing. This is the most important aspect because machine learning results vary depending on the quality of the image. The training set was used to implement the machine learning model, and the test set were used to test the model accuracy. If only the front part of the image, which represents the position of the actual subject, was converted into an image, a slightly faster performance is possible, but it is difficult to achieve noticeably better results. However, when experimenting with more images or with embedded equipment, it is necessary to reduce the size of the basic information of the images, even if they are damaged. The experiments show that machine learning can determine the presence and movement of a person using images generated by radar signals. The current experimental setting and results are better than expected at the beginning of the experiment. However, if the situation changes, such as the distance and location of the radar, person, and different physical conditions of several subjects, the results can be significantly different. Therefore, it is necessary to collect more experimental data and to improve pretreatment algorithms to obtain cleaner images.

Experiment
In the preprocessing stage, the radar signals were processed to produce a basic feature set. This is the most important process because machine learning results may vary depending on the feature set produced in the preprocessing stage. Next, we created a dataset from the feature set using classification labels and then created images for machine learning. The images generated were divided into a training set for model construction and test set for verification. The generated images were also used as input data for machine learning. The experiments conducted in this study showed a high level of classification accuracy. However, datasets pretreated only by signal classification seemed to be quite sensitive to the quality of the data when used as features for machine learning. The algorithm was implemented in hardware and software environments, as summarized in Table 2. The algorithm proposed was used to generate an image using the dataset created in the previous step. The image was then stored separately by label. During the machine learning process, the images were loaded and learned, and the results were shown. The experiment uses frames received for 40 minutes using a total of 48,000 signals received every 50 milliseconds. The total number of images created using the proposed algorithm is 240 and the train set and test set are used in machine learning at a ratio of 8:2. The images for machine learning were converted into a spectrogram, which combines the waveform and spectrum features to visualize the wave, using the hamming window method. Furthermore, an RGB (Red, Green, Blue) image with an actual size of 224×224×3 was created and stored by label. The created image appeared in the form shown in Figure 12, and features obtained by signing the radar's signal at the front of the image exist using spectrogram that is a visual tool of representing the signal length. Meanwhile, features that appear in front of the radar are required. However, the images were not randomly cut or resized for the experiment. When more than one person is present in future studies or multiple radars are used, new targets appear in the blank area as they are currently being signaled. Therefore, the original image was used because the accuracy could vary depending on whether signal processing was performed or not. Table 3 summarize experimental environment and accuracy. For experiment-1, it is a measure of how well six labels measured using radar can be distinguished. In experiment-2, label-4 and label-5 are experiments on subjects in fixed positions, such as Label 1. However, since only the angle of measurement is different, label-4 and label-5 can be integrated with label-1. In other words, it is an experiment that combines data set where similarities exist into a single family of labels. In experiment-3, the simplest experiment was conducted, and although the actual result is 100%, the accuracy will be decreased as the number of images increases. Depending on the experimental conditions, it can be confirmed that the results of the three experiments vary depending on the composition of the label. As mentioned above, label-1, label-4, and label-5 in experiment-2 are datasets for subject who have different angles but are not moving. Therefore, in experiment-2, we used the image of having three labels on label-1. The Figure 9, Figure 10 and Figure 11 show a graph that monitors train and validation for machine learning, and the graph at the top represents accuracy, and the graph at the bottom represents loss. The x-axis represents iteration of machine learning, and the y-axis represents the accuracy (%) and loss rate. The black dotted line is the validation result for the actual train set. Figure 7 shows result of the first experiment with GoogLeNet using six labels.  Figure 10 shows the result of second experiment performed in a similar manner as the experiment-1. In the experiment, classification by label was slightly simplified. In the case of label-0 and label-2, no change was observed. For label-4 and label-5, radiated radar pulses from the left and right sides of the subject were integrated into the radiated radar pulses from the front of the subject (label-1). Label-3, which radiates radar signals from the back of the subject, including poor results from previous experiments, was excluded from the experiment and machine learning was conducted.  Figure 11 shows the machine learning result using only label-0, label-1, and label-2. The results show that label-3 is difficult to recognize. Furthermore, label-4 and label-5 had high predicted rates when integrated into label-1, indicating that these three labels may have similar features.  Figure 12 shows the result of some validated images for the machine learning model, with high predictive results for label-0 indicating when no one is present and for label-1 and label-2, when a person are located in front of the radar. For label-3, the radar pulse was emitted by the back of the subject, but this was insufficient to judge the presence of a subject. Label 4 and label-5 is the data that is changed only from left to right under the same conditions as label 1, and high predictions can be seen. Information about each label can be found in Table 1. In fact, the use of images representing signal strength within frequencies conducted in this study alone cannot determine whether the subject is human. Therefore, in the future, research will be needed to process biometric signals such as respiration at the same time to determine if they are human or not. In addition, experiments using other neural networks like Resnet series and SqueezeNet were conducted. In fact, in the case of SqueezeNet, it has the advantage of learning speed, similar to GoogLeNet, and the results were similar. However, experiments using the Resnet series (Resnet18, Resnet50 and Resnet101) of neural networks with more network layers to improve accuracy showed that the results were rather lower at around 80 percent. The reason is thought to be the result of poor learning, passing through many layers because of the small number of images used in the experiment. In this experiment, a total of 240 images (train set: 192 test set: 48) are used. Although much more data was collected than was used in this experiment, many data were discarded due to errors and inaccuracies in the data. Therefore, it is not easy to filter out quality data by analyzing radar signals obtained by emitting them to living things. In order to have higher accuracy, it is necessary to have the integrity of the radar signal and to obtain at least five times the image used in this experiment.

Conclusion
Research and development using machine learning is ongoing worldwide, particularly in the prediction of subject activities. In this study, subject presence and movement were determined via machine learning, using an ultra-wideband impulse radar signal. Using a set of data imaged with electromagnetic signals in machine learning, experimentation using several neuronal networks resulted in high classification prediction results. An analysis of human-to-electromagnetic interactions shows that UWB impulse radars have low risk associated with human exposure. Therefore, in this paper, the UWB Impulse Radar signal was released to the human body and processed and imaged the signal. In addition, a new method was proposed to detect subject presence and motion by utilizing the created images as data sets of machine learning. Experiments have shown that subject presence and motion are sufficiently possible to be detected. Based on the results, it is judged that if the proposed method is further developed, it can be used in a variety of applications, including medical care and crime prevention.

Conflict of Interest
The authors declare no conflict of interest.