Drone Detection Using Dynamic-DBSCAN and Deep Learning in an Indoor Environment
Article information
Abstract
Drones have found extensive utility in both public and personal places. Consequently, the accurate detection and tracking of drones have emerged as pivotal endeavors in terms of ensuring their optimal performance. This research paper introduces a novel application for discerning the movements of humans and drones from cloud points through the utilization of frequency-modulated continuous wave radar. The dynamic density-based spatial clustering of applications with noise (Dynamic-DBSCAN) algorithm was employed to classify cloud points into separate groups corresponding to the number of objects within the tracking area. Compared to the original DBSCAN algorithm, this method increased accuracy by about 16.8%, achieving an accuracy of up to 93.99%. Subsequently, a trio of deep learning algorithms—long short-term memory, deep neural network, and residual network (ResNet)—were harnessed to ascertain the categorization of each group as either human or drone. According to the results, ResNet achieved the best accuracy rate of 97.72%. Overall, this study underscores the efficacy of the proposed method in accurately and efficiently distinguishing between human and drone entities for effective monitoring and management.
I. Introduction
In recent years, drones have witnessed extensive proliferation across diverse industries and businesses, manifesting their versatility in a variety of applications. In the field of agriculture, drones equipped with multispectral cameras can assess crop health, detect diseases, and optimize irrigation and fertilization [1]. In the warehousing sector, drones can be used to locate and retrieve items from high shelves or hard-to-reach areas and transport items in large warehouses, thus reducing the need for manual material handling and forklifts [2, 3]. Additionally, in emergencies, they can identify potential hazards, locate survivors, and streamline rescue efforts, bolstering the efficacy of response operations [4–6]. However, since most drones rely on global positioning system (GPS) signals, which are usually weak or unavailable in indoor environments, for navigation and stabilization, they may struggle to maintain stable flight and accurate positioning.
Currently, the detection and management of drones are prominent issues that have garnered the attention of numerous researchers. In this context, some typical methods have already been studied, such as acoustic detection [7, 8], radio frequency (RF) detection [9], radar detection [10], and visual detection [11]. RF detection systems [12, 13] analyze the communication signals between a drone and its remote controller to extract information about the drone’s presence, location, and potential. However, this implies that autonomous drones (drones that do not communicate with any controller) cannot be detected using this method [14]. Detecting drones using an acoustic method involves capturing the sound generated by its propellers or motors. Acoustic sensors, such as microphones or specialized listening devices, are strategically placed in the environment to detect and analyze these sound signals.
By comparing and processing the acoustic data, it is possible to identify the presence, location, and the potential type of drone in the vicinity based on its unique sound signature. However, this method necessitates an extensive signature database and is sensitive to background environmental noise, particularly in densely populated locations [15, 16]. Visual drone detection [17] relies on cameras or optical sensors to capture and analyze images or video feeds of objects resembling drones. When a potential drone is identified, the system classifies it and generates alerts, making it an effective method for detecting drones that are visible to the naked eye. However, this method is dependent on good visibility and may require a clear line of sight.
In this paper, we focus on developing an application that can detect and track both humans and drones with the aim of addressing security, privacy, and safety issues in indoor environments. Our application detects humans and drones based on radar techniques, specifically frequency-modulated continuous wave (FMCW) radar. Compared to visual detection, the FMCW radar works more effectively in diverse conditions, including fog and low visibility. The FMCW radar also offers high accuracy in measuring distances, enabling precise tracking at long ranges. Moreover, it can operate continuously, providing constant surveillance without gaps, which makes it suitable for security and monitoring applications.
According to the literature [14], research on radar-based detection typically focuses on identifying the appearance of a single drone in outdoor environments [18–20]. In contrast, the current study adopts a novel idea to achieve an accurate rate of detection and localization of drones and humans in an indoor environment, even when there are more than two drones or humans in the tracking area. Instead of focusing on the configuration to improve FMCW radar performance, such as the development of a radar designed using USRP B210 [21] or the design and development of a K-band FMCW radar prototype for nano-drone detection [22], we focus on handling the data obtained from a commercial FMCW. Effectively, this study addresses two primary concerns: the clustering of point clouds and the prediction of an object’s cluster.
DBSCAN was our first choice to solve the clustering problem. This algorithm is designed to find clusters of data points based on their density. In [23], the authors used it to cluster LiDAR point clouds—a crucial part of a wide range of applications, including object detection, object recognition, and localization. This algorithm first estimates the density of a certain data object’s local area using two input parameters: ɛ (epsilon) and minPts. ɛ is the radius of the nearby neighborhood of a selected data point, and minPts is the minimal number of nearby data points in the specified area. This indicates that the algorithm’s outcomes are significantly influenced by the user-defined ɛ parameter. Thus, selecting an appropriate ɛ value is a critical aspect of density-based clustering algorithms, such as DBSCAN. Notably, in our collected dataset, the clusters had significantly different densities. Traditional approaches often rely on manual selection or heuristics, which can be subjective and sensitive to variations in data characteristics. In contrast, we propose an innovative approach—the Dynamic-DBSCAN algorithm. The fundamental concept of this approach is dynamically computing the ɛ parameter of each point cloud data and automatically updating it to fulfill the desired requirements. Furthermore, to forecast an object’s cluster, three deep learning models—residual network (ResNet), deep neural network (DNN), and long short-term memory (LSTM)—are used, and the accuracy of their results is compared.
This paper is organized as follows: the theory and methodology related to the algorithms employed in this study are explained in Section II. Section III describes the experimental setup and the results obtained using the proposed application. Finally, the conclusion and prospects for future work related to the topic are presented in Section IV.
II. Theory and Methodology
1. Methodology
Fig. 1 outlines our research methodology, which comprises two primary phases: the training and testing of the model phase and the actual tracking phase. The initial training and testing of the model phase commenced with data acquisition using an FMCW radar, which usually provides comprehensive point cloud information. Subsequently, we employed Dynamic-DBSCAN techniques to filter out the noise and cluster data from the point cloud. After this preprocessing, the clusters were categorized into one of three labels: 0 denoting drones, 1 indicating humans, and −1 representing noise. The data were then standardized to create suitable features that would serve as the input data for the subsequent deep learning model training and testing phases. We deployed three distinct deep learning models to identify humans and drones within the clusters, further comparing their accuracy rates to select the most optimal model for deployment in the actual tracking phase. In the actual tracking phase, similar to the initial step in the previous phase, we continued to extract more radar data. The selected optimal model was then employed for testing. Subsequently, a two-dimensional (2D) tracking viewer was created to visualize the movements of the tracked objects within a tracking area.
2. Clustering Algorithm
2.1 DBSCAN algorithm
The density-based spatial clustering of applications with noise (DBSCAN) algorithm was proposed by Ester et al. [24] in 1996. This algorithm is well known for its ability to discover clusters automatically without requiring the number of clusters to be pre-specified.
The DBSCAN first selects an arbitrary data point, denoted as Pi, from the dataset. It proceeds to count the neighboring data points located within a specified distance ɛ collectively referred to as Nɛ(Pi). If the number of data points Pj within ɛ-neighborhood (neighboring data points within radius ɛ of a given point in the dataset) exceeds a predefined threshold, known as minPts, the selected data point Pi is labeled a core point (Pi is a core point if |Nɛ(Pi)|≥minPts), indicating that it belongs to a potential cluster. The algorithm then expands the cluster by recursively visiting and adding all core points and their connected neighbors to the cluster. This recursive process continues until no more core points can be added (for instance, Pj is added to the cluster if it is a core point and ||Pi−Pj||≤ ɛ, where ||Pi−Pj|| is the distance between Pi and Pj). Notably, data points within the radius ɛ of a core point that does not satisfy the minPts condition, labeled border points, are also included in the cluster (Pk is a border point if ||Pi−Pk||≤ ɛ and |Nɛ(Pk)|<minPts). Subsequently, the algorithm selects the next arbitrary data point and repeats the process, identifying additional clusters and labeling noise points (data points that do not belong to any cluster). DBSCAN’s adaptive approach to cluster formation, its ability to handle clusters of arbitrary shapes, and its robustness to noisy data make it a valuable tool for various data analysis and pattern recognition applications.
The value of the ɛ plays a crucial role in the process of clustering data using the DBSCAN algorithm. Different ɛ values yield different cluster results, making it essential to choose a suitable ɛ value to ensure the accuracy of the results when employing the DBSCAN algorithm. Notably, for datasets featuring varying point densities within different objects, using a single ɛ value as a global density parameter to separate the objects within a point cloud can hinder the algorithm’s ability to form clusters accurately, as depicted in Fig. 2. Fig. 2 illustrates the execution of the DBSCAN algorithm, considering a scenario in which a drone is flying over a person’s head.
In Fig. 2(a), with ɛ = 0.226, DBSCAN identifies three objects within the frame with IDs 0, 1, and 2, and noise points with a cluster ID of −1. However, in reality, there are only two objects in the frame—the human and the drone. While DBSCAN correctly separated the drone’s points into a cluster with ID = 1, it incorrectly divided the human object into two clusters with IDs 0 and 2. Logically, increasing the value of ɛ could merge clusters IDs 2 and 0 into a single cluster, which would align more closely with reality. However, a visual inspection might reveal that the cluster with ID = 1 (associated with the drone) and the cluster with ID = 0 (associated with the person) are located closer in terms of distance than the distance between the clusters with IDs = 1 and 2, both associated with the person.
Therefore, when increasing the value of ɛ = 0.526, the outcome in Fig. 2(b) reveals that although the number of correctly identified objects is 2, with cluster ID = 0 and 1, the allocation of points to the corresponding objects is inaccurate. This demonstrates that only adjusting the ɛ value to select an appropriate value would be insufficient when applying the DBSCAN algorithm to our dataset.
Recognizing these limitations, we introduce a new clustering algorithm called Dynamic-DBSCAN in the following section. The proposed algorithm aims to adaptively select an appropriate ɛ value and merge clusters automatically to ensure that points are accurately assigned to their respective objects.
2.2 Dynamic-DBSCAN algorithm
The Dynamic-DBSCAN proposed in this study is described in Algorithm 1. This approach includes four steps. Step 1 involves estimating the first ɛ value and using it with the DBSCAN algorithm to obtain clusters. Notably, throughout the testing process, we encountered situations in which the initial anticipated value failed to yield correct results. Consequently, we carried out Steps 2 and 3 to review and fine-tune the parameters and figures to attain the best outcomes. Finally, Step 4 is responsible for merging clusters into their respective corresponding objects.
In the first step, we computed the first estimated ɛ. Notably, our ɛ estimation approach involves analyzing distances between pairs of points in both 2D and three-dimensional (3D) coordinate spaces. This ensures that both 2D and 3D insights contribute to the ɛ estimate, thus facilitating adaptability to diverse dataset characteristics and enhancing the accuracy of density-based clustering. First, we computed the distances between pairs of points in 2D (Oxy) and 3D coordinate spaces (Oxyz). Following this, we identified the peaks in the distance distributions of each dimension separately. To estimate the ɛ, we adopted a peak-based approach, selecting the peak values from both 2D (ɛ2D) and 3D (ɛ3D) analyses. To obtain the final ɛ estimate, a weighted average formula was utilized to combine the peak values, as follows:
where α represents the parameter weight for ɛ2D and β refers to the parameter weight for ɛ3D.
In the second step of the algorithm, we examined the initial exception condition, which involved assessing whether the number of points within the object groups (NPobject) was greater than the number of points within the noise group (NPnoise). If this condition holds, the algorithm will proceed to the subsequent step. However, if it turns out to be false, we must either advance or modify the ɛ value until the condition is satisfied.
The third step pertains to verifying an apparent situation—when a drone is positioned directly above a human, the point cloud should always contain at least two unique object groupings. In this step, we first calculated the point cloud’s height (Dz). If this measurement exceeded the specified threshold for a person’s height (zh), it meant that a drone is located just above the human inside the point cloud. The function f_count was then used to count the number of objects (N_object). The basic idea behind the f_count function’s object count computation is to count the number of unique cluster IDs in the list C_temp. Next, N_object was examined to determine whether it was equal to or greater than 2. If this condition was fulfilled, the program moved on to the next phase. If it was not satisfied, the ɛ parameter was adjusted, and the cluster count was recalculated. This change, however, will have little effect when a drone is located too close to a person, because the points of each item would readily overlap. This further implies that comparing the new ɛ value to the lower threshold (ɛlow) value aids in avoiding an infinite loop.
Since the density of points within a cluster did not maintain a consistent proportion in every frame, there were situations in which the clustering algorithm divided the point cloud of a single object into multiple smaller clusters, resulting in incorrect outcomes, especially in the case of human point clouds. Step 4 focused on overcoming this problem. We first calculated the coordinates of the central points of each cluster. Subsequently, we determined the distance between the central points (||centroidi − centroidj||) of any two clusters (clusteri, clusterj) and compared this distance to a predefined threshold value (ω). If the calculated distance was less than or equal to the threshold, we merged the two clusters into one.
3. Deep Learning Approaches for Object Detection
3.1 DNN algorithm
A DNN is a type of neural network architecture used to solve complex machine learning tasks, such as image recognition and natural language processing. It comprises interconnected layers of neurons, including the input, hidden, and output layers. These neurons process information sequentially by calculating the weighted sums of the inputs and applying activation functions to capture intricate data patterns. DNNs learn through back-propagation—adjusting connection weights to minimize errors during training using optimization algorithms such as stochastic gradient descent. The depth of a DNN, determined by its hidden layers, allows it to extract hierarchical features, distinguishing it from shallow neural networks.
3.2 ResNet algorithm
ResNet, short for residual network [25], is a groundbreaking DNN algorithm designed to tackle the hurdles of training very deep models. It introduces skip connections, addressing the vanishing gradient issue by learning residual functions instead of the entire transformation. By adding shortcut connections between layers, ResNet ensures effective backpropagation and prevents performance decline with increased depth. Its marked success in computer vision tasks, such as image classification, object detection, and segmentation, has made it a vital tool for deep learning experts, marking a significant milestone in the field’s progress.
3.3 LSTM algorithm
The LSTM algorithm [26]—a specialized type of recurrent neural network (RNN)—addresses the vanishing gradient problem by employing three gates: the input, forget, and output gates. These gates, implemented as sigmoid activation layers, control the flow of information in the network. The input gate (it) manages new information input into the cell state (ct), the forget gate (ft) controls relevant information retention from the previous cell state, and the output gate (ot) regulates information transmission to the next hidden state and cell output. By utilizing these gates, as well as their respective weight matrices and bias terms, LSTMs are able to effectively manage information flow, allowing for improved learning and information retention in sequential data analysis tasks.
III. Experiment and Result
1. Experiment Setup
In this study, we used an FMCW radar named Retina-4SN [27] to obtain data for the experiment. This radar comprises four radar chips with cascaded connections, as shown in Fig. 3.
The system comprises 12 transmitters (TX) and 16 receivers (RX) operating in the 71–81 GHz frequency range, forming a virtual array of 192 elements. Moreover, it employs a single patch antenna offering a wide azimuth field-of-view (FOV) of 90°, with a resolution of 3.8° for both the azimuth and elevation angles. The capability to acquire 4D point cloud data (x, y, z, v) via a Wi-Fi interface is a notable feature of the system, which enables it to gather comprehensive spatial (in 3D) and velocity information (4th dimension).
To gather drone data, we employed three drones of different sizes, as indicated in Fig. 4 and Table 1. During the data collection process, the drones were flown at a speed of 5–10 km/hr.
For human data acquisition, we engaged three individuals, each featuring varying heights and body compositions, as indicated in Table 2. During the data collection process, the humans walked at speeds ranging from 4.5 to 6 km/hr.
The experiment was performed in a multi-sports indoor stadium. Due to the limited tracking area of the radar, the dataset was obtained in an area of 7 m (Ox) × 5 m (Oy).
The experiment we designed involved collecting data under three main scenarios in the tracking area: first, only one human in the tracking region; second, only one drone in the tracking area; and third, both human and drone in the tracking area.
For the first two scenarios, in which the tracking area contains only one object—either a person or a drone, the data collection scenarios were constructed based on four types of motion trajectories:
As shown in Fig. 5(a), the individual will move from the starting position at coordinates (−2.5, 2) along the red straight line parallel to the Ox-axis to arrive at the position at coordinates (2.5, 2), and then return to the initial position. In subsequent cases, the individual will move in a similar way along the blue lines, as depicted in Fig. 5(a).
As depicted in Fig. 5(b), the individual will start from the initial position at coordinates (1, 0) and move along the red straight line parallel to the Oy-axis to reach the position at (1, 7), and then return to the starting position. In the following cases, the movement will follow the blue lines in Fig. 5(b).
Fig. 5(c) indicates that the person will initiate movement from the starting position at coordinates (−2.5, 0) and move along the red trajectory to reach the point at coordinates (2.5, 7), and then return to the starting position. In subsequent cases, the individual will move along the blue-colored trajectories shown in Fig. 5(c).
Fig. 5(d) shows that the person starts from any position on the left half of the tracking area and moves in a circular path back to the initial position. In the next case, the individual will start from any position on the right half of the tracking area and move in a circular path back to the initial position.
In the third scenario, which involved the simultaneous presence of a human and a drone within the tracking area, data collection entailed comprehending the mutual positions of the two objects and their paths. This investigation involved five relative positions between the human and the drone, with scenarios in which the drone was positioned in front of, behind, to the left of, to the right of, or directly above the human. Both the human and the drone followed movement patterns similar to those in scenarios featuring a solitary object within the tracking area, maintaining a distance of 1–3 m between them. Consequently, a total of 210 scenarios were conducted, as shown in Table 3.
After recording each case, the radar provided two types of data: a video in SRSREC format and a JSON file. Notably, the video comprised several static frames. Fig. 6 depicts a static frame with the human and drone signals represented using a variety of 3D points. The left cluster is a drone, whereas the right cluster is a human.
2. Preprocessing Data
The JSON data contained numerous records of static frames, with each line containing point cloud information, as shown in Table 4.
After the clustering phase, the clusters within each static frame were identified, and the relevant data points were listed. Given the varying number of points within each cluster, we proceeded to standardize the original data information outlined in Table 4 into a set of ten distinct features, as presented in Table 5. First, a list of the 3D coordinates of all points ((x1,y1,z1), …, (xn,yn,zn)) was transferred to the center point x (centerx), computed as Eq. (2); center point y (centery), computed as Eq. (3); center point z (centerz), computed as Eq. (4); Dx, computed as Eq. (5); Dy, computed as Eq. (6); and Dz, computed as Eq. (7). These equations are noted below:
where xmax and xmin are the maximum and minimum values of the Ox coordinates of all points in one cluster.
where ymax and ymin are the maximum and minimum values of the Oy coordinates of all points in one cluster.
where zmax and zmin are the maximum and minimum values of the Oz coordinates of all points in one cluster.
Otherwise, the list power value of all points (p1, p2, …, pn) can be transformed into average power (avgp), which can be computed using Eq. (8), as noted below:
In addition, the average velocity was calculated from the list velocity value of all points (v1,v2, …, vn) using Eq. (9), as follows:
Finally, we labeled the clusters using the following numbers: −1 for noise, 0 for drone, and 1 for human.
3. Result and Discussion
3.1 Clustering algorithm
Fig. 7 depicts an example of a drone flying directly above a person. Fig. 7(a) illustrates the point cloud of the objects within the tracking area, as displayed on the Retina application, and Fig. 7(b) presents the outcome when using an ɛ value of 0.421, in which the resulting cluster from DBSCAN shows only one cluster, thus incorrectly identifying the presence of a single object within the tracking area.
In Fig. 7(c), it is evident that the Dynamic-DBSCAN algorithm adjusted the ɛ value appropriately based on Step 3 of Algorithm 1. The result shows a modified ɛ value of 0.281, accurately producing two clusters corresponding to two objects. Furthermore, the points are correctly assigned to their respective clusters. In summary, this example demonstrates the effectiveness of the Dynamic-DBSCAN algorithm in dynamically adjusting and selecting an appropriate ɛ value.
Fig. 8 revisits the scenario depicted in Fig. 2, in which a drone flies above a person. Initially, in all three images, cluster ID −1 was considered noise, and only cluster IDs greater than −1 were recognized as objects. In Fig. 8(a), which features a smaller ɛ value of 0.226, the DBSCAN results show an incorrect count of three objects. Therefore, in Fig. 8(b), the ɛ value is increased to 0.526 in an attempt to reduce the cluster count. The DBSCAN yields the correct count of two clusters, aligning with reality. However, visual inspection readily reveals that the points are misallocated, particularly the cluster ID = 0 in Fig. 8(a), where points belonging to the person are erroneously grouped with the drone cluster.
Fig. 8(c) demonstrates the effectiveness of cluster merging in Step 4 of the Dynamic-DBSCAN algorithm outlined in Algorithm 1. Dynamic-DBSCAN provides an accurate result, showing the correct count of two objects, while also assigning points to their corresponding clusters accurately. This image serves as proof that Dynamic-DBSCAN can address the limitations of DBSCAN in terms of clustering datasets characterized by varying point densities within different objects.
Considering a total of 74,548 static frames, we used both the DBSCAN and Dynamic-DBSCAN methods to find which of the two produced the best clustering results. Accuracy (Accuracyc) was calculated using the following equation:
where TF corresponds to the number of frames that are correctly clustered, and NF corresponds to the number of frames that are incorrectly clustered.
In the Dynamic-DBSCAN approach, the pair of values is referred to as (α, β), as presented in Fig. 9. In Step 1 of this process, we used the pairs of values to calculate the ɛest, specifically Δe = 0.03, while ω = 0.47, ɛlow = 0.05, and zh = 1.95 were used in the subsequent stages. For the DBSCAN algorithm, we utilized the ɛ value, which is equivalent to the ɛest obtained from the initial step of the Dynamic-DBSCAN method.
As illustrated in Fig. 9, the Dynamic-DBSCAN method consistently surpasses DBSCAN for each ɛ value with regard to accuracy. For every (α, β) pair, the DBSCAN algorithm achieves an accuracy below 80%, whereas the Dynamic-DBSCAN method consistently maintains an accuracy exceeding 90%, with the accuracy rate of both algorithms increasing from left to right in Fig. 9. It is evident that the pair of values (α, β) = (1, 2) produces the most favorable results, showing an accuracy of 77.12% for the DBSCAN algorithm and an impressive 93.99% for the Dynamic-DBSCAN algorithm. This indicates that the Dynamic-DBSCAN method improves accuracy by 16.87% when compared to DBSCAN.
An algorithm’s processing time per data sample is also a crucial factor for real-time tracking. Thus, in addition to comparing the accuracy of the two algorithms in handling the data within the dataset, we measured the processing time in terms of seconds per data sample. Both algorithms were tested on an Intel Core i5-7200U CPU (@2.50 GHz 2.71 GHz), 8.0-GB RAM, yielding the results depicted in Fig. 10. Subsequently, we compared the results obtained for the three test scenarios—when the data sample contained only one object (human) with 13,193 frames, when the sample contained only one object (drone) with 21,368 frames, and when the data sample contained two objects (combined), human and drone, with 39,987 frames. Notably, the processing time per data sample in the various scenarios using Dynamic-DBSCAN was consistently higher than that of DBSCAN. However, the difference was unremarkable when measured in milliseconds.
3.2 DL-based detection models
The DNN model employed in this study is outlined in Fig. 11(a). The architecture contains a batch normalization layer for normalizing the input data in the first and drop-out layers to prevent overfitting. Fig. 11(b) depicts the architecture of the ResNet model utilized in this work, which is similar to that of the DNN model. The structure features two hidden layers, each equipped with a batch normalization layer in front of it to normalize the input data and then correct the rectified linear unit (ReLU) activation layer. Notably, to address the vanishing gradient issue, which is a common difficulty in DNN algorithms, the second block in Fig. 11(b) comprises a shortcut link that adds an intermediate layer between the input and the output before the output travels through the ReLU layer. Furthermore, to avoid overfitting issues, a drop-out layer is applied after the second layer.
Meanwhile, the LSTM network model employed in this study, as shown in Fig. 11(c), consists of two layers. Again, a batch normalization layer is used in front of the two consecutive LSTM layers to pre-handle the input data. Furthermore, a drop-out layer is added to overcome overfitting.
The dataset at our disposal comprised 121,095 samples. For this study, we partitioned the dataset into a training set encompassing 81,134 samples (70% of the total) and a testing set of 39,961 samples (30% of the total). To assess the accuracy (Accuracyd) of our predictive models, we employed three distinct models to evaluate their performance using the following equation:
where TS refers to the number of positive samples correctly predicted by the classifier, and NS corresponds to the number of negative samples wrongly predicted as positive by the classifier.
We applied the three deep learning models to our dataset by conducting a number of training sessions and epochs, as shown in Table 6. For each model, we calculated the average accuracy and highest accuracy after 20 training sessions to identify the most effective model for our dataset. Notably, the gap between the two highest accuracy percentages and the average accuracy scores of the models may indicate their stability throughout the testing period. Table 6 shows that the LSTM model has a substantial difference of 3.33%, while the DNN and ResNet models exhibit very consistent outcomes in each test. The difference between the two numbers for the two models is less than 1%-0.1% for DNN and 0.18% for ResNet. In terms of accuracy, the LSTM model yields the lowest results, achieving the highest accuracy at 77.69% and an average accuracy of 74.26%. The DNN model performed considerably better, achieving the highest accuracy of 96.65% and an average accuracy of 96.55%. Finally, the ResNet model stands out, with the highest accuracy of 97.72% and an average accuracy of 97.54%.
Fig. 12 depicts a static frame comprising two drones and two humans in the tracking region. The image on the right in Fig. 12(a) presents the raw data from the Retina-4SN program on a 3D map. Notably, this program specifically focuses on person recognition and posture detection, as observed in the image on the left in Fig. 12(a). When two drones fly in close proximity, the Retina-4SN program predicts an extra person. In contrast, as demonstrated in Fig. 12(b), the proposed deep learning model is able to recognize and distinguish between humans and drones correctly.
3.3 Object tracking
In the actual tracking phase, information about each static frame obtained from the FMCW radar was provided as input data to the chosen application. After the clustering and standardizing steps were completed, the ResNet model adopted in this study predicted whether a cluster was human, drone, or noise. Finally, in the tracking step, only the human and drone objects appeared on the 2D map. To determine the objects’ locations, we used the centerx and centery features. To verify the accuracy of the tracked object in our tracking view, we computed the discrepancy between the coordinates of the object in each data frame of our tracking view and the coordinate parameters of the object in the corresponding frames of the Retina tracking view. Notably, since the Retina application detects and tracks only humans, we compared only those cases in which a frame contained only one human object. The formula used to compute the error (err) for each frame is as follows:
where xa and ya are the coordinates of the human on the x-axis and y-axis in the Retina application, respectively, while centerx and centery are the coordinates of the human on the x-axis and y-axis with regard to the proposed method.
Table 7 provides the calculation results of the errors identified across 13,193 frames in meters. It is evident that the margin of error is not significant, showing an average error of 11 cm.
Fig. 12(b) displays the tracking view of a static frame with two drones and two people in the tracking region, along with the locations of each item.
Fig. 13(a) depicts the raw data acquired when a drone flies above a person’s head, thus reenacting the problem depicted in Fig. 7(a). Fig. 13(b) depicts the result obtained using the DBSCAN algorithm, showing that the tracking view incorrectly displays only one person in the tracking area. In contrast, the tracking view of the proposed program displays the right number of objects—one person and one drone. Moreover, since the person and the drone have the same coordinates on the Oxy plane, the human and drone icons overlap, as illustrated in Fig. 13(c).
IV. Conclusion and Future Work
In this research, indoor tracking methods are investigated using FMCW radar to gather data. To circumvent the limitations of using the DBSCAN method for our dataset, we devised a novel cluster approach called Dynamic-DBSCAN. This approach was implemented along with three deep learning models to detect the presence of drones and people in the dataset. Furthermore, a tracking technique was utilized to follow motion trajectory through a tracking view.
When applied to our dataset, Dynamic-DBSCAN outperformed the original DBSCAN method in terms of clustering outcomes. Moreover, Dynamic-DBSCAN delivered an accuracy of up to 93.99%, which was roughly 16.87% better than DBSCAN’s best result (77.12%). Furthermore, the ResNet model outperformed the other two deep learning models in terms of object cluster prediction, exhibiting a maximum prediction accuracy of 97.72%. It also produced favorable results when distinguishing between people and drones.
Despite the superior performance of Dynamic-DBSCAN compared to DBSCAN, its highest achievable accuracy was 93.99%. Two primary factors contributed to this limitation. First, challenges arise when objects are positioned in close proximity or at the edges. The proposed algorithm functioned optimally when the distance between objects exceeded approximately 28 cm (±2 cm). The second factor pertains to the noise resulting from reflection. During data collection using the FMCW radar, reflections may introduce artifacts into the point cloud, posing a significant challenge for denoising. Additionally, since Dynamic-DBSCAN requires more processing time per frame than DBSCAN, it might not be suitable for scenarios demanding extremely low latency. Instead, it aligns well with problems that necessitate high precision.
In our future work, we plan to improve upon the proposed clustering algorithm by integrating machine learning techniques to address the three challenges mentioned above. Additionally, we intend to persistently collect more data to expand our dataset. An expanded dataset will play a pivotal role in both the clustering and detection phases, thus contributing to the overall improvement of our system.
Acknowledgments
This work was supported in part by the Basic Science Research Program (Grant No. 2020R1I1A3057142) and in part by the Priority Research Center Program through the National Research Foundation (Grant No. 2019R1A6A1A03032988).
References
Biography
Ha Thi Tran, https://orcid.org/0009-0006-5826-4529 received her B.S degree in information and technology from University of Engineering and Technology, Vietnam National University. She is currently pursuing her M.S. degree in information and communication engineering at Kongju National University.
The-Hien Pham, https://orcid.org/0000-0003-3800-0703 received his B.E. degree in communication and networking from the Ho Chi Minh City University of Transport, Vietnam, in 2012. In 2022, he received his M.E. degree in information and communication engineering from Kongju National University, South Korea, in 2022, where he is currently pursuing a Ph.D. in information and communication engineering. His major research interests include non-destructive testing, communication and networking, wireless communication systems, synthetic aperture radar imaging, machine learning, and optimization methods.
Yun-Seok Mun, https://orcid.org/0009-0003-3524-2384 received his B.E. degree in smart information and technology engineering from Kongju National University, Cheonan, South Korea, in 2024. He is currently pursuing his M.S. degree in information and communication engineering at Kongju National University.
Ic-Pyo Hong, https://orcid.org/0000-0003-1875-5420 received his B.S., M.S., and Ph.D. degrees in electronics engineering from Yonsei University, Seoul, South Korea, in 1994, 1996, and 2000, respectively. From 2000 to 2003, he was a senior engineer involved in CDMA Mobile Research Team in the Information and Communication Division of Samsung Electronics Company, Suwon, South Korea. He was a visiting scholar at Texas A&M University, College Station, TX, USA, in 2006, at and Syracuse University, Syracuse, NY, USA, in 2012. Since 2003, he has been with the Department of Smart Information and Technology Engineering, Kongju National University, Cheonan, South Korea, where he is currently a professor. His research interests include numerical techniques in electromagnetics and periodic electromagnetic structures and their applications in wireless communications.