Drone Detection Using Dynamic-DBSCAN and Deep Learning in an Indoor Environment

Article information

J. Electromagn. Eng. Sci. 2024;24(5):510-523
Publication date (electronic) : 2024 September 30
doi : https://doi.org/10.26866/jees.2024.5.r.253
Department of Smart Information and Technology Engineering, Kongju National University, Cheonan, Korea
*Corresponding Author: Ic-Pyo Hong (e-mail: iphong@kongju.ac.kr)
Received 2023 November 3; Revised 2023 December 22; Accepted 2024 February 2.

Abstract

Drones have found extensive utility in both public and personal places. Consequently, the accurate detection and tracking of drones have emerged as pivotal endeavors in terms of ensuring their optimal performance. This research paper introduces a novel application for discerning the movements of humans and drones from cloud points through the utilization of frequency-modulated continuous wave radar. The dynamic density-based spatial clustering of applications with noise (Dynamic-DBSCAN) algorithm was employed to classify cloud points into separate groups corresponding to the number of objects within the tracking area. Compared to the original DBSCAN algorithm, this method increased accuracy by about 16.8%, achieving an accuracy of up to 93.99%. Subsequently, a trio of deep learning algorithms—long short-term memory, deep neural network, and residual network (ResNet)—were harnessed to ascertain the categorization of each group as either human or drone. According to the results, ResNet achieved the best accuracy rate of 97.72%. Overall, this study underscores the efficacy of the proposed method in accurately and efficiently distinguishing between human and drone entities for effective monitoring and management.

I. Introduction

In recent years, drones have witnessed extensive proliferation across diverse industries and businesses, manifesting their versatility in a variety of applications. In the field of agriculture, drones equipped with multispectral cameras can assess crop health, detect diseases, and optimize irrigation and fertilization [1]. In the warehousing sector, drones can be used to locate and retrieve items from high shelves or hard-to-reach areas and transport items in large warehouses, thus reducing the need for manual material handling and forklifts [2, 3]. Additionally, in emergencies, they can identify potential hazards, locate survivors, and streamline rescue efforts, bolstering the efficacy of response operations [46]. However, since most drones rely on global positioning system (GPS) signals, which are usually weak or unavailable in indoor environments, for navigation and stabilization, they may struggle to maintain stable flight and accurate positioning.

Currently, the detection and management of drones are prominent issues that have garnered the attention of numerous researchers. In this context, some typical methods have already been studied, such as acoustic detection [7, 8], radio frequency (RF) detection [9], radar detection [10], and visual detection [11]. RF detection systems [12, 13] analyze the communication signals between a drone and its remote controller to extract information about the drone’s presence, location, and potential. However, this implies that autonomous drones (drones that do not communicate with any controller) cannot be detected using this method [14]. Detecting drones using an acoustic method involves capturing the sound generated by its propellers or motors. Acoustic sensors, such as microphones or specialized listening devices, are strategically placed in the environment to detect and analyze these sound signals.

By comparing and processing the acoustic data, it is possible to identify the presence, location, and the potential type of drone in the vicinity based on its unique sound signature. However, this method necessitates an extensive signature database and is sensitive to background environmental noise, particularly in densely populated locations [15, 16]. Visual drone detection [17] relies on cameras or optical sensors to capture and analyze images or video feeds of objects resembling drones. When a potential drone is identified, the system classifies it and generates alerts, making it an effective method for detecting drones that are visible to the naked eye. However, this method is dependent on good visibility and may require a clear line of sight.

In this paper, we focus on developing an application that can detect and track both humans and drones with the aim of addressing security, privacy, and safety issues in indoor environments. Our application detects humans and drones based on radar techniques, specifically frequency-modulated continuous wave (FMCW) radar. Compared to visual detection, the FMCW radar works more effectively in diverse conditions, including fog and low visibility. The FMCW radar also offers high accuracy in measuring distances, enabling precise tracking at long ranges. Moreover, it can operate continuously, providing constant surveillance without gaps, which makes it suitable for security and monitoring applications.

According to the literature [14], research on radar-based detection typically focuses on identifying the appearance of a single drone in outdoor environments [1820]. In contrast, the current study adopts a novel idea to achieve an accurate rate of detection and localization of drones and humans in an indoor environment, even when there are more than two drones or humans in the tracking area. Instead of focusing on the configuration to improve FMCW radar performance, such as the development of a radar designed using USRP B210 [21] or the design and development of a K-band FMCW radar prototype for nano-drone detection [22], we focus on handling the data obtained from a commercial FMCW. Effectively, this study addresses two primary concerns: the clustering of point clouds and the prediction of an object’s cluster.

DBSCAN was our first choice to solve the clustering problem. This algorithm is designed to find clusters of data points based on their density. In [23], the authors used it to cluster LiDAR point clouds—a crucial part of a wide range of applications, including object detection, object recognition, and localization. This algorithm first estimates the density of a certain data object’s local area using two input parameters: ɛ (epsilon) and minPts. ɛ is the radius of the nearby neighborhood of a selected data point, and minPts is the minimal number of nearby data points in the specified area. This indicates that the algorithm’s outcomes are significantly influenced by the user-defined ɛ parameter. Thus, selecting an appropriate ɛ value is a critical aspect of density-based clustering algorithms, such as DBSCAN. Notably, in our collected dataset, the clusters had significantly different densities. Traditional approaches often rely on manual selection or heuristics, which can be subjective and sensitive to variations in data characteristics. In contrast, we propose an innovative approach—the Dynamic-DBSCAN algorithm. The fundamental concept of this approach is dynamically computing the ɛ parameter of each point cloud data and automatically updating it to fulfill the desired requirements. Furthermore, to forecast an object’s cluster, three deep learning models—residual network (ResNet), deep neural network (DNN), and long short-term memory (LSTM)—are used, and the accuracy of their results is compared.

This paper is organized as follows: the theory and methodology related to the algorithms employed in this study are explained in Section II. Section III describes the experimental setup and the results obtained using the proposed application. Finally, the conclusion and prospects for future work related to the topic are presented in Section IV.

II. Theory and Methodology

1. Methodology

Fig. 1 outlines our research methodology, which comprises two primary phases: the training and testing of the model phase and the actual tracking phase. The initial training and testing of the model phase commenced with data acquisition using an FMCW radar, which usually provides comprehensive point cloud information. Subsequently, we employed Dynamic-DBSCAN techniques to filter out the noise and cluster data from the point cloud. After this preprocessing, the clusters were categorized into one of three labels: 0 denoting drones, 1 indicating humans, and −1 representing noise. The data were then standardized to create suitable features that would serve as the input data for the subsequent deep learning model training and testing phases. We deployed three distinct deep learning models to identify humans and drones within the clusters, further comparing their accuracy rates to select the most optimal model for deployment in the actual tracking phase. In the actual tracking phase, similar to the initial step in the previous phase, we continued to extract more radar data. The selected optimal model was then employed for testing. Subsequently, a two-dimensional (2D) tracking viewer was created to visualize the movements of the tracked objects within a tracking area.

Fig. 1

Diagram flow of methodology.

2. Clustering Algorithm

2.1 DBSCAN algorithm

The density-based spatial clustering of applications with noise (DBSCAN) algorithm was proposed by Ester et al. [24] in 1996. This algorithm is well known for its ability to discover clusters automatically without requiring the number of clusters to be pre-specified.

The DBSCAN first selects an arbitrary data point, denoted as Pi, from the dataset. It proceeds to count the neighboring data points located within a specified distance ɛ collectively referred to as Nɛ(Pi). If the number of data points Pj within ɛ-neighborhood (neighboring data points within radius ɛ of a given point in the dataset) exceeds a predefined threshold, known as minPts, the selected data point Pi is labeled a core point (Pi is a core point if |Nɛ(Pi)|≥minPts), indicating that it belongs to a potential cluster. The algorithm then expands the cluster by recursively visiting and adding all core points and their connected neighbors to the cluster. This recursive process continues until no more core points can be added (for instance, Pj is added to the cluster if it is a core point and ||PiPj||≤ ɛ, where ||PiPj|| is the distance between Pi and Pj). Notably, data points within the radius ɛ of a core point that does not satisfy the minPts condition, labeled border points, are also included in the cluster (Pk is a border point if ||PiPk||≤ ɛ and |Nɛ(Pk)|<minPts). Subsequently, the algorithm selects the next arbitrary data point and repeats the process, identifying additional clusters and labeling noise points (data points that do not belong to any cluster). DBSCAN’s adaptive approach to cluster formation, its ability to handle clusters of arbitrary shapes, and its robustness to noisy data make it a valuable tool for various data analysis and pattern recognition applications.

The value of the ɛ plays a crucial role in the process of clustering data using the DBSCAN algorithm. Different ɛ values yield different cluster results, making it essential to choose a suitable ɛ value to ensure the accuracy of the results when employing the DBSCAN algorithm. Notably, for datasets featuring varying point densities within different objects, using a single ɛ value as a global density parameter to separate the objects within a point cloud can hinder the algorithm’s ability to form clusters accurately, as depicted in Fig. 2. Fig. 2 illustrates the execution of the DBSCAN algorithm, considering a scenario in which a drone is flying over a person’s head.

Fig. 2

DBSCAN’s result with different ɛ values: (a) result with ɛ= 0.226 and (b) result with ɛ= 0.526.

In Fig. 2(a), with ɛ = 0.226, DBSCAN identifies three objects within the frame with IDs 0, 1, and 2, and noise points with a cluster ID of −1. However, in reality, there are only two objects in the frame—the human and the drone. While DBSCAN correctly separated the drone’s points into a cluster with ID = 1, it incorrectly divided the human object into two clusters with IDs 0 and 2. Logically, increasing the value of ɛ could merge clusters IDs 2 and 0 into a single cluster, which would align more closely with reality. However, a visual inspection might reveal that the cluster with ID = 1 (associated with the drone) and the cluster with ID = 0 (associated with the person) are located closer in terms of distance than the distance between the clusters with IDs = 1 and 2, both associated with the person.

Therefore, when increasing the value of ɛ = 0.526, the outcome in Fig. 2(b) reveals that although the number of correctly identified objects is 2, with cluster ID = 0 and 1, the allocation of points to the corresponding objects is inaccurate. This demonstrates that only adjusting the ɛ value to select an appropriate value would be insufficient when applying the DBSCAN algorithm to our dataset.

Recognizing these limitations, we introduce a new clustering algorithm called Dynamic-DBSCAN in the following section. The proposed algorithm aims to adaptively select an appropriate ɛ value and merge clusters automatically to ensure that points are accurately assigned to their respective objects.

2.2 Dynamic-DBSCAN algorithm

The Dynamic-DBSCAN proposed in this study is described in Algorithm 1. This approach includes four steps. Step 1 involves estimating the first ɛ value and using it with the DBSCAN algorithm to obtain clusters. Notably, throughout the testing process, we encountered situations in which the initial anticipated value failed to yield correct results. Consequently, we carried out Steps 2 and 3 to review and fine-tune the parameters and figures to attain the best outcomes. Finally, Step 4 is responsible for merging clusters into their respective corresponding objects.

Dynamic-DBSCAN algorithm

In the first step, we computed the first estimated ɛ. Notably, our ɛ estimation approach involves analyzing distances between pairs of points in both 2D and three-dimensional (3D) coordinate spaces. This ensures that both 2D and 3D insights contribute to the ɛ estimate, thus facilitating adaptability to diverse dataset characteristics and enhancing the accuracy of density-based clustering. First, we computed the distances between pairs of points in 2D (Oxy) and 3D coordinate spaces (Oxyz). Following this, we identified the peaks in the distance distributions of each dimension separately. To estimate the ɛ, we adopted a peak-based approach, selecting the peak values from both 2D (ɛ2D) and 3D (ɛ3D) analyses. To obtain the final ɛ estimate, a weighted average formula was utilized to combine the peak values, as follows:

(1) ɛest=α*ɛ2D+β*ɛ3Dα+β,

where α represents the parameter weight for ɛ2D and β refers to the parameter weight for ɛ3D.

In the second step of the algorithm, we examined the initial exception condition, which involved assessing whether the number of points within the object groups (NPobject) was greater than the number of points within the noise group (NPnoise). If this condition holds, the algorithm will proceed to the subsequent step. However, if it turns out to be false, we must either advance or modify the ɛ value until the condition is satisfied.

The third step pertains to verifying an apparent situation—when a drone is positioned directly above a human, the point cloud should always contain at least two unique object groupings. In this step, we first calculated the point cloud’s height (Dz). If this measurement exceeded the specified threshold for a person’s height (zh), it meant that a drone is located just above the human inside the point cloud. The function f_count was then used to count the number of objects (N_object). The basic idea behind the f_count function’s object count computation is to count the number of unique cluster IDs in the list C_temp. Next, N_object was examined to determine whether it was equal to or greater than 2. If this condition was fulfilled, the program moved on to the next phase. If it was not satisfied, the ɛ parameter was adjusted, and the cluster count was recalculated. This change, however, will have little effect when a drone is located too close to a person, because the points of each item would readily overlap. This further implies that comparing the new ɛ value to the lower threshold (ɛlow) value aids in avoiding an infinite loop.

Since the density of points within a cluster did not maintain a consistent proportion in every frame, there were situations in which the clustering algorithm divided the point cloud of a single object into multiple smaller clusters, resulting in incorrect outcomes, especially in the case of human point clouds. Step 4 focused on overcoming this problem. We first calculated the coordinates of the central points of each cluster. Subsequently, we determined the distance between the central points (||centroidicentroidj||) of any two clusters (clusteri, clusterj) and compared this distance to a predefined threshold value (ω). If the calculated distance was less than or equal to the threshold, we merged the two clusters into one.

3. Deep Learning Approaches for Object Detection

3.1 DNN algorithm

A DNN is a type of neural network architecture used to solve complex machine learning tasks, such as image recognition and natural language processing. It comprises interconnected layers of neurons, including the input, hidden, and output layers. These neurons process information sequentially by calculating the weighted sums of the inputs and applying activation functions to capture intricate data patterns. DNNs learn through back-propagation—adjusting connection weights to minimize errors during training using optimization algorithms such as stochastic gradient descent. The depth of a DNN, determined by its hidden layers, allows it to extract hierarchical features, distinguishing it from shallow neural networks.

3.2 ResNet algorithm

ResNet, short for residual network [25], is a groundbreaking DNN algorithm designed to tackle the hurdles of training very deep models. It introduces skip connections, addressing the vanishing gradient issue by learning residual functions instead of the entire transformation. By adding shortcut connections between layers, ResNet ensures effective backpropagation and prevents performance decline with increased depth. Its marked success in computer vision tasks, such as image classification, object detection, and segmentation, has made it a vital tool for deep learning experts, marking a significant milestone in the field’s progress.

3.3 LSTM algorithm

The LSTM algorithm [26]—a specialized type of recurrent neural network (RNN)—addresses the vanishing gradient problem by employing three gates: the input, forget, and output gates. These gates, implemented as sigmoid activation layers, control the flow of information in the network. The input gate (it) manages new information input into the cell state (ct), the forget gate (ft) controls relevant information retention from the previous cell state, and the output gate (ot) regulates information transmission to the next hidden state and cell output. By utilizing these gates, as well as their respective weight matrices and bias terms, LSTMs are able to effectively manage information flow, allowing for improved learning and information retention in sequential data analysis tasks.

III. Experiment and Result

1. Experiment Setup

In this study, we used an FMCW radar named Retina-4SN [27] to obtain data for the experiment. This radar comprises four radar chips with cascaded connections, as shown in Fig. 3.

Fig. 3

Structure of the Retina-4SN radar.

The system comprises 12 transmitters (TX) and 16 receivers (RX) operating in the 71–81 GHz frequency range, forming a virtual array of 192 elements. Moreover, it employs a single patch antenna offering a wide azimuth field-of-view (FOV) of 90°, with a resolution of 3.8° for both the azimuth and elevation angles. The capability to acquire 4D point cloud data (x, y, z, v) via a Wi-Fi interface is a notable feature of the system, which enables it to gather comprehensive spatial (in 3D) and velocity information (4th dimension).

To gather drone data, we employed three drones of different sizes, as indicated in Fig. 4 and Table 1. During the data collection process, the drones were flown at a speed of 5–10 km/hr.

Fig. 4

Drones.

Size of drones

For human data acquisition, we engaged three individuals, each featuring varying heights and body compositions, as indicated in Table 2. During the data collection process, the humans walked at speeds ranging from 4.5 to 6 km/hr.

Size of humans

The experiment was performed in a multi-sports indoor stadium. Due to the limited tracking area of the radar, the dataset was obtained in an area of 7 m (Ox) × 5 m (Oy).

The experiment we designed involved collecting data under three main scenarios in the tracking area: first, only one human in the tracking region; second, only one drone in the tracking area; and third, both human and drone in the tracking area.

For the first two scenarios, in which the tracking area contains only one object—either a person or a drone, the data collection scenarios were constructed based on four types of motion trajectories:

  • As shown in Fig. 5(a), the individual will move from the starting position at coordinates (−2.5, 2) along the red straight line parallel to the Ox-axis to arrive at the position at coordinates (2.5, 2), and then return to the initial position. In subsequent cases, the individual will move in a similar way along the blue lines, as depicted in Fig. 5(a).

  • As depicted in Fig. 5(b), the individual will start from the initial position at coordinates (1, 0) and move along the red straight line parallel to the Oy-axis to reach the position at (1, 7), and then return to the starting position. In the following cases, the movement will follow the blue lines in Fig. 5(b).

  • Fig. 5(c) indicates that the person will initiate movement from the starting position at coordinates (−2.5, 0) and move along the red trajectory to reach the point at coordinates (2.5, 7), and then return to the starting position. In subsequent cases, the individual will move along the blue-colored trajectories shown in Fig. 5(c).

  • Fig. 5(d) shows that the person starts from any position on the left half of the tracking area and moves in a circular path back to the initial position. In the next case, the individual will start from any position on the right half of the tracking area and move in a circular path back to the initial position.

Fig. 5

Trajectory scenarios of the target: (a) moving horizontally, (b) moving vertically, (c) moving diagonally, and (d) moving in a circle.

In the third scenario, which involved the simultaneous presence of a human and a drone within the tracking area, data collection entailed comprehending the mutual positions of the two objects and their paths. This investigation involved five relative positions between the human and the drone, with scenarios in which the drone was positioned in front of, behind, to the left of, to the right of, or directly above the human. Both the human and the drone followed movement patterns similar to those in scenarios featuring a solitary object within the tracking area, maintaining a distance of 1–3 m between them. Consequently, a total of 210 scenarios were conducted, as shown in Table 3.

Number of scenarios collected as data for the three situations

After recording each case, the radar provided two types of data: a video in SRSREC format and a JSON file. Notably, the video comprised several static frames. Fig. 6 depicts a static frame with the human and drone signals represented using a variety of 3D points. The left cluster is a drone, whereas the right cluster is a human.

Fig. 6

A static frame from Retina-4SN.

2. Preprocessing Data

The JSON data contained numerous records of static frames, with each line containing point cloud information, as shown in Table 4.

Original data obtained from the radar

After the clustering phase, the clusters within each static frame were identified, and the relevant data points were listed. Given the varying number of points within each cluster, we proceeded to standardize the original data information outlined in Table 4 into a set of ten distinct features, as presented in Table 5. First, a list of the 3D coordinates of all points ((x1,y1,z1), …, (xn,yn,zn)) was transferred to the center point x (centerx), computed as Eq. (2); center point y (centery), computed as Eq. (3); center point z (centerz), computed as Eq. (4); Dx, computed as Eq. (5); Dy, computed as Eq. (6); and Dz, computed as Eq. (7). These equations are noted below:

Input features for the deep learning model

(2) centerx=1ni=1nxi,
(3) centery=1ni=1nyi,
(4) centerz=1ni=1nzi,
(5) Dx=xmax-xmin,

where xmax and xmin are the maximum and minimum values of the Ox coordinates of all points in one cluster.

(6) Dy=ymax-ymin,

where ymax and ymin are the maximum and minimum values of the Oy coordinates of all points in one cluster.

(7) Dz=zmax-zmin,

where zmax and zmin are the maximum and minimum values of the Oz coordinates of all points in one cluster.

Otherwise, the list power value of all points (p1, p2, …, pn) can be transformed into average power (avgp), which can be computed using Eq. (8), as noted below:

(8) avgp=1ni=1npi.

In addition, the average velocity was calculated from the list velocity value of all points (v1,v2, …, vn) using Eq. (9), as follows:

(9) avgv=1ni=1nvi.

Finally, we labeled the clusters using the following numbers: −1 for noise, 0 for drone, and 1 for human.

3. Result and Discussion

3.1 Clustering algorithm

Fig. 7 depicts an example of a drone flying directly above a person. Fig. 7(a) illustrates the point cloud of the objects within the tracking area, as displayed on the Retina application, and Fig. 7(b) presents the outcome when using an ɛ value of 0.421, in which the resulting cluster from DBSCAN shows only one cluster, thus incorrectly identifying the presence of a single object within the tracking area.

Fig. 7

Comparison of clustering results between DBSCAN and the proposed method in choosing a suitable ɛ value: (a) point cloud data from radar, (b) clustering result obtained using DBSCAN on (a) with ɛ = 0.421, and (c) clustering result obtained using the proposed method on (a) with ɛ = 0.281.

In Fig. 7(c), it is evident that the Dynamic-DBSCAN algorithm adjusted the ɛ value appropriately based on Step 3 of Algorithm 1. The result shows a modified ɛ value of 0.281, accurately producing two clusters corresponding to two objects. Furthermore, the points are correctly assigned to their respective clusters. In summary, this example demonstrates the effectiveness of the Dynamic-DBSCAN algorithm in dynamically adjusting and selecting an appropriate ɛ value.

Fig. 8 revisits the scenario depicted in Fig. 2, in which a drone flies above a person. Initially, in all three images, cluster ID −1 was considered noise, and only cluster IDs greater than −1 were recognized as objects. In Fig. 8(a), which features a smaller ɛ value of 0.226, the DBSCAN results show an incorrect count of three objects. Therefore, in Fig. 8(b), the ɛ value is increased to 0.526 in an attempt to reduce the cluster count. The DBSCAN yields the correct count of two clusters, aligning with reality. However, visual inspection readily reveals that the points are misallocated, particularly the cluster ID = 0 in Fig. 8(a), where points belonging to the person are erroneously grouped with the drone cluster.

Fig. 8

Comparison of clustering results between DBSCAN and Dynamic-DBSCAN: (a) clustering results obtained using DBSCAN with ɛ = 0.226, (b) clustering results obtained using DBSCAN with ɛ = 0.526, and (c) clustering results obtained using Dynamic-DBSCAN.

Fig. 8(c) demonstrates the effectiveness of cluster merging in Step 4 of the Dynamic-DBSCAN algorithm outlined in Algorithm 1. Dynamic-DBSCAN provides an accurate result, showing the correct count of two objects, while also assigning points to their corresponding clusters accurately. This image serves as proof that Dynamic-DBSCAN can address the limitations of DBSCAN in terms of clustering datasets characterized by varying point densities within different objects.

Considering a total of 74,548 static frames, we used both the DBSCAN and Dynamic-DBSCAN methods to find which of the two produced the best clustering results. Accuracy (Accuracyc) was calculated using the following equation:

(10) Accuracyc=TFTF+NF,

where TF corresponds to the number of frames that are correctly clustered, and NF corresponds to the number of frames that are incorrectly clustered.

In the Dynamic-DBSCAN approach, the pair of values is referred to as (α, β), as presented in Fig. 9. In Step 1 of this process, we used the pairs of values to calculate the ɛest, specifically Δe = 0.03, while ω = 0.47, ɛlow = 0.05, and zh = 1.95 were used in the subsequent stages. For the DBSCAN algorithm, we utilized the ɛ value, which is equivalent to the ɛest obtained from the initial step of the Dynamic-DBSCAN method.

Fig. 9

Comparison of accuracy between DBSCAN and Dynamic-DBSCAN.

As illustrated in Fig. 9, the Dynamic-DBSCAN method consistently surpasses DBSCAN for each ɛ value with regard to accuracy. For every (α, β) pair, the DBSCAN algorithm achieves an accuracy below 80%, whereas the Dynamic-DBSCAN method consistently maintains an accuracy exceeding 90%, with the accuracy rate of both algorithms increasing from left to right in Fig. 9. It is evident that the pair of values (α, β) = (1, 2) produces the most favorable results, showing an accuracy of 77.12% for the DBSCAN algorithm and an impressive 93.99% for the Dynamic-DBSCAN algorithm. This indicates that the Dynamic-DBSCAN method improves accuracy by 16.87% when compared to DBSCAN.

An algorithm’s processing time per data sample is also a crucial factor for real-time tracking. Thus, in addition to comparing the accuracy of the two algorithms in handling the data within the dataset, we measured the processing time in terms of seconds per data sample. Both algorithms were tested on an Intel Core i5-7200U CPU (@2.50 GHz 2.71 GHz), 8.0-GB RAM, yielding the results depicted in Fig. 10. Subsequently, we compared the results obtained for the three test scenarios—when the data sample contained only one object (human) with 13,193 frames, when the sample contained only one object (drone) with 21,368 frames, and when the data sample contained two objects (combined), human and drone, with 39,987 frames. Notably, the processing time per data sample in the various scenarios using Dynamic-DBSCAN was consistently higher than that of DBSCAN. However, the difference was unremarkable when measured in milliseconds.

Fig. 10

Comparison of processing time between DBSCAN and Dynamic-DBSCAN.

3.2 DL-based detection models

The DNN model employed in this study is outlined in Fig. 11(a). The architecture contains a batch normalization layer for normalizing the input data in the first and drop-out layers to prevent overfitting. Fig. 11(b) depicts the architecture of the ResNet model utilized in this work, which is similar to that of the DNN model. The structure features two hidden layers, each equipped with a batch normalization layer in front of it to normalize the input data and then correct the rectified linear unit (ReLU) activation layer. Notably, to address the vanishing gradient issue, which is a common difficulty in DNN algorithms, the second block in Fig. 11(b) comprises a shortcut link that adds an intermediate layer between the input and the output before the output travels through the ReLU layer. Furthermore, to avoid overfitting issues, a drop-out layer is applied after the second layer.

Fig. 11

(a) DNN structure, (b) ResNet structure, and (c) LSTM network structure.

Meanwhile, the LSTM network model employed in this study, as shown in Fig. 11(c), consists of two layers. Again, a batch normalization layer is used in front of the two consecutive LSTM layers to pre-handle the input data. Furthermore, a drop-out layer is added to overcome overfitting.

The dataset at our disposal comprised 121,095 samples. For this study, we partitioned the dataset into a training set encompassing 81,134 samples (70% of the total) and a testing set of 39,961 samples (30% of the total). To assess the accuracy (Accuracyd) of our predictive models, we employed three distinct models to evaluate their performance using the following equation:

(11) Accuracyd=TSTS+NS,

where TS refers to the number of positive samples correctly predicted by the classifier, and NS corresponds to the number of negative samples wrongly predicted as positive by the classifier.

We applied the three deep learning models to our dataset by conducting a number of training sessions and epochs, as shown in Table 6. For each model, we calculated the average accuracy and highest accuracy after 20 training sessions to identify the most effective model for our dataset. Notably, the gap between the two highest accuracy percentages and the average accuracy scores of the models may indicate their stability throughout the testing period. Table 6 shows that the LSTM model has a substantial difference of 3.33%, while the DNN and ResNet models exhibit very consistent outcomes in each test. The difference between the two numbers for the two models is less than 1%-0.1% for DNN and 0.18% for ResNet. In terms of accuracy, the LSTM model yields the lowest results, achieving the highest accuracy at 77.69% and an average accuracy of 74.26%. The DNN model performed considerably better, achieving the highest accuracy of 96.65% and an average accuracy of 96.55%. Finally, the ResNet model stands out, with the highest accuracy of 97.72% and an average accuracy of 97.54%.

Comparison of accuracy between the LSTM, DNN, and RestNet models

Fig. 12 depicts a static frame comprising two drones and two humans in the tracking region. The image on the right in Fig. 12(a) presents the raw data from the Retina-4SN program on a 3D map. Notably, this program specifically focuses on person recognition and posture detection, as observed in the image on the left in Fig. 12(a). When two drones fly in close proximity, the Retina-4SN program predicts an extra person. In contrast, as demonstrated in Fig. 12(b), the proposed deep learning model is able to recognize and distinguish between humans and drones correctly.

Fig. 12

(a) Tracking view results and raw point cloud data in a static frame from Retina-4SN’s application. (b) Our tracking view results for the same frame as (a).

3.3 Object tracking

In the actual tracking phase, information about each static frame obtained from the FMCW radar was provided as input data to the chosen application. After the clustering and standardizing steps were completed, the ResNet model adopted in this study predicted whether a cluster was human, drone, or noise. Finally, in the tracking step, only the human and drone objects appeared on the 2D map. To determine the objects’ locations, we used the centerx and centery features. To verify the accuracy of the tracked object in our tracking view, we computed the discrepancy between the coordinates of the object in each data frame of our tracking view and the coordinate parameters of the object in the corresponding frames of the Retina tracking view. Notably, since the Retina application detects and tracks only humans, we compared only those cases in which a frame contained only one human object. The formula used to compute the error (err) for each frame is as follows:

(12) err=(xa-centerx)2+(ya-centery)2,

where xa and ya are the coordinates of the human on the x-axis and y-axis in the Retina application, respectively, while centerx and centery are the coordinates of the human on the x-axis and y-axis with regard to the proposed method.

Table 7 provides the calculation results of the errors identified across 13,193 frames in meters. It is evident that the margin of error is not significant, showing an average error of 11 cm.

Comparison of the error value for the coordination of the human object between the proposed application and the Retina application

Fig. 12(b) displays the tracking view of a static frame with two drones and two people in the tracking region, along with the locations of each item.

Fig. 13(a) depicts the raw data acquired when a drone flies above a person’s head, thus reenacting the problem depicted in Fig. 7(a). Fig. 13(b) depicts the result obtained using the DBSCAN algorithm, showing that the tracking view incorrectly displays only one person in the tracking area. In contrast, the tracking view of the proposed program displays the right number of objects—one person and one drone. Moreover, since the person and the drone have the same coordinates on the Oxy plane, the human and drone icons overlap, as illustrated in Fig. 13(c).

Fig. 13

(a) Raw point cloud in a static frame from Retina-4SN’s application; (b) tracking view results obtained using DBSCAN for the cluster in frame (a), and (c) tracking view results by using Dynamic-DBSCAN for the cluster in frame (a).

IV. Conclusion and Future Work

In this research, indoor tracking methods are investigated using FMCW radar to gather data. To circumvent the limitations of using the DBSCAN method for our dataset, we devised a novel cluster approach called Dynamic-DBSCAN. This approach was implemented along with three deep learning models to detect the presence of drones and people in the dataset. Furthermore, a tracking technique was utilized to follow motion trajectory through a tracking view.

When applied to our dataset, Dynamic-DBSCAN outperformed the original DBSCAN method in terms of clustering outcomes. Moreover, Dynamic-DBSCAN delivered an accuracy of up to 93.99%, which was roughly 16.87% better than DBSCAN’s best result (77.12%). Furthermore, the ResNet model outperformed the other two deep learning models in terms of object cluster prediction, exhibiting a maximum prediction accuracy of 97.72%. It also produced favorable results when distinguishing between people and drones.

Despite the superior performance of Dynamic-DBSCAN compared to DBSCAN, its highest achievable accuracy was 93.99%. Two primary factors contributed to this limitation. First, challenges arise when objects are positioned in close proximity or at the edges. The proposed algorithm functioned optimally when the distance between objects exceeded approximately 28 cm (±2 cm). The second factor pertains to the noise resulting from reflection. During data collection using the FMCW radar, reflections may introduce artifacts into the point cloud, posing a significant challenge for denoising. Additionally, since Dynamic-DBSCAN requires more processing time per frame than DBSCAN, it might not be suitable for scenarios demanding extremely low latency. Instead, it aligns well with problems that necessitate high precision.

In our future work, we plan to improve upon the proposed clustering algorithm by integrating machine learning techniques to address the three challenges mentioned above. Additionally, we intend to persistently collect more data to expand our dataset. An expanded dataset will play a pivotal role in both the clustering and detection phases, thus contributing to the overall improvement of our system.

Acknowledgments

This work was supported in part by the Basic Science Research Program (Grant No. 2020R1I1A3057142) and in part by the Priority Research Center Program through the National Research Foundation (Grant No. 2019R1A6A1A03032988).

References

1. Ahirwar S., Swarnkar R., Bhukya S., Namwade G.. Application of drone in agriculture. International Journal of Current Microbiology and Applied Sciences 8(1):2500–2505. 2019; https://doi.org/10.20546/ijcmas.2019.801.264.
2. Companik E., Gravier M. J., Farris M. T. II. Feasibility of warehouse drone adoption and implementation. Journal of Transportation Management 28(2):31–48. 2018; https://doi.org/10.22237/jotm/1541030640.
3. Wawrla L., Maghazei O., Netland T.. Applications of drones in warehouse operations 2019. [Online]. Available: https://pom.ethz.ch/news-and-events/news/2019/08/read-a-new-white-paper-from-the-chair-of-pom-applications-of-drones-in-warehouse-operations.html.
4. Daud S. M. S. M., Yusof M. Y. P. M., Heo C. C., Khoo L. S., Singh M. K. C., Mahmood M. S., et al. Applications of drone in disaster management: a scoping review. Science & Justice 62(1):30–42. 2022; https://doi.org/10.1016/j.scijus.2021.11.002.
5. Sanjana P., Prathilothamai M.. Drone design for first aid kit delivery in emergency situation. In : Proceedings of 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). Coimbatore, India; 2020; p. 215–220. https://doi.org/10.1109/ICACCS48705.2020.9074487.
6. Sanz-Martos S., Lopez-Franco M. D., Alvarez-Garcia C., Granero-Moya N., Lopez-Hens J. M., Camara-Anguita S., et al. Drone applications for emergency and urgent care: a systematic review. Prehospital and Disaster Medicine 37(4):502–508. 2022; https://doi.org/10.1017/S1049023X22000887.
7. Sibanyoni S. V., Ramotsoela D. T., Silva B. J., Hancke G. P.. A 2-D acoustic source localization system for drones in search and rescue missions. IEEE Sensors Journal 19(1):332–341. 2019; https://doi.org/10.1109/JSEN.2018.2875864.
8. Shi Z., Chang X., Yang C., Wu Z., Wu J.. An acoustic-based surveillance system for amateur drones detection and localization. IEEE Transactions on Vehicular Technology 69(3):2731–2739. 2020; https://doi.org/10.1109/TVT.2020.2964110.
9. Kaplan B., Kahraman I., Ekti A. R., Yarkan S., Gorçın A., Ozdemır M. K., et al. Detection, identification, and direction of arrival estimation of drone FHSS signals with uniform linear antenna array. IEEE Access 9:152057–152069. 2021; https://doi.org/10.1109/ACCESS.2021.3127199.
10. Kaya E., Kaplan G. B.. Neural network based drone recognition techniques with non-coherent S-band radar. In : Proceedings of 2021 IEEE Radar Conference (Radar-Conf21). Atlanta, GA, USA; 2021; p. 1–6. https://doi.org/10.1109/RadarConf2147009.2021.9455167.
11. Liu Y., Sun P., Namiki A.. Target tracking of moving and rotating object by high-speed monocular active vision. IEEE Sensors Journal 20(12):6727–6744. 2020; https://doi.org/10.1109/JSEN.2020.2976202.
12. Flak P.. Drone detection sensor with continuous 2.4 GHz ISM band coverage based on cost-effective SDR platform. IEEE Access 9:114574–114586. 2021; https://doi.org/10.1109/ACCESS.2021.3104738.
13. Ezuma M., Erden F., Anjinappa C. K., Ozdemir O., Guvenc I.. Micro-UAV detection and classification from RF fingerprints using machine learning techniques. In : Proceedings of 2019 IEEE Aerospace Conference. Big Sky, MT, USA; 2019; p. 1–13. https://doi.org/10.1109/AERO.2019.8741970.
14. Khan M. A., Menouar H., Eldeeb A., Abu-Dayya A., Salim F. D.. On the detection of unauthorized drones—Techniques and future perspectives: a review. IEEE Sensors Journal 22(12):11439–11455. 2022; https://doi.org/10.1109/JSEN.2022.3171293.
15. Shi Z., Chang X., Yang C., Wu Z., Wu J.. An acoustic-based surveillance system for amateur drones detection and localization. IEEE Transactions on Vehicular Technology 69(3):2731–2739. 2020; https://doi.org/10.1109/TVT.2020.2964110.
16. Yousaf J., Zia H., Alhalabi M., Yaghi M., Basmaji T., Shehhi E. A., et al. Drone and controller detection and localization: trends and challenges. Applied Sciences 12(24)article no. 12612. 2022; https://doi.org/10.3390/app122412612.
17. Pavliv M., Schiano F., Reardon C., Floreano D., Loianno G.. Tracking and relative localization of drone swarms with a vision-based headset. IEEE Robotics and Automation Letters 6(2):1455–1462. 2021; https://doi.org/10.1109/LRA.2021.3051565.
18. Bjorklund S.. Target detection and classification of small drones by boosting on radar micro-Doppler. In : Proceedings of 2018 15th European Radar Conference (EuRAD). Madrid, Spain; 2018; p. 182–185. https://doi.org/10.23919/EuRAD.2018.8546569.
19. Jian M., Lu Z., Chen V. C.. Drone detection and tracking based on phase-interferometric Doppler radar. In : Proceedings of 2018 IEEE Radar Conference (RadarConf18). Oklahoma City, OK, USA; 2018; p. 1146–1149. https://doi.org/10.1109/RADAR.2018.8378723.
20. Park J., Jung D. H., Bae K. B., Park S. O.. Range-Doppler map improvement in FMCW radar for small moving drone detection using the stationary point concentration technique. IEEE Transactions on Microwave Theory and Techniques 68(5):1858–1871. 2020; https://doi.org/10.1109/TMTT.2019.2961911.
21. Deshmukh S., Vinoy K. J.. Design and development of RADAR for detection of drones and UAVs. In : Proceedings of 2022 IEEE Microwaves, Antennas, and Propagation Conference (MAPCON). Bangalore, India; 2022; p. 1714–1719. https://doi.org/10.1109/MAPCON56011.2022.10047163.
22. Zulkifli S., Balleri A.. Design and development of K-band FMCW radar for nano-drone detection. In : Proceedings of 2020 IEEE Radar Conference (RadarConf20). Florence, Italy; 2020; p. 1–5. https://doi.org/10.1109/RadarConf2043947.2020.9266538.
23. El Yabroudi M., Awedat K., Chabaan R. C., Abudayyeh O., Abdel-Qader I.. Adaptive DBSCAN LiDAR point cloud clustering for autonomous driving applications. In : Proceedings of 2022 IEEE International Conference on Electro Information Technology (eIT). Mankato, MN, USA; 2022; p. 221–224. https://doi.org/10.1109/eIT53891.2022.9814025.
24. Ester M., Kriegel H. P., Sander J., Xu X.. A density-based algorithm for discovering clusters in large spatial databases with noise. In : Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). Portland, OR, USA; 1996; p. 226–231.
25. He K., Zhang X., Ren S., Sun J.. Deep residual learning for image recognition. In : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA; 2016; p. 770–778. https://doi.org/10.1109/CVPR.2016.90.
26. Hochreiter S., Schmidhuber J.. Long short-term memory. Neural Computation 9(8):1735–1780. 1997; https://doi.org/10.1162/neco.1997.9.8.1735.
27. Smart Radar System Inc. RETINA-4SN: AI 4D imaging radar with machine learning 2023. [Online]. Available: https://www.smartradarsystem.com/en/products/retina_4s.html.

Biography

Ha Thi Tran, https://orcid.org/0009-0006-5826-4529 received her B.S degree in information and technology from University of Engineering and Technology, Vietnam National University. She is currently pursuing her M.S. degree in information and communication engineering at Kongju National University.

The-Hien Pham, https://orcid.org/0000-0003-3800-0703 received his B.E. degree in communication and networking from the Ho Chi Minh City University of Transport, Vietnam, in 2012. In 2022, he received his M.E. degree in information and communication engineering from Kongju National University, South Korea, in 2022, where he is currently pursuing a Ph.D. in information and communication engineering. His major research interests include non-destructive testing, communication and networking, wireless communication systems, synthetic aperture radar imaging, machine learning, and optimization methods.

Yun-Seok Mun, https://orcid.org/0009-0003-3524-2384 received his B.E. degree in smart information and technology engineering from Kongju National University, Cheonan, South Korea, in 2024. He is currently pursuing his M.S. degree in information and communication engineering at Kongju National University.

Ic-Pyo Hong, https://orcid.org/0000-0003-1875-5420 received his B.S., M.S., and Ph.D. degrees in electronics engineering from Yonsei University, Seoul, South Korea, in 1994, 1996, and 2000, respectively. From 2000 to 2003, he was a senior engineer involved in CDMA Mobile Research Team in the Information and Communication Division of Samsung Electronics Company, Suwon, South Korea. He was a visiting scholar at Texas A&M University, College Station, TX, USA, in 2006, at and Syracuse University, Syracuse, NY, USA, in 2012. Since 2003, he has been with the Department of Smart Information and Technology Engineering, Kongju National University, Cheonan, South Korea, where he is currently a professor. His research interests include numerical techniques in electromagnetics and periodic electromagnetic structures and their applications in wireless communications.

Article information Continued

Fig. 1

Diagram flow of methodology.

Fig. 2

DBSCAN’s result with different ɛ values: (a) result with ɛ= 0.226 and (b) result with ɛ= 0.526.

Fig. 3

Structure of the Retina-4SN radar.

Fig. 4

Drones.

Fig. 5

Trajectory scenarios of the target: (a) moving horizontally, (b) moving vertically, (c) moving diagonally, and (d) moving in a circle.

Fig. 6

A static frame from Retina-4SN.

Fig. 7

Comparison of clustering results between DBSCAN and the proposed method in choosing a suitable ɛ value: (a) point cloud data from radar, (b) clustering result obtained using DBSCAN on (a) with ɛ = 0.421, and (c) clustering result obtained using the proposed method on (a) with ɛ = 0.281.

Fig. 8

Comparison of clustering results between DBSCAN and Dynamic-DBSCAN: (a) clustering results obtained using DBSCAN with ɛ = 0.226, (b) clustering results obtained using DBSCAN with ɛ = 0.526, and (c) clustering results obtained using Dynamic-DBSCAN.

Fig. 9

Comparison of accuracy between DBSCAN and Dynamic-DBSCAN.

Fig. 10

Comparison of processing time between DBSCAN and Dynamic-DBSCAN.

Fig. 11

(a) DNN structure, (b) ResNet structure, and (c) LSTM network structure.

Fig. 12

(a) Tracking view results and raw point cloud data in a static frame from Retina-4SN’s application. (b) Our tracking view results for the same frame as (a).

Fig. 13

(a) Raw point cloud in a static frame from Retina-4SN’s application; (b) tracking view results obtained using DBSCAN for the cluster in frame (a), and (c) tracking view results by using Dynamic-DBSCAN for the cluster in frame (a).

Table 1

Size of drones

Drone 1 Drone 2 Drone 3
Width (cm) 51 40 18
Length (cm) 51 40 16
Height (cm) 21 8 3

Table 2

Size of humans

Human 1 Human 2 Human 3
Height (m) 1.68 1.63 1.78
Weight (kg) 67 60 90

Table 3

Number of scenarios collected as data for the three situations

Number of scenarios
Only human 38
Only drone 70
Combine 102

Table 4

Original data obtained from the radar

Name Type
Time stamp Datetime
3D coordinates of all points List (float)
Powers of all points List (float)
Velocity List (float)

Table 5

Input features for the deep learning model

Name Type
Number of points Integer
Center point x Float
Center point y Float
Center point z Float
Dx Float
Dy Float
Dz Float
Average power Float
Average velocity Float

Table 6

Comparison of accuracy between the LSTM, DNN, and RestNet models

Model LSTM DNN ResNet
Number of training 20 20 20
Number of epochs 40 100 75
Accuracy (%)
 Average 74.36 96.55 97.54
 Highest 77.69 96.65 97.72

Table 7

Comparison of the error value for the coordination of the human object between the proposed application and the Retina application

Error value (m)
Minimum 0.00046
Maximum 0.64
Average 0.111

Algorithm 1

Dynamic-DBSCAN algorithm

Input: List 3D coordinates of all points in point cloud (N)
Output: List cluster Id of all points in point cloud (C)

 #step 1: Get the first list cluster id
C = DBSCAN(eps = ɛest)
 #step 2: Check first condition
 while NPobject < NPnoise:
  ɛest = ɛest + Δe
  C = DBSCAN(eps=ɛest, N)
 #step 3: Check second condition
  ɛtemp= ɛest
  Ctemp = C
  Nobject = f_count(Ctemp)
  if Dz > zh:
   while Nobject < 2:
    ɛtemp = ɛtemp − Δe
    if ɛtemp< ɛlow:
     break;
    Ctemp = DBSCAN(eps=ɛtemp)
    Nobject= f_count(Ctemp)
   if Nobject2
    ɛest = ɛtemp
    C = Ctemp
 #step 4: Merge close cluster
 if ||centroidicentroidj||≤ ω:
  merge two clusters clusteri,clusterj