Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (2024)

Background & Summary

The advancement of Unmanned Aerial Vehicles (UAVs), commonly known as drones, has significantly improved security surveillance capabilities. Drones excel at tracking and pinpointing individuals of interest, rendering person-following and tracking systems1 invaluable in domains like surveillance2, search and rescue missions3, and healthcare4,5. These systems leverage Visual Object Tracking (VOT) techniques, which involve locating and estimating the trajectory of specific objects within a sequence of consecutive frames6,7. VOT finds applications in various fields, including autonomous vehicles8, robotics9,10, and robot-assisted person following10. In VOT, a fundamental challenge is to learn an appearance model from the initial state of a target object, which is essential for locating the target object in subsequent frames6,7. This challenge becomes particularly pronounced with the presence of similar appearance distractors6.

A reliable visual object tracker is vital for the effectiveness of a vision-based drone-person following system in the face of numerous challenges. To train a robust tracker capable of excelling in diverse scenarios, it must be exposed to various challenging tracking scenarios. In recent years, several large-scale object tracking datasets have been released, such as UAV12311, OTB10012, VOT201813, TrackingNet14, LaSOT15, GOT-10k16, and LaTOT17, which cover diverse real-world tracking challenges. However, none of these datasets specifically cover person tracking in a uniform appearance environments. Such settings are common in regions like the Gulf and many parts of Asia and pose unique tracking challenges due to similar clothing. Introducing a dataset for uniform appearance tracking can be a valuable addition to the computer vision community. A recent study18 has introduced a dataset named PTUA, which focuses on ground robot-person tracking in a uniform dressing environment scenario. However, this dataset has several limitations in the settings chosen while recording it. Firstly, it only considers a maximum person density of four, which does not reflect a truly crowded scenario. Secondly, the dataset was captured in a controlled environment, which does not accurately simulate the challenges of real-world tracking scenarios. Thirdly, the dataset does not simulate scenarios where a person may be imitating intruder behavior, such as moving quickly to evade a robot which are essential factors in assessing the robustness of a tracking algorithm in a person tracking scenario.

In scenarios that involve tracking individuals using drones in uniform appearance crowd, the task of maintaining the trajectory of a designated target becomes notably challenging. This challenge arise from the complexities introduced by drone-based capturing and the substantial presence of uniform appearance distractors. The combined impact of these challenges creates obstacles in effectively tracking the intended object, thereby rendering it more intricate compared to tracking tasks in other datasets, as visually illustrated in Fig.1, and as comprehensively discussed in “Technical Validation” section.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (1)

A visual performance evaluation encompasses a comparative analysis of six state-of-the-art trackers, specifically STARK-ST5025, ToMP5022, KeepTrack46, MixFormer-CvT37, OSTrack384-NeighborTrack47, and OSTrack38441. This evaluation is carried out across seven distinct datasets, followed by a direct comparison with our proposed D-PTUAC dataset. It is noteworthy that the results underscore a noticeable decline in the performance of the state-of-the-art trackers when applied to our D-PTUAC dataset in sharp contrast to their performance on the other seven datasets.

To address the above gaps, we introduce the Drone-Person Tracking in Uniform Appearance Crowd (D-PTUAC) dataset19 for uniform-clothed crowds. The dataset also stands out in having a target person behaving as an intruder to single them out from the crowd. The sequences were collected by controlling a camera-equipped drone, specifically DJI Mavic 3 Pro (https://www.dji.com/ae/mavic-3-pro) with a wireless manual controller using DJI GO 4 Android application (https://www.dji.com/ae/downloads/djiapp/dji-go-4) to follow a target person among the crowd wearing the same attire. To enhance the diversity of the dataset, we performed the collection in different challenging scenarios such as Uniformity (UF), Abrupt Appearance Change (AAC), Background Clutters (BC), Aspect Ratio Change (ARC), Scale Variation (SV), LR, Rotation (ROT), Pose Variation (PV), Occlusion (OCC), Out-of-View (OV), Short-Term (ST), Long-Term (LT), Motion Blur (MB), Fast Motion (FM), Illumination Variations (IV), Weather Conditions (WC), Crowd Densities (CD), Deformation (DEF), and Surveillance Settings (SS). Figure2 illustrates sample images taken from the proposed D-PTUAC dataset. The evaluation of RGB trackers revealed a substantial performance decline, particularly in the presence of LR and BC, as illustrated in Fig.3. This decline can be attributed to the nature of the dataset captured by a drone, where the subjects tend to be LR and wear clothing with a uniform appearance, intensifying the challenges of UF and BC. Additionally, to highlight the challenges in drone crowd uniform appearance tracking, we show that previous frameworks that rely on estimated depth fusion and segmentation fail on our dataset. For this purpose, we employed two frameworks: MiDaS for monocular depth estimation20 to generate RGB-D data and ViT-B SAM21 to generate segmentation masks for tracking. The dataset19 is available for access on Figshare at (https://doi.org/10.6084/m9.figshare.24590568.v2).

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (2)

Selected samples from the proposed dataset highlighting challenging attributes (IV, BC, UF, OCC, PV, MB, FM, OV, AAC, ARC, DEF, ROT, CD, SV, ST, LT) through a structured layout: Row 1 showcases RGB sample images, Row 2 presents depth sample images, and Row 3 displays Segmentation masks sample images. The columns within the figure showcase samples that encompass multiple attributes, where Column (a) features LR, IV, BC, and UF, Column (b) includes OCC, BC, IV, and UF, Column (c) portrays IV, BC, and UF, Column (d) demonstrates OCC, BC, IV, UF, and PV, Column (e) encompasses DEF, LR, BC, IV, and UF, Column (f) showcases MB and FM, and Column (g) encompasses OV. These images emphasize the importance of developing robust drone-person tracking methods.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (3)

Evaluation results of 44 state-of-the-art pretrained trackers on D-PTUAC on videos with LR and BC attributes using, (a) Success Rate, (b) Precision Rate, and (c) Normalized Precision Rate. Please zoom for better clarity.

Methods

Human subjects

Our study involving human subjects was approved by the Research Ethics Committee of Khalifa University of Science and Technology (IRB protocol number H23-029), ensuring adherence to ethical standards in research. Following this approval, we specifically targeted the Khalifa University community for participant recruitment, encompassing students, staff, faculty, and local residents in Abu Dhabi, United Arab Emirates. This effort successfully engaged approximately 40–50 subjects; all of whom voluntarily participated in the construction of the dataset. The Ethics office, in collaboration with the investigators, played a key role in disseminating detailed information about the study and obtaining informed written consent from all participants. The inclusion criteria for our participant cohort were clearly defined: individuals aged 18 years and above, of any sex, either UAE nationals or residents, who were capable of understanding and providing consent. Exclusion criteria included individuals below 18 years of age and those unwilling or unable to give consent. Importantly, participants were explicitly informed that their likenesses captured in videos and images would be shared as part of an open-access dataset, thus ensuring their full awareness and understanding of the extent of their involvement and how their data would be utilized in the research community.

Proposed D-PTUAC dataset

To construct a benchmark dataset tailored for drone-person tracking scenarios, we conducted RGB video recordings using a DJI Mavic 3 Pro drone. During these recordings, the drone was manually operated to track and follow a designated individual within a group. This approach allowed us to capture a range of typical drone navigation characteristics and challenges, including ego-motion, MB, and occurrences of OCC. Subsequent sections will provide detailed insights into the dataset construction process.

Surveillance settings

The D-PTUAC dataset comprises videos for dynamic and static SS to simulate real-world scenarios. Sample frames extracted from these videos are visually depicted in Fig.2. Below, we provide a comprehensive overview of the specific applications and dataset particulars pertaining to these two distinct SS.

  • Dynamic surveillance involves actively monitoring a particular subject or group of subjects using a moving drone. The D-PTUAC dataset features 88 videos specifically recorded for dynamic surveillance. Participants were instructed to walk from point A to point B while the drone captured their movement from both the front and back views, although not simultaneously. The drone closely monitors the subject by flying a few meters ahead of them, and user cooperation is not necessary, as the drone is designed to follow the subject’s movements via manual control using the DJI GO application.

  • Static surveillance involves using a static drone to monitor an area or event without a specific focus on any particular subject or object. The D-PTUAC dataset includes 50 videos captured for static surveillance, each featuring participants instructed to move around, engage in discussions, or walk within a designated area while the drone captured the entire scene. Unlike dynamic surveillance, the drone’s movement is kept static, and its objective is to record the events of the specified region rather than monitor specific individuals. This approach simulates real-world surveillance scenarios, making the D-PTUAC dataset a challenging and valuable resource for drone-based tracking and understanding of human activities.

Dataset setup

The D-PTUAC dataset consists of twenty-four distinct settings, offering variation in terms of SS, angle of view, CD, and times of capture. The dataset was meticulously collected on an outdoor tennis court located at Khalifa University in Abu Dhabi, United Arab Emirates. The tennis court has dimensions of 20×10×4 meters.

The data collection spanned two seasons, Fall and Spring, with diverse crowds to enhance dataset diversity. In the Fall season, Crowd 1 videos were recorded during both morning and evening sessions, whereas in the Spring season, Crowd 2 videos were exclusively captured during the morning hours, except for one instance characterized by rainy weather conditions. The dataset introduces three CD categories, namely sparse, medium, and compact. For each CD category, participants were recorded from both front and back views, resulting in a total of 87 videos recorded in the morning for each CD and SS, along with an additional 51 videos captured in the evening. It is worth noting that the morning videos possess high illumination, while the evening videos exhibit relatively lower illumination levels. To ensure sufficient lighting in the evening videos, floodlights were employed.

Figure4 visually illustrates scenarios involving a crowd with an intruder among them. These scenarios are constructed under the assumption that the individual to be tracked is an intruder, and the objective is for the drone video tracker to effectively follow their movements as they navigate through and within the crowd, occasionally attempting to evade the tracker’s surveillance.

  1. 1.

    Scenario 1 (S1): The uniform appearance crowd moves in a straight line from point A to B, while the intruder, present within the crowd, attempts to confuse the tracker by moving in an overlapping and zigzag path.

  2. 2.

    Scenario 2 (S2): The uniform appearance crowd moves in a straight line from point A to B, and the intruder joins the crowd at a later stage. The intruder then moves in an overlapping and zigzag path to confuse the tracker.

  3. 3.

    Scenario 3 (S3): The uniform appearance crowd moves in a straight line from point A to B, and the intruder initially moves in a circular path towards the drone. Upon noticing the drone, the intruder quickly joins the crowd and continues to move in an overlapping and zigzag path to confuse the tracker.

  4. 4.

    Scenario 4 (S4): The uniform appearance crowd moves randomly within the tennis court area, while the drone 5tries to follow the intruder. The intruder attempts to overlap with the crowd to confuse the tracker and hide. This scenario represents a dynamic SS, as the drone follows the intruder throughout the video.

  5. 5.

    Scenario 5 (S5): The uniform appearance crowd is instructed to move randomly within a confined area of the tennis court. Meanwhile, the intruder moves in a circular path towards the drone. Once the intruder notices the drone, they immediately join the crowd and attempt to overlap with them to confuse the drone tracker and evade detection. The drone tries to follow the intruder throughout the video, while the intruder tries to hide within the crowd.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (4)

Visual illustration of the scenarios employed for D-PTUAC dataset collection. (a) Scenario 1 (S1), (b) Scenario 2 (S2), (c) Scenario 3 (S3), (d) Scenario 4 (S4), and (e) Scenario 3 (S5).

The D-PTUAC dataset contains repeated videos, and this is intentional due to the division of each crowd into groups of intruders. Each group participated in one of the five scenarios mentioned earlier, leading to multiple videos of the same scenario but with different intruders. The collection process of the D-PTUAC dataset is described in Algorithm 1.

Algorithm 1

Algorithmic Overview of D-PTUAC Dataset Collection Process.Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (5)

The D-PTUAC dataset consists of a range of visual factors, including PV, IV, OCC, and LR, as depicted in Fig.2. Each video in the dataset contains approximately 40–50 subjects for the two crowds, with 15–30 subjects per combination of CD, angle of view, drone SS (static or dynamic), and time of capture.

In terms of subject demographics, all individuals captured in the videos fall within the age range of 20 to 35 years. Additionally, a subset of videos includes two participants who exceed the age of 35 years. For each specific combination of settings, a single group of 2–5 intruders appears in only one video, resulting in a total of 138 videos across the twenty-four combinations. Detailed statistics of the D-PTUAC dataset can be found in Table1, which includes over 76 K frames for dynamic SS and over 44 K frames for static SS.

The dataset also encompasses high-resolution gallery images of each subject captured in constrained settings using a 12-megapixel smartphone. The video scenarios were captured at a frame rate of 30 Frames Per Second (FPS) and a resolution of 3840×2160 pixels using a DJI Mavic 3 Pro drone. These gallery images were taken in optimal lighting conditions and encompass four distinct poses. These images serve multiple purposes, including the development of a facial identification system capable of recognizing intruders in aerial images, even when the facial area is limited to only a few pixels. Furthermore, these images are integral to a comprehensive tracking framework. Initially, a face detector identifies the first bounding box of a specific person, which is then confirmed by a face recognizer. This information is subsequently passed on to the tracker to initiate the tracking process.

Videos annotations

In the annotation process for the D-PTUAC dataset, the heads of the subjects were selected as the most suitable body part for annotation. This choice was made to address the challenges presented by subject overlap and OCC. Precisely defined bounding boxes were employed to encompass the visible region of the heads, taking into consideration the perspective from the drone.

To assess the quality of head annotations in the D-PTUAC dataset, a comparison was made against full-body annotations using twenty video sequences and a pretrained ToMP50 tracker22. The head-tracking success rate achieved 55.29%, significantly outperforming the full-body tracking success rate of 11.09%. This outcome underscores the superiority of head annotations for the specific tracking tasks in this dataset.

The annotation process for the dataset was meticulous, involving a team of experienced individuals with expertise in VOT. Manual annotation was performed using the Computer Vision Annotation Tool (CVAT) (https://www.cvat.ai/) with precise attribute labels assigned. The process underwent three stages of scrutiny and refinement to ensure the annotations’ high quality. Several challenges were encountered during the annotation process, including small head sizes in the video frames, OCC, MB, LR, UF, and BC. Addressing these challenges necessitated re-annotation for approximately 80% of the dataset.

Annotating only the head of the target person in the D-PTUAC dataset results in small bounding boxes of size 64×64 pixels, as depicted inFig.5, leading to challenging LR samples. These small targets lack sufficient appearance information and pose difficulties for deep networks, which produce weak features when directly processing LR regions. Enlarging the regions introduces blur and sawtooth artifacts, compromising LR image representation and increasing computational costs. Additionally, when LR objects occupy a small portion of the image, they are vulnerable to interference from background objects and noise. These combined challenges impede the localization and discrimination capabilities of general visual tracking networks when dealing with LR objects17.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (6)

The histogram represents the distribution of ground truth annotated heads based on their size. It can be observed that a significant portion of the ground truth images have less than 4,000 pixels, which is equivalent to images smaller than 64×64 pixels.

Dataset tracking attributes

Drone-person tracking scenarios often present various challenging factors, many of which have been intentionally incorporated into the scenarios described in the “Dataset Setup” section. The tracking attributes in the D-PTUAC dataset, as shown in Fig.6, can be classified into two groups: controlled and implicitly inherited. This study focuses on four crucial video-level controlled attributes relevant to aerial environments and significantly impact tracking algorithm performance in aerial captured video sequences. These attributes are summarized as follows:

  1. 1.

    Abrupt Appearance Change: It describes sudden and significant changes in the appearance of the tracked object. In the scenarios discussed earlier, such those in S3 (Fig.4a) and S5 (Fig.4e), where the intruder executes zigzag movement and blending into the crowd can cause AAC. This results in instances where the initially tracked region corresponds to the front head but subsequently shifts to the back head.

  2. 2.

    Crowd Density: It represents the proximity and density of individuals in a crowd. The dataset categorizes CD into three levels: sparse (more than 1-meter distance), shown in Fig.2a, medium (1-meter distance), shown in Fig.2c, and compact (shoulder-to-shoulder proximity), shown in Fig.2e.

  3. 3.

    Surveillance Settings: It refers to dynamic and static SS. Dynamic surveillance, with 88 videos, involves capturing subjects’ movements while the drone follows them, while static surveillance, with 50 videos, documents events in a designated area without actively tracking individuals.

  4. 4.

    Illumination Variation: It denotes significant changes in lighting conditions within a scene. In the drone-person following scenario, IV can arise from various sources such as the sun’s position, artificial lights, shadows, and reflections. The dataset captures videos in different lighting conditions, including morning, evening, and rainy weather. Examples are shown in Fig.2a,b.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (7)

Distribution of Sequences Across Each Attribute within the D-PTUAC Dataset.

Additionally, 14 implicit attributes in the dataset were inherited from the nuisance and distraction factors. These attributes arise from factors that were not explicitly controlled or introduced by humans but were inherent in the dataset recordings. They are briefly described below:

  1. 1.

    Uniformity: The D-PTUAC dataset exhibits a unique characteristic wherein all individuals, including the target and distractors, wear a white dress and headscarf throughout the recording, as depicted in Fig.2a–e. This feature distinguishes our proposed dataset from others, such as the recent robot-person tracking dataset18, which includes a similar attribute but with fewer people. In our dataset, scenes consist of 15–30 people, making individuals appear as moving blobs without visible leg movements. The scenes also feature two to five intruders who subsequently join the crowd, augmenting the dynamic nature and appearance of the scene.

  2. 2.

    Fast Motion: It occurs when the tracked object or the drone moves quickly, challenging the tracker to keep up.

  3. 3.

    Motion Blur: It arises from the drone or target’s movement, causing blurred frames, as illustrated in Fig.2f.

  4. 4.

    Pose Variation: It captures the variability of human poses, including actions like running or hugging, resulting in significant pose changes between consecutive frames, as shown in Fig.2a,d.

  5. 5.

    Scale Variation: It represents changes in the object’s size from the first frame to the current frame is outside the range of [0.5, 2], particularly in static scenarios where the target moves closer or farther from the drone as shown in Fig.2a–d.

  6. 6.

    Background Clutter: It occurs when the target’s appearance resembles the background, leading to challenges in accurate differentiation, as shown in Fig.2a–g.

  7. 7.

    Low Resolution: It describes the characteristics of the target object in the video frames, specifically referring to the tracked person’s head in our D-PTUAC dataset. To determine LR, we follow the method proposed in17, which calculates the object’s relative size by dividing its bounding box area by the image area in each frame. The average relative size is then computed across all frames within each video sequence. We set the average relative size threshold at 1%, as suggested in17. However, to prevent misclassification of larger object sequences as LR sequences, we incorporate the concept of average absolute size, with a threshold set at 22×22 pixels. For a video sequence to be classified as LR, both the average absolute and relative size must be below these thresholds. This dual-criteria approach enhances the accuracy of LR sequence identification in our study.

  8. 8.

    Rotation: It happens when the target deliberately conceals themselves within the crowd after being detected by the drone.

  9. 9.

    Aspect Ratio Change: The bounding box aspect ratio falls outside the specified range of [0.5, 2].

  10. 10.

    Deformation: The target undergoes deformations and changes in shape during the tracking process, as depicted in Fig.2e.

  11. 11.

    Occlusion - Partial Occlusion (POC)/Full Occlusion (FOC): It arises when parts or the entire target is obstructed by objects or people in the scene, as depicted in Fig.2b–d.

  12. 12.

    Out of View: It refers to a situation where the target fully leaves the camera field of view, as shown in Fig.2g.

  13. 13.

    Short-Term Videos: It refers to a sequence length of less than 1,000 frames. Our dataset contains 95 ST videos.

  14. 14.

    Long-Term Videos: It refers to a sequence length of more than 1,000 frames. Our dataset contains 43 LT videos.

Frameworks for using RGB-D and segmentation based trackers

RGB-D framework

To comprehensively evaluate different categories of trackers, it is important to consider RGB-D trackers that rely on both RGB and depth data for fusion algorithms. However, since the D-PTUAC dataset is captured solely with an RGB camera, we have developed a framework to generate depth information from RGB data.

The RGB-D framework utilizes monocular depth estimation techniques, specifically leveraging the MiDaS network20. Specifically, we have employed the DPT-Swin2-Tiny-256 network, which balances accurate depth estimation and real-time inference on embedded devices, achieving a framerate of 90 FPS20. This choice is particularly important for deploying the framework on resource-constrained systems such as drones.

Segmentation mask generation framework

Given the uniqueness of our dataset, a segmentation model capable of generating masks for various objects, with a focus on the head in our case, is required. To address this, we have utilized the SAM model21, which has been trained on a large dataset of over 1 billion masks derived from 11 million licensed and privacy-respecting images. The extensive training enables the SAM model to generalize well and accurately segment specific targets, such as the head of the person of interest. Among the available options in the SAM model, we have chosen the ViT-base model due to its lightweight nature in terms of parameters and floating point operations (FLOPs), as well as its fast inference speed compared to other models discussed in the paper21.

Data Records

The D-PTUAC dataset19 has been made available for public download through Figshare at (https://doi.org/10.6084/m9.figshare.24590568.v2). Access to the data does not necessitate any registration process. The dataset occupies a combined storage of 15.01 gigabytes. An elaboration of the folder arrangement encompassing the dataset and its pertinent files is provided below.

Folder structure

An overview of the D-PTUAC dataset is given in Fig.7. The dataset’s root directory, labeled “D-PTUAC”, comprises two subsidiary directories: “train” and “test”. Within the “train” subdirectory, there is a file named “list.txt”, listing video directories and 90 directories with video frames and four “.txt” files defined as follows: “groundtruth.txt” forms an N×4 matrix, indicating object locations as [xmin, ymin, width, height]. “cover.label” is an N×1 array, representing object visible ratios categorized from 0 to 8. “absence.label” is a binary N×1 array indicating object presence, and “cut_by_image.label” indicates if an object is cut by frame boundaries in each video frame. The same structure is followed in the “test” subdirectory, where a file named “list.txt” lists video directories along with 48 directories containing video frames, along with the four “.txt” files.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (8)

The folder structure of the D-PTUAC dataset.

In the dataset, each video frame contained within the “Vid” directories is designated with an eight-digit numerical sequence that incrementally advances as the video unfolds. This sequence is followed by the “.jpg” extension, indicating the image data type. Pertinent annotations and essential information concerning the object of interest within each video are located within the corresponding video directory. For the structuring of annotations, the D-PTUAC dataset adheres to the annotations style established by the GOT-10K16 dataset.

Technical Validation

We conducted a comprehensive performance evaluation of existing state-of-the-art (SOTA) trackers on our proposed D-PTUAC dataset, which included attribute-wise analysis to test the trackers’ robustness against specific challenges. To further enhance the tracking performance, we finetuned 10 high-quality SOTA trackers on a training split of the D-PTUAC dataset. All experiments were conducted on a workstation equipped with one Nvidia GeForce RTX 3080 GPU, 11th Gen Intel 2.3 GHz CPU, 32GB RAM, and 8GB VRAM. We used the official source codes provided by the respective authors to implement all trackers.

Evaluation metrics

To evaluate the trackers, we used the popular One-Pass Evaluation (OPE) protocols proposed by OTB12 and LaSOT15 to measure Success Rate (SR), Precision Rate (PR), and Normalized Precision Rate (NPR).

The SR, is calculated by taking the Intersection over Union (IoU) of the pixels between the tracker’s predicted bounding box, boxP, and the actual ground truth’s bounding box, boxG.

$$SR=\frac{| bo{x}^{G}\cap bo{x}^{P}| }{| bo{x}^{G}\cup bo{x}^{P}| }$$

(1)

Tracking algorithms are ranked based on their SR, which is determined by the Area Under the Curve (AUC) ranging from 0 to 1. A higher AUC indicates a better success rate for the tracker. The ranking is done from the worst to the best-performing tracker.

In general, the PR is computed as the distance between the center of the ground truth bounding box and the predicted bounding box generated by the tracker. The PR is defined as:

$$PR=| | bo{x}^{G}-bo{x}^{P}| {| }^{2}$$

(2)

The ranking of trackers is determined by varying threshold values from 0 to 20 pixels in this measure, and those with a higher PR are considered to have better performance.

To address the sensitivity of the PR measurement to image resolution and bounding box sizes, we incorporated the NPR, measurement. The calculation of NPR is as:

$$\begin{array}{l}W=diag\left(bo{x}_{x}^{G},bo{x}_{y}^{G}\right)\\ NPR={\left\Vert W\left(bo{x}^{G}-bo{x}^{P}\right)\right\Vert }^{2}\end{array}$$

(3)

This metric, denoted by NPR, normalizes the PR using the ground truth annotations, as described in14. Trackers are ranked based on the AUC for NPR values ranging from 0 to 0.5. A higher NPR score indicates better performance of the tracker.

Baseline trackers

The evaluation of the dataset involved a careful selection of representative SOTA baseline trackers in highlighting the challenges posed by the proposed dataset. A total of 44 prominent trackers were included to ensure a comprehensive evaluation. These trackers were chosen from different categories, including Discriminative Correlation Filters (DCF)-based trackers like ATOM23 and DiMP24, hybrid transformer-based trackers like STARK25, and TATrack26 which combine Siamese networks and transformer networks for improved feature discrimination, and the DCF-based RGB-D tracker DeT27 that extends the DiMP tracker24 to incorporate depth information. Segmentation-based trackers such as RTS28 were also included in the evaluation.

Evaluation protocols

The evaluation protocol is designed to assess the following aspects: [label = ()]

  1. (a)

    Overall Performance on the Testing Set: Comparing the performance of 44 SOTA trackers on the D-PTUAC testing set before and after finetuning.

  2. (b)

    Drone Surveillance Settings (Multi-scale) Performance: Assessing trackers in dynamic and static drone scenarios to understand their capabilities and limitations under different operational conditions, including object tracking during drone movement and challenges such as MB, FM, OCC, and changing trajectories. Evaluating trackers in static drone settings provides insights into their performance in the presence of multiple uniform appearance distractors.

  3. (c)

    Scenario Performance: Conducting individual evaluations for each scenario depicted in Fig.4 to analyze the tracker’s performance under different intruder behaviors, such as circular paths (Fig.2c,e), or attempts to blend in with the crowd (Fig.2a).

  4. (d)

    Crowd Density Performance: Performing separate evaluations for sparse, medium, and compact CD levels to understand the impact of uniform appearance distractors on tracker performance, as they act as dynamic obstacles with similar appearances.

  5. (e)

    Different Daytime Performance: Evaluating the performance of trackers in morning and evening scenarios to assess their adaptability and robustness under varying lighting conditions and environmental changes.

  6. (f)

    Attribute Evaluation: Using trackers to assess distinct attributes exhibited in the videos, enabling in-depth analysis of tracker performance related to specific attributes.

For sections “Drone Surveillance Settings (Multi-scale) Performance”, “Scenario Performance”, “Crowd Density Performance”, “Different Daytime Performance”, and “Attribute Evaluation”, we specifically selected 10 SOTA trackers that underwent finetuning based on the results presented in Tables3, 4. A comparison is then made between these chosen trackers for each evaluation protocol. The objective behind these evaluations is to conduct a comprehensive analysis of the trackers’ performance while also gaining insights into the impact of various scenarios and attributes on their effectiveness.

Training/Testing split

The D-PTUAC dataset was split into training and testing sets. The training set consists of 90 videos, while the testing set consists of 48 videos. The training set contains approximately 78 K frames, while the testing set contains around 42 K frames. A comprehensive comparison of the training and testing sets of D-PTUAC is presented in Table2. The analysis shows that the minimum frames, mean frames, median frames, and maximum frames exhibit similarity between these two subsets. Additionally, Fig.9 demonstrates that the ratios of sequences across all attributes and settings are also similar. These findings, derived from both Table2 and Fig.9, provide evidence of the consistency and coherence of our training/testing split.

Overall Performance on the testing set

To conduct a comprehensive analysis, we evaluate the performance of 44 pretrained trackers on the testing set of D-PTUAC, as depicted in Fig.8. Among these trackers, 24 are listed in Table3, which showcases their performance in the pretrained state. Additionally, we present 20 trackers in Table4, which demonstrates their performance before and after finetuning on the training set of D-PTUAC, allowing us to assess the impact of our training set on tracker performance. It is important to note that no changes were made to the hyperparameters of these 20 trackers.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (9)

Evaluation results of 44 pretrained models on D-PTUAC testing set using, (a) SR (%), (b) PR (%), and (c) NPR (%). Please zoom for better clarity.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (10)

Comparison of sequence distribution in each attribute between training and testing sets.

The analysis of the results reveals that algorithms that combine the Siamese network and Transformers, such as TATrack26, SeqTrack29, and ToMP22, demonstrate a higher level of robustness in performance. These algorithms effectively leverage the advantages of capturing contextual information and addressing LT dependencies, which are crucial for accurate tracking. However, it is worth noting that certain algorithms that combine the Siamese network and Transformers, including STARK25, TrTr30, and TransT31, fail to achieve satisfactory results on our dataset. A similar pattern is observed in DCF trackers, such as DiMP1824, SuperDiMP32, PrDiMP1833, and ATOM23. While SuperDiMP32 demonstrates robust performance by employing effective scale regression techniques and online learning strategies, ATOM23 falls short in achieving desirable results.

Specifically, the results indicate that TATrack-Base38426 achieved the highest performance with values of 59.39% for SR and 71.54% for NPR. Regarding PR, SeqTrack-B25629 outperformed other trackers, achieving a PR of 65.60%. Upon finetuning, a similar trend was observed where TATrack-Base38426 continued to demonstrate its effectiveness, yielding the best results with values of 64.74% for SR and 79.59% for NPR. Notably, AiATrack34 achieved the highest PR of 73.72%.

RGB-D and segmentation-based trackers, such as DeT27 and RTS28, face significant challenges and exhibit notable failures on our testing set. The hom*ogeneous distribution of the crowd at the same distance from the camera leads to identical depth outputs for all individuals. Consequently, the depth data presents a similar appearance for the entire crowd, making it difficult for trackers to differentiate and accurately track the target person. This lack of depth variation hampers the tracker’s ability to distinguish between individuals, leading to the loss of track on the target and compromising performance in such scenarios. The presence of OCC and multi-scale targets further exacerbates the challenges faced by segmentation-based trackers. These trackers rely on pixel-level segmentation masks, which can be unreliable when multiple individuals in the crowd have similar appearances. This results in inaccurate tracking and compromised performance on our testing set.

Upon analyzing the data presented in Table4, it is clear that each of the 20 trackers studied shows a consistent enhancement in performance following the finetuning process using our training set. This notable improvement not only validates the effectiveness of our training set but also emphasizes its critical importance in the context of drone-person following in scenarios involving a crowd with uniform appearance.

Additionally, Fig.10 provides a visual representation of the comparatively limited performance exhibited by the 20 finetuned SOTA trackers when evaluated against the testing set. This diminished performance is indicative of the increased complexity and challenge inherent in the testing set, thereby underscoring the need for further advancements in tracker technology to effectively address such demanding scenarios.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (11)

Evaluation results of 20 finetuned models on D-PTUAC testing set using, (a) SR (%), (b) PR (%), and (c) NPR (%). Please zoom for better clarity.

Surveillance settings (Multi-scale) performance

As detailed in “Surveillance Settings” section, the D-PTUAC dataset comprises videos categorized into dynamic and static SS. In dynamic drone settings, the scale of the target’s bounding box remains relatively consistent due to the drone’s efforts to maintain a constant distance. However, in static drone settings, the target’s bounding box exhibits significant SV as the object moves closer to or farther away from the drone, resulting in multi-scale bounding boxes. This introduces challenges such as SV and necessitates tracking algorithms to effectively handle these changes and maintain accurate localization.

Based on the comparison provided in Table5, AiATrack tracker34 demonstrates the highest performance in videos with multi-scale variations, particularly in static drone settings. Compared to the baseline tracker and the second best tracker, TrDiMP35, AiATrack achieves a performance improvement of 2.94%. Furthermore, TATrack-Base38426 is the top-performing tracker in dynamic drone settings, characterized by challenges such as MB and FM. In comparison to the baseline tracker and the second best tracker, SuperDiMP, TATrack-Base384 exhibits a performance enhancement of 1.08%.

Scenario performance

As outlined in Section “Dataset Setup”, the D-PTUAC dataset consists of five distinct scenarios that aim to simulate various intruder behaviors within a uniform appearance crowd. These scenarios are designed to replicate real-life situations where a law enforcement drone is deployed to track an intruder amidst a crowd with similar attire. Evaluating tracker performance in these scenarios is crucial for assessing their effectiveness in real-world applications. By evaluating the performance of the trackers on the D-PTUAC dataset, our objective is to gain insights into their capabilities and limitations when dealing with intruder tracking in uniform appearance crowd scenarios.

The benchmarking results presented in Table6 reveal notable variations in tracker performance across different scenarios. Specifically, in scenarios S3 and S5, where the presence of appearance ambiguity challenges the trackers, a significant decline in performance is observed. The random behavior of the crowd in S5 further amplifies the performance degradation due to increased OCC. In both S2, S3, and S5, AiATrack34 outperforms other trackers, achieving performance improvements of 0.62%, 19.89%, and 35.18%, respectively, compared to the second-best performing trackers. Similarly, in scenario S1, characterized by high OCC levels as the intruder attempts to blend in with the crowd, SeqTrack-B25629 achieved highest performance of 80.45%. On the other hand, in scenario S4, where the drone closely follows the intruder, trackers exhibit a significant performance boost as the target remains within the drone’s field of view for most of the time. TATrack-Base38426 demonstrates the best performance in S4, surpassing the second best-performing tracker, SuperDiMP26, by 3.13%.

Crowd density performance

The evaluation of trackers on different CDs, including sparse, medium, and compact, provides valuable insights into their capabilities and limitations in real-world surveillance scenarios. It allows for a comprehensive assessment of their performance in handling diverse crowd configurations, distinguishing targets from the background, and coping with OCC. Understanding the specific challenges and limitations associated with each density enables researchers and developers to enhance the trackers’ capabilities and address density-specific obstacles. Furthermore, benchmarking and comparing tracker performance across different densities facilitate informed decision-making for selecting suitable trackers based on specific CD requirements.

As shown in Table7, there is a noticeable decline in performance among various trackers, such as SLT-TrDiMP36, SeqTrack-B25629, and AiATrack34, when transitioning from sparse to medium to compact CD. These variations align with an increase in OCC and the emergence of more complex BC. In contrast, in the sparse and medium CD levels, ToMP5022 achieves impressive AUC values of 72.15% and 68.94%, respectively. In compact CD scenario, SuperDiMP achieved slight performance improvement of 0.82% compared to the second-best performer, TrDiMP35.

Daytime performance

The evaluation of trackers on morning and evening scenarios in SS offers multiple benefits. It provides valuable insights into their performance under varying lighting conditions, ensuring their adaptability to different times of the day. This evaluation also allows for the assessment of trackers’ robustness in handling challenges such as shadows, IV, and low light conditions specific to morning and evening environments. Additionally, it enables the analysis of potential temporal variations in tracking accuracy, aiding in the selection of trackers that exhibit consistent performance throughout the day.

As indicated in Table8, all trackers performed better in the evening compared with the morning. A possible reason for a tracker performing worse in videos captured in the morning compared to the evening could be variations in lighting conditions. In the morning, the lighting may be softer, with lower contrast and potentially more shadows, making it challenging for the tracker to accurately detect and track objects. Additionally, the angle and intensity of sunlight can change throughout the day, leading to different levels of illumination and potential glare in morning videos. These variations in lighting conditions can affect the quality and reliability of visual features used by the tracker, resulting in decreased performance.

TrDiMP24 demonstrates the best performance in evening videos, surpassing the second best-performing tracker SeqTrack-B25629 by 4.45%. Moreover, the best-performing tracker in videos captured in the morning is TATrack-Base38426 with a performance improvement of 2.32% compared to the second best-performing tracker AiATrack34.

Attribute-wise performance

In order to comprehensively evaluate the performance of various tracking algorithms, we evaluated ten finetuned trackers on 17 attributes using the D-PTUAC testing set. The results of this evaluation are presented in Table9. For better visualisation, we plot the top nine unique attributes in our D-PTUAC dataset against the performance of the chosen trackers as depicted in Fig.11.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (12)

A visual comparative attribute-wise results of finetuned trackers for nine unique attributes on D-PTUAC testing set using, (a) SR (%), (b) PR (%), and (c) NPR (%). Please zoom for better clarity.

TATrack-Base38426 and AiATrack34 demonstrate effective mitigation of challenges such as ACC, BC, FM, LT, ST, POC, FOC, PV, ROT, and SV when compared to other algorithms. It is worth highlighting that these challenges, particularly BC, UF, and OCC, significantly impact the quality of object features. The aforementioned trackers successfully address these challenges, demonstrating significant improvements in the representation, discrimination, and localization abilities for tracking multi-scale uniform appearance objects.

However, upon further analysis of the performance on attributes related to UF, BC, IV, and OV challenges, we observe that there remains a significant gap in effectively addressing challenges associated with tracking multi-scale uniform appearance objects. Consequently, further research and development efforts are necessary to overcome the challenges associated with tracking multi-scale uniform appearance objects in real-world applications.

Qualitative evaluation

For a qualitative assessment of different trackers and to provide insights for future research, we present the qualitative evaluation results in Fig.12 of six representative trackers: RTS28, SeqTrackL-38429, TATrack-Base38426, DiMP5024, ToMP5022, and MixFormer-CvT37. We show in Fig.12 five tracking scenarios with attributes such as AAC, ARC, SV, OCC, BC, LR, FM, MB, ROT, different CD levels, and different SS. Furthermore, to facilitate observation, we have enlarged the regions containing the target objects and presented them on the right side of the original images. In the D-PTUAC dataset, sequences often exhibit multiple challenge attributes, posing significant difficulties for tracking multi-scale uniform appearance objects and leading to frequent failures of current SOTA trackers. For instance, in Fig.12a–c, in S1, S2, and S3 sequences present challenges such as ARC, FM, MB, OCC, SV, BC, LR, and UF, which pose considerable challenges for existing trackers.

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (13)

Qualitative evaluation on six representative trackers. To enhance visibility, we have magnified the object regions and presented them on the right side of the original images. The enlarged regions are shown for the following examples: (a) S1, (b) S2, (c) S3, (d) S4, and (e) S5.

Additionally, we present some examples of failed cases for SOTA trackers on the D-PTUAC dataset in Fig.12d,e. These failed cases involve various challenge attributes, including AAC, ARC, SV, BC, MB, FM, OCC, BC, LR, and UF. The challenging attributes of FM can cause the target to move beyond the trackers’ search area. Although there are some re-detection tracking algorithms capable of addressing such problems, trackers often struggle to track the multi-scale uniform appearance object due to the lack of sufficient appearance information and interference from the BC. MB, often accompanied by FM and camera motion, further degrades the quality of feature representation. Moreover, as shown in Fig.12d,e, the challenge attribute of OCC frequently results in model drift and targets moving beyond the search area. In summary, the main reasons for the failure of other trackers in tracking the D-PTUAC dataset can be attributed to (1) the LR, UF, and limited informative content of multi-scale uniform appearance objects, which hinder effective feature extraction and precise target localization, and (2) the presence of multiple challenge attributes within the same video sequence in the D-PTUAC dataset, posing substantial challenges for tracking methods.

Usage Notes

The D-PTUAC dataset19, is publicly accessible on Figshare, available at (https://doi.org/10.6084/m9.figshare.24590568.v2). This dataset is offered for unrestricted use, permitting users to freely copy, share, and distribute the data in any format or medium. Additionally, users are granted the flexibility to adapt, remix, transform, and build upon the material. In the pursuit of fostering reproducibility, the predicted bounding boxes and finetuned weights of the visual object trackers are also available on Figshare, accessible at (https://doi.org/10.6084/m9.figshare.24590268.v2)38. Both the dataset and the evaluation scripts are licensed under the Creative Commons “Attribution 4.0 International” license, which can be reviewed at (https://creativecommons.org/licenses/by/4.0/) .

Code availability

The transformation of RGB frames into estimated monocular depth frames was conducted using models from the works cited in20,39. These models are available at (https://github.com/isl-org/MiDaS). In a similar manner, the generation of segmentation masks from RGB frames was accomplished using the SAM model21, accessible from (https://github.com/facebookresearch/segment-anything). Additionally, the VOT algorithms employed in this study were sourced from the official codes provided by the respective authors, as detailed in Table10.

For ease of access and utilization, all relevant codes, finetuned models, and predicted bounding boxes of the visual object trackers have been collated on our project’s Figshare page. These resources can be accessed via the following links: (https://doi.org/10.6084/m9.figshare.24590268.v2)38, and (https://github.com/HamadYA/D-PTUAC).

References

  1. Wu, X. et al. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geoscience and RS Magazine 10, 91–124 (2021).

    Google Scholar

  2. Portmann, J. et al. People detection and tracking from aerial thermal views. In 2014 IEEE ICRA, 1794–1800 (IEEE, 2014).

  3. Mishra, B. et al. Drone-surveillance for search and rescue in natural disaster. Computer Communications 156, 1–10 (2020).

    Article Google Scholar

  4. Kyrarini, M. et al. A survey of robots in healthcare. Technologies 9, 8 (2021).

    Article Google Scholar

  5. Kim, S. J. et al. Drone-aided healthcare services for patients with chronic diseases in rural areas. Journal of Intelligent & Robotic Systems 88, 163–180 (2017).

    Article Google Scholar

  6. Chen, F. et al. Visual object tracking: A survey. Computer Vision and Image Understanding 222, 103508 (2022).

    Article Google Scholar

  7. Javed, S. et al. Visual object tracking with discriminative filters and siamese networks: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6552–6574, https://doi.org/10.1109/TPAMI.2022.3212594 (2023).

    Article PubMed Google Scholar

  8. Zhu, P. et al. Multi-drone-based single object tracking with agent sharing network. IEEE Trans. on CSVT 31, 4058–4070, https://doi.org/10.1109/TCSVT.2020.3045747 (2021).

    Article Google Scholar

  9. Mengistu, A. D. & Alemayehu, D. M. Robot for visual object tracking based on artificial neural network. International Journal of Robotics Research and Development (IJRRD) 6, 1–6 (2016).

    Google Scholar

  10. Islam, M. J. et al. Person-following by autonomous robots: A categorical overview. The International Journal of Robotics Research 38, 1581–1618 (2019).

    Article Google Scholar

  11. Mueller, M. et al. A benchmark and simulator for uav tracking. In ECCV, 445–461 (Springer, 2016).

  12. Wu, Y. et al. Object tracking benchmark. IEEE Trans. on PAMI 37, 1834–1848, https://doi.org/10.1109/TPAMI.2014.2388226 (2015).

    Article Google Scholar

  13. Kristan, M. et al. The sixth visual object tracking vot2018 challenge results. In ECCV, 0–0 (2018).

  14. Muller, M. et al. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV, 300–317 (2018).

  15. Fan, H. et al. Lasot: A high-quality benchmark for large-scale single object tracking. In CVPR, 5374–5383 (2019).

  16. Huang, L. et al. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. on PAMI 43, 1562–1577 (2019).

    Article Google Scholar

  17. Zhu, Y. et al. Tiny object tracking: A large-scale dataset and a baseline. IEEE Trans. on NNLS 1–15, https://doi.org/10.1109/TNNLS.2023.3239529 (2023).

  18. Zhang, X. et al. Robot-person tracking in uniform appearance scenarios: A new dataset and challenges. IEEE Trans. on Human-Machine Systems 1–11, https://doi.org/10.1109/THMS.2023.3247000 (2023).

  19. Alansari, M.et al. Drone-Person Tracking in Uniform Appearance Crowd (D-PTUAC), Figshare, https://doi.org/10.6084/m9.figshare.24590568.v2 (2023).

  20. Ranftl, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. on PAMI 44, 1623–1637 (2020).

    Article Google Scholar

  21. Kirillov, A. et al. Segment anything. arXiv preprint arXiv:2304.02643 (2023).

  22. Mayer, C. et al. Transforming model prediction for tracking. In CVPR, 8731–8740 (2022).

  23. Danelljan, M. et al. Atom: Accurate tracking by overlap maximization. In CVPR, 4660–4669 (2019).

  24. Bhat, G. et al. Learning discriminative model prediction for tracking. In ICCVW, 6182–6191 (2019).

  25. Yan, B. et al. Learning spatio-temporal transformer for visual tracking. In ICCVW, 10448–10457 (2021).

  26. He, K. et al. Target-aware tracking with long-term context attention. arXiv preprint arXiv:2302.13840 (2023).

  27. Yan, S. et al. Depthtrack: Unveiling the power of rgbd tracking. In ICCVW, 10725–10733 (2021).

  28. Paul, M. et al. Robust visual tracking by segmentation. In ECCV, 571–588 (Springer, 2022).

  29. Chen, X. et al. Seqtrack: Sequence to sequence learning for visual object tracking. In CVPR, 14572–14581 (2023).

  30. Zhao, M. et al. Trtr: Visual tracking with transformer. arXiv preprint arXiv:2105.03817 (2021).

  31. Chen, X. et al. Transformer tracking. In CVPR, 8126–8135 (2021).

  32. Kristan, M. et al. Pytracking. https://github.com/visionml/pytracking (2021).

  33. Danelljan, M. et al. Probabilistic regression for visual tracking. In CVPR, 7181–7190, https://doi.org/10.1109/CVPR42600.2020.00721 (2020).

  34. Gao, S. et al. Aiatrack: Attention in attention for transformer visual tracking. In ECCV, 146–164 (Springer, 2022).

  35. Wang, N. et al. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In CVPR, 1571–1580, https://doi.org/10.1109/CVPR46437.2021.00162 (2021).

  36. Kim, M. et al. Towards sequence-level training for visual tracking. In ECCV, 534–551 (Springer, 2022).

  37. Cui, Y. et al. Mixformer: End-to-end tracking with iterative mixed attention. In CVPR, 13608–13618 (2022).

  38. Alansari, M.D-PTUAC Evaluation Scripts, Figshare, https://doi.org/10.6084/m9.figshare.24590268.v2 (2023).

  39. Ranftl, R. et al. Vision transformers for dense prediction. ArXiv preprint (2021).

  40. Bhat, G. et al. Know your surroundings: Exploiting scene information for object tracking. In ECCV, 205–221 (Springer, 2020).

  41. Ye, B. et al. Joint feature learning and relation modeling for tracking: A one-stream framework. In ECCV, 341–357 (Springer, 2022).

  42. Danelljan, M. et al. Eco: Efficient convolution operators for tracking. In CVPR, 6931–6939, https://doi.org/10.1109/CVPR.2017.733 (2017).

  43. Blatter, P. et al. Efficient visual tracking with exemplar transformers. In WACV, 1571–1581 (2023).

  44. Bhat, G. et al. Learning what to learn for video object segmentation. In ECCV, 777–794 (Springer, 2020).

  45. Wu, Q. et al. Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks. In CVPR, 14561–14571 (2023).

  46. Mayer, C. et al. Learning target candidate association to keep track of what not to track. In ICCV, 13424–13434, https://doi.org/10.1109/ICCV48922.2021.01319 (2021).

  47. Chen, Y.-H. et al. Neighbortrack: Improving single object tracking by bipartite matching with neighbor tracklets. arXiv preprint arXiv:2211.06663 (2022).

Download references

Acknowledgements

This work was supported by the Khalifa University of Science and Technology under Award RC1-2018-KUCARS.

Author information

Authors and Affiliations

  1. Department of Electrical Engineering and Computer Science, Abu Dhabi, 00000, UAE

    Mohamad Alansari,Sara Alansari,Sajid Javed,Abdulhadi Shoufan&Naoufel Werghi

  2. Department of Aerospace Engineering, Abu Dhabi, 00000, UAE

    Oussama Abdul Hay&Yahya Zweiri

  3. Advanced Research and Innovation Center (ARIC), Abu Dhabi, 00000, UAE

    Oussama Abdul Hay&Yahya Zweiri

  4. Center for Autonomous Robotic Systems, Abu Dhabi, 00000, UAE

    Sajid Javed&Naoufel Werghi

  5. Center for Cyber-Physical Systems, Abu Dhabi, 00000, UAE

    Abdulhadi Shoufan&Naoufel Werghi

Authors

  1. Mohamad Alansari

    You can also search for this author in PubMedGoogle Scholar

  2. Oussama Abdul Hay

    You can also search for this author in PubMedGoogle Scholar

  3. Sara Alansari

    You can also search for this author in PubMedGoogle Scholar

  4. Sajid Javed

    You can also search for this author in PubMedGoogle Scholar

  5. Abdulhadi Shoufan

    You can also search for this author in PubMedGoogle Scholar

  6. Yahya Zweiri

    You can also search for this author in PubMedGoogle Scholar

  7. Naoufel Werghi

    You can also search for this author in PubMedGoogle Scholar

Contributions

Conceptualization, M.A., O.A.H.; Data Curation, M.A.; Experiments Conception, M.A., O.A.H., S.A.; Experiments Conduction, M.A., O.A.H., S.A.; Formal Analysis, M.A.; Investigation, M.A.; Methodology, M.A., O.A.H.; Results Analysis, M.A.; Software, M.A., O.A.H., S.A.; Supervision, S.J., A.S., Y.Z., and N.W.; Visualization, M.A., O.A.H., S.A.; Website, S.A.; Writing, M.A., O.A.H., S.A.; All authors reviewed the manuscript.

Corresponding author

Correspondence to Mohamad Alansari.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (14)

Cite this article

Alansari, M., Abdul Hay, O., Alansari, S. et al. Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset. Sci Data 11, 15 (2024). https://doi.org/10.1038/s41597-023-02810-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41597-023-02810-y

Drone-Person Tracking in Uniform Appearance Crowd: A New Dataset (2024)
Top Articles
DAG structure is one more advantage of TFSC
Dagbase: A Decentralized Database Platform Using DAG-Based Consensus
Hotels Near 6491 Peachtree Industrial Blvd
Fort Morgan Hometown Takeover Map
Melson Funeral Services Obituaries
The Potter Enterprise from Coudersport, Pennsylvania
1movierulzhd.fun Reviews | scam, legit or safe check | Scamadviser
Retro Ride Teardrop
Toyota gebraucht kaufen in tacoma_ - AutoScout24
Stream UFC Videos on Watch ESPN - ESPN
Toonily The Carry
Pwc Transparency Report
Facebook Marketplace Charlottesville
Pro Groom Prices – The Pet Centre
California Department of Public Health
How Much Is Tay Ks Bail
Traveling Merchants Tack Diablo 4
College Basketball Picks: NCAAB Picks Against The Spread | Pickswise
Elbert County Swap Shop
Everything To Know About N Scale Model Trains - My Hobby Models
Scripchat Gratis
Shelby Star Jail Log
Cowboy Pozisyon
Jesus Calling Feb 13
My Reading Manga Gay
Miles City Montana Craigslist
Till The End Of The Moon Ep 13 Eng Sub
Helpers Needed At Once Bug Fables
The Posturepedic Difference | Sealy New Zealand
Uky Linkblue Login
Beaver Saddle Ark
Whas Golf Card
How to Get Into UCLA: Admissions Stats + Tips
How to Destroy Rule 34
Retire Early Wsbtv.com Free Book
2024 Ford Bronco Sport for sale - McDonough, GA - craigslist
State Legislatures Icivics Answer Key
2700 Yen To Usd
Daly City Building Division
Google Flights Orlando
Gold Dipping Vat Terraria
Santa Clara County prepares for possible ‘tripledemic,’ with mask mandates for health care settings next month
Wilson Tire And Auto Service Gambrills Photos
Television Archive News Search Service
Best Haircut Shop Near Me
Pas Bcbs Prefix
Rubmaps H
Sml Wikia
De Donde Es El Area +63
Kindlerso
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated:

Views: 5512

Rating: 4.2 / 5 (43 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.