Satellite video demonstrates immense potential in fields such as target monitoring, emergency response, and intelligent sensing. Target detection and tracking based on high-definition remote sensing video are crucial for enhancing traffic monitoring efficiency and optimizing smart city management. However, the small size of ground targets, low signal-to-noise ratio, and complex backgrounds in remote sensing video result in poor robustness of discriminative features, posing significant challenges for detection and tracking tasks. Therefore, the competition establishes a Weak Small Target Detection and Tracking track dedicated to selecting advanced algorithms (high detection/tracking accuracy, low computational complexity) capable of accurately identifying and continuously tracking moving objects (vehicles, aircraft, ships) within high-definition optical satellite video data. This aims to expand practical applications of remote sensing video, enhance traffic monitoring, and improve road network management efficiency.
The small object recognition track requires participants to detect and track small objects, specifically vehicles, on the road and output bounding boxes for the detected and tracked targets. Participants are expected to train and submit their models using the training dataset provided by the organizers. The organizers will test the algorithm's detection and tracking performance as well as its computational efficiency on a separate test dataset. The results will be evaluated based on a scoring criteria, and the first, second, and third place winners, as well as the highest efficiency award, will be selected according to the evaluation results.
a. Preliminary round: Weak object detection performance and tracking capability of the algorithm on the test dataset.
b. Semi-Final:Assesses an algorithm's weak object detection and tracking capabilities on expanded test datasets, while evaluating its adaptability under conditions such as jitter, platform vibration, and cloud/fog interference.
c. Finals: The algorithm's overall performance on an expanded test dataset, further evaluating its adaptability under conditions such as slow traffic with high density, platform jitter, and cloud-fog interference, as well as its operational efficiency.
Scale: A total of 50–100 remote sensing video datasets containing road traffic vehicles and their annotated data;
Purpose: Dynamic traffic monitoring, smart cities.
a. Raw imagery: The imagery consists of a series of consecutive satellite images, comprising three bands: R, G, and B.
b. Label data: Labels are formatted as bounding box annotations, containing frame ID, target ID, bounding box coordinates, bounding box width, and other details.
c. Participants shall not publish research findings based on this dataset without prior permission from the organizers.
Submit a .zip compressed file that can be extracted to reveal a results folder. The results folder contains test results for the val files.
The contents of results.zip are:
results
——1-1.txt
——1-2.txt
......
The content of the TXT file is as follows:
Each column contains the following fields in sequence: frame number, target ID, X-coordinate of the top-left corner of the bounding box, Y-coordinate of the top-left corner of the bounding box, width of the bounding box, height of the bounding box, target category (fixed as 1 for vehicles), -1, -1, -1.
score =(MOTA+IDF1)/2
The MOTA and IDF1 metrics were employed to evaluate the recognition results, with respective weights of 50% and 50%. The assessment focused on whether weak targets could be detected and tracked in real time, and whether a high number of false alarms were generated.