Automatic Image Annotation for Mapped Features Detection
Detecting road features is a key enabler for autonomous driving and localization. For instance, a reliable detection of poles which are widespread in road environments can improve localization. Modern deep learning-based perception systems need a significant amount of annotated data. Automatic annotation avoids time-consuming and costly manual annotation. Because automatic methods are prone to errors, managing annotation uncertainty is crucial to ensure a proper learning process. Fusing multiple annotation sources on the same dataset can be an efficient way to reduce the errors. This not only improves the quality of annotations, but also improves the learning of perception models. In this paper, we consider the fusion of three automatic annotation methods in images: feature projection from a high accuracy vector map combined with a lidar, image segmentation and lidar segmentation. Our experimental results demonstrate the significant benefits of multi-modal automatic annotation for pole detection through a comparative evaluation on manually annotated images. Finally, the resulting multi-modal fusion is used to fine-tune an object detection model for pole base detection using unlabeled data, showing overall improvements achieved by enhancing network specialization. The dataset is publicly available.
This paper is available on arXiv.
The results are demonstrated in a video available on YouTube.
The video showcases a segment of a driving sequence from the dataset used in this study and provides a detailed presentation of the results.
Initially, the video highlights the automatic annotations generated by the three methods proposed in the paper. It then demonstrates the process of merging these annotation sources, using different colors for the crosses representing annotations to indicate the level of consensus among the methods. Specifically, annotations validated by all methods are distinguished from those that are ambiguous.
Next, the video illustrates how black patches were added to address uncertainties in the annotations. Finally, it presents the results of pole base detection using a YOLOv7 neural network. This network was trained on high-consensus automatic annotations, with the input images modified to mask ambiguous objects.
Citation
@inproceedings{noizet2024,
author = {Noizet, Maxime and Xu, Philippe and Bonnifait, Philippe},
title = {Automatic {Image} {Annotation} for {Mapped} {Features}
{Detection}},
booktitle = {2024 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS)},
pages = {9367 - 9373},
date = {2024-10-16},
url = {https://ieeexplore.ieee.org/document/10801773},
doi = {10.1109/IROS58592.2024.10801773},
langid = {en},
abstract = {Detecting road features is a key enabler for autonomous
driving and localization. For instance, a reliable detection of
poles which are widespread in road environments can improve
localization. Modern deep learning-based perception systems need a
significant amount of annotated data. Automatic annotation avoids
time-consuming and costly manual annotation. Because automatic
methods are prone to errors, managing annotation uncertainty is
crucial to ensure a proper learning process. Fusing multiple
annotation sources on the same dataset can be an efficient way to
reduce the errors. This not only improves the quality of
annotations, but also improves the learning of perception models. In
this paper, we consider the fusion of three automatic annotation
methods in images: feature projection from a high accuracy vector
map combined with a lidar, image segmentation and lidar
segmentation. Our experimental results demonstrate the significant
benefits of multi-modal automatic annotation for pole detection
through a comparative evaluation on manually annotated images.
Finally, the resulting multi-modal fusion is used to fine-tune an
object detection model for pole base detection using unlabeled data,
showing overall improvements achieved by enhancing network
specialization. The dataset is publicly available.}
}