Read, look and detect: Bounding box annotation from image-caption pairs

Eduardo Hugo Sanchez

Pré-Publication, Document De Travail Année : 2023

Read, look and detect: Bounding box annotation from image-caption pairs

(1)

Eduardo Hugo Sanchez

Fonction : Auteur
PersonId : 1060125

IRT Saint Exupéry - Institut de Recherche Technologique

Résumé

Various methods have been proposed to detect objects while reducing the cost of data annotation. For instance, weakly supervised object detection (WSOD) methods rely only on image-level annotations during training. Unfortunately, data annotation remains expensive since annotators must provide the categories describing the content of each image and labeling is restricted to a fixed set of categories. In this paper, we propose a method to locate and label objects in an image by using a form of weaker supervision: image-caption pairs. By leveraging recent advances in vision-language (VL) models and self-supervised vision transformers (ViTs), our method is able to perform phrase grounding and object detection in a weakly supervised manner. Our experiments demonstrate the effectiveness of our approach by achieving a 47.51% recall@1 score in phrase grounding on Flickr30k Entities and establishing a new state-of-the-art in object detection by achieving 21.1 mAP 50 and 10.5 mAP 50:95 on MS COCO when exclusively relying on image-caption pairs.

Mots clés

multimodal learning weakly-supervised bounding box annotation object discovery weakly-supervised object detection weakly-supervised phrase grounding

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

RLD.pdf (4.84 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Eduardo Hugo Sanchez : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04121503

Soumis le : mercredi 7 juin 2023-20:51:29

Dernière modification le : samedi 1 juillet 2023-05:11:04

Dates et versions

hal-04121503 , version 1 (07-06-2023)

Identifiants

HAL Id : hal-04121503 , version 1
ARXIV : 2306.06149

Citer

Eduardo Hugo Sanchez. Read, look and detect: Bounding box annotation from image-caption pairs. 2023. ⟨hal-04121503⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRT_SAINT-EXUPERY

25 Consultations

26 Téléchargements

Read, look and detect: Bounding box annotation from image-caption pairs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager