How to (Virtually) Train Your Speaker Localizer

Prerak Srivastava; Antoine Deleforge; Archontis Politis; Emmanuel Vincent

Communication Dans Un Congrès Année : 2023

How to (Virtually) Train Your Speaker Localizer

(1) , (1) , (2) , (1)

1
2

Prerak Srivastava

Fonction : Auteur
PersonId : 1106538

Speech Modeling for Facilitating Oral-Based Communication

Antoine Deleforge

Fonction : Auteur
PersonId : 10056
IdHAL : antoine-deleforge
ORCID : 0000-0003-0339-7472
IdRef : 184451205

Speech Modeling for Facilitating Oral-Based Communication

Archontis Politis

Fonction : Auteur
PersonId : 1031986

University of Tampere [Finland]

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Learning-based methods have become ubiquitous in speaker localization. Existing systems rely on simulated training sets for the lack of sufficiently large, diverse and annotated real datasets. Most room acoustics simulators used for this purpose rely on the image source method (ISM) because of its computational efficiency. This paper argues that carefully extending the ISM to incorporate more realistic surface, source and microphone responses into training sets can significantly boost the real-world performance of speaker localization systems. It is shown that increasing the training-set realism of a state-of-the-art direction-of-arrival estimator yields consistent improvements across three different real test sets featuring human speakers in a variety of rooms and various microphone arrays. An ablation study further reveals that every added layer of realism contributes positively to these improvements.

Mots clés

localization direction-of-arrival image source directivity room acoustic simulation

Domaines

Traitement du signal et de l'image [eess.SP] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

INTERSPEECH_2023-CR.pdf (193.19 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité

Prerak SRIVASTAVA : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03855912

Soumis le : jeudi 25 mai 2023-16:40:44

Dernière modification le : jeudi 12 décembre 2024-11:27:40

Dates et versions

hal-03855912 , version 1 (21-11-2022)

hal-03855912 , version 2 (30-11-2022)

hal-03855912 , version 3 (25-05-2023)

Licence

Paternité

Identifiants

HAL Id : hal-03855912 , version 3

Citer

Prerak Srivastava, Antoine Deleforge, Archontis Politis, Emmanuel Vincent. How to (Virtually) Train Your Speaker Localizer. INTERSPEECH 2023, Aug 2023, Dublin, Ireland. ⟨hal-03855912v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD SILECS ANR

224 Consultations

360 Téléchargements

How to (Virtually) Train Your Speaker Localizer

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager