Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

WEIR-P: An Information Extraction Pipeline for the Wastewater Domain

Abstract : We present the MeDO project, aimed at developing resourcesfor text mining and information extraction in the wastewater domain.We developed a specific Natural Language Processing (NLP) pipelinenamed WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to network information, wastewater treatment, accidents and works,organizations, spatio-temporal information, measures and water quality. We present and evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotationfrom texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of theextension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-03211461
Contributeur : Nanée Chahinian <>
Soumis le : mercredi 28 avril 2021 - 17:10:59
Dernière modification le : mercredi 15 septembre 2021 - 10:48:03
Archivage à long terme le : : jeudi 29 juillet 2021 - 19:07:57

Fichier

RCIS_MeDO.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-03211461, version 1

Citation

Nanée Chahinian, Thierry Bonnabaud La Bruyère, Francesca Frontini, Carole Delenne, Marin Julien, et al.. WEIR-P: An Information Extraction Pipeline for the Wastewater Domain. Research Challenges in Information Science, May 2021, En ligne, Cyprus. ⟨hal-03211461⟩

Partager

Métriques

Consultations de la notice

474

Téléchargements de fichiers

125