Complex question generation using discourse-based data augmentation - Méthodes et Ingénierie des Langues, des Ontologies et du Discours Access content directly
Conference Papers Year : 2024

Complex question generation using discourse-based data augmentation


Question Generation (QG), the process of generating meaningful questions from a given context, has proven to be useful for several tasks such as question answering or FAQ generation. While most existing QG techniques generate simple, fact-based questions, this research aims to generate questions that can have complex answers (e.g. "why" questions). We propose a data augmentation method that uses discourse relations to create such questions, and experiment on existing English data. Our approach generates questions based solely on the context without answer supervision, in order to enhance question diversity and complexity. We use an encoder-decoder trained on the augmented dataset to generate either one question or multiple questions at a time, and show that the latter improves over the baseline model when doing a human quality evaluation, without degrading performance according to standard automated metrics.
Fichier principal
Vignette du fichier
2024.codi-1.10.pdf (250.36 Ko) Télécharger le fichier
Origin Publisher files allowed on an open archive

Dates and versions

hal-04598235 , version 1 (04-06-2024)


  • HAL Id : hal-04598235 , version 1


Kushnur Binte, Philippe Muller, Chloé Braud. Complex question generation using discourse-based data augmentation. Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024), ACL, Mar 2024, Malta, Malta. ⟨hal-04598235⟩
49 View
5 Download


Gmail Mastodon Facebook X LinkedIn More