SASASQ : Système d'Apprentissage Supervisé Automatique pour la Classification des Questions
Abstract
Most question&answer systems are based on three main axes: question classification and analysis, documents retrieval and answer extraction. The performance in every stage affects the final result. The classification of questions appears as an important task because it deduces the type of expected answers. In this paper, we present a method of improving of the performance of classifier, based on the linguistic analysis (semantic, syntactic and morphological) and statistical approaches guided by a layered semantic hierarchy of fine grained questions types. In fact, we propose two methods of questions expansion. The first, aims to add for each word the synonyms matching it contextual sence, and the second adds a high representation "generalisation" for the noun. Various features of representation of documents, term frequency and machine learning algorithms are studied. Experiments conducted on real data are presented show an improvement of the precision in the classification of questions.