S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML, 2001.

E. Agichtein and V. Ganti, Mining reference tables for automatic text segmentation, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '04, pp.20-29, 2004.
DOI : 10.1145/1014052.1014058

H. Ao and T. Takagi, ALICE: An Algorithm to Extract Abbreviations from MEDLINE, Journal of the American Medical Informatics Association, vol.12, issue.5, pp.576-586, 2005.
DOI : 10.1197/jamia.M1757

H. Déjean and J. Meunier, A System for Converting PDF Documents into Structured XML Format, Document Analysis Systems VII, pp.129-140, 2006.
DOI : 10.1007/11669487_12

N. Fuhr and J. Kamps, Fo cused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX, 2007.

J. Han and M. Kamber, Data Mining, 2006.
DOI : 10.1007/978-1-4899-7993-3_104-2

T. Herawan and M. M. Deris, A soft set approach for association rules mining. Knowledge-Based Systems, pp.186-195, 2011.

M. Juganaru-mathieu and S. G. Brambila, Projet d'exploration et extraction de connaissances sur la pollution de l'air depuis une collection de documents publiques, STIC & Environnement 2011, pp.327-332, 2011.
URL : https://hal.archives-ouvertes.fr/emse-00675307

. Sheng-tun, L. Lia, and . Shue, Data mining to aid policy making in air pollution management, Expert Systems with Applications, vol.27, pp.331-340, 2004.

Y. Ma, M. Richards, M. Ghanem, Y. Guo, and J. Hassard, Air Pollution Monitoring and Mining Based on Sensor Grid in London, Sensors, vol.8, issue.12, pp.3601-3623, 2008.
DOI : 10.3390/s80603601

M. Mccandless, E. Hatcher, and O. Gospodnetic, Lucene in Action, 2010.

C. Medina-ramírez, La web semántica en el medio ambiente: necesidad de una ontología de dominio, 2009.

Y. Park, J. Roy, and . Byrd, Hybrid text mining for finding abbreviations and their definitions, Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp.126-133, 2001.

M. Richards, M. Ghanem, M. Osmond, Y. Guo, and J. Hassard, Grid-based analysis of air pollution data, Ecological Modelling, vol.194, issue.1-3, pp.274-286, 2006.
DOI : 10.1016/j.ecolmodel.2005.10.042

S. K. Sahu and K. S. Bakar, A comparison of Bayesian models for daily ozone concentration levels, Statistical Methodology, vol.9, issue.1-2, pp.144-157, 2012.
DOI : 10.1016/j.stamet.2011.04.009

H. Schmid, Probabilistic part-of-speech tagging using decision trees, Proceedings of International Conference on New Methods in Language Processing, pp.44-49, 1994.

C. Temiyasathit, S. B. Kim, and S. Park, Spatial prediction of ozone concentration profiles, Computational Statistics & Data Analysis, vol.53, issue.11, pp.3892-3906, 2009.
DOI : 10.1016/j.csda.2009.03.027

Y. Wang, J. David, J. Dewitt, and . Cai, X-Diff: an effective change detection algorithm for XML documents, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), pp.519-530, 2003.
DOI : 10.1109/ICDE.2003.1260818