photo JY Antoine
Jean-Yves ANTOINE
France Deutsch Portugese Brezhonneg
Home Research Activities Publications Teaching Activities History of Art


> Research > Evaluation
Evaluation of NLP systems

icone Presentation


Evaluation is nowadays a central question in terms of best practices for Natural Language Engineering. Evaluation campaigns are regularly conducted on a large variety of application domains and provide useful information on the behaviour of NLP systems in real situations of use.  One should however regret that most of these evaluations favour immediate results (tuning on development and test data) and do not really provide a deep explanation on the successes and failures of the assessed systems. This is why I conduct some methodological thoughts on how assessing NLP systems.
  • Man-machine dialogue and speech understanding - Definition of several methodologies (DCR, DEFI) of test for the predictive evaluation of speech understanding. Based on linguistically-motivated test suites, these paradigms associate the definition of objective metrics with a detailed analysis of the behaviour of speech understanding systems. They have inspired the evaluation paradigm used during the MEDIA/EVALDA sortie nouvelle fenetre (2002-2005) French-speaking evaluation campaign of speech understanding systems
  • Augmentative and Alternative communication - Evaluation of AAC systems from the user point of view, aside from standard but quite artificial metrics such as keystroke saving rates (KSR).
  • Emotion detection and annotation - Experimental studies of the inter-coders agreement metrics currently used in NLP when considering the emotion annotation of speech corpus, as well as analysis of the influence of this agreement on the results of evaluation campaign in emotion detection. In particular, this work has shown some limitations of the Kappa statistics for NLP tasks. 

 icone Works and projects



Spoken language processing

  • ARC ILOR-B2 de l'AUF (1996-2000) - DCR (Demand - Control - Request) methodology for the evaluation of speech understanding, in collaboration with Jérôme ZEILIGER (ICP Grenoble), Jean CAELEN (CLIPS-IMAG, désormais LIG Grenoble) and Jacques SIROUX (IRISA, Lannion).
  • 5.5 Workgroup on Speech understanding sortie nouvelle fenetre (GDR-I3 du CNRS ; 1998-2005) - DEFI ("Contest") methodology for the evaluation of speech understanding
  • MEDIA/EVALDA sortie nouvelle fenetre (2002-2005) campaign of evaluation of French speech understanding systems.
  • Ester 2nouvelle fenêtre (2009) and ETAPE nouvelle fenêtre (2012) evaluation campaigns : named entities recognition in speech broadcastings.
  • Augmentative and Alternative Communication (AAC)

  • ESAC_IMC sortie nouvelle fenetre (Fondation Motrice sortie nouvelle fenetre , 2006-2007) - Survey of the behaviour of AAC systems with users suffering from additional language disabilities (dyslexia...).
  • VOLTAIRE project (AFM sortie nouvelle fenetre , 2008-2009) - Integration of Sibylle word prediction in the CVK sortie vers site ESAC_IMC / CiViKey sortie vers site ESAC_IMC freeware virtual keyboard - Long-term evaluation of the keyboard with real users (PhD of Samuel Pouplin sortie nouvelle fenetre). 

 icone Some publications


  • Jean-Yves ANTOINE, Marc LE TALLEC, Jeanne VILLANEAU (2011) Evaluation de la détection des émotions, des opinions ou des sentiments : dictatute de la majorité ou respect de la diversité d'opinions ? Actes TALN'2011, Montpellier, France, Juillet 2001 [HAL-00625727] document PDF PUR 2005
  • Damien NOUVEL, Jean-Yves ANTOINE, Nathalie FRIBURGER, Denis MAUREL (2010) An analysis of the performances of the CasEN named entities detection system in the Ester2 evaluation campaign. Proc. 9th European conference on Language Resources and Evaluation, LREC’2010, Valetta, Malta, May 2010.document PDF LREC'2010 [HAL-00502370
  • Philippe BOISSIERE, Igor SCHADLE, Jean-Yves ANTOINE (2006) A methodological framework for writing assistance systems: applications to sibylle and VITIPI systems.  AMSE Journal on Modelling, Mesurement & Control, Série  C., Barcelona, Spain. Vol 67, pp. 167-176 document PDF article ASME 2006.
  • Laurence DEVILLERS, H. MAYNARD, P. PAROUBEK, S. ROSSET, J-Y. ANTOINE, F. BECHET, C. BOUSQUET, O. BONTRON, L. CHARNAY, K. CHOUKRI, K. McTAIT, L. ROMARY, M. VERGNES, N. VIGOUROUX (2004) The French MEDIA/EVALDA project: the evaluation of the understanding capability of Spoken Language Dialogue Systems. Proc.  4th European Conference on Language Resources and Evaluation, LREC'2004, Lisbonne, Portugal document LREC'2004.
  • Jean-Yves ANTOINE, Caroline BOUSQUET-VERNHETTES, Jerome GOULIAN, Mohamed Zakaria KURDI, Sophie ROSSET, Nadine VIGOUROUX, Jeanne VILLANEAU (2002) Predictive and objective evaluation of speech understanding: the “challenge” evaluation campaign of the I3 speech workgroup of the French CNRS. Proc. 3rd International Conference on Language Resources & Evaluation, LREC’2002, Las Palmas de Gran Canaria, Espagne. pp.529-535 document LREC 2002
  • Jean-Yves ANTOINE, Jacques SIROUX, Jean CAELEN, Jeanne VILLANEAU, Jerome GOULIAN, Mohamed AHAFHAF (2000) Obtaining predictive results with an objective evaluation of spoken dialogue systems : experiments with the DCR assessment paradigm, Proc.  2nd International Conference on Language Resources & Evaluation, LREC’2000, Athenes, Grèce document PDF LREC 2000.
Evaluation of NLP systems rappel haut de page

Jean-Yves ANTOINE - Last update : march 17th 2012