Evaluation of NLP systems
Presentation
Evaluation
is nowadays a central question in terms of best practices for Natural
Language Engineering. Evaluation campaigns are regularly conducted on a
large variety of application domains and provide useful information on
the behaviour of NLP systems in real situations of use. One
should however regret that most of these evaluations favour immediate
results (tuning on development and test data) and do not really provide
a deep explanation on the successes and failures of the assessed
systems. This is why I conduct some methodological thoughts on how
assessing NLP systems.
- Man-machine dialogue and
speech understanding -
Definition of several methodologies (DCR, DEFI) of test for
the
predictive evaluation of speech understanding. Based on
linguistically-motivated test
suites, these paradigms associate the definition of objective metrics
with a detailed analysis of the behaviour of speech understanding
systems. They have inspired the evaluation paradigm used during the MEDIA/EVALDA
(2002-2005)
French-speaking evaluation campaign of speech understanding
systems
- Augmentative and
Alternative communication
- Evaluation of AAC systems from the user point of view, aside from
standard but quite artificial metrics such as keystroke saving rates
(KSR).
- Emotion detection and annotation
- Experimental studies of the inter-coders agreement metrics currently
used in NLP when considering the emotion annotation of speech corpus,
as well as analysis of the influence of this agreement on the
results of evaluation campaign in emotion detection. In particular,
this work has shown some limitations of the Kappa statistics for NLP
tasks.
Works
and projects
Spoken
language processing
- ESAC_IMC
(Fondation
Motrice
, 2006-2007) -
Survey of the behaviour of AAC systems with users suffering from
additional language disabilities (dyslexia...).
- VOLTAIRE
project (AFM
,
2008-2009)
- Integration of Sibylle word prediction in the CVK
/ CiViKey
freeware virtual keyboard - Long-term evaluation of the keyboard with
real
users (PhD of Samuel
Pouplin ).
Some
publications
- Jean-Yves ANTOINE, Marc LE
TALLEC, Jeanne VILLANEAU (2011) Evaluation de
la
détection des émotions, des opinions ou des
sentiments :
dictatute de la majorité ou respect de la
diversité
d'opinions ? Actes
TALN'2011,
Montpellier, France, Juillet 2001 [HAL-00625727]

- Damien
NOUVEL, Jean-Yves ANTOINE, Nathalie FRIBURGER, Denis MAUREL (2010)
An analysis of the performances of the CasEN named entities detection
system in the Ester2 evaluation campaign.
Proc. 9th European
conference on Language Resources and Evaluation, LREC’2010,
Valetta, Malta, May 2010.
[HAL-00502370]
- Philippe BOISSIERE, Igor SCHADLE,
Jean-Yves ANTOINE (2006) A methodological framework for
writing assistance systems: applications to sibylle and VITIPI
systems. AMSE
Journal on Modelling, Mesurement & Control,
Série C., Barcelona, Spain. Vol 67, pp. 167-176
.
- Laurence
DEVILLERS, H. MAYNARD, P. PAROUBEK, S. ROSSET, J-Y. ANTOINE, F. BECHET,
C. BOUSQUET, O. BONTRON, L. CHARNAY, K. CHOUKRI, K. McTAIT, L. ROMARY,
M. VERGNES, N. VIGOUROUX (2004) The French
MEDIA/EVALDA project: the evaluation of the understanding capability of
Spoken Language Dialogue Systems. Proc. 4th European Conference on
Language Resources and Evaluation, LREC'2004, Lisbonne,
Portugal
.
- Jean-Yves
ANTOINE, Caroline BOUSQUET-VERNHETTES, Jerome GOULIAN, Mohamed Zakaria
KURDI,
Sophie ROSSET, Nadine VIGOUROUX, Jeanne VILLANEAU (2002)
Predictive and objective evaluation of speech understanding: the
“challenge” evaluation campaign of the I3 speech
workgroup of the French CNRS. Proc. 3rd International Conference on
Language Resources & Evaluation, LREC’2002,
Las Palmas de Gran Canaria, Espagne. pp.529-535

- Jean-Yves
ANTOINE, Jacques SIROUX, Jean CAELEN, Jeanne VILLANEAU, Jerome GOULIAN,
Mohamed AHAFHAF (2000) Obtaining predictive results
with an objective evaluation of spoken dialogue systems : experiments
with the DCR assessment paradigm, Proc. 2nd International Conference on
Language Resources & Evaluation, LREC’2000,
Athenes, Grèce
.
Evaluation
of NLP systems
|