Agata Savary - LANGUAGE RESOURCES
This page lists the language resources in whose creation I have been involved.
- PARSEME corpus - annotated corpora and tools of the PARSEME shared task on automatic identification of verbal multiword expressions (edition 1.0) (various flavors of the CC BY license), 18 languages
- Prolexbase - multilingual database and ontology of proper names (CC-BY SA license), mostly French, Polish and English
- MweLitRead - a dataset of Polish MWEs and their literal readings (CC-BY SA license)
- SEJF - Grammatical Lexicon of Polish Phraseology (CC-BY SA license)
- SEJFEK - Grammatical Lexicon of Polish Economic Phraseology and its lexicalized shallow grammar version SEJFEK4Spejd (CC-BY SA license)
- Składnica-MWEs - Polish constituency treebank Składnica enriched with MWEs annotations; the annotation results from an automatic mapping of 3 MWE resources, followed by a manual validation; over 2,000 MWEs are annotated in about 9,000 trees (GPL v3 license), see the reference paper for more details.
- PNEG - Gazetteer for Polish Named Entities (2-clause BSD license)
- PNET - Triggers for Polish Named Entities (2-clause BSD license)
- SAWA - Grammatical Lexicon of Warsaw Urban Proper Names (CC-BY SA license)
- NKJP - named entity annotation layer of the National Corpus of Polish (GNU GPL v.3 license)
This resource is a result of a pilot study on lexical description of multi-word expressions (MWEs) in four Spanish dialects from Latin America (Colombia, Costa Rica, Peru and Mexico). It is an outcome of the Business Intelligence Seminar student project carried out within the IT4BI Erasmum Mundus master program. It is available under the 2-clause BSD license.
- PPC - Polish Coreference Corpus (CC BY v.3 license)
- XML database containing the lexical description of 100 Spanish MWEs in four dialects, together with their generic and dialect-specific properties (meaning, dialect, language register, passivization, partial inflection, etc.)
- XML schema for the database
- 255 examples of MWEs in four dialects, with word-by-word translations, idiomatic readings and some examples of usage [.xlsx.zip]
- Arauco, A., Bogantes, D., Rodríguez, A., Rodríguez, E. (2015) Representation and Identification of Multiword Expressions in different Spanish Dialects, Technical Report 314, Laboratoire d'informatique, Francois Rabelais University of Tours, France [bibtex].
- Bogantes, D., Rodríguez, E., Arauco, A., Rodríguez, A., Savary, A. (2015) Towards Lexical Encoding of Multiword Expressions in Spanish Dialects, in the PARSEME 5th general meeting, 23-24 September 2015, Iași, Romania (poster) [bibtex].
- Bogantes, D., Rodríguez, E., Arauco, A., Rodríguez, A., Savary, A. (2016): "Towards Lexical Encoding of Multiword Expressions in Spanish Dialects", in the Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16), 23-28 May 2016, Portorož, Slovenia.