Analyzing and reproducing natural language requires an understanding of the meaning of the sentence. To meet this need, the corpus made up of more than 20,000 richly annotated sentences in French constitutes a lexical and syntactic resource of reference for linguists and computer scientists, in particular in the case of use in automatic natural language processing.



  • Automatic natural language processing
  • Semantic web, search engine
  • Human-machine dialogue (chatbots)
  • Spellchecking
  • Automatic translation
  • Language teaching


Competitive advantages

  • Quality of the corpus: annotation by automatic tools and corrected by hand by several successive passages on the different annotations
  • Available in four formats: xml (original format), Tiger-xml (the most complete format with compound components), PTB (constituent annotations), CoNLL (dependencies annotations)
  • Rich annotation : domain, author, date; compound words (and components), 218 morpho-syntactic labels, grammatical functions and trees of syntactic constituents


Intellectual property

Corpus filing on 01/28/2018 with APP, n°IDDN FR 001 050008 000 D C 2008 000 10300



Lexical ressources - Syntatic ressources - NLP

Télécharger la fiche de technologie

Tout s’accélère.
Et vous ?

Erganeo se tient à votre écoute.

votre sujet
Pour toute information concernant les données personnelles, consultez les mentions légales.