DFKI (German Research Centre for Artificial Intelligence)
ILSP (Institute for Language and Speech Processing, Greece)
ELDA (Evaluations and Language Resources Distribution Agency, France)
Tilde (Latvia)
IDC (United Kingdom)
Project Description
The development of language technologies requires substantial amounts of language data. European, national and regional public services in all EU Member States continuously deal with a huge amount of multilingual textual information in original and translated form. The ELRC (European Language Resource Coordination) action aims at minimising language barriers across the EU.
The action consists of a number of service contracts carried out by the consortium partners for the Multilingualism sector of the EC department DG CNECT. The action is governed by the Language Resource Board, which consists of leading technological and public service representatives for EU member states.
Activities undertaken in the service contracts include collection of data for training systems, setup of a helpdesk for data providers, awareness raising on language technologies, assessment of tools and techniques, and setup of a market study.
The ELRC action resulted in a variety of resources and events:
- Language data sharing facility (ELRC-SHARE)
- Helpdesk for offering legal and technical advice to data providers
- Large parallel resources (sentences and their translations) in all EU official languages
- Workshops on various language technology topics: machine translation (MT), terminology, summarisation, speech recognition, named-entity recognition, chatbots, large language models, language simplification, …
- Studies on tools and techniques: subtopics of MT, computer-aided translation tools, document classification, readability assessment, website translation, anonymisation of documents, multilingual fake news detection
- Market study on MT, speech technology, and search technologies in Europe and other continents
- Meetings with EC teams working on various domains, to advise them on the potential of language technology
