The new European language data space is now available in its operational beta version.
Fecha de la noticia: 12-05-2025

With 24 official languages and more than 60 regional and minority languages, the European Union is proud of its cultural and linguistic diversity. However, this richness also represents a significant challenge in the digital and technological sphere. Advances in artificial intelligence (AI) and natural language processing have been dominated by English, creating a noticeable imbalance in the availability of language resources for most European languages.
This imbalance has direct consequences, for example:
- Asymmetric technology development: Companies and researchers have difficulty creating AI solutions adapted to specific languages because resources are limited.
- Technological dependence: Europe risks becoming dependent on language solutions developed outside its cultural and normative context.
Addressing this gap is not only a matter of inclusion, but also represents a large-scale economic opportunity, capable of generating huge gains in both trade and technological innovation. To address these challenges, the European Commission has launched the European Language Data Space (LDS), a decentralised infrastructure that promotes the secure and controlled exchange of language data among multiple actors in the European ecosystem.
Unlike a simple centralised repository, the LDS functions as a language data marketplace that allows participants to share, sell or license their data under clearly defined conditions and with full control over the use of the data.
The European Language Data Space (LDS), with a beta version operational, represents a decisive step towards democratising language technologies across all languages of the European Union. We tell you the keys to this project and the next steps.
How does this platform work?
LDS is based on a decentralised peer-to-peer (P2P) architecture that allows users to interact directly with each other, without the need for a central server or single authority, where each participant maintains control of its own data. The key elements of LDS operation are:
1. Decentralised and sovereign architecture
Each participant (whether data provider or data consumer) can locally install the LDS Connector, a software that allows interacting directly with other participants without the need for a central server.. This approach ensures:
-
Data sovereignty: owners retain full control over who can access their data and under what conditions of use.
-
Trust and security: Only eligible and authorised participants, legal entities registered in the EU, can be part of the ecosystem.
- Interoperability: is compatible with other European data spaces, following common standards.
2. Data exchange flow
The exchange process follows a structured flow between two main actors:
- The providers describe their linguistic datasets, establish access policies (licences, prices) and publish these offers in the catalogue.
- The consumers explore the catalogue, identify resources of interest and, through their connectors, initiate negotiations on the terms of use.
If both parties reach an agreement, a contract is established and the data transfer takes place securely between the connectors.
3 Supporting infrastructure
Although the exchange is decentralised, the LDS includes supporting elements such as:
-
Participant registration: ensures that only verified entities participate in the ecosystem.
-
Optional catalogue: facilitates the publication and discovery of available resources
-
Hub of vocabularies: is a service that centralises controlled vocabularies, and allows maintaining lists of values, definitions, relationships between terms, mappers between lists, etc.
- Monitoring service: allows you to monitor the overall operation of the system.
Added value for the European data ecosystem
The LDS brings significant benefits to the European digital landscape:
-
Boosting multilingual AI
By facilitating access to quality linguistic data in all European languages, the LDS contributes directly to the development of more inclusive AI models adapted to Europe's multilingual reality. This is especially relevant at a time when large language models (LLMs) are transforming human-machine interaction.
-
Strengthening the data economy
It is estimated that true digital language integration could generate enormous economic benefits in both trade and technological innovation. The LDS creates a marketplace where language data becomes valuable by incentivising its collection, processing and availability under fair and transparent conditions.
-
Preservation of linguistic diversity
By promoting technological development in all European languages, the LDS contributes to preserving and revitalising the continent's linguistic heritage, ensuring that no language is left behind in the digital revolution.
-
The crucial role of industry and public administrations
The success of the LDS depends crucially on the active participation of various actors:
-
Fresh, quality data
The platform seeks to attract especially "fresh" data from the industry (media, publishing, customer services) and the public sector, necessary to train and improve current language models. They are particularly valued:
-
Multimodal data (text, audio, video).
-
Specific content from various professional domains.
- Up-to-date and relevant language resources.
-
Participation open to all ecosystem actors
The LDS is designed to be inclusive, allowing both private organisations and public entities to participate, as long as they are legal entities registered in the EU. Both types of organisations can act as providers and/or consumers of data.
Participation is formalised through a validation process by the governance board, ensuring that all eligible organisations can benefit from this common language data marketplace.
How can you take part?
The beta version of the LDS is now operational and open to new participants. Organisations interested in participating in this initiative can:
- Join the test and focus groups: to contribute to the development and improvement of the platform, here.
- Testing the LDS connector: experimenting with the technology in controlled environments.
- Provide technical feedback : helping to define key aspects such as metadata, licensing or exchange mechanisms.
- Identify relevant data: assessing which language resources could be shared through the platform.
The future of the LDS
While LDS currently focuses on data exchange, its medium-term vision envisages the possibility of integrating language services and AI model hosting within the same ecosystem, thus reinforcing Europe's role in the development of language technologies . A pre-final version of LDS is expected to be available in July 2025 and the finalised version of LDS is expected in January 2026.
All these aspects were discussed at a free online seminar held by the European open data portal "Data spaces: experience from the European Language Data Space". You can go back to watch the webinar here.
In a global context where technological sovereignty has become a strategic priority, the European Language Data Space represents a decisive step towards ensuring that the AI revolution does not leave Europe's linguistic richness behind.