Open Data and Natural Language Processing Technologies

Share

Fecha de la noticia: 27-09-2016

Nowadays, more than six thousand tweets per second are sent in the world, there are more than nine billion web pages and the European Union translates a volume similar to 1500 quijotes texts per year.

This volume of electronic texts is no longer humanly manageable, but it is imperatively necessary to harness it. Therefore, the automatic exploitation of this valuable source of information is urgent and necessary.

To that end, Language Technologies, which include both natural language processing technologies (NLP) and machine translation, seek to smooth the way for the automatic understanding of human language. This presents great challenges. Computer systems process data easily, the information has a structure and a unique and explicit meaning (structured information). They can easily handle tables with millions of numerical data, for example. But human language is much more complex and subtle, it is full of nuances and peculiarities, their meanings may vary depending on the context and make reference to not explicit information or convey irony, etc. This discipline can be applied to a diverse range of areas from computer-assisted translation to the retrieval of relevant information or opinion mining; making it the germ of an emerging and innovative industry with great growth opportunities.

Organizations accumulate great amounts of textual information in electronic format, whose transformation into reusable formats and publication under open licenses can turn it into the fuel for language technologies industry.

The value of these texts is twofold:

On one hand its direct value as an informative raw material to generate relevant information through language technologies.
And no less important, they are really useful to create and train the own language technology (a good example is the translation memories of the Directorate-General for Translation of the European Commission, which are the most downloaded datasets on the open data portal of the EU).

To put the focus on the potential benefits of the confluence of Open Data and Language Technologies, and addressing the social, economic, legal and technical challenges, two events will take place in the context of the International Open Data Conference 2016, to be held in Madrid in October this year.

The first of them is a workshop that will take place on October 5, where the challenges and potential benefits by the confluence of open data and natural language processing technologies are analysed.

This workshop, part of the activities previous to IODC, brings together a group of experts in different aspects of this multifaceted field who will have time to share and discuss with the audience, their revealing but different visions and experiences in a collective effort to enrich our knowledge about the confluence of Open Data and Language Technologies.

This workshop is divided into three sections dedicated to Challenges, Experiences and Public Policies, ending with the session Next Steps, where experts will recap its recommendations for the future.

In the section about related public policies, initiatives such as the Plan to Promote Language Technologies in Spain and CEF.AT in the European Union will be described. The last project is a clear example of how the confluence of the natural language processing technologies and the re-use of open data has a positive impact by facilitating the creation of a machine translation platform that will allow digital public services in the EU be multilingual, apart from the exchange of information between public administrations from different countries.

In addition to this workshop, on October 6 a session on this same topic will be hold for all members of the open data community to forge a network of stakeholders interested in the confluence of Open Data and Language Technologies.

For more information, data from these two events are available on the official website of the International Open Data Conference or the online portal of the Digital Agenda for Spain.