Emerging Technologies and Open Data: Natural Language Processing
Fecha del documento: 30-04-2020

Natural language processing is making machines (computers) understand human language: spoken or in the form of text. More formally, natural language processing is a hybrid field between computer science and linguistics, which uses different techniques, some of them based on artificial intelligence, to interpret human language.
In this report, prepared by the digital transformation expert Alejandro Alija, we will see how natural language processing is much closer to our day-to-day life than we may initially think. Applications such as automatic translation of texts; sentiment analysis in social networks; the searches we carry out on the internet; the generation of meteorological summaries or the simple requests that we make to our smart speaker, have a strong technological component of natural language processing.
The specific weight that natural language processing has (and will have) in industry and the economy is increasing, since most of the data that is produced in the world (mainly through the Internet) is data in the form of texts and voice (unstructured data). Open data plays a crucial role for this technology. The artificial intelligence algorithms that are used to analyze and understand natural language require a huge amount of quality data to be trained. Many of these data come from the open data repositories of both public and private institutions.
Throughout this report, the history of natural language processing is reviewed, from its inception to the present day. Additionally, the Inspire section describes some of the most representative use cases that harness the potential of natural language processing. The prediction of text when writing a new email, the classification of texts in categories or the generation of false news, are just some of the cases that are reviewed in this report.
Finally, for those more enthusiastic readers, in the Action section, a complete use case (using programming tools) is developed on sentiment analysis in conversations about citizen public debates.
The report ends with a list of resources and readings for those users who wish to continue expanding their knowledge of Artificial Intelligence.
You can download the full report and other additional materials at the following links:
Note: The published code is intended to be a guide for the reader, but may require external dependencies or specific configurations for each user who wants to run it.
Documentation
- Presentación: Tecnologías emergentes y datos abiertos: Procesamiento del Lenguaje Natural (only available in Spanish)pptx3.29 MB
- Resumen ejecutivo: Tecnologías emergentes y datos abiertos: Procesamiento del Lenguaje Natural (only available in Spanish)docx51.88 KB