16 posts found
Using Pandas for quality error reduction in data repositories
There is no doubt that data has become the strategic asset for organisations. Today, it is essential to ensure that decisions are based on quality data, regardless of the alignment they follow: data analytics, artificial intelligence or reporting. However, ensuring data repositories with high levels…
Understanding Word Embeddings: how machines learn the meaning of words
Natural language processing (NLP) is a branch of artificial intelligence that allows machines to understand and manipulate human language. At the core of many modern applications, such as virtual assistants, machine translation and chatbots, are word embeddings. But what exactly are they and why are…
Open science and information systems for research
The European Open Science Cloud (EOSC) is a European Union initiative that aims to promote open science through the creation of an open, collaborative and sustainabledigital research infrastructure. EOSC's main objective is to provide European researchers with easier access to the data, tools and re…
GeoParquet 1.0.0: new format for more efficient access to spatial data
Cloud data storage is currently one of the fastest growing segments of enterprise software, which is facilitating the incorporation of a large number of new users into the field of analytics.
As we introduced in a previous post, a new format, Parquet, has among its…
The relevance of open data for medical research: The Case of FISABIO Foundation
In the digital age, technological advancements have transformed the field of medical research. One of the factors contributing to technological development in this area is data, particularly open data. The openness and availability of information obtained from health research provide multiple benefi…
When to use each programming language in data science?
Python, R, SQL, JavaScript, C++, HTML... Nowadays we can find a multitude of programming languages that allow us to develop software programmes, applications, web pages, etc. Each one has unique characteristics that differentiate it from the rest and make it more appropriate for certain tasks. But h…
Chatbots or virtual assistants in Public Administrations to democratize the use of open data
According to the latest analysis conducted by Gartner in September 2021, on Artificial Intelligence trends, Chatbots are one of the technologies that are closest to deliver effective productivity in less than 2 years. Figure 1, extracted from this report, shows that there are 4 technologies that are…
How are we addressing the shortage in professional competencies for working with data?
Over the last decade we have seen how national and international institutions, as well as national governments and business associations themselves have been warning about the shortage of technological profiles and the threat this poses to innovation and growth. This is not an exclusively European p…
Why should you use Parquet files if you process a lot of data?
It's been a long time since we first heard about the Apache Hadoop ecosystem for distributed data processing. Things have changed a lot since then, and we now use higher-level tools to build solutions based on big data payloads. However, it is important to highlight some best practices related to ou…
Kaggle and other alternative platforms for learning data science
The profession of the data scientist is booming. According to him 2020 LinkedIn Emerging Jobs Report, the demand for data science specialists grew 46.8% compared to the previous year, being especially demanded in sectors such as banking, telecommunications or research. The report also indicates…