8 posts found
Importance of data profiling, types and tools
What is data profiling?
Data profiling is the set of activities and processes aimed at determining the metadata about a particular dataset. This process, considered as an indispensable technique during exploratory data analysis, includes the application of different statistics with the main objectiv…
GeoParquet 1.0.0: new format for more efficient access to spatial data
Cloud data storage is currently one of the fastest growing segments of enterprise software, which is facilitating the incorporation of a large number of new users into the field of analytics.
As we introduced in a previous post, a new format, Parquet, has among its…
Why should you use Parquet files if you process a lot of data?
It's been a long time since we first heard about the Apache Hadoop ecosystem for distributed data processing. Things have changed a lot since then, and we now use higher-level tools to build solutions based on big data payloads. However, it is important to highlight some best practices related to ou…
The most popular network analysis tools
Two decades ago they did not exist, but in recent years, social networks have become an important part of our lives. 29 million Spaniards actively use these services, according to data compiled by EPData. On average, users spend almost 2 hours a day connected to them.
This data makes social networks…
The most popular data visualization tools
Data is a fundamental pillar in business decision making. Before making any decision, it is necessary to analyze the situation to understand the context and envision possible alternatives. And for this it is necessary to present the data in a clear and understandable way.
Data analysis is useless if…
The most popular data analysis tools
Data analysis is a process that allows us to obtain knowledge of the underlying information of the data, with the purpose of drawing conclusions that allow us to make informed decisions. Without data analytics, companies and organizations are limited in examining their results and determining the di…
The most popular data conversion and data cleaning tools
When carrying out a data analysis project, it is usual to work with different sources, which in many cases include datasets with Heterogeneous formats and structures that do not always share the same quality. Therefore, one of the first phases in any data analysis process is known as data cleaning.…
The Juan March Foundation, a pioneer in the use of data science applied to culture
Data science is reaching all sectors, from agriculture to health, tourism or transport. And culture, as illustrated by the DataLab of Juan March Foundation, a knowledge laboratory that not only seeks to manage and disseminate the contents from the Library, but also extracts valuable information that…