The most popular data analysis tools
Fecha de la noticia: 21-04-2021

Data analysis is a process that allows us to obtain knowledge of the underlying information of the data, with the purpose of drawing conclusions that allow us to make informed decisions. Without data analytics, companies and organizations are limited in examining their results and determining the direction to take to have the best chance of success.
Types of analytics
Within the field of analytics we find different processes that try to respond to the past, present and future of our activities:
- Exploratory analysis, which subjects the data to statistical treatment, to determine why a certain event has occurred.
- Descriptive analysis, which explores the data from different perspectives to find out what happened.
- Predictive analytics, which allows predicting future values of the variables of interest to know what will happen.
- Prescriptive analysis, that offers recommendations when testing the environment variables and suggesting those with the highest probability of generating a positive result.
This article contains a selection of popular data analysis tools that will allow you to perform these tasks, divided based on two target audiences:
- Tools that perform simple analysis and do not involve programming tasks, aimed at users without advanced technical knowledge.
- Tools that present greater versatility, but require the use of programming languages, so they are aimed at users with mathematical and computer knowledge.
It is convenient to remember that before carrying out any analysis of this type it is necessary to transform the data that we use so that they have the same structure and format, free of errors, something that we already saw in the article Data conversion and debugging tools.
Data analysis tools for non-programmers
WEKA
Functionality:
WEKA is a cross-platform machine learning and data mining software. Its functionalities can be accessed through a graphical interface, a command line or a Java API.
Main advantages:
One of its main advantages is that it contains a large number of built-in tools for standard machine learning tasks and allows access to other tools such as scikit-learn, R and Deeplearning4j.
Do you want to know more?
- Support materials: As an appendix to the book Data Mining: Practical Machine Learning: tools and techniques, we found this WEKA manual that brings us closer to its panels and functionalities. It includes methods for the main data mining problems: regression, classification, clustering, association rules, and attribute selection. We also have at our disposal on the net this manual and these tutorials prepared by the University of Waikato, a body that has also launched a Blog on matter.
- Repository: The official WEKA source code is available at this url. You can also access it from this repository Github, as well as different packages or tools.
- User community: You can find user groups in Stackoverflow.
KNIME
Functionality:
KNIME is a data mining software that allows data analysis and visualizations through a graphical interface.
Main advantages:
The graphical interface on which the data analysis flows are modeled uses nodes, which represent the different algorithms and arrows that show the flow of data in the processing pipeline. In addition, it allows incorporating code developed in R and Python, as well as interaction with WEKA.
Do you want to know more?
- Support materials: On KNIME's own website you can find different help documents, that guide you in its installation, the creation of workflows or the use of nodes. In addition, on his channe lYoutube you can find multiple videos, including playlists with basic aspects for users who are facing this tool for the first time.
- Repository: On GitHub Tools are provided to configure the KNIME SDK (Software Development Kit), so that you can work with the source code of the extensions or develop your own.
- User community: KNIME users have groups at their disposal to answer questions in Gitter Y Stackoverflow, as well as a discussion forum on the website of Knime.
- Social media: You can follow the Twitter account @knime and his profile of LinkedIn to keep up to date with KNIME news and related events or talks.
ORANGE
Functionality:
Orange is open machine learning and data mining software, similar to Knime.
Main advantages:
Orange creates the analysis and data visualizations using the drag and drop paradigm from awidget catalog representing different tasks. Also, it can be installed as a Python library.
Do you want to know more?
- Support materials: In this case we highlight two books. First, Introduction to data mining with Orange, which collects the workflows and visualizations of the course on Introduction to Data Mining from Orange himself. Second, Orange Data Mining LibraryDocumentation, a brief introduction to scripting in Orange. You can also find video tutorials on the YouTube channel Orange Data Mining.
- Repository: From this GitHub you can download the necessary resources for its installation.
- User Community: In Gitter, StackExchange and Stackoverflow users have created spaces where they can ask questions and share experiences.
- Social media: On twitter profile@OrangeDataMiner and his LinkedIn account reports, events, use cases and news related to this tool are collected.
Data analysis tools for non-programmers
R (The R Project for statistical computing)
Functionality:
R is an interpreted object-oriented programming language, initially created for statistical computing and the creation of graphical representations.
Main advantages:
R is one of the most used languages in scientific research and this is due to its many advantages:
- It has a programming environment, R-Studio.
- It consists of a set of functions that can be easily expanded by installing libraries or defining custom functions.
- It is permanently updated due to its extensive community of users and programmers, who since its inception contribute to the development of new functions, libraries and updates available to all users freely and for free.
Do you want to know more?
- support materials: Due to its popularity, there are a large number of helpful materials. As an example we highlight the books R for Data Science and R manual. You can also find guides in the web space The R Manuals and the webinars that from the own R Studio they organize.
- User community: There is a discussion space in Stackoverflow. In addition, in Spain, we find two groups that carry out different activities (hackathons, conferences, courses ...) to promote the use of R: R-Hispanic community and R-Ladies. You can know more about them in this article.
- Social media: R has a LinkedIn group with almost 150,000 members.
Python
Funcionalidad:
Python is a dynamic, multi-platform, multi-paradigm interpreted programming language, partially supporting object-oriented programming, structured programming, imperative programming, and functional programming.
Main advantages:
It is a simple programming language. Its philosophy emphasizes providing human-readable, easy-to-use, and easy-to-learn code syntax. In addition, it allows the integration of libraries such as Matplotlib, Bokeh, Numpy, Pandas or spaCy, to implement functions that enable complex interactive graphing and statistical analysis.
Do you want to know more?
- Support materials: As with R, being a very popular language, we find a lot of materials and help on the net, such as tutorials The Python Tutorial Y LearnPython.org, or the portal with videos Pyvideo, where you can find various webinars.
- Repository: In Github you can find different repositories related to the Python programming language.
- Community of users: Those uruaries with questions can seek the help of people in the same situation in Stackoverflow or Gitter. On Python's own website you can also find a large number of communities Worldwide.
- Social media: The official twitter profile of the Python Software Foundation is @ThePSF. There is also group in Linkedin.
GNU Octave
Functionality:
GNU Octave is a programming language designed primarily to solve computational algebra tasks. It is the best known alternative to the commercial MATLAB solution, but free and open. Also, it does not have a graphical interface.
Main advantages:
GNU Octave has powerful built-in mathematical functions (differential equations, linear algebra, matrix calculus) and can be extended with the addition of libraries, such as Scientific Library, Dionysus or Bc. It also has a index package with numerous extensions that enrich the functionality of the tool.
Do you want to know more?
- Support materials: In this link You have the notes of the GNU Octave course from the Complutense University of Madrid. On the GNU Octave website you can also find manuals and on your youtube profile, video tutorials.
- Repository: The GNU Octave developer community has a variety of repositories on Github with materials of interest.
- User Community: In Stackoverflow and in the GNU Octave website there is a space for users to share opinions and experience.
- Social media: You can follow the news related to this tool on the Twitter account @GnuOctave and this group of LinkedIn.
The following table shows a summary of the tools mentioned above:
This is just a selection of data analysis tools, but there are many more. We invite you to share your experience with these or other solutions in the comments.
For those who want to know more about these tools and others that can help us during the different phases of data processing, at datos.gob.es we offer you the recently updated report "Data processing and visualization tools". You can see the full report here.
You can see more tools related to this field in the following monographs:
- The most popular data conversion and data cleaning tools
- The most popular data visualisation tools
- The most popular data visualisation libraries and APIs
- The most popular geospatial visualisation tools
- The most popular network analysis tools
Content prepared by the datos.gob.es team.