Kaggle and other alternative platforms for learning data science
Fecha de la noticia: 16-09-2021

The profession of the data scientist is booming. According to him 2020 LinkedIn Emerging Jobs Report, the demand for data science specialists grew 46.8% compared to the previous year, being especially demanded in sectors such as banking, telecommunications or research. The report also indicates that among the capabilities that companies demand are "Machine Learning, R, Apache Spark, Python, Data Science, Big Data, SQL, Data Mining, Statistics and Hadoop." Training ourselves in this type of tools and capabilities is therefore a notable competitive advantage in the workplace.
In this context, it is not surprising that the university offer in these subjects does not stop growing. But at the same time there are also alternatives that allow us to expand our knowledge in a playful way.
Gamification to learn data science
One of the best ways to learn new skills is through play. The resolution of challenges and real cases allows us to test our knowledge and exercise new skills in an entertaining and motivating way. It is what is known as gamification, a learning technique that applies elements of game design to non-playful contexts. In this case we are talking about learning, but it can also be applied to marketing or even sectors like health and welfare, among others.
Gamification is a perfect technique for acquiring data-related capabilities, revealed through competitions such as hackathons or application and idea contests - like our Challenge Contribute -. But in recent years, online platforms that propose open competitions in the form of challenges to users have also grown.
Kaggle, a space for open competitions
Of all these platforms, the best known is Kaggle, which brings together more than 7 million registered users from around the world. It is a free platform that provides users with problems to solve using data science, predictive analytics or machine learning techniques, among others.
There are problems for beginners, like predict survival on the Titanic -a binary classification problem- or house prices, for which it is necessary to use advanced regression techniques. Some competitions start directly from companies that seek to solve a challenge that resists them and choose to open it up to platform users, as did the Santander Bank. Occasionally, there can be large cash prizes for the user who finds the best solution. An example is the american football league, which seeks to predict blows against players' helmets and awards $ 100,000 to whoever succeeds. There are also companies that specifically create contests in which the winners have the opportunity for an interview with their data science team, as did Facebook, a few years ago. Kaggle is therefore a good formula to expand the possibilities of finding a good job. Many recruiters keep their eye on the platform when it comes to locating new talent, paying particular attention to the winners of the competitions.
In addition to competitions, Kaggle offers other functionalities:
- A section to share datasets. There are currently more than 50,000 shared public data sets, which can be freely used to practice, solve competitions or train algorithms.
- Free courses, which cover topics such as Python, introduction to machine learning, geospatial analysis or natural language processing. They are designed to quickly introduce the user to essential topics and guide them through the Kaggle platform. Once you have the basic knowledge, it is time to participate in competitions.
- Notebooks, shared by Kaggle users. This is the code, along with tutorials, that the participants in the competitions have used to solve different problems. There are currently more than 500,000. In order to run and practice them, Kaggle has a computational environment designed to facilitate the reproduction of data science work.
- A discussion forum, where to solve doubts and share feedback. By signing up for Kaggle, you not only gain numerous resources, but you also become part of a community of experts. Being present in the forum is key to expanding knowledge and meeting other users, making a team and enriching yourself with the experience of those who master the subject in question.
Kaggle uses a progression system with different types of user, according to their level of performance in each area. On the one hand, there are 5 levels of performance: Novice, Contributor, Expert, Master and Grandmaster. On the other, four categories of experience in data science from Kaggle: Competitions, Notebooks, Datasets and Discussion, which refer to user participation in each area. Progress through the performance levels is done independently within each experience category, so that the same user can be a Master in Competitions, but Novice in Discussion.
The success of Kaggle is so great that in 2017 it was acquired by Google.
If you are thinking of participating in a competition, you have some tips in this post, video and presentation.
Other platforms similar to Kaggle
In addition to Kaggle, we also find other similar platforms on the web that host competitions and challenges related to data.
- DrivenData. Organize online challenges, which usually last between 2 and 3 months, some of them with financial prizes. An example of competition is the construction of machine learning algorithms capable of mapping floods using satellite images of Sentinel-1. They also have a datalab where they offer companies their services to build solutions related to data.
- Devpost. It offers a repository of hackathons that users can sign up to, most of them online. Includes company competitions such as Amazon or Microsoft. Some competition accumulates up to $ 5 million to distribute in prizes.
- Innocentive. Collect challenges from various organizations - some also with large prize figures. Although it has technical competitions, it also includes theoretical or strategic challenges in which only a theoretical proposal is necessary.
- CrowdAnalytix. With more than 25,000 users, crowdAnalytix is a community where data experts collaborate and compete to customize and optimize algorithms. An example is this competition, where the evolution of crops had to be predicted using public satellite images.
A good profile on Kaggle, or on the rest of the platforms that we have seen, will help you gain more experience and create a good portfolio of work. It will also make you more attractive to recruiters, increasing your chances of landing a good job. A good performance at Kaggle demonstrates problem-solving and teamwork skills, which are some of the characteristics necessary to become a good data scientist.
Content prepared by the datos.gob.es team.