Data products with GraphQ
Fecha de la noticia: 25-05-2021

When it comes to infrastructure, the foundation of our modern civilisation is based on concrete and steel. Wherever we look, large structures such as roads, buildings, cars, trains and planes are made of concrete and metal. In the digital world, everything is made of data and APIs. From the moment we wake up in the morning and look at our mobile phones, we interact with data and APIs.
Introduction
When we check our email, ask ourselves what the weather is going to be like or check our route by car, we do so through applications (mobile, web or desktop) that make use of APIs to return the data we request. We have already talked at length about APIs, through various contents where we have explained, for example, how to publish open data using this mechanism or how to ensure that the APIs comply with certain specifications.
APIs are the interfaces through which computer programs talk to each other and exchange information. The current absolute standard for APIs is REST technology. REST has long since replaced SOAP as the preferred technology for communicating between applications over the Internet. REST uses the HTTPs protocol as a transport medium to ask questions and get answers. For example, the AEMET app, executes a query via REST using HTTPs over the Internet to the AEMET servers. When the AEMET servers (known as the AEMET API) receive the specific query, it returns the information to our mobile phone as programmed. All this usually happens very quickly (usually in less than a second) and is not noticeable to us.
Undoubtedly most digital products rely on REST APIs for their operation. Data products are no exception and the vast majority adopt this standard for their operation. However, despite the great acceptance and flexibility of REST, it is not perfect, and some of its limitations have a negative impact on the development of highly data-oriented products. Some of its most obvious limitations are:
- REST is all or nothing. When you make a query, you get the full programmed result. For example, if you want to query a user in a database, you normally make one call (to the user endpoint) and get the full list of users.
- REST often requires several calls to get the desired data. Continuing with the previous example, if you want to query the balance of a user's bank account, you usually have to make the call to the list of users, then call the list of bank accounts, cross-reference these two results, and finally call the balance query with the user and his bank account as parameters of the final query.
- REST is not designed to easily manage relationships. In the previous example we saw that we have to do several sequential steps intermediating partial queries in order to get the result of a related query.
To simply understand the difference between a service-oriented IT application and a data-oriented IT application, let's take an example: when developing a modern application with different functionality, it is often referred to as a service-oriented IT design pattern. Under this pattern, a service that allows users to log into the system is a typical case of integration between services. This functionality typically uses REST APIs as an integration mechanism. A concrete example are those applications that allow us to log in to their service using an email account or a social network. The case of a data-oriented service is one in which the application performs a query to the system with the main purpose of sharing data. For example, when we request the average time that a user has been browsing a particular part of our application or website.
GraphQL
As an alternative to REST and to overcome these limitations, Facebook created GraphQL in 2012. At that time, Facebook used it internally for its own queries on the social network, but in 2015, the company decided to publish the source code of this project, turning it into Open Source software.
The great advantage of GraphQL is the possibility of requesting specific data, regardless of how the data is organised at source. The source data can be organised (and stored) in a relational database, in a REST API in a NoSQL database or in a specifically designed object, e.g. the result of an algorithm.
A query in the GraphQL language looks like this:
{
Bicicletas_Barcelona(district:1, type:”electric”){
Bike,
Street
}
Barcelona(district:1){
Bus,
Stop,
Lat,
Long
}
}
In the example above, starting from an open dataset available on datos.gob.es, with GraphQL, we would be able to combine (in a single query) the results of the location of an urban electric bicycle parking in Barcelona together with the position of a nearby bus stop. The goal would be to be able to build a data product that is able to plan urban trips by sustainable means of transport[1]. In the same way we could combine multiple data sources, for example, open data on buses and mobility from other cities in Spain. We would only need to incorporate these new data sources as the underlying model returning the queries would be very similar.
As can be seen in the query, GraphQL is declarative in nature. JSON format is used to declare in a very simple and clear way what is being requested from the system.
An example of GraphQL
Let's suppose that we have a database to manage a training course. It is easy to think that in that database we have several tables with the record of the students, the subjects, the grades, etc. When we apify this database with REST, we will create several endpoints to consult the students, the subjects and the grades. When we want to consult the students, who are eligible in a specific subject, we will have to make several consecutive calls and our application will have to take care of the corresponding filters. The logical flow would be something like this:
- We request the list of students.
- We request the list of subjects.
- We cross both lists and filter for the subject we want to check.
- We request the list of grades for that subject.
- We filter for students above a certain grade.
Each of these calls will have a different endpoint of the type:
However, in GraphQL we can perform this query in a single call since GraphQL allows querying related data in a single call.
In conclusion, GraphQL is a technology to be considered today as a tool for integration with information systems. It is generally accepted that REST APIs are more oriented towards integration between Internet services (e.g. a user authentication service) while GraphQL is more oriented towards integration between data products (e.g. a price comparison tool on the Internet). In this interesting debate between IT product integration versus data products, it seems that both technologies will coexist in the near future.
[1] This consultation has been adapted from its original source. Thanks to Ángel Garrido Román for his detailed explanation in his TFG 2018.
Content elaborated by Alejandro Alija, expert in Digital Transformation and Innovation.
Contents and points of view expressed in this publication are the exclusive responsibility of its author.