In an increasingly digitised world, the creation, use and distribution of software and data have become essential activities for individuals, businesses and government organisations. However, behind these everyday practices lies a crucial aspect: licensingof both software and data.
Understanding what licences are, their types and their importance is essential to ensure legal and ethical use of digital resources. In this article, we will explore these concepts in a simple and accessible way, as well as discuss a valuable tool called Joinup Licensing Assistant, developed by the European Union.
What are licences and why are they important?
A licence is a legal agreement that grants specific permissions on the use of a digital product, be it software, data, multimedia content or other resources. This agreement sets out the conditions under which the product may be used, modified, distributed or marketed. Licences are essential because they protect the rights of creators, ensure that users understand their rights and obligations, and foster a safe and collaborative digital environment.
The following are some examples of the most popular ones, both for data and software.
Common types of licences
Copyright
Copyright is an automatic protection which arises at the moment of the creation of an original work, be it literary, artistic or scientific. It is not necessary to formally register the work in order for it to be protected by copyright. This right grants the creator exclusive rights over the reproduction, distribution, public communication and transformation of his work.
Ejemplo: When a company creates a dataset on, for example, construction trends, it automatically owns the copyright on that data. This means that others may not use, modify or distribute such data without the explicit permission of the creator.
Public domain
When a work is not protected by copyright, it is considered to be in the public domain. This may occur because the rights have expired, the author has waived them or because the work does not meet the legal requirements for protection. For example, a work that lacks sufficient originality - such as a telephone list or a standard form - does not qualify for protection. Works in the public domain may be used freely by anyone, without the need to obtain permission.
Ejemplo: Many classic works of literature, such as those of William Shakespeare, are in the public domain and can be freely reproduced and adapted.
Creative commons
The Creative Commons licences offer aflexible way to grant permissions for the use of copyrighted works. These licences allow creators to specify which uses they do and do not allow, facilitating the dissemination and re-use of their works under clear conditions. The most common CC licences include:
-
CC BY (Attribution): permits the use, distribution and creation of derivative works, provided credit is given to the original author.
-
CC BY-SA (Attribution-Share Alike): in addition to attribution, requires that derivative works be distributed under the same licence.
-
CC BY-ND (Attribution-No Derivative Works): permits redistribution, commercial and non-commercial, provided the work remains intact and credit is given to the author.
- CC0 (Public Domain): allows creators to waive all rights to their works, allowing them to be used freely without attribution.
These licences are especially useful for creators who wish to share their works while retaining certain rights over their use.
GNU General Public License (GPL)
The GNU General Public License (GPL) , created by the Free Software Foundation, guarantees that software licensed under its terms will always remain free and accessible to everyone. This licence is specifically designed for software, not data. It aims to ensure that the software remains free, accessible and modifiable by any user, protecting the freedoms related to its use and distribution.
This licence not only allows users to use, modify and distribute the software, but also requires that any derivative works retain the same terms of freedom. In other words, any software that is distributed or modified under the GPL must remain free for all its users. The GPL is designed to protect four essential freedoms:
- The freedom to use the software for any purpose.
- The freedom to study how the software works and adapt it to specific needs.
- The freedom to distribute copies of the software to help others.
- The freedom to improve the software and release the improvements for the benefit of the community.
One of the key features of the GPL is its "copyleft" clause, which requires that any derivative works be licensed under the same terms as the original software. This prevents free software from becoming proprietary and ensures that the original freedoms remain intact.
Ejemplo: Suppose a company develops a programme under the GPL and distributes it to its customers. If any of these customers decide to modify the source code to suit their needs, it is their right to do so. In addition, if the company or customer wishes to redistribute modified versions of the software, they must do so under the same GPL licence, ensuring that any new user also enjoys the original freedoms.
European Union Public Licence (EUPL)
The European Union Public License (EUPL) is a free and open source software licence developed by the European Commission. Designed to facilitate interoperability and cooperation between Europeansoftware, the EUPL allows the free use, modification and distribution of software, ensuring that derivative works are also kept open. In addition to covering software, the EUPL can be applied to ancillary documents such as specifications, user manuals and technical documentation.
Although the EUPL is used for software, in some cases it may be applicable to datasets or content (such as text, graphics, images, documentation or any other material not considered software or structured data),but its use in open data is less common than other specific licences such as Creative Commons or Open Data Commons.
Open Data Commons (ODC-BY)
The Open Data Commons Attribution License (ODC-BY) is a licence designed specifically for databases and datasets, developed by the Open Knowledge Foundation. It aims to allow free use of data, while requiring appropriate acknowledgement of the original creator. This licence is not designed for software, but for structured data, such as statistics, open catalogues or geospatial maps.
ODC-BY allows users to:
- Copy, Distribute and use the database.
- Create derivative works, such as visualisations, analyses or derivative products.
- Adapt data to new needs or combine them with other sources.
The only main condition is attribution: users must credit the original creator appropriately, including clear references to the source.
A notable feature of the ODC-BY is that does not impose a copyleft clause, meaning that derived data can be licensed under other terms, as long as attribution is maintained.
Ejemplo: Imagine that a city publishes its bicycle station database under ODC-BY. A company can download this data, create an app that recommends cycling routes and add new layers of information. As long as you clearly indicate that the original data comes from the municipality, you can offer your app under any licence you wish, even on a commercial basis.
A comparison of these most commonly used licences allows us to better understand their differences:
Licence |
Allows commercial use |
Permitted modification |
Requires attribution | Allos derivative works | Applicable to data | Specialisationsnn |
Copyright |
Yes, with permission of the author | No, except by agreement with the creator | No | No | It can be applied to databases, but only if they meet certain requirements of creativity and originality in their structure or selection of content. It does not protect the data itself, but the way it is organised or presented. | Original works such as texts, music, films, software and, in some cases, databases whose structure or selection is creative. It does not protect the data itself. |
Public domain | Yes | Yes | No | Yes | Yes | Original works such as texts, music, films and software without copyright protection (by expiration, waiver, or legal exclusion) |
Creative Commons BY (Attribution) | Yes | Yes, with attribution | Yes | Yes | Yes | Reusable text, images, videos, infographics, web content and datasets, provided that authorship is acknowledged |
Creative Commons BY-SA (Attribution-ShareAlike) | Yes | Yes, you must keep the same licence | Yes | Yes, with the same licence | Yes | Collaborative content such as articles, maps, datasets or open educational resources; ideal for community projects |
Creative Commons BY-ND (Attribution-NoDerivatives) | Yes | No | Yes | No | Yes, but it is forbidden to modify or combine the data. | Content to be preserved unaltered: official documents, closed infographics, unalterable data sets, etc. |
Creative Commons CC0 (Public domain) | Yes | Yes | No | Yes | Yes | All kinds of works: texts, images, music, data, software, etc., which are voluntarily released into the public domain. |
GNU General Public License (GPL) | Yes | Yes, it should be kept under the GPL | Yes | Yes | No | Executable software or source code. Not suitable for documentation, multimedia content or databases. |
European Union Public Licence (EUPL) | Yes | Yes, derivative works should remain open | Yes | Yes | Partially: could be used for technical data, but is not its main purpose | Software developed by public administrations and its associated technical documentation (manuals, specifications, etc.). |
Open Data Commons (ODC-BY) | Yes | Yes | Yes | Yes | Yes (specifically designed for open data) | Structured databases such as public statistics, geospatial arrays, open catalogues or administrative registers |
Figure 1. Comparative table. Source: own elaboration
Why is it necessary to use licences in the field of open data?
In the field of open data, these licences are essential to ensure that data is available for public use, promoting transparency, innovation and the development of data-driven solutions. In general, the advantages of using clear licences are:
-
Transparency and open access: clear licences allow citizens, researchers and developers to access and use public data without undue restrictions, fostering government transparency and accountability.
-
Fostering innovation: By enabling the free use of data, open data licences facilitate the creation of applications, services and analytics that can generate economic and social value.
-
Collaboration and reuse: licences that allow for the reuse and modification of data encourage collaboration between different entities and disciplines, fostering the development of more robust and complete solutions.
-
Improved data quality: The availability of open data encourages greater community participation and review, which can lead to an improvement in the quality and accuracy of the data available.
-
Legal certainty for the re-user: Clear licences provide confidence and certainty to those who re-use data, as they know they can do so legally and without fear of future conflicts.
Introduction to the Joinup Licensing Assistant?
In this complex licensing landscape, choosing the right one can be a daunting task, especially for those with no previous experience in licence management. This is where the Joinup Licensing Assistant, a tool developed by the European Union and available at Joinup.europa.eu, comes in. This collaborative platform is designed to promote the exchange of solutions and best practices between public administrations, companies and citizens, and the Licensing Assistant is one of its star tools.
For those working specifically with data, you may also find useful the report published by data.europa.eu, which provides more detailed recommendations on the selection of licences for open datasets in the European context.
The Joinup Licensing Assistant offers several features and benefits that simplify licence selection and management:
|
Functionality | Benefits | |
![]() |
Customised advice: recommends suitable licences according to the type of project and your needs. | ![]() |
Simplifying the selection process: breaks down the choice of licence into clear steps, reducing complexity and time. |
![]() |
Licence database: access to software licences, content and data, with clear descriptions. | ![]() |
Legal risk reduction: avoids legal problems by providing recommendations that are compatible with project requirements. |
![]() |
Comparison of licences: allows you to easily see the differences between various licences. | ![]() |
Fostering collaboration and knowledge sharing: facilitates the exchange of experiences between users and public administrations. |
![]() |
Legal update: provides information that is always up to date with current legislation. | ![]() |
Accessibility and usability: intuitive interface, useful even for those with no legal knowledge. |
![]() |
Open data support: includes specific options to promote reuse and transparency. | ![]() |
Supporting the sustainability of free software and open data: promotes licences that drive innovation, openness and continuity of projects. |
Figure 2. Table of functionality and benefits. Source: own elaboration
Various sectors can benefit from the use of the Joinup Licensing Assistant:.
- Public administrations: to apply correct licences on software, content and open data, complying with European standards and encouraging re-use.
- Software developers: to align licences with their business models and facilitate distribution and collaboration.
- Content creators: to protect their rights and decide how their work can be used and shared.
- Researchers and scientists: to publish reusable data to drive collaboration and scientific advances.
Conclusion
In an increasingly interconnected and regulated digital environment, using appropriate licences for software, content and especially open data is essential to ensure the legality, sustainability and impact of digital projects. Proper licence management facilitates collaboration, reuse and secure dissemination of resources, while reducing legal risks and promoting interoperability.
In this context, tools such as the Joinup Licensing Assistant offer valuable support for public administrations, companies and citizens, simplifying the choice of licences and adapting it to each case. Their use contributes to creating a more open, secure and efficient digital ecosystem.
Particularly in the field of open data, clear licences make data truly accessible and reusable, fostering institutional transparency, technological innovation and the creation of social value.
Content prepared by Mayte Toscano, Senior Consultant in Data Economy Technologies. The contents and points of view reflected in this publication are the sole responsibility of the author.
In the more traditional conception of the right of access and transparency of public sector entities, obtaining information requires, in advance, the processing of an administrative procedure that ends with the corresponding resolution by which the requested information is granted or denied. However, in the model based on open data, a substantial change occurs: on the one hand, the request and corresponding resolution are only considered as a residual measure; and, on the other hand, access to the data will take place without the need for a formalised administrative act.
In this regard, Law 37/2007 contemplates both possibilities, expressly enabling public administrations and bodies to provide standard licenses in digital format that can be processed automatically. It also states a preference for those types of licenses that establish the minimum restrictions and establishes the minimum content that must be incorporated:
- information concerning the specific purpose for which the re-use is granted
- if reuse for commercial purposes is allowed
- the duration of the license
- the obligations of each party, as well as the responsibilities for use
- the free nature of the re-use or, where appropriate, the applicable fee
In the case of the General State Administration, the general rule is the availability of the data without any specific conditions, simply by complying with a series of general requirements:
- citing the source of the data
- indicate the date of the last update, where appropriate, via metadata
- not to distort the meaning of the information
- keep the metadata on the applicable conditions for reuse
Consequently, except in the exceptional cases in which it is necessary to make a request or there is a specific regime with certain additional requirements, the general conditions of re-use for the state public sector area will be applicable to those who intend to re-use data provided by entities in this area, including processing such as copying, dissemination, modification, adaptation, extraction, reordering and combination of information.
Despite the advantages of using licenses, there is no tradition in the Spanish public sector of standardizing the conditions for reusing information through this instrument. Rather, it is a figure typical of the Anglo-Saxon legal context which, through European Union law, has been incorporated into the regulation on the re-use of public sector information and open data. However, licences can be a very useful tool in facilitating the integration of data from different sources. Indeed, on the one hand, they allow to promote interoperability in legal terms, since they simplify the analysis and comparison from the perspective of the conditions to which the re-using agents are submitted. On the other hand, they make the automated processing of the conditions in which reuse can take place more dynamic and without greater formalities. Therefore, reducing the need to carry out manual checks on the viability of the use of the data in each specific case according to what each entity has established according to its own criteria.
The preferential option of setting general conditions for reuse means that, except in the publishing field and in particular for journals, the use of licences by the public sector is not very widespread in Spain; perhaps because it is a legal figure that is alien to our cultural tradition based on formal institutions such as the act and the administrative procedure. That is, on the unilateral decision of the Administration that must take into account the circumstances of the specific case. In fact, the term license is normally used to refer to an administrative act by which a private activity is permitted or, in the case of public goods, its use under certain conditions.
Given the low level of implementation of licences in Spain in the field of re-use of public sector information - except for some regional and municipal initiatives - it could be asked to what extent the aforementioned general conditions are compatible with the most widespread licences, so that this analysis serves as a reference when assessing their approximate equivalence. The case of the Creative Commons licences is of particular interest, as these are the licences adopted by the European Commission following the comparative study carried out previously.
By way of example, the conditions established at the state level -given their greater projection- could be compared with the aforementioned licenses which, moreover, from version 4.0 onwards include not only content but also data. In this regard, as shown graphically in the table below, the possibilities granted by the Creative Commons licenses - summarized in the left column - must be contrasted with the conditions set by law - briefly explained in the right column - both in the articles of Law 37/2007 (LRISP) and in Royal Decree 1495/2011 (RDRISP):
Source: Clabo, N.; Ramos-Vielba, I. (2015). Reuse of open data in the public administration in Spain and use of model licenses. Revista Española de Documentación Científica, 38 (3): e097, doi: http://dx.doi.org/10.3989/redc.2015.3.1206
Therefore, even if the conditions established by the Spanish legislation for the re-use of public sector information have a wider scope in terms of content, it has been considered that there is a substantial equivalence between those conditions and the ones contemplated by these licenses, in particular version CC BY 4.0. In any case, the complete wizard of the European Data Portal is a very useful tool when it comes to making an exhaustive comparison of the conditions for reusing public sector information in Spain with each of the specific types of license that exist beyond the aforementioned.
In spite of the doubts raised by some of its provisions in this area, in view of the forthcoming transposition of Directive 2019/1024 and its clear commitment to the use of licences, it seems that the time has come to open up the legal debate in Spain once and for all on its use in the field of re-use of public sector information and open data.
Content prepared by Julián Valero, professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec).
Contents and points of view expressed in this publication are the exclusive responsibility of its author.
One of the main difficulties in promoting the reuse of public sector information refers to the diversity of licenses. Given the absence of a general obligation, each public entity can decide the legal conditions to access for subsequent reuse taking into account the legal preference for open licenses, establishing the minimum possible restrictions.
In any case, there are no clear guidelines on how to use licenses, so that each entity could establish the conditions without having to resort to said instrument. Nor is there an unequivocal legal criterion that allows public entities to choose a certain type of license over another; which ultimately implies that the decision is normally based on criteria of opportunity or, where appropriate, based on the prior conditioning of the document management carried out internally by the entity (support in which it is carried out, formats used, respect for the interoperability rules ...).
Based on these premises, it is of great importance to establish specific criteria for the adoption of such decisions, in particular through the approval of a legal norm in the strict sense. This is the case, for example, in the field of the General State Administration, where some general rules for the provision of data have been established by regulation and, likewise, the preference for openness without conditions has been established unless there is an adequate motivation to justify the option for a reuse regime subject to them. Although already non-normative, one can also opt for the establishment of mere guidelines that, although lacking in strict force, can certainly be useful for making general preferences known in each administrative area.
This problem is reproduced and even multiplied if we project it in Europe. Indeed, although the new Directive 1024/2019 has established specific rules on the use of licenses, the truth is that it leaves a wide discretion to the Member States when they approve their own rules since, in addition to simply urging them to encourage use of type licenses, only establishes a mere obligation to ensure (Article 9.2)
because the type licenses for the reuse of public sector documents, which may be adapted to respond to specific applications of the license, are available in digital format and can be processed electronically.
Consequently, Member States are free to only establish conditions without using licenses; configure their own license adapting to specific measures; or, where appropriate, contemplate the use of type licenses. However, in the absence of the aforementioned guidelines and standards, in principle there will be no objective and predetermined criteria for setting the conditions for access and reuse of information, which would affect not only the public entities themselves but, in particular, those who aim to promote a specific project based on reuse, already for commercial purposes and with a political-social objective. Even more when finding that there can be multiple alternatives depending on the way in which the various criteria that can be taken into account are combined. Specifically, beyond the necessary attribution of authorship - recognition - it would be necessary to assess, among other circumstances, whether or not commercialization is allowed; if in the latter case the power of dissemination is conferred under the same legal conditions in which the data is provided; or, without exhaustive spirit, if modifications, adaptations or even translations are admitted as a result of data processing.
To cope with this difficulty, multiple studies and explanations have been prepared that, both from an academic perspective and, also, from a decided practical approach, are intended to help understand the scope of each type of license, which is especially interesting when the analysis attempts to systematize the existing practice in each of the States of the European Union. However, normally such instruments suffer from an excessively rigid approach, which hinders their use and limits their usefulness, hence the importance of promoting dynamic initiatives that effectively facilitate the understanding of the scope of each of the various types of licenses.
This is precisely the added value of the licensing assistant that has launched the European Data Portal. It is a tool that allows you to carry out advanced adjustments that combine, on the one hand, the choice of specifications and conditions of use and, on the other hand, it offers systematized information in very intuitive formats, such as color assignment or provision of clear and easy to understand information through summary sheets.
Specifically, the assistant allows you to make multiple advanced assignment settings by combining three criteria:
- The obligations that the license entails, an element that in turn articulates from several sub-criteria (lesser copyleft, attribution, sharealike, notice, copyleft and status changes).
- The permission granted, a criterion that is also systematized according to five more precise options (derivative works, distribution, reproduction, sublicensing and use patent claims).
- The prohibition or authorization of commercial uses of the data.
All these criteria can be activated in the configuration chosen when using the assistant, so you can search for those licenses that incorporate only one of them or several. It is also possible to activate in each search elements belonging to each of the main criteria, two of them or, also, to those included in three of them, according to the degree of precision indicated in each assumption. In any case, if several search criteria were legally incompatible with each other, no result would be obtained, which allows an intuitive and efficient exploration of all the possibilities of existing licenses for a specific case from searches that, otherwise, would certainly be complex.
In addition, the assistant allows to link each type of license with a summary sheet where the limitations and possibilities of each type of license are presented in a simple and easily to understand way. Even, in addition to indicating the possibilities, obligations and prohibitions assigning intuitive colours to each alternative, sometimes a succinct explanation is included to facilitate the understanding of each of the alternatives. Finally, each of the cards allows linking to the rest of comparable licenses, so that it is possible to perform that analysis in a simple way.
Even if some advanced functions could be added, such as linking licenses with specific initiatives and projects that are using this type to facilitate their understanding or even incorporate a greater depth of analysis highlighting the main advantages and disadvantages of each type license, the truth is that it is an instrument of unquestionable added value when exploring potential configurations of the conditions of use through licenses by public authorities; who, in addition, can go to the assistant to project multiple simulations when deciding the specific type of license to choose in each case based on diverse criteria.
With regard to reusers, the tool facilitates the effective understanding of each type of license, helping to determine what are the obligations assumed and the limitations to information treatments.
In short, the assistant is undoubtedly a remarkable effort in facilitating the interoperability of licenses in legal terms that, without a doubt, can serve as a basis for more complete future initiatives such as the one being promoted within the framework of the platform Joinup, where a suggestive initial working document has already been generated.
Content prepared by Julián Valero, professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec).
Contents and points of view expressed in this publication are the exclusive responsibility of its author.
In the regulation of the right of access to public sector information, making a request is essential to obtain the data. This request gives rise to the corresponding administrative procedure, so that the user could obtain the information only after the appropriate resolution. Likewise, the legislation on transparency has also established important obligations of active publicity, that is, cases in which the information must be made available electronically without the need for an application to be submitted.
Consequently, in these cases it is possible to affirm the existence of an authentic right of citizens to obtain the information, unless there is another legal good that must prevail. However, it should be taken into account that access does not imply, just like that, the right to use the information obtained for any purpose and in any case. As an example, there may be restrictions from the perspective of the protection of personal data, in particular as regards the use of them for purposes incompatible with those that initially justified their dissemination, an illegal cases under the application of the article 5 of the General Data Protection Regulation.
However, this approach is not applicable, in theory, in the area of the reuse of public sector information. In fact, despite the progress made in 2013, the truth is that there is no real legally established right in this area and, on the other hand, each public entity can decide under what conditions it is possible to proceed with the reuse of the information. Specifically, notwithstanding that the general requirements of article 8 of Law 37/2007 are also applicable, they may resort to one of the following options:
- Facilitate reuse without establishing additional conditions.
- Demand the submission of an application that, therefore, will lead to a formalized procedure that will end with the corresponding resolution.
- Proceed with the subscription of an exclusive agreement, although it is a possibility that is subject to significant restrictions.
- Opt for an early predetermination of the reuse conditions, in which case the public entity will proceed to publish a license electronically. This may have the status of a type license, so that it allows any subject and for any purpose to access the data, use it, modify it and share it for free.
Thus, licenses are an instrument of high importance to facilitate the reuse of public sector information. In fact, the 2019 new Directive is firmly committed to standard licenses and requires that the established conditions are “objective, proportionate, non-discriminatory and justified by a public interest objective”; so that in theory it would only be possible to contemplate restrictions for justified reasons and always that they do not mean a restriction of competition. In the case of Spanish regulation, the imposition of conditions that limit reuse is also exceptionally contemplated, so that the restrictions must be the minimum possible and in no case can they be discriminatory for comparable categories of reuse.
Although there is a wide typological diversity in terms of licenses and numerous initiatives in the international arena, the European Union offers a very useful assistant when choosing one modality or another. In any case, when any public entity in Spain opts for one of them, it must take into account some essential legal requirements. In general, it is necessary to opt for open licenses, that is to say, that contain the minimum restrictions, although there is a minimum content for licenses that cannot be ignored, since it must include:
- the specific purpose for which reuse is permitted, indicating where appropriate if commercial activities are allowed.
- the duration of the license
- the obligations assumed by both the beneficiary and the publisher agency
- usage responsibilities
- as well as if reuse is free or, where appropriate, the applicable rate.
Certainly, it is not possible to put open licenses and public licenses on the same level because, as noted, the imposition of certain restrictions -that prevent public licenses from being considered open in any case- may be justified. Even, it is necessary to admit the existence of diverse criteria in each entity for opening the same data set, which means an additional problem at the time of extending the services of added value that the reused agent can offer. In that case, the reuser would be forced to adapt to different legal environments.
Consequently, it is essential to promote effective policies at European level that contemplate interoperability not only from the technical and organizational perspectives but also from the point of view of legal requirements. For this, the licenses singularities in the field of public sector information must be taken into account. At the same time, in line with the recent European regulation, the open licensing model must be promoted as a priority in order to overcome the existing legal difficulties that can ultimately hinder digital transformation initiatives based on the Big Data and artificial intelligence. And, in this regard, although the conditions of use are usually established through a text legend on the entity website, the truth is that the current scenario of automated interconnections between devices and applications requires a more dynamic model that, beyond the aforementioned interoperability premises, also take into account the need for reuse conditions - and therefore licenses - to be accessible in legible formats in an automated manner.
Thus, it is a process that will have to work during the next few years, especially on the occasion of regulatory adaptation to European reform.
Content prepared by Julián Valero, professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec).
Contents and points of view expressed in this publication are the exclusive responsibility of its author.
In the digital world, data becomes a fundamental asset for companies. Thanks to them, they can better understand their environment, business and competition, and make convenient decisions at the right time.
In this context, it is not surprising that an increasing number of companies are looking for professional profiles with advanced digital capabilities. Workers who are able to search, find, process and communicate exciting stories based on data.
The report "How to generate value from data: formats, techniques and tools to analyse open data" aims to guide those professionals who wish to improve the digital skills highlighted above. It explores different techniques for the extraction and descriptive analysis of the data contained in the open data repositories.
The document is structured as follows:
- Data formats. Explanation of the most common data formats that can be found in an open data repository, paying special attention to csv and json.
- Mechanisms for data sharing through the Web. Collection of practical examples that illustrate how to extract data of interest from some of the most popular Internet repositories.
- Main licenses. The factors to be considered when working with different types of licenses are explained, guiding the reader towards their identification and recognition.
- Tools and technologies for data analysis. This section becomes slightly more technical. It shows different examples of extracting useful information from open data repositories, making use of some short code fragments in different programming languages.
- Conclusions. A technological vision of the future is offered, with an eye on the youngest professionals, who will be the workforce of the future.
The report is aimed at a general non-specialist public, although those readers familiar with data treatment and sharing o in the web world will find a familiar and recognizable reading.
Next, you can then download the full text, as well as the executive summary and a presentation.
Note: The published code is intended as a guide for the reader, but may require external dependencies or specific settings for each user who wishes to run it.
Usually news related to intellectual property, copyright and licenses for works published on the Internet arise.
Internet has been associated with a large open and public space, where everything is shared by all. But this is not the case, and contents on the web also subject to legality and intellectual property.
Firstly, we must know that when you create a work (painting, writing...), it has an intellectual property, which could be defined as the set of rights that authors have over their creations.
The set of copyrights are divided into:
- Moral rights: They serve to protect the authorship of the work. These rights can not be assigned, sold or transferred, nor do they prescribe over time.
- Patrimonial rights: They serve to regulate the exploitation of the work (retribution for use, reproduction, modification ...). These rights can be assigned, sold or transferred. The set of exploitation rights is made up of the rights of reproduction, distribution, public communication and transformation.
Therefore, if someone wants to use a work that I have created, he would have to ask for permission to do so. Licenses provide us with the mechanisms to make explicit the permissions that I give others for the use of my works, without needing to ask for permission whenever I want to use it.
In more detail, a license is an express declaration made by the owner of the economic rights of a work to indicate the limits and scope of the use that other people can make with respect to the copying, reproduction, transformation, distribution of their work, without having to be consulted each time.
Within the Spanish state it is important to take into account two aspects:
- When you create a piece of work, it is not mandatory to register it since the author's rights are linked to it with the simple creation of the work.
- If no license is indicated, by default, all exploitation rights of a work are reserved (copyright).
In contrast to the "all rights reserved", there is a set of licenses called "open licenses" that were created with the goal of promoting the free use and distribution of works, being able to demand that the concessionaires preserve the same freedoms when distributing their copies and derivatives.
The choice of a license is something that can take time, being not a trivial aspect. Therefore, there are web pages that help us when choosing the license for our works and data through attendees, such as the Licentia website, created by the Institut National de Recherche en Informatique et en Automatique (INRIA), a French research center specialized in Computer Science, control theory and applied mathematics [link http://licentia.inria.fr/]
In the specific case of Linked Open Data, it is convenient to link the data with their licenses through URIs. On this topic, there is a project called RDFLicense [http://rdflicense.appspot.com/] that has created a data set of the most common licenses expressed in RDF. Thanks to this, not only each URI is provided with a URI, but also, when using Open Digital Rights Language (ODRL) to describe them, it allows to make knowledge inferences and verifications.
In conclusion, open data needs two things: data and openness. And for the data openess it is essential that they are explicitly under an open license. If the data is not under an open license, it is not open data.