10 public data repositories related to natural sciences and the environment

Fecha de la noticia: 23-12-2021

10 public data repositories related to natural sciences and the environment body: NASA Open Data Portal, Copernicus, Climate Data Online, AlphaFold Protein Structure Database, Free GIS DATA, GBIF (Global Biodiversity Information Facility), EDI Data Portal, PANGAEA, re3data, IRIS

Open data is fundamental in the field of science. Open data facilitates scientific collaboration and enriches research by giving it greater depth. Thanks to this type of data, we can learn more about our environment and carry out more accurate analyses to support decisions.

In addition to the resources included in general data portals, we can find an increasing number of open data banks focused on specific areas of the natural sciences and the environment. In this article we bring you 10 of them.

10 public data repositories related to natural sciences and the environment body: NASA Open Data Portal, Copernicus, Climate Data Online, AlphaFold Protein Structure Database, Free GIS DATA, GBIF (Global Biodiversity Information Facility), EDI Data Portal, PANGAEA, re3data, IRIS

NASA Open Data Portal

  • Publisher: NASA

The data.nasa.gov portal centralizes NASA's open geospatial data generated from its rich history of planetary, lunar and terrestrial missions. It has nearly 20,000 unique monthly users and more than 40,000 datasets. A small percentage of these datasets are hosted directly on data.nasa.gov, but in most cases metadata and links to other space agency projects are provided. 

data.nasa.gov includes a wide range of subject matter, from data related to rocket testing to geologic maps of Mars. The data are offered in multiple formats, depending on each publisher.

The site is part of the Open Innovation Sites project, along with api.nasa.gov, a space for sharing information about NASA APIs, and code.nasa.gov, where NASA's open source projects are collected.

Copernicus

  • Publisher: Copernicus

COPERNICUS is the Earth observation program of the European Union. Led by the European Commission, with the collaboration of the member states and various European agencies and organizations, it collects, stores, combines and analyzes data obtained through satellite observation and in situ sensor systems on land, air and sea.

It provides data through 6 services: emergency, security, marine monitoring, land monitoring, climate change and atmospheric monitoring. The two main access points to Copernicus satellite data are managed by ESA: the Copernicus Open Access Platform - which has an API - and the CSCDA (Copernicus Space Component Data Access). Other access points to Copernicus satellite data are managed by Eumetsat.

Climate Data Online

  • Publisher: NOAA (National Centers for Environmental Information)

Climate Data Online (CDO) from the US government agency NOAA provides free access to historical weather and climate data worldwide. Specifically, 26,000 datasets are offered, including daily, monthly, seasonal and annual measurements of parameters such as temperature, precipitation or wind, among others. Most of the data can be downloaded in CSV format.

To access the information, users can use, among other functionalities, a search tool, an API or a map viewer where a wide variety of data can be displayed in the same visualization environment, allowing variables to be related to specific locations.

AlphaFold Protein Structure Database

  • Publisher: DeepMind and EMBL-EBI

AlphaFold is an artificial intelligence system developed by the company DeepMind that predicts the 3D structure of a protein from its amino acid sequence. In collaboration with the EMBL European Bioinformatics Institute (EMBL-EBI), DeepMind has created this database that provides the scientific community with free access to these predictions.

The first version covers the human proteome and the proteomes of other key organisms, but the idea is to continue to expand the database to cover a large proportion of all cataloged proteins (over 100 million). The data can be downloaded in mmCIF or PDB format, which are widely accepted by 3D structure visualization programs such as PyMOL and Chimera.

Free GIS DATA

  • Publisher: Robin Wilson, expert in the GIS area.

Free GIS Data is the effort of Robin Wilson, a freelance expert in remote sensing, GIS, data science and Python. Here users can find a categorized list of links to over 500 websites offering freely available geographic datasets, all ready to be loaded into a Geographic Information System. You can find data on climate, hydrology, ecology, natural disasters, mineral resources, oil and gas, transportation and communications, or land use, among many other categories.

Users can contribute new datasets by sending them by email to robin@rtwilson.com.

GBIF (Global Biodiversity Information Facility)

  • Publisher: GBIF

GBIF is an intergovernmental initiative formed by countries and international organizations that collaborate to advance free and open access to biodiversity data. Through its nodes, participating countries provide data on species records based on common standards and open source tools. In Spain, the national node is GBIF-ES, sponsored by the Spanish Ministry of Science and Innovation and managed by the Spanish National Research Council (CSIC).

The data it offers comes from many sources, from specimens held in museums and collected in the 18th and 19th centuries to geotagged photos taken with smartphones and shared by amateur naturalists. It currently has more than 1.8 billion records and 63,000 datasets of great utility for researchers conducting studies related to biodiversity and the general public. You can also access its API here.

EDI Data Portal

  • Publisher:Environmental Data Initiative (EDI)

The Environmental Data Initiative (EDI) promotes the preservation and reuse of environmental data, supporting researchers to archive and publish publicly funded research data. This is done following FAIR principles and using the Ecological Metadata Language (EML) standard.

The EDI data portal contains the contributed environmental and ecological data packages, which can be accessed through a search engine or API. Users should contact the data provider before using the data in any research. These data should be properly cited when used in a publication. A Digital Object Identifier (DOI) is provided for this purpose. 

PANGAEA

  • Publisher: World Data Center PANGEA

The PANGAEA information system functions as an open access library for archiving, publishing and distributing georeferenced Earth system research data.

Any user can upload data related to the natural sciences. PANGAEA has a team of editors who are responsible for checking the integrity and consistency of the data and metadata. It currently includes more than 400,000 datasets from more than 650 projects. The formats in which they are available are varied: you can find from text/ASCII or tab-delimited files, to binary objects (e.g. seismic data and models, among others) or other formats following ISO standards (such as images or movies).

re3data

  • Publisher: DataCite

Re3data is a worldwide registry of research data repositories covering databases from different academic disciplines available free of charge. It includes data related to the natural sciences, medicine or engineering, as well as those linked to humanities areas.

It currently offers detailed descriptions of more than 2,600 repositories. These descriptions are based on the re3data metadata schema and can be accessed through the re3data API. In this Github repository you can find examples of how to use the re3data API. These examples are implemented in R using Jupyter Notebooks.

IRIS

  • Publisher: Incorporated Research Institutions for Seismology (IRIS)

IRIS is a consortium of more than 100 U.S. universities dedicated to the operation of scientific facilities for the acquisition, management and distribution of seismological data. Through this website any citizen can access a variety of resources and data related to earthquakes occurring around the world.

It collects time series data, including sensor recordings of a variety of measurements. Among the metadata available is the location of the station from which the data was obtained and its instrumentation. It also provides access to historical seismic data, including scanned seismograms and other information from pre-digital sources.

Data are available in SEED (the international standard for the exchange of digital seismic data), ASCII or SAC (Seismic Analysis Code) format.

 

Do you know more international repositories with data related to natural sciences and environment? Leave us a comment or send us an email to dinamizacion@datos.gob.es.