Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish version 3 (CT-EBM-SP v3)

Name: Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish version 3 (CT-EBM-SP v3)
Creator: Agencia Estatal Consejo Superior de Investigaciones Científicas
License: https://creativecommons.org/licenses/by/4.0/
Keywords: None

Publisher Agencia Estatal Consejo Superior de Investigaciones Científicas

Administration level State Administration

Entity

Public

License

https://creativecommons.org/licenses/by/4.0/

Description

This is the version 3 of the CT-EBM-SP corpus of 1200 clinical trials (292173 tokens), annotated with 23 entity types and 18 relation types, covering Unified Medical Language System (UMLS) semantic groups, drug-related information, temporal data, and negation/speculation. It includes 11 encoded attributes (e.g., event temporality and experiencer status) and normalized entities to UMLS Concept Unique Identifiers. The corpus contains 87037 entities, including nested and discontinuous entities, 16597 attributes and 68206 relationships. Inter-annotator agreement (IAA) achieved average F1 values of 0.861 (entities), 0.810 (attributes), and 0.791 (relations). 81.75% of entities were normalized (IAA: F1 = 0.966). The repository includes the code to benchmark this dataset by fine-tuning Transformer models for relation extraction and medical concept normalization. In the relation extraction task, the average F1 ranged from 0.858 to 0.879. In the medical concept normalization task, the accuracy at rank 1 was 0.896.

Data

Information

Show technical data sheet of the dataset.

Technical sheet

Distributions(2)

Identification Interoperability

Access point URL	https://digital.csic.es/bitstream/10261/416915/1/CT-EBM-SP-v3.zip

Format	ZIP

Identification Interoperability

Access point URL	https://digital.csic.es/bitstream/10261/416915/3/README.txt

Format	plain

Keywords
Tags	Clinical trials Evidence-Based Medi... Inter-Annotator Agr... Natural Language Pr... Semantic Annotation
Categories
Categories	Science and technology Healthcare
Language
Languages	English

Identification
Identifier	http://hdl.handle.net/10261/416915
Last updated	4/02/2026 07:44 (UTC)
Creation date	2/02/2026 23:00 (UTC)
References
Other resources	http://hdl.handle.net/10261/285045 http://hdl.handle.net/10261/400983 https://scielo.org/es/) https://www.clinicaltrialsregister.eu/ctr-search/search) http://doi.org/10.1038/s41597-026-06608-6 https://github.com/lcampillos/ct-ebm-sp-v3 https://doi.org/10.5281/zenodo.18048413

Language

You are here

Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish version 3 (CT-EBM-SP v3)