CLARA-MeD simplified sentences

Descripción

This dataset contains 1200 manually simplified sentences (144 019 tokens) from clinical trials in Spanish. A total of 1040 announcements from the European Clinical Trials Register (EudraCT) were analyzed to select sentences with ambiguities or exceeding 25 words. Simplification criteria were devised in an annotation guideline, which is released publicly along the dataset. This resource was collected in the CLARA-MeD project, with the goal of simplifying medical texts in the Spanish language and reduce the language barrier to patient's informed decision making. In particular, the project aims at developing linguistic resources for automatic medical term simplification in Spanish; and conducting experiments in automatic text simplification.

Distribuciones

  • Dataset https://digital.csic.es/bitstream/10261/346579/1/claramed_synt_simp_aligned.tsv text/tab-separated-values
    TSV
    1004656 Bytes
  • Guideline CLARA-MeD_simplif_guideline.pdf application/pdf
    PDF
    776110 Bytes
  • README_CLARAMED_sentences.txt text/plain
    plain
    7495 Bytes

Información adicional

Fecha de creación 8/02/2024 23:00 (UTC)
Cobertura temporal
  • Desde 8/02/2024 23:00 (UTC) hasta 8/02/2024 23:00 (UTC)
Cobertura geográfica España
Idiomas
  • Español
  • Inglés
Otros recursos