2022-05-19T00:00:00+02:00
Biomedical natural language processing
https://digital.csic.es/bitstream/10261/269887/1/CLARA-MeD-corpus.zip
CLARA-MeD-corpus.zip
CLARA-MeD-corpus.zip
205657210
https://digital.csic.es/bitstream/10261/269887/1/CLARA-MeD-corpus.zip
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts, >3M tokens); and 2) Clinical trials announcements (5748 pairs of texts, 451 690 tokens). The dataset also contains a parallel corpus with a subset of 3800 sentence pairs of professional and laymen variants (149 862 tokens). This is a benchmark for medical text simplification. The latest download of files was in February 2022.
https://digital.csic.es/bitstream/10261/269887/4/README.txt
8294
https://digital.csic.es/bitstream/10261/269887/4/README.txt
README.txt
README.txt
http://hdl.handle.net/10261/269887
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts, >3M tokens); and 2) Clinical trials announcements (5748 pairs of texts, 451 690 tokens). The dataset also contains a parallel corpus with a subset of 3800 sentence pairs of professional and laymen variants (149 862 tokens). This is a benchmark for medical text simplification. The latest download of files was in February 2022.
en
Comparable corpus
CLARA-MeD corpus
2022-05-19T00:00:00+02:00
Comparación de corpus
Medical text simplification
CLARA-MeD corpus
Agencia Estatal Consejo Superior de Investigaciones Científicas
EA0020951
2022-05-15T00:00:00+02:00
2022-05-15T00:00:00+02:00
Frases paralelas
Parallel sentences
Procesamiento del Lenguaje Natural en biomedicina
Simplificación de textos médicos
es
application/x-zip-compressed
ZIP
plain
text/plain