Procesamiento del Lenguaje Natural en biomedicina
CLARA-MeD corpus
en
Biomedical natural language processing
https://digital.csic.es/bitstream/10261/269887/1/CLARA-MeD-corpus.zip
205657210
CLARA-MeD-corpus.zip
https://digital.csic.es/bitstream/10261/269887/1/CLARA-MeD-corpus.zip
CLARA-MeD-corpus.zip
8294
https://digital.csic.es/bitstream/10261/269887/4/README.txt
README.txt
https://digital.csic.es/bitstream/10261/269887/4/README.txt
README.txt
es
http://hdl.handle.net/10261/269887
Parallel sentences
CLARA-MeD corpus
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts, >3M tokens); and 2) Clinical trials announcements (5748 pairs of texts, 451 690 tokens). The dataset also contains a parallel corpus with a subset of 3800 sentence pairs of professional and laymen variants (149 862 tokens). This is a benchmark for medical text simplification. The latest download of files was in February 2022.
Frases paralelas
Comparación de corpus
EA0020951
Agencia Estatal Consejo Superior de Investigaciones Científicas
Comparable corpus
Medical text simplification
2022-05-19T00:00:00+02:00
2022-05-19T00:00:00+02:00
2022-05-15T00:00:00+02:00
2022-05-15T00:00:00+02:00
Simplificación de textos médicos
A collection of 24.298 pairs of professional and simplified texts (>96 million tokens): 1) Drug leaflets and summaries of product characteristics (10 211 pairs of texts, >82M words); 2) Cancer-related information summaries (201 pairs of texts, >3M tokens); and 2) Clinical trials announcements (5748 pairs of texts, 451 690 tokens). The dataset also contains a parallel corpus with a subset of 3800 sentence pairs of professional and laymen variants (149 862 tokens). This is a benchmark for medical text simplification. The latest download of files was in February 2022.
plain
text/plain
application/x-zip-compressed
ZIP