The Dutch Parallel Corpus 2.0 is a bidirectional parallel corpus of expert translations for Dutch-English and Dutch-French language pairs. The corpus is sentence-aligned, lemmatized and POS-tagged using the state-of-the-art natural language processing toolkit Stanza.
Compared to the first release of the Dutch Parallel Corpus in 2010, the new DPC2 contains:
The corpus currently contains 2.7 million words, but is dynamic in nature.
Translation direction | Word count | Number of source texts | Number of text providers | Number of translators |
---|---|---|---|---|
English > Dutch | 398,774 | 110 | 10 | 13 (10) |
Dutch > English | 430,094 | 105 | 20 | 22 (16) |
French > Dutch | 1,029,739 | 153 | 15 | 20 (14) |
Dutch > French | 925,002 | 176 | 20 | 22 (19) |
Total | 2,783,609 | 544 | 65 | 77 (59) |
The availability of an extensive set of metadata is considered the main asset of this corpus, together with a more principled and flexible register classification, which contains manuals for a general audience, manuals for specialists, (popular) science, journalistic texts, commercial communication, public service communication, political speeches, literature, touristic texts. The corpus is relevant for scholars in Translation Studies, (Contrastive) Linguistics and register studies.
Get access to DPC2
DPC2 can be accessed via SketchEngine. Alternatively, we can send you a zip file containing the individual xml files. Researchers can use DPC2 free of charge. It is, however, not allowed to use the corpus, or part of it, to develop commercial or non-commercial software. It is also not allowed to reproduce DPC2-texts, or parts of it, in any form without the prior permission of the project team.
Refer to DPC2 in your scholarly work
When presenting or publishing research using data from DPC2, please refer to:
Reynaert, Ryan, Lieve Macken, Arda Tezcan & Gert De Sutter (2021). Building a New-Generation Corpus for Empirical Translation Studies: The Dutch Parallel Corpus 2.0. In: Vincent X. Wang, Lily Lim and Defeng Li (Eds.), New Perspectives on Corpus Translation Studies. [Series title: New Frontiers in Translation Studies]. Singapore: Springer.
Project team
The project is a collaborative effort of the research units Empirical and Quantitative Translation and Interpreting Studies (EQTIS) and Language and Translation Technology Team (LT3), and was financially supported by the Department of Translation, Interpreting and Communication. The project team consists of the following members:
Contact