Dutch Parallel Corpus 2.0 (DPC2)


The Dutch Parallel Corpus 2.0 is a bidirectional parallel corpus of expert translations for Dutch-English and Dutch-French language pairs. The corpus is sentence-aligned, lemmatized and POS-tagged using the state-of-the-art natural language processing toolkit Stanza.


Compared to the first release of the Dutch Parallel Corpus in 2010, the new DPC2 contains:

  • new source and target texts.
  • metadata about the translators (e.g., gender, education, experience).
  • metadata about the translation projects (e.g., L1/L2 translation, software used, degree and type of revision).
  • metadata about the texts themselves (e.g., source and target language, intended audience, intended goal, register).


The corpus currently contains 2.7 million words, but is dynamic in nature.

Translation direction Word count Number of source texts Number of text providers Number of translators
English > Dutch 398,774 110 10 13 (10)
Dutch > English 430,094 105 20 22 (16)
French > Dutch 1,029,739 153 15 20 (14)
Dutch > French 925,002 176 20 22 (19)
Total 2,783,609 544 65 77 (59)

 

The availability of an extensive set of metadata is considered the main asset of this corpus, together with a more principled and flexible register classification, which contains manuals for a general audience, manuals for specialists, (popular) science, journalistic texts, commercial communication, public service communication, political speeches, literature, touristic texts. The corpus is relevant for scholars in Translation Studies, (Contrastive) Linguistics and register studies.

 

Get access to DPC2

DPC2 can be accessed via SketchEngine. Alternatively, we can send you a zip file containing the individual xml files. Researchers can use DPC2 free of charge. It is, however, not allowed to use the corpus, or part of it, to develop commercial or non-commercial software. It is also not allowed to reproduce DPC2-texts, or parts of it, in any form without the prior permission of the project team.

  1. Read the academic license, sign it (last page of the document), and send it to gert.desutter@ugent.be.
  2. Shortly after, you will receive an email which grants you access to DPC2 on SketchEngine or a zip file containing all tmx files.

 

Refer to DPC2 in your scholarly work

When presenting or publishing research using data from DPC2, please refer to:

Reynaert, Ryan, Lieve Macken, Arda Tezcan & Gert De Sutter (2021). Building a New-Generation Corpus for Empirical Translation Studies: The Dutch Parallel Corpus 2.0. In: Vincent X. Wang, Lily Lim and Defeng Li (Eds.), New Perspectives on Corpus Translation Studies. [Series title: New Frontiers in Translation Studies]. Singapore: Springer.

 

Project team

The project is a collaborative effort of the research units Empirical and Quantitative Translation and Interpreting Studies (EQTIS) and Language and Translation Technology Team (LT3), and was financially supported by the Department of Translation, Interpreting and Communication. The project team consists of the following members:

 

Contact

gert.desutter@ugent.be