Legal considerations about corpora used in the contextual dictionary
- Corpus name: opensubtitles and its versions. License: not specified. References: https://www.opensubtitles.org/, Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
- Corpus name: UN. License: not specified. References: http://www.euromatrixplus.net/multi-un/, MultiUN: A Multilingual corpus from United Nation Documents, Andreas Eisele and Yu Chen, LREC 2010.
https://conferences.unite.un.org/uncorpus/, Ziemski, M., Junczys-Dowmunt, M., and Pouliquen, B., (2016), The United Nations Parallel Corpus, Language Resources and Evaluation (LREC’16), Portorož, Slovenia, May 2016.
- Corpus name: oj4. License: not specified. References: http://apertium.eu/data
- Corpus name: Europarl. License: EUPL. References: Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005.
https://ec.europa.eu/jrc/en/language-technologies/dcep, DCEP: Digital Corpus of the European Parliament
- Corpus name: kde4. License: not specified. References: http://opus.lingfil.uu.se/KDE4.php, http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf
- Corpus name: news-commentary and its versions. License: not specified. References: for questions, comments, etc. please send email to Philip Koehn: pkoehn@inf.ed.ac.uk.
- Corpus name: openoffice3. License: not specified. References: http://opus.lingfil.uu.se/OpenOffice3.php, http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf
- Corpus name: eu. License: not specified. References: http://opus.lingfil.uu.se/EUconst.php, http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf
- Corpus name: kdedoc. License: not specified. References: http://opus.lingfil.uu.se/KDEdoc.php, http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf
- Corpus name: jrc3. License: public domain. References: https://ec.europa.eu/jrc/en/language-technologies
- Corpus name: ecdc2012. License: Copyright(c)EU/ECD, 2012. No warranties. References: http://optima.jrc.it/Resources/ECDC-TM/2012_10_Terms-of-Use_ECDC-TM.pdf, https://ec.europa.eu/jrc/en/language-technologies
- Corpus name: ecb-v01. License: not specified. References: http://opus.lingfil.uu.se/ECB.php, http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf
- Corpus name: emea-v03. License: not specified. References: http://opus.lingfil.uu.se/EMEA.php, http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf
- Corpus name: TED, WIT and versions. License: Creative Commons BY-NC-ND. Considerations: Reverso Technologies Inc. acknowledges the authorship of TED talks (BY condition) and does not redistribute transcripts for commercial purposes (NC). As regards the integrity of the work (ND), Reverso Technologies Inc. only reproduces small parts of the contents, while preserving the original contents. References: https://wit3.fbk.eu/
- Corpus name: DGT. License: EUPL. References: https://ec.europa.eu/jrc/en/language-technologies
- Corpus name: Tatoeba. License: Creative Commons CC-BY-2.0. References: http://tatoeba.org
- Corpus name: handsard 2001. License: see copyright terms at http://www.parl.gc.ca/ImportantNotices.aspx?Language=E&view=S. References: http://www.parl.gc.ca/
- Corpus name: 109. License: not specified. References: http://www.statmt.org/wmt10/training-giga-fren.tar, http://www.statmt.org/wmt09/pdf/WMT-0901.pdf
- Corpus name: Coppa. License: see copyright terms at http://www.wipo.int/patentscope/en/data/terms.html. References: http://www.wipo.int
- Corpus name: books. License: not specified. References: http://opus.nlpl.eu/Books.php, Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
- Corpus name: Tanzil. License: not specified. References: http://opus.nlpl.eu/Tanzil.php, Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
- Corpus name: WikiMatrix. License: Creative Commons BY-SA. Considerations: Reverso Technologies Inc. acknowledges the authorship of Wikipedia (BY condition) and only reproduces small parts of the contents, while preserving the original contents. References: https://github.com/facebookresearch/LASER/tree/master/tasks/WikiMatrix, Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia arXiv, July 11 2019.
- Corpus name: Paracrawl. License: public domain. References: https://www.paracrawl.eu/
- Corpus name: EUBookshop. License: not specified. References: http://opus.nlpl.eu/EUbookshop.php, Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
- Corpus name: GNOME. License: not specified. References: http://opus.nlpl.eu/GNOME.php, Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
- Corpus name: PATTR. License: Creative Commons BY-NC-SA. Considerations: Reverso Technologies Inc. acknowledges the authorship of European Patent Office (EPO) and the World Intellectual Property Organization (WIPO) (BY condition) and does not redistribute transcripts for commercial purposes (NC). Reverso Technologies Inc. only reproduces small parts of the contents, while preserving the original contents. References: http://www.cl.uni-heidelberg.de/statnlpgroup/pattr/, Wäschle, Katharina; Riezler, Stefan, 2014, "PatTR: Patent Translation Resource", https://doi.org/10.11588/data/10002, heiDATA, V3
- Corpus name: Ubuntu. License: not specified. References: http://opus.nlpl.eu/Ubuntu.php, Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
Acknowledgements
The first versions of the Reverso Context were developed with the help of Prompsit Language Engineering.