Abstract

Philipp Roelli, Jan Ctibor, A new version of Corpus Corporum, the latin full-text database and tool

The article provides background information on the freely accessible online project Corpus Corporum, the largest structured Latin text meta-corpus in existence. A completely reworked version of software and presentation was launched in 2021. The main purpose of the project is to provide readers access to Latin texts, help in perusing them, and in finding passages. The new version is more stable, more easily extendible, and provides a number of new features, some of them still under construction. The project uses exclusively free and open software, most importantly BaseX, TreeTagger, and Sphinx. TEI XML files are used as input, they are automatically PoS-tagged and lemmatised. Users can then read the texts and by clicking words visualise lemma entries in some of the most important Latin dictionaries. The major novelties are: searches that can ignore orthographic and medieval spelling variation, the automatic identification of possible text-reuse, and metrical analyses.

Paolo Monella Curriculum
DH bibliography
Paolo Monella home page