Paolo Monella Post-doc scholarship in Digital Humanities Accademia dei Lincei, Rome 2012

Vespa Project

The Iudicium coci et pistoris by Vespa, an experimental scholarly digital edition

Status of the project

Short version: The project is now discontinued (and the edition incomplete). I am now (2016) continuing my work on these methodological principles with the Ursus Project.

Longer version: I developed this project in the last months of a 12-months post-doc bourse in 2012 at the Accademia dei Lincei. The project goal was to provide a proof-of-concept prototype of scholarly digital edition, but I could not complete the edition at the end of the bourse (December 2012). After that date, the development of the project has been very slow, though I have exposed its methodological principles at a number of venues (see, for example, this article in the proceedings of the first AIUCD conference). From 2015/16 on, I definitely quit the work on the Vespa Project and started applying the same methodological principles to a new edition: see the Ursus Project within the ALIM research framework.


Musical score digital edition model

I am working at a digital scholarly edition of the Iudicium coci et pistoris iudice Vulcano by Vespa (Anth. Lat. 199 Riese) a Latin text in verse from the Late Antiquity, for which the main MS is the Codex Salmasianus.

The full rationale of this edition, which tests a number of experimental features, is discussed in detail in the talk Many witnesses, many layers. The abstract and the slides of the talk, as well as the pre-print full text of the published article, are on this page. The files (CSV, XML and Python) that compose the edition are in the paolomonella/vespa GitHub repository. A concise description of its rationale is in this webpage.

According to the assumption of the identification of three different layers (among many possible others) in the text, which I will call "graphic" (graphemes, paragraphematical signs and other graphic signs), "alphabetic" (alphabetic letters, or "alphabemes") and "linguistic" (inflected words), I add to my digital edition, as a functional component, two "Tables of signs" which will list the idividual signs corresponding to one encoding symbolic unit: a table for graphical signs and another for alphabetic letters.

The files I am working with are in the paolomonella/vespa GitHub repository (beware: work in progress!).

I am currently exploring different ways to linearise my "musical score" edition model (the relevant working files are linked and described below):

At the bottom of this page, I am also publishing:

Linearisation A (many XML files)

Linearisation A: a different XML/TEI file for each transcription layer. Alignment is done through <link> elements included in external XML/TEI files.

The files listed below are the ones I am currently experimenting with. They are available at the paolomonella/vespa GitHub repository. Some of them are described in my talk Many witnesses, many layers. Some methodological issues are more deeply discussed in my talk In the Tower of Babel; a more detailed discussion is in the article I derived from that talk:

My current workflow in a nutshell: I edit the two csv files 'by hand'. Script processes the two csv files and generates the xml files. The xml files whose name starts with align_ are the alignment files. The script also transforms the "tables of signs" into a complex <charDecl> element (and prepends it to relevant xml files).

This is what the source code of the three transcription files looks like (the arrows represent the alignment):


In the yellow rectangle below you can see a snippet of the source code of one of the alignment files (namely file align_alph_graph.xml):


Linearisation B (Menota)

Chapter 3 of The Menota Handbook v 2.0 also allows to encode a text at three layers, which Menota calls "facsimile", "diplomatic" and "normalised" (roughly corresponding to my "graphical", "alphabetic" and "linguistic"). To do so, they added three elements to XML/TEI, namely <me:facs> <me:dipl> <me:norm>.

The resulting code looks like this:

<w> <choice> <me:facs>&drot;<am>&osup;</am>ttin<am>&bar;</am></me:facs> <me:dipl>d<ex>ro</ex>ttin<ex>n</ex></me:dipl> <me:norm>Dróttinn</me:norm> </choice> </w>

The main differences between the current Menota encoding practice (as far as I know it) and the goals of the present edition are the following:

  1. The finest granularity allowed by the Menota markup scheme is word-level granularity, while I need alignment at grapheme-level granularity
  2. All three Menota transcription layers share the same set of 'characters', while my "graphical" and "alphabetic" transcriptions have each a different set of elements, each one completely described in a specific 'table of signs', and my "linguistic" transcription does not encode inflected words as a sequence of letters, but with unique IDs
  3. I want to have formal and explicit definitions of each element (grapheme and alphabetic letter) used in the transcription, while Menota relies on Unicode to define each encoded sign.

So I am trying to tweak the Menota markup to fit my own goals. The first result of this ongoing (as of 29/12/2012) experiment is the following file:

As alignment between word and graphemes and between word and alphabetic letters is granted by the inclusion of Menota <me:facs> and <me:dipl> elements in the <w> element, the only alignment required here by means of <link> elements is between graphemes and alphabetic letters. These <link> elements are included in the menota.xml file. The XML file now also includes the <charDecl> element (but only for the Table of signs/graphemes; I don't know how to include also the Table of signs/alphabemes in the header of the same XML file).

The major tweaks I'm introducing or am about to introduce to face the three issues listed above are the following:

Also note that in Linearisation A above, the one with three different XML files and no Menota markup, there was no need to differentiate between grapheme_m and alphabeme_m simply because each transcription (graphic, alphabetic) went into a different XML file, and each XML transcription file should include a different 'table of signs' in its TEI Header).


To-do list