Paolo Monella - Fabio Cusimano Linking Text and image: TEI XML and IIIF

Italiano English

ReIReS 2019

1. Details

1. Dettagli

On Wednesday July 3, 2019 Paolo Monella and Fabio Cusimano led a workshop entitled Linking Text and image: TEI XML and IIIF in the framework of the Summer School ReIResources: Sharing Resources in a Networked Digital Ecosystem (Bologna, Italy July 3-5) organized by ReIReS (Research Infrastructure on Religious Studies) and Fscire (Fondazione per le scienze religiose) Giovanni XXIII in partnership with AIUCD (Associazione per l'Informatica Umanistica e la Cultura Digitale) and Veneranda Biblioteca Ambrosiana.

Paolo Monella led part 1 of the workshop (on TEI XML); F. Cusimano led part 2 (on IIIF).

Mercoledì 3 luglio 2019 Paolo Monella e Fabio Cusimano hanno condotto un workshop intitolato Linking Text and image: TEI XML and IIIF nel contesto della summer school ReIResources: Sharing Resources in a Networked Digital Ecosystem (Bologna, 3-5 luglio) organizzata da ReIReS (Research Infrastructure on Religious Studies) e Fscire (Fondazione per le scienze religiose) Giovanni XXIII in partnership con AIUCD (Associazione per l'Informatica Umanistica e la Cultura Digitale) e Veneranda Biblioteca Ambrosiana.

Paolo Monella ha guidato la parte 1 del workshop (su TEI XML); F. Cusimano, la parte 2 (su IIIF).

2. Abstract

Part 1 (TEI XML)

Parte 1 (TEI XML)

In the first part of the workshop, led by Paolo Monella and centered on digital textual modelling and TEI XML, students will create a digital (formal, machine-actionable) model of a portion of a text from a medieval manuscript, both gaining hands-on experience and reflecting on the methodological and theoretical foundations and issues of textual modelling.

They will follow a inductive path, moving from the elementary structures of the computer (a sequence of binary states, "on/off", "yes/no", often represented by "0" and "1") to binary and decimal numbers and charsets (ASCII and Unicode).

At this point, the hands-on experience will start: students will create their own textual markup language based on symbols of their choice and will be asked to reflect on the theoretical and methodological issues arising from inline markup.

They will then be introduced to the SGML/XML syntax and to the TEI XML vocabulary and will encode a brief textual portion taken from a medieval manuscript, based on its digital images and using the TEI module for the transcription of primary sources.

International Image Interoperability Framework logo
IIIF logo

The students will then be presented, and will practice, two alternative strategies for combining TEI XML, the current standard for scholarly text encoding, with IIIF, the rising standard for online image metadata and annotation:

  1. the first approach consists in linking to the digital images of the manuscript from within the TEI XML source, for example with the TEI attribute @facs;
  2. with the second approach, the whole TEI XML transcription is included in the IIIF metadata as an "Annotation".

This will constitute a bridge with the second part of the workshop, led by Fabio Cusimano, focussed on IIIF.

Nella prima parte del workshop, condotta da Paolo Monella e centrata sulla modellizzazione digitale del testo e su TEI XML, gli studenti creeranno un modello (formale, machine-actionable) di una porzione di testo tratto da un manoscritto medievale, da un lato ottenendo esperienza diretta e dall'altro riflettendo sulle basi teorico-metodologiche e sulle questioni aperte della modellizzazione del testo.

I corsisti saranno accompagnati in un percorso induttivo e laboratoriale che partirà dalle strutture elementari del funzionamento del computer (una sequenza di stati binari, "aceso/spento", "sì/no", spesso rappresentati con "0" ed "1"), fino ai numeri binari, a quelli decimali, ai CharSet (tabelle di caratteri come ASCII o Unicode).

A questo punto, inizierà l'esperienza diretta: gli studenti creeranno un loro linguaggio di markup basato su simboli scelti da loro, e saranno portati a riflettere sulle questioni teoriche e metodologiche legate all'inline markup.

Saranno dunque introdotte la sintassi SGML/XML e il vocabolario TEI XML. Gli studenti codificheranno una breve porzione testuale tratta da un manoscritto medievale, partendo dalle sue riproduzioni digitali e usando il modulo TEI per la trascrizione delle fonti testuali.

International Image Interoperability Framework logo
Logo IIIF

Infine, si presenteranno agli studenti due strategie alternative di integrazione tra TEI XML (lo standard attuale per la codifica testuale nel mondo della ricerca umanistica) e IIIF (lo standard emergente per la metadatazione e l'annotazione di immagini nel Web):

  1. il primo approccio consiste nel creare nel codice TEI XML (ad esempio tramite l'attributo TEI @facs) link che puntino alle immagini digitali del manoscritto;
  2. col secondo approccio, l'intera trascrizione TEI XML è incusa all'interno dei metadati IIIF come "Annotation".

Gli studenti praticheranno entrambe le strategie. Ciò costituirà un ponte verso la seconda parte del workshop, condotta da Fabio Cusimano e centrata su IIIF.

Part 2 (IIIF)

Parte 2 (IIIF)

Biblioteca Ambrosiana - settembre 2013
Veneranda Biblioteca Ambrosiana

The second part of the workshop will be focused on digitization good practices, digital library design and IIIF (International Image Interoperability Framework).

Fabio Cusimano will introduce these topics as tiles of a complex mosaic, starting from a real-life case study: the on-going digitization experience at the Veneranda Biblioteca Ambrosiana in Milan.

Then, the students will be presented the IIIF Web-based approach as a way to literally unlock digital collections thanks to LD (Linked Data). From the concept of the capsa librarum, or of the bibliotheca – as the etymology of the word itself suggests – to the open and freely accessible library in the digital dimension.

3. Workshop plan

3. Programma del workshop

Trainer Module From To Topic/activity
Monella Digital textual modelling 11.00 11.20 Concepts of model, formal model and digital model
11.20 11.40 Let's build a digital textual model: binary numbers, digital numbers, charsets (ASCII and Unicode), textual markup
Monella TEI XML 11.40 11.50 TEI (Text Encoding Initiative) XML
11.50 13.00 Encoding a portion of a manuscript in TEI XML based on the manuscript images
  Lunch break 13.00 15.00
Cusimano The Veneranda Biblioteca Ambrosiana 15.00 15.15 The Veneranda Biblioteca Ambrosiana and its new digital infrastructure
Cusimano Designing a new digital library devoted to manuscripts 15.15 15.30 Facing the preservation risks
15.30 16.00 Some good practices in digitization
Cusimano A new approach: IIIF - International Image Interoperability Framework 16.00 16.20 IIIF Core APIs: Image API & Presentation API
16.20 16.40 IIIF Canvas and the .json Manifest
16.40 17.00 IIIF & the image viewer Mirador: playing with images
17.00 17.20 The image viewer Mirador and the UI as a research tool: annotating images
Monella Linking TEI and IIIF (International Image Interoperability Framework). Two strategies 17.20 17.50 TEI 2 IIIF: Linking from within the TEI XML transcription source code (attribute @facs) to IIIF
Monella Linking TEI and IIIF (International Image Interoperability Framework). Two strategies 17.50 18.20 IIIF 2 TEI: Including the TEI XML transcription in the IIIF JSON metadata as "Annotation"
Cusimano - Monella Post-workshop dissemination 18.20 18.30 Brainstorming: how do you think to train your colleagues on what you have learned during this workshop?

4. Materials

4. Materiali

4.1 Framasoft shared pad

4.1 Pad condiviso Framasoft

  1. Framasoft pad
  1. Pad condiviso

4.2 From digital textual modelling to TEI XML

4.2 Dalla modellizzazione testuale digitale a TEI XML

Folio 13r of MS Ambr. D 23 sup.
Folio 13r of MS Ambr. D 23 sup.
  1. Example of TEI XML encoding of lines 1-5 of folio 13v of manuscript Ambr. D 23 sup. in TEI XML:
    1. Digital facsimile
    2. TEI XML transcription
  2. Let's encode lines 3-14 of folio 13r of manuscript Ambr. D 23 sup. in TEI XML:
    1. Template file to start from
    2. Digital facsimile of the page
    3. Plain text transcription
    4. Complete TEI XML transcription
  3. Codifichiamo le righe 3-14 del folio 13r del manoscritto Ambr. D 23 sup. in TEI XML
    1. File template da cui partire
    2. Riproduzione digitale della pagina
    3. Trascrizione

4.3 Linking TEI and IIIF: TEI 2 IIIF

4.3 Collegare TEI e IIIF: TEI 2 IIIF

  1. Linking (from within TEI) to a static image:
    <pb n="13r" facs="13r.jpg"/>
  2. OxGarage: converting TEI XML to HTML for visualization. Instructions:
    1. Left: Convert from → Documents → TEI P5 XML Document
    2. Right: Convert to → xHTML
    3. Left: Select file to convert → Button Browse/Sfoglia → select and upload your TEI XML file (mytrascription.xml)
    4. Right: Upload images → → Button Browse/Sfoglia → select and upload the image with the manuscript page facsimile
    5. Bottom, center: click on button Convert
    6. In a few seconds, a download dialog window appears → click Save/Salva to save the HTML file mytranscription.html
    7. Open the downloaded HTML file (double click; your default browser will open it)
  3. Linking (from within TEI) to a whole IIIF JSON manifest:
    <pb n="13r" facs="http://213.21.172.53/manifests/public/0b002711800e7d6d.json"/>
  4. Linking (from within TEI) to a specific canvas (folio) within the IIIF JSON manifest:
    <pb n="13r" facs="http://213.21.172.53/manifests/public/0b002711800e7d6d.json#/sequences/0/canvases/35"/>

4.4 Linking TEI and IIIF: IIIF 2 TEI

4.4 Collegare TEI e IIIF: IIIF 2 TEI

4.4.1 Workshop activities

  1. Mirador:
    • Visualize an annotation
    • Create an annotation
  2. Hacking the IIIF JSON code

4.4.2 Hacking the IIIF JSON code: visualize (read) an annotation

  1. IIIF link to be visualized with Mirador (MS Ambr. D 23 sup)
    • Folio 13v (IIIF JSON Annotation already published): scroll right to 036_D23sup_c.13v
    • Folio 13r (the page we transcribed): scroll right to 035_D23sup_c.13r
    Mirador
  2. IIIF manifest (JSON file).
    • path to annotationlist on page 13v: [Collapse all] sequences / 0 / canvases / 35 / otherContent / 0 / @id (or find / control-F "otherContent")
    IIIF JSON manifest code
    IIIF JSON manifest code
  3. IIIF annotationList on page 13v (JSON file).
    • Path to actual transcription: [Collapse all] resources / 0 / resource
    JSON annotationList code

4.4.3 Annotation types

Annotation including plain text (current annotation in our annotationList):

"resource":{
    "@id": "http://example.org/iiif/book1/res/comment1.html"
    "@type":	"cnt:ContentAsText"
    "format":	"text/plain"
    "chars":	"Cuius describtio per prouincias et gentes haec est: Lybia, Cyrinaica et<br/><br/>Pentapolis post Aegyptum in parte Affricae prima est. Haec incipit a<br/><br/>ciuitate Parethonio et montibus Catabathmon, inde secundo mari usque ad<br/><br/>aras Philinorum extenditur."
  }

Annotation linking to an HTML page:

"resource":{
    "@id": "http://www1.unipa.it/paolo.monella/reires2019/code/d23sup13r/transcription.html",
    "@type": "dctypes:Text",
    "format": "text/html"
  }

Annotation linking to a (TEI) XML file:

"resource": {
        "@id": "http://www1.unipa.it/paolo.monella/reires2019/code/d23sup13r/transcription.xml",
        "@type": "dctypes:Text",
        "format": "application/tei+xml"
      }

4.4.4 Hacking the IIIF JSON code: create (edit) an annotation

  1. Open the annotationList
    JSON annotationList code
  2. If necessary, control-U or right click / View source code
    JSON annotationList code
  3. Select all the JSON annotationList source code (control-A or right click / Select all) and copy it
    JSON annotationList code
  4. Open the online JSON-editor
    JSON-editor
  5. Paste the JSON annotationList source code into the left window of the JSON-editor
  6. Click on Format JSON
  7. At the top of the right window, select View
  8. Edit the code in the left window. After each edit, click on Format JSON to get a clearer view of the code in the right window

5. Final project

5. Progetto finale

  1. Create a new folder project
  2. Put your TEI XML transcription of folio 13r in that folder. If the OxGarage conversion to HTML (see below) does not work, download the complete TEI XML transcription and use it instead
  3. Download the Digital facsimile of folio 13r to the same folder
  4. Convert your TEI XML file to HTML with OxGarage. Instructions:
    1. Left: Convert from → Documents → TEI P5 XML Document
    2. Right: Convert to → xHTML
    3. Left: Select file to convert → Button Browse/Sfoglia → select and upload your TEI XML file (mytrascription.xml)
    4. Right: Upload images → → Button Browse/Sfoglia → select and upload the image with the manuscript page facsimile
    5. Bottom, center: click on button Convert
    6. In a few seconds, a download dialog window appears → click Save/Salva to save the HTML file mytranscription.html
    7. Open the downloaded HTML file (double click; your default browser will open it)
  5. Save the downloaded HTML file to the project folder
  6. Open the HTML file with Sublime and edit it as you want (examples: add a new <div> with the translation or any note; change the title; add the names of the curators or a link to the ReIReS website)
  7. Create a foglio.css file in the same folder
  8. Connect the HTML file with the foglio.css file by inserting <link rel="stylesheet" href="../indice/stile/foglio.css" type="text/css" /> in the <head> of the HTML file
  9. Edit the foglio.css as you want to change the HTML page style

7. Suggested readings

7. Suggerimenti bibliografici

7.1 DH and TEI XML

7.2 DH e TEI XML

7.2 IIIF