Abstract Paolo Monella, Scritture dimenticate, scritture colonizzate: sistemi grafici e codifiche digitali [Forgotten scripts, colonized scripts: graphic systems and digital encodings]

English version, extended abstract

Introduction

Societies that design and produce technologies have the option of modelling them on their own cultures. The others must model their culture to fit that particular technology. In my talk I will examine examples of how current digital encoding technologies represent and manipulate non-Western scripts, namely the graphical systems of India (Devánāgarī) and the Middle East (Arabic).

Three Gutenberg principles

In modern Europe the print technology has remodelled the Latin, Greek and Cyrilllic graphical sistems on a rigidly "alphabetic model, based on three principles:

1 ↔ 1. One grapheme corresponds to one alphabetic letter and to one phoneme only;
1 = 1. All graphemes have the same status (e.g. vowels, consonants, long and shor vowels);
1, 2, 3… The script is a one-dimension, one-direction sequence of elements all on the same "level". There is only a "back" and a "forward", no "up" and "down".

No universal principles

These principles did not apply to handwritten medieval European graphic systems and do not apply today to non-western ones, including their print version. In many scripts diacritics orbit around (above, under, before or after) base graphemes. Examples include "subscribed iota" and rough breathing in Greek, Devánāgarī and Hebrew vowels, Arabic ḥarakāt, not to mension Far East Asia scripts.

Digital challenges

Print did not have a critical standardizing (i.e. westernizing) impact on Devánāgarī and Arabic scripts, but digital text encoding technologies pose a more subtile challenge in this respet. Those technologies have mostly been developed in the USA and the West and are based on the principles of western print.

Devánāgarī

In the Indic Devánāgarī script vocalic graphemes connect with consonantic ones to form one syllabic glyph. A vowel can be positioned below, on the right or on the left of its consonant. In the letter case (left), since the script flows left to right, like the Latin one, the vowel is written "before", not "after" the consonant that it modifies. This contradicts the Gutenberg principle of unidirectionality (1, 2,, 3…).

However, this is no arbitrary inversion of the script direction, but can happen because vowels are modifiers of consonants (thus contradicting the 1 = 1 Gutenberg principle).

The ASCII and the Unicode encodings ignore this distinction of status. Today, a Devánāgarī word is simply converted to an unidirectional sequence of numbers ("code points"), all equivalent.

Arabic

The Arabic script has a wealth of diacritics, but I will only discuss the ḥarakāt here. They are signs added above or below a consonant to specify the short vowel with which it should be pronounced.

From an Arabic viewpoint, the ḥarakāt contradict all three Gutenberg principles:

1 ↔ 1. They are optional, and normally omitted, thus negating the principle for which any grapheme corresponds to one phoneme.
1 = 1. Their status is lower than that of graphemes for consonants and long vowels, which constitute the "backbone" of the word, sufficient to identify it.
1, 2, 3… Arabic script, both handwritten and printed, does not develop on one dimension only: diacritics (i‘jām and tashkīl, which include the ḥarakāt) are written above or below a base grapheme and combine with it.

Again, ASCII and Unicode ignore all this, attribute to all graphemes (consonants, long and short vowels) equivalent code points and aligns those numbers in a horizontal ordered sequence.

Frictions

Both in the Arabic and Devánāgarī scripts, the issues are not apparent in the input an visualization phases, thanks to software managing typing on a real or vitual keyboard and rearrangement of glyphs on the screen, including ligatures and diacritics.

However, in the phase of text processing issues emerge. This includes most simple operations such as string matching on a web page or a database: if an Arabic user types a word with the ḥarakāt, she will not find its instances encoded without the ḥarakāt, and the other way around, unless a specific algorythm helps sidestepping the problem.

Arabīzī

Real or virtual Arabic keyboards facilitate the typing of a text in fuṣḥā (the high, unifying variant of the language), with the Arabic script.

But when young users text or chat on a mobile device in a vernacular Arabic variant, they increasingly tend to write the Arabic language in Latin characters, simpler to key on those devices, enginereed in the West.

The resulting script is called "Arabīzī" (Arabi + Anglizi, English) or "Franco-Arabic", and mixes Latin letters and numbers. Short vowels may be written or not, and no diacritic is used, for example to distinguish long and short vowels. Basically, those users confine themselves to the simplest and most "standard" encoding: ASCII.

This is an often overlooked aspect: technology can help with the glitches of the encoding of non-western scripts. See the extraordinarily complex Google search "Did you mean?" algorythms as an example. But not all technology is available everywhere. On a cellphone, typing in ASCII is just so much easier than typing in Arabic.

The political impact

Though arguably unplanned, this "technological colonization" of writing is political, because scripts, and the conception of the language that they imply, are identifying aspects of many cultures. The case of Arabic is evident.

In the early 20th century, Turkey and Arabic populations split, tearing the Ottoman Empire apart. Turks, in search of a national identity after the western model, replaced the Arabic script with the Latin one. On the other side, Arabs and Maghrebis rising up against the Ottoman Empire identified with the Arabic language and script.

As surprising as it may seem to westerners, the substantially syllabic structure of the Arabic script and its distinction of status between consonants, long and short vowels, play a role in the peceived unity and identiy of the Middle Eastern world. A unity that spans trhough time, from the Quran to today, and in time, from Morocco to Iraq.

In the perception of Arabic speakers and in its grammatical tradition, consonants and long vowels are sufficient to identify the root of a word.

But there is more at stake: graphemes representing consonants and long vowels - the written part of a word - remain largely identical in time and space. On the contrary, the actual pronunciation of some consonants, and above all short values - those parts of a word that are left out of the script - vary largely in time and, even more, in the regional variants of the language.

A word in fuṣḥā is the same word in the Quranic past and today, East and West, as long as one only writes its consonants and long vowels, i.e. only within the Arabic script. Writing short vowels, or more precisingly making them semiotically pertinent, implies destroying the cultural, social and potentially political unity of the Arabic-Islamic world.

This helps explaining reactions in the Arabic world against "Arabīzī".

Probably, if computers had been invented in Saudi Arabia or Israel, digital text encoding models would have been molded around the specific structures of semitic languages, and in some scripts some graphemes would have had encoded as structural (consonants and long vowels in Arabic, consonants in Hebrew or Devánāgarī) and others as their modifiers.

Such a model is definitely possible from a technological viewpoint. I myself, in my scholarly digital edition of the "De nomine" by Ursus Beneventanus (IX century, http://www.unipa.it/paolo.monella/ursus), used TEI XML markup to model the difference of status between base graphemes and abbreviation signs in the medieval Latin handwritten graphical sistem. The technology is there, if a strategic interest calls for it. The issue is not technological, but political.

Conclusion: some hope

In the Middle East reactions to social and political discomfort sometimes turn to religious fondamentalism and violence. While I firmly condemn those aspects, I think that the defense of the Arabic cultural unity and identiy is an understandable reaction to a perceived western exploitation.

In an ideal scenario a prosperous and free Middle East would not be exploited in the globalization arena or oppressed by corrupt regimes. This Middle East would safeguard its cultural identity, including the specific characteristics of its language and script, its perceived unity in time and space, without needing to use that identity as a defensive or offensive weapon.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.