Anneli Meurman-Solin, Structured Text Corpora in the Study of Language Variation and Change

From the perspective of a compiler of electronic corpora, one of the major challenges in the attempt to improve their quality is the need to carefully reconsider how language-external variables used to structure them could be defined and conceptualized more precisely to justify references to them as factors conditioning language variation and change. How these variables relate to one another should also be specified. In examining criteria for assessing representativeness of corpora, the concept of range is discussed to stress the evident differences between texts categorized as representatives of a specific genre. Good practices of philological computing are highlighted by illustrating what kind of information can be lost if scholarly rigour is not applied in the process of editing and/or digitizing texts.

