Bruce Robertson, Optical Character Recognition of 19th Century Polytonic Greek Texts. Results of A Preliminary Survey

This is a quantitative overview of a strategy for performing optical character recognition on text images comprising ancient Greek. We produced 22 different classifiers to conduct OCR on 19th-century ancient Greek texts from around the world. For each classifier, we processed 10 page images from 158 books. The output was scored for its 'Greekness' on phonetic and lexical grounds, and summarized in a table. In the majority of cases, the output of each text's highest-scoring classifier is of sufficient quality to be useful in further research and image-fronted search engines. There is a good correlation between the best classifier or group of classifiers and the publisher and publication date. This confirms the usefulness of our approach, and will simplify OCR of occasional Greek words in other texts by the same publishers. Better line-segmentation strategies will provide the greatest single improvement in this process. Source: http://www.perseus.tufts.edu/publications/dve/RobertsonGreekOCR/ (it's the abstract attached to the digital publication)

