Sudeshna Sarkar, Regular Expression Matching for Multi-script Databases

Modern database systems mostly support representation and retrieval of data belonging to different scripts and different languages. But the database functions are mostly designed or optimized with respect to the Roman script and English. Most database querying languages include support for regular expression matching. However the matching units are designed for the Roman script, and do not satisfy the natural requirements of all other scripts. In this paper, we discuss the different scripts and languages in use in the world, and recommend the type of regular expression support that will suit the needs for all these scripts. We also discuss crosslingual match operators and matching with respect to linguistic units.

