Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
2. ENGINEERING AND TECHNOLOGY, Communication engineering and systems
This paper presents a machine translation system (Hutchins 2003) called UniArab (Salem, Hensman and Nolan 2008). It is a proof-of-concept system supporting the fundamental aspects of Arabic, such as the parts of speech, agreement and tenses. UniArab is based on the linking algorithm of RRG (syntax to semantics and vice versa). UniArab takes MSA Arabic as input in the native orthography, parses the sentence(s) into a logical meta-representation based on the fully expanded RRG logical structures and, using this, generates perfectly grammatical English output with full agreement and morphological resolution. UniArab utilizes an XML-based implementation of elements of the Role and Reference Grammar theory in software. In order to analyse Arabic by computer we first extract the lexical properties of the Arabic words (Al-Sughaiyer and Al-Kharashi 2004). From the parse, it then creates a computer-based representation for the logical structure of the Arabic sentence(s). We use the RRG theory to motivate the computational implementation of the architecture of the lexicon in software. We also implement in software the RRG bidirectional linking system to build the parse and generate functions between the syntax-semantic interfaces. Through seven input phases, including the morphological and syntactic unpacking, UniArab extracts the logical structure of an Arabic sentence. Using the XML-based metadata representing the RRG logical structure, UniArab then accurately generates an equivalent grammatical sentence in the target language through four output phases. We discuss the technologies used to support its development and also the user interface that allows for the addition of lexical items directly to the lexicon in real time. The UniArab system has been tested and evaluated generating equivalent grammatical sentences, in English, via the logical structure of Arabic sentences, based on MSA Arabic input with very significant and accurate results (Izwaini 2006). At present we are working to greatly extend the coverage by the addition of more verbs to the lexicon. We have demonstrated in this research that RRG is a viable linguistic model for building accurate rulebased semantically oriented machine translation software. Role and Reference Grammar (RRG) is a functional theory of grammar that posits a direct mapping between the semantic representation of a sentence and its syntactic representation. The theory allows a sentence in a specific language to be described in terms of its logical structure and grammatical procedures. RRG creates a linking relationship between syntax and semantics, and can account for how semantic representations are mapped into syntactic representations. We claim that RRG is very suitable for machine translation of Arabic, notwithstanding well-documented difficulties found within Arabic MT (Izwaini, S. 2006), and that RRG can be implemented in software as the rule-based kernel of an Interlingua bridge MT engine.
Y. Salem and B. Nolan.(2009) UNIARAB: An Universal Machine Translator System For Arabic Based On Role And Reference Grammar, in Proceedings of the 31st Annual Meeting of the Linguistics Association of Germany (DGfS 2009), University of Osnabruck, Germany, March 2009.