Querying Recorded Information with Natural Language

Erqiang Zhou, Dublin Institute of Technology

Document Type Theses, Ph.D

Successfully submitted for the award of Doctor of Philosophy (Ph.D) to the Dublin Institute of Technology, September, 2011.

Abstract

Information systems that can retain the changing histories of designated information units, and thereafter can reproduce such histories have not been proposed and investigated. Methods for querying and retrieving the retained information are also needed for this kind of information systems. Although natural language interfaces has been studied for years, a comprehensive support for complex time phrases combining with nested relative clauses has not been found. The research devotes its e orts into a generic framework which not only supports the mechanism of recording and replaying the interactive information between users and information systems, but also provides an ability to query the recorded information with natural language. Within the framework, a modelling approach which aims at COmplex INformation Speci cation (COINS) is proposed to facilitate the process of providing records of information that can be queried, played and replayed. The COINS model consists of three components: 1) a structure model employing Entity Relationship (ER) approach; 2) a rule mechanism which deals with logic constraints among attributes, entities and relationships and; 3) a recorder component which supports a mechanism of recoding and replaying the history of evolving information. The framework also provides a speci cation language named COINS Language, which not only incorporates the COINS modelling approach, but also aims for splitting the semantics of domain knowledge from programming languages. While the COINS modelling approach focuses upon the input of an information system, the querying part aims for the output of the information system. This thesis provides two approaches to retrieve information units: natural language query and visual query. In detail, this thesis presents a knowledge representation model, Master Dictionary (MD), and an approach for translating natural language queries into computerised query statements. The MD captures semantics of natural language words in a domain. An analysis of how to build such an MD is provided with examples of building up sample MDs. A prototype system based on the MD is developed to demonstrate how English queries can be translated into SQL statements. Experiment results show that complex queries, such as queries having more than 30 words with two relative clauses, can be correctly translated. In addition to natural language query, a visual query method has also been designed and implemented within the prototype system. The main contributions of this thesis are: a novel modelling approach for specifying complex information; a novel knowledge representation model and an approach for processing natural language based on the model; a proof-of-concept prototype system with a visual query method for replaying recorded information units.