Document Type

Conference Paper


Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Publication Details

In Proceedings of the AAAI Symposium on Dialog with Robots, Arlington, Virginia, USA. 11th - 13th Nov 2010.


People often refer to objects by describing the object's spatial location relative to another object. Due to their ubiquity in situated discourse, the ability to use 'locative expressions' is fundamental to human-robot dialogue systems. A key component of this ability are computational models of spatial term semantics. These models bridge the grounding gap between spatial language and sensor data. Within the Artificial Intelligence and Robotics communities, spatial template based accounts, such as the Attention Vector Sum model (Regier and Carlson, 2001), have found considerable application in mediating situated human-machine communication (Gorniak, 2004; Brenner et a., 2007; Kelleher and Costello, 2009). Through empirical validation and computational application these template based models have proven their usefulness. We argue, however, that these models ignore important contextual features; resulting in their over-generalization and failure to account for actual usage in situated context. Such over-simplifications are a natural consequence of the experimental design taken in acquiring these models. That is, the data behind and hence the subsequent modelling of template based accounts used simplified scenes and reduced 2-dimensional survey based object configurations. While this is understandable given the original aims of these studies, we nevertheless believe that this is not sufficient justification for the direct application of idealized spatial templates to situated communication. This critique of template based models is similar in spirit to critiques already put forward by a number of researchers: Coventry and Garrod (2004) have stressed the need to account for functional effects; Kelleher and Costello (2009) highlighted the need to account for the effects introduced by distractors. Here, we argue that the models must also be extended to incorporate perspective effects.