Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Being able to respond appropriately to users’ overlaps should be seen as one of the core competencies of incremental dialogue systems. At the same time identifying whether an interlocutor wants to support or grab the turn is a task which comes naturally to humans, but has not yet been implemented in such systems. Motivated by this we first investigate whether prosodic characteristics of speech in the vicinity of overlaps are significantly different from prosodic characteristics in the vicinity of non-overlapping speech. We then test the suitability of different context sizes, both preceding and following but excluding features of the overlap, for the automatic classification of collaborative and competitive overlaps. We also test whether the fusion of preceding and succeeding contexts improves the classification. Preliminary results indicate that the optimal context for classification of overlap lies at 0.2 seconds preceding the overlap and up to 0.3 seconds following it. We demonstrate that we are able to classify collaborative and competitive overlap with a median accuracy of 63%.
Oertel, C, et al. (2012) Context Cues for Classification of Competitive and Collaborative Overlaps. Speech Prosody, 6th International Conference, Shanghai, China, 22-25, May. doi:10.21427/D7C89S