Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
In many scenes with human characters, interacting groups are an important factor for maintaining a sense of realism. However, little is known about what makes these characters appear realistic. In this paper, we investigate human sensitivity to audio mismatches (i.e., when individuals’ voices are not matched to their gestures) and visual desynchronization (i.e., when the body motions of the individuals in a group are mis-aligned in time) in virtual human conversers. Using motion capture data from a range of both polite conversations and arguments, we conduct a series of perceptual experiments and determine some factors that contribute to the plausibility of virtual conversing groups. We found that participants are more sensitive to visual desynchronization of body motions, than to mismatches between the characters’ gestures and their voices. Furthermore, synthetic conversations can appear sufficiently realistic once there is an appropriate balance between talker and listener roles. This is regardless of body motion desynchronization or mismatched audio.
Ennis, C., McDonnell, R. and O'Sullivan, C. (2010) Seeing is Believing: Body Motion Dominates in Multisensory Conversations. ACM transactions on graphics, Volume 29, No. 4, 2010. doi:10.1145/1778765.1778828