Multimodal Discourse: Gesture, Speech and Gaze

Francis Quek

Vision Interfaces and System Laboratory

Computer Science & Engineering Department

Wright State University

 

Human discourse is an dynamic process of converting thoughts into speech, gesture, and gaze activity.  Grounded on the psycholinguistic foundations on the production of such multimodal ‘conversational-acts’, we address the interpretation of gesture, speech, and gaze in the context of discourse management.  We investigate the cues afforded by each mode of interaction and the algorithms necessary to detect and extract them; study the spatial and temporal relationships among these cues and associate them with topical units in discourse; study the interactions of gesture, speech and gaze in discourse segmentation; and a multimedia database system that integrates these elements into a coherent whole.  Our approach involves experiments designed to discover and quantify cues in the various modalities, and their relation with respect to discourse management; the development of computational algorithms to detect and recognize such cues; and the integration of these cues into a cogent discourse management system.

We present psycholinguistic phenomena that are detected by our analysis.  The understanding of how such phenomena are detectable from video and audio signal, and the determination of the kinds of computable cues that support such analysis are the first steps toward the bridging the signal-sense gap in multi-modal interaction. Among these are cues for semantic segmentation and organization, cross-modal temporal integration, and the significance of ‘hold tension release’.

We have assembled a strong interdisciplinary team comprising psycholinguistic, machine vision and signal processing researchers to address the holistic nature of discourse and language itself.  This permits us to base our research squarely on the realities of human communication in spontaneous discourse across a wide range of pragmatic conditions.  Technology is being developed that have significant impact on natural language discourse analysis, human-computer interaction systems, neuropathological studies (Parkinson’s Disease and Left/Right Hemisphere Damage) and discourse and video databases.  Another significant outcome of this research is to introduce computational and quantitative rigor to the psycholinguistic study of discourse production.  This represents a model of collaborative research between the fields of engineering and cognitive science.