Graduation Year

2010

Document Type

Thesis

Degree

M.S.Cp.E.

Degree Granting Department

Computer Science and Engineering

Major Professor

Sudeep Sarkar, Ph.D.

Committee Member

Rangachar Kasturi, Ph.D.

Committee Member

Dmitry Goldgof, Ph.D.

Keywords

Conversation change, Temporal scales, Turn pattern, Multimedia analysis, Taxonomy

Abstract

Automatic analysis of conversations is important for extracting high-level descriptions of meetings. In this work, as an alternative to linguistic approaches, we develop a novel, purely bottom-up representation, constructed from both audio and video signals that help us characterize and build a rich description of the content at multiple temporal scales. Nonverbal communication plays an important role in describing information about the communication and the nature of the conversation. We consider simple audio and video features to extract these changes in conversation. In order to detect these changes, we consider the evolution of the detected change, using the Bayesian Information Criterion (BIC) at multiple temporal scales to build an audio-visual change scale-space. Peaks detected in this representation yields group turn based conversational changes at different temporal scales. We use the NIST Meeting Room corpus to test our approach. Four clips of eight minutes are extracted from this corpus at random, and the other ten are extracted after 90 seconds of the start of the entire video in the corpus. A single microphone and a single camera are used from the dataset. The group turns detected in this test gave an overall detection result, when compared with different thresholds with fixed group turn scale range, of 82%, and a best result of 91% for a single video. Conversation overlaps, changes and their inferred models offer an intermediate-level description of meeting videos that are useful in summarization and indexing of meetings. Since the proposed solutions are computationally efficient, require no training and use little domain knowledge, they can be easily added as a feature to other multimedia analysis techniques.

Share

COinS