Graduation Year

2010

Document Type

Thesis

Degree

M.S.Cp.E.

Degree Granting Department

Computer Science and Engineering

Major Professor

Sudeep Sarkar, Ph.D.

Co-Major Professor

Rangachar Kasturi, Ph.D.

Committee Member

Dmitry Goldgof, Ph.D.

Keywords

Conversation change, Temporal scales, Turn pattern, Multimedia analysis, Taxonomy

Abstract

Automatic analysis of conversations is important for extracting high-level descriptions of

meetings. In this work, as an alternative to linguistic approaches, we develop a novel, purely

bottom-up representation, constructed from both audio and video signals that help us char-

acterize and build a rich description of the content at multiple temporal scales. Nonverbal

communication plays an important role in describing information about the communication

and the nature of the conversation. We consider simple audio and video features to extract

these changes in conversation. In order to detect these changes, we consider the evolution of the

detected change, using the Bayesian Information Criterion (BIC) at multiple temporal scales

to build an audio-visual change scale-space. Peaks detected in this representation yields group

turn based conversational changes at di
erent temporal scales.

We use the NIST Meeting Room corpus to test our approach. Four clips of eight minutes

are extracted from this corpus at random, and the other ten are extracted after 90 seconds of

the start of the entire video in the corpus. A single microphone and a single camera are used

from the dataset. The group turns detected in this test gave an overall detection result, when

compared with di
erent thresholds with xed group turn scale range, of 82%, and a best result

of 91% for a single video.

Conversation overlaps, changes and their inferred models o
er an intermediate-level de-

scription of meeting videos that are useful in summarization and indexing of meetings. Since

the proposed solutions are computationally e cient, require no training and use little domain

knowledge, they can be easily added as a feature to other multimedia analysis techniques.

Share

COinS