Graduation Year

2008

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Computer Science and Engineering

Major Professor

Sudeep Sarkar, Ph.D.

Keywords

Sign language recognition, Movement epenthesis, Hand segmentation, Hidden Markov models, Dynamic time warping, Level building

Abstract

Dynamic programming has been widely used to solve various kinds of optimization problems.In this work, we show that two crucial problems in video-based sign language and gesture recognition systems can be attacked by dynamic programming with additional multiple observations. The first problem occurs at the higher (sentence) level. Movement epenthesis[1] (me), i.e., the necessary but meaningless movement between signs, can result in difficulties in modeling and scalability as the number of signs increases. The second problem occurs at the lower (feature) level. Ambiguity of hand detection and occlusion will propagate errors to the higher level. We construct a novel framework that can handle both of these problems based on a dynamic programming approach. The me has only be modeled explicitly in the past. Our proposed method tries to handle me in a dynamic programming framework where we model the me implicitly. We call this enhanced Level Building (eLB) algorithm.

This formulation also allows the incorporation of statistical grammar models such as bigrams and trigrams. Another dynamic programming process that handles the problem of selecting among multiple hand candidates is also included in the feature level. This is different from most of the previous approaches, where a single observation is used. We also propose a grouping process that can generate multiple, overlapping hand candidates. We demonstrate our ideas on three continuous American Sign Language data sets and one hand gesture data set. The ASL data sets include one with a simple background, one with a simple background but with the signer wearing short sleeved clothes, and the last with a complex and changing background. The gesture data set contains color gloved gestures with a complex background. We achieve within 5% performance loss from the automatically chosen me score compared with the manually chosen me score.

At the low level, we first over segment each frame to get a list of segments. Then we use a greedy method to group the segments based on different grouping cues. We also show that the performance loss is within 5% when we compare this method with manually selected feature vectors.

Share

COinS