Stage I - Viseme Detection Using Grids

Both types of viseme classifiers start by skin segmenting the frame as shown in figure 2. This is done by locating the face using the viola jones face detector^[3] and using it to learn a Gaussian skin colour model. The hands and face are then segmented for further processing by the classifiers.

Classifiers which describe the position of the motion (TAB visemes) are learnt by localising the hands in relation to the signer. A grid is applied to the image dependant upon the position and scale of the face detection as shown in figure 3 (a). The skin segmented frame is then quantised into this grid. For each of the tab visemes a classifier can then be built via boosting to show which rectangles fire for that particular viseme, examples of these classifiers are shown in figure 3 (b).

Figure 2 Example of a skin segmented frame

Figure 3 a) The grid around a signers face and b) the learnt classifiers for two tab visemes.

Stage I - Viseme Detection Using Grids

Learning Sign From Subtitles

Large Lexicon Detection

Spatio-Temporal Features