SeeStorm Phoneme Recognition and LipSync Technologies
SeeStorm Phoneme Recognition technology analyzes voice signal and identifies human speech phonemes in it. The reliability of recognition is up to 95% in automatic text-dependent and independent modes without associating of specific language of the input speech. Phoneme Recognition can run in real time or process pre-recorded voice messages.
Phoneme Recognition is based on the Artificial Neural Networks technology (ANN), which is used to classify the acoustic characteristics (coefficients obtained from analysis) in order to recognize phonemes. Phonemes are subdivided into several groups of the similar basic articulation type for the purpose of their visualization. In addition, energy variation is used to generate satisfactory co-articulation for the speech-based animation.
Phoneme visualization is based on Face Mimic Modeling technology that uses 3D Model frame sequences to display the lip motions corresponding to speech data, and accompanying mimics (movements of the head, eyes, and eyebrows). All facial motions are synchronized with fluctuation, timing and nuances of the voice.
Avatar can be also animated by text input, if some Text-To-Speech software is used to convert text into voice beforehand. Smile signs ('emoticons') in text are recognized as emotions, and avatars express them.
SeeStorm Lips Synchronization (LipSync) technology is based on Phoneme Recognition combined with phoneme visualization. LipSync speech-to-motion engine provides real-time automatic animation of 3D avatars by human speech: avatars' lips move in sync with the speech, and other mimics accompany the lips. So avatars behave naturally and impressively visualize voice communication.
|