1
Recognition of syllables in tone languages | |
Author | Tanee Demeechai |
Call Number | AIT Diss. no.TC-00-01 |
Subject(s) | Tone (Phonetics) Speech perception Speech processing systems |
Note | A dissertation submitted in partial fulfilment of the requirements the Degree of Doctor of Engineering, School of Engineering and Technology |
Publisher | Asian Institute of Technology |
Series Statement | Dissertation ; no. TC-00-01 |
Abstract | Spcech recognition of tone languages requires detection of the tone in addition to detection of the consonants and vowels of a syllable. Two approaches for recognition of tonal syllables have been proposed in the literature: joint detection and sequential detection. In joint detection, recognition is done by employing a hidden Markov model (HMM) of connected tonal syllasbles, in which the pitch and its time derivative are included into the feaure vector in addition to the phonetic features. In sequential detection, base syllables (syllables ignoring their tones) are recognized by using a HMM of connected base syllables only; the estimated syllable boundaries are then used for subsequent tone recognition in a separate HMM of tones. Joint detection performs better than sequential detection, but its computational complexity is higher. In this thesis, a new approach caled linked detection is proposed to achieve performance close to that of joint detection with computational complexity close to that of sequential detection. In linked detection, the recognition in the HMM of connected base sykkabkes is modified to periodically take into account also tonal likelihood computed form a HMM of tones. Likeed detection can provide performance that is comparable to the performance of joint detection and superior to that of sequential detection. For a large vocabulary task, the computational complexity of linked detection is much lower than that of joint detection while it is only slightly higher than that of sequential detection. In our experiment on recognition of 173 Thai-language syllables, the worst recognition rate obtained from sequential detection is 72% This result is comparable to results of the IBM Mandarin Call Home system (LIU et al., 1996), where a syllable recognition rate of 50% is reported for an experiment, in which the speech data are conversational telephone speech data and a word-pair language model is applied so that the perplexity is 15. |
Year | 2000 |
Corresponding Series Added Entry | Asian Institute of Technology. Dissertation ; no. TC-00-01 |
Type | Dissertation |
School | School of Engineering and Technology |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Telecommunications (TC) |
Chairperson(s) | Makelainen, Kimmo; |
Examination Committee(s) | Sadanada, Ramakoti;Ahmed, Kazi M.;Rajatheva, R. M. A. P.;Lee, Chin-Hui; |
Scholarship Donor(s) | Royal Thai Government (RTG); |
Degree | Thesis (Ph.D.) - Asian Institute of Technology |