1 AIT Asian Institute of Technology

Recognition of syllables in tone languages

AuthorTanee Demeechai
Call NumberAIT Diss. no.TC-00-01
Subject(s)Tone (Phonetics)
Speech perception
Speech processing systems

NoteA dissertation submitted in partial fulfilment of the requirements the Degree of Doctor of Engineering, School of Engineering and Technology
PublisherAsian Institute of Technology
Series StatementDissertation ; no. TC-00-01
AbstractSpcech recognition of tone languages requires detection of the tone in addition to detection of the consonants and vowels of a syllable. Two approaches for recognition of tonal syllables have been proposed in the literature: joint detection and sequential detection. In joint detection, recognition is done by employing a hidden Markov model (HMM) of connected tonal syllasbles, in which the pitch and its time derivative are included into the feaure vector in addition to the phonetic features. In sequential detection, base syllables (syllables ignoring their tones) are recognized by using a HMM of connected base syllables only; the estimated syllable boundaries are then used for subsequent tone recognition in a separate HMM of tones. Joint detection performs better than sequential detection, but its computational complexity is higher. In this thesis, a new approach caled linked detection is proposed to achieve performance close to that of joint detection with computational complexity close to that of sequential detection. In linked detection, the recognition in the HMM of connected base sykkabkes is modified to periodically take into account also tonal likelihood computed form a HMM of tones. Likeed detection can provide performance that is comparable to the performance of joint detection and superior to that of sequential detection. For a large vocabulary task, the computational complexity of linked detection is much lower than that of joint detection while it is only slightly higher than that of sequential detection. In our experiment on recognition of 173 Thai-language syllables, the worst recognition rate obtained from sequential detection is 72% This result is comparable to results of the IBM Mandarin Call Home system (LIU et al., 1996), where a syllable recognition rate of 50% is reported for an experiment, in which the speech data are conversational telephone speech data and a word-pair language model is applied so that the perplexity is 15.
Year2000
Corresponding Series Added EntryAsian Institute of Technology. Dissertation ; no. TC-00-01
TypeDissertation
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSTelecommunications (TC)
Chairperson(s)Makelainen, Kimmo;
Examination Committee(s)Sadanada, Ramakoti;Ahmed, Kazi M.;Rajatheva, R. M. A. P.;Lee, Chin-Hui;
Scholarship Donor(s)Royal Thai Government (RTG);
DegreeThesis (Ph.D.) - Asian Institute of Technology


Usage Metrics
View Detail0
Read PDF0
Download PDF0