1
Tone recognition of speech using hidden Markov models | |
Author | Hasan, Md. Khairul |
Call Number | AIT Thesis no.TC-97-15 |
Subject(s) | Voice frequency Automatic speech recognition Markov processes |
Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering. |
Publisher | Asian Institute of Technology |
Series Statement | Thesis ; no. TC-97-15 |
Abstract | The variation of fundamental frequency (Fo) over the duration of a syllable is commonly referred to as tone. In tone languages, a complete speech recognition system requires a tone recognition subsystem to properly recognize the meaning of a syllable. Here, in this thesis work, a tone recognition system based on hidden Markov model technology is implemented successfully. By using subharmonic summation algorithm pitch frequency and corresponding peak of subharmonic sum are calculated form the speech after every 10 ms. The peak of subharmonic sum in the voiced portion of a syllable is considerably higher than in the unvoiced portion. Therefore, pitch contour corresponding to the voiced portion is separated from the unvoiced part by utilizing the peak of subharmonic sum. Z-score normalization technique is applied to the resulting pitch contour to remove inter- and intra-speaker variation in the pitch values. A sequence of observation vectors is generated form the pitch contour, and fed to the trained HMMs for final tone identification. As a case study for Thai language, a total of five hidden Markov models are trained for five tones in Thai language. To compromise between accuracy and computational complexity semi-continuous HMMs are used. Six isolated syllables, each with five tones and spoken by two males and two female speakers are used to train the reference models. Recognition performances of these reference models are evaluated for twelve different syllables each with five tones and spoken by four speakers different from those who trained the models. A comparison is made between 3-state, 4-state and 5-state HMMs. The initial test in MATLAB environment is showing recognition accuracy of 97.3%, 99% and 97.8% respectively for 3, 4 and 5-state hidden Markov models. Finally, the complete tone recognition system is implemented in real-time environment by using TMS320C30 digital signal processor. |
Year | 1997 |
Corresponding Series Added Entry | Asian Institute of Technology.|tThesis ; no. TC-97-15 |
Type | Thesis |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Telecommunications (TC) |
Chairperson(s) | Makelainen, Kimmo |
Examination Committee(s) | Ahmed, Kazi Mohiuddin;Rajatheva, RM.AP. |
Scholarship Donor(s) | Government of Japan. |
Degree | Thesis (M.Eng.) - Asian Institute of Technology, 1997 |