AIT Asian Institute of Technology

1 AIT Asian Institute of Technology

> > >

Tone recognition of speech using hidden Markov models
Author	Hasan, Md. Khairul
Call Number	AIT Thesis no.TC-97-15
Subject(s)	Voice frequency Automatic speech recognition Markov processes
Note	A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering.
Publisher	Asian Institute of Technology
Series Statement	Thesis ; no. TC-97-15
Abstract	The variation of fundamental frequency (Fo) over the duration of a syllable is commonly referred to as tone. In tone languages, a complete speech recognition system requires a tone recognition subsystem to properly recognize the meaning of a syllable. Here, in this thesis work, a tone recognition system based on hidden Markov model technology is implemented successfully. By using subharmonic summation algorithm pitch frequency and corresponding peak of subharmonic sum are calculated form the speech after every 10 ms. The peak of subharmonic sum in the voiced portion of a syllable is considerably higher than in the unvoiced portion. Therefore, pitch contour corresponding to the voiced portion is separated from the unvoiced part by utilizing the peak of subharmonic sum. Z-score normalization technique is applied to the resulting pitch contour to remove inter- and intra-speaker variation in the pitch values. A sequence of observation vectors is generated form the pitch contour, and fed to the trained HMMs for final tone identification. As a case study for Thai language, a total of five hidden Markov models are trained for five tones in Thai language. To compromise between accuracy and computational complexity semi-continuous HMMs are used. Six isolated syllables, each with five tones and spoken by two males and two female speakers are used to train the reference models. Recognition performances of these reference models are evaluated for twelve different syllables each with five tones and spoken by four speakers different from those who trained the models. A comparison is made between 3-state, 4-state and 5-state HMMs. The initial test in MATLAB environment is showing recognition accuracy of 97.3%, 99% and 97.8% respectively for 3, 4 and 5-state hidden Markov models. Finally, the complete tone recognition system is implemented in real-time environment by using TMS320C30 digital signal processor.
Year	1997
Corresponding Series Added Entry	Asian Institute of Technology.\|tThesis ; no. TC-97-15
Type	Thesis
School	School of Engineering and Technology (SET)
Department	Department of Information and Communications Technologies (DICT)
Academic Program/FoS	Telecommunications (TC)
Chairperson(s)	Makelainen, Kimmo
Examination Committee(s)	Ahmed, Kazi Mohiuddin;Rajatheva, RM.AP.
Scholarship Donor(s)	Government of Japan.
Degree	Thesis (M.Eng.) - Asian Institute of Technology, 1997