1 AIT Asian Institute of Technology

Named entity recognition in semi-structured texts

AuthorNguyen Cao Hong Ngoc
Call NumberAIT Thesis no.CS-09-10
Subject(s)Name tags

NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science, School of Engineering and Technology
PublisherAsian Institute of Technology
Series StatementThesis ; no. CS-09-10
AbstractIn universities, institutes or any organizations, managing a large amount of documents and extracting useful information is always a big challenge. There are some requirements for automatically extracting name of participants in different kinds of contracts, agreements, and resolutions so that these kinds of documents can be restored and retrieved scientifically. Contracts, agreements, and resolutions are semi-structured documents and are often written in a formal style. Although named entity recognition has been studied for nearly twenty years, there is limited consideration for this domain. In our research, we exploited and combined the features of semi-structured texts and formal writing style when recognizing and classifying named entities. We proposed a semi-supervised learning by combining CRF method, dictionaries and heuristic rules. We have applied our method in MoUI MoA corpus and get acceptable result (F-measure can reach 88.01 %). Besides, the method is flexible in that it can be applied not only for MoUI MoA corpus but also for other semi-structured texts by providing corresponding dictionaries. We hope that with this study's result, perspective AIT students will continue and complete the system which can be used by External Relations and Communications Office, significantly contributing to the development of AIT.
Year2009
Corresponding Series Added EntryAsian Institute of Technology. Thesis ; no. CS-09-10
TypeThesis
SchoolSchool of Engineering and Technology (SET)
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSComputer Science (CS)
Chairperson(s)Janecek, Paul
Examination Committee(s)Haddawy, Peter;Guha, Sumanta
Scholarship Donor(s)Ministry of Education and Training, Vietnam
DegreeThesis (M.Eng.) - Asian Institute of Technology, 2009


Usage Metrics
View Detail0
Read PDF0
Download PDF0