1
Named entity recognition in semi-structured texts | |
Author | Nguyen Cao Hong Ngoc |
Call Number | AIT Thesis no.CS-09-10 |
Subject(s) | Name tags |
Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science, School of Engineering and Technology |
Publisher | Asian Institute of Technology |
Series Statement | Thesis ; no. CS-09-10 |
Abstract | In universities, institutes or any organizations, managing a large amount of documents and extracting useful information is always a big challenge. There are some requirements for automatically extracting name of participants in different kinds of contracts, agreements, and resolutions so that these kinds of documents can be restored and retrieved scientifically. Contracts, agreements, and resolutions are semi-structured documents and are often written in a formal style. Although named entity recognition has been studied for nearly twenty years, there is limited consideration for this domain. In our research, we exploited and combined the features of semi-structured texts and formal writing style when recognizing and classifying named entities. We proposed a semi-supervised learning by combining CRF method, dictionaries and heuristic rules. We have applied our method in MoUI MoA corpus and get acceptable result (F-measure can reach 88.01 %). Besides, the method is flexible in that it can be applied not only for MoUI MoA corpus but also for other semi-structured texts by providing corresponding dictionaries. We hope that with this study's result, perspective AIT students will continue and complete the system which can be used by External Relations and Communications Office, significantly contributing to the development of AIT. |
Year | 2009 |
Corresponding Series Added Entry | Asian Institute of Technology. Thesis ; no. CS-09-10 |
Type | Thesis |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Computer Science (CS) |
Chairperson(s) | Janecek, Paul |
Examination Committee(s) | Haddawy, Peter;Guha, Sumanta |
Scholarship Donor(s) | Ministry of Education and Training, Vietnam |
Degree | Thesis (M.Eng.) - Asian Institute of Technology, 2009 |