1
Giving Users What They Ask For: Improving Question Answering by Question Classification, Question Disambiguation and Answer Filtering | |
Author | Maheen Bakhtyar |
Call Number | AIT Diss. no.CS-13-08 |
Subject(s) | Natural language processing (Computer science) |
Note | A dissertation submitted in fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Engineering and Technology |
Publisher | Asian Institute of Technology |
Series Statement | Dissertation ; no. CS-13-08 |
Abstract | The Internet is one of our main sources of information, attracting billions of users wanting to find the answers to their questions. Finding an accurate answer to a question among an enormous set of documents is a challenging task. Users are oftentimes most interested in an exact answer to their question and do not desire to look for the answer in a long list of documents, because it is tedious and time consuming to do so. Getting an exact answer immediately is more convenient and more attractive. Question answering (QA) systems aim to provide accurate answers to questions using various question processing techniques, after which the answer can be extracted from a set of documents. Open-domain question answering is not the same as traditional document retrieval in a search engine, where a set of relevant documents is returned in response to a query. Instead, in a question answering system, the response to the query is a concise and exact answer to the question. Typically, question classification (QC) is the first step in a question answering system. The QC phase is responsible for determining the type of the expected answer, allowing the QA system to prune out extraneous information that is not relevant to extracting the answer. In this thesis, I first present and introduce research on hierarchical question classification. We represent the Li and Roth (2002) hierarchy in such a way that new classes can be added dynamically to the hierarchy. I present and evaluate a methodology to replace the LOC:Other, ENTY:Other, and NUM:Other classes in the existing hierarchy with a new set of dynamically generated classes. Next I present an analysis of using knowledge resources to disambiguate the symptoms information in medical question answering. Single knowledge resource is not sufficient to help assisting the physician in diagnosing the disease based on the patient’s symptoms. I discuss the Disease Ontology,1 an ontology providing the biomedical community with consistent, reusable, and sustainable descriptions of human disease terms, phenotype characteristics, and related medical vocabulary for disease concepts. I present an analysis that assists in predicting the chances of misdiagnoses, due to the ambiguity present in the knowledge resource, that are possible due to overlapping symptoms of related diseases, especially in biomedical question answering systems. Third, I discuss the need of answer filtering of the unrelated answers, especially in case of cooperative query answering systems. A database system will not return results for all queries. Such queries are called failing queries. Under normal circumstances, an empty answer would be returned in response to such queries. Cooperative query answering systems produce generalized and hopefully relevant answers when an exact answer does not exist, by enhancing the query scope and including a broader range of information. Such systems may apply various generalization techniques, also referred to as generalization operators, to relax certain conditions and obtain related answers. These answers will not necessarily be exact, but they may be sufficiently informative answers to contain some of the information that the user needs. I discuss the generalization operators and propose a mechanism to filter out unrelated answers in a set of generalized and expanded results in cooperative query answering systems return only related answers to the user. Unrelated answers are pruned out, and only the related and informative answers are returned to the user. This research addresses several important aspects of question answering including question classification, question disambiguation resulting problem of misdiagnoses in medical QA systems arising due to overlapping symptoms for different diseases. It also addresses some of the issues faced in cooperative query answering in database systems, filtering the unrelated answers and only related answers are returned to the user. |
Year | 2013 |
Corresponding Series Added Entry | Asian Institute of Technology. Dissertation ; no. CS-13-08 |
Type | Dissertation |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Computer Science (CS) |
Chairperson(s) | Matthew N. Dailey; |
Examination Committee(s) | Paul Janecek;Asanee Kawtrakul;Saint-Dizier, Patrick ; |
Scholarship Donor(s) | University of Balochistan (UOB), Quetta, Pakistan;Asian Institute of Technology Fellowship; |
Degree | Thesis (Ph.D.) - Asian Institute of Technology, 2013 |