1 AIT Asian Institute of Technology

Multi-medical document summarization using XX masking approach

AuthorKoirala, Ayush
Call NumberAIT Thesis no.DSAI-23-12
Subject(s)Natural language processing (Computer science)
Information retrieval
NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence
PublisherAsian Institute of Technology
AbstractMulti-document summarization is essential for capturing key information from vast medical literature. Navigating the medical domain faces significant challenges due to the vastness and complexity of medical literature datasets. The specific meanings of medical keywords and their clinical importance further amplify this difficulty. How ever, existing summarization methods employs token probability marginalization, en counter critical challenges. This technique, by averaging token probabilities to deter mine their relevance, may fail to fully capture the intricate details of medical texts, po tentially leading to inaccuracies or misrepresentations, particularly of less common med ical terms. Addressing these limitations, we propose an masking approach specifically designed to effectively select candidate sentences from the masking background infor mation. This approach commences with the selection of candidate documents utilizing Dense Passage Retrieval (DPR) and then, to select candidate sentences based on back ground. We experimented comprehensive analysis of five distinct masking techniques, applied at three varying masking ratios, and assesses their effectiveness across four different BART model sizes. Significantly, our experiments demonstrated that employing a TF-IDF (Term Frequency-Inverse Document Frequency) based background masking strategy at a 15% masking ratio, particularly when fine-tuned on the BART-LARGE CNN model, yielded the highest Rouge scores. This performance surpasses previous benchmarks established on the MS2 dataset, thereby underscoring the efficacy of our proposed approach in enhancing the quality of multi-document summarization in the medical domain.
Year2023
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSData Science and Artificial Intelligence (DSAI)
Chairperson(s)Chaklam Silpasuwanchai
Examination Committee(s)Chantri Polprasert;Attaphongse Taparugssanagorn
Scholarship Donor(s)AIT Fellowship
DegreeThesis (M. Eng.) - Asian Institute of Technology, 2023


Usage Metrics
View Detail0
Read PDF0
Download PDF0