1
Evaluating the effectiveness of truncation and extractive approaches in text summarization | |
Author | Pranisaa Charnparttaravanit |
Call Number | AIT Thesis no.DSAI-22-06 |
Subject(s) | Natural language processing (Computer science) Computational linguistics Semantics--Data processing |
Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence |
Publisher | Asian Institute of Technology |
Abstract | Transformer-based models still struggle to accommodate long inputs due to high memory requirement of full self-attention mechanism. Such long inputs are normally truncated as suming that important information is located at particular location of the document, which obviously does not make much sense. One promising approach is document extraction i.e. selecting important sentences based on lexicon or semantic overlaps. This study evaluates different extraction approaches such as luhn’s algorithm, latent semantic analysis, textrank and k-means clustering on sentence embedding and compared the results to truncation ap proaches. In addition, we investigated whether these approaches were robust when order of sentences in the document is randomly shuffled. The results showed that extraction ap proaches outperformed truncation approaches in shuffled condition. Among the extraction approaches, textrank and luhn achieved the best performance. In contrast, truncation ap proaches generally outperformed extraction approaches in unshuffled condition, suggest ing that truncation approaches might be suitable when location of important information is known in prior. Further discussion and implications were made. This study comprehensively evaluates and compares extraction approaches which can be applied to existing summarization systems. |
Year | 2022 |
Type | Thesis |
School | School of Engineering and Technology |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Data Science and Artificial Intelligence (DSAI) |
Chairperson(s) | Chaklam Silpasuwanchai; |
Examination Committee(s) | Dailey, Matthew N.;Mongkol Ekpanyapong; |
Scholarship Donor(s) | Royal Thai Government Fellowship; |
Degree | Thesis (M. Eng.) - Asian Institute of Technology, 2022 |