AIT Asian Institute of Technology

1 AIT Asian Institute of Technology

> > >

Context augmented text generation for AI based research writing application
Author	Shrestha, Amanda Raj
Call Number	AIT Thesis no.DSAI-24-06
Subject(s)	Artificial intelligence--Educational applications Natural language generation (Computer science)
Note	A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Data Science and Artificial Intelligence
Publisher	Asian Institute of Technology
Abstract	The increasing popularity of AI-powered writing tools such as Quillbot, Trinka, Grammarly, Jenni AI, and PaperPal have revolutionized how students and academics approach scien tific writing. While these tools offer features like ”Suggest Text”, ”Copilot”, and ”Auto Writing,” which utilize AI-powered text generation, often, they lack the ability to generate context-specific and research-oriented content. The generic nature of the generated text and the absence of automatic citation capabilities limit the potential of these AI writing assis tants in academic settings. To address these limitations, we propose a Corrective Retrieval Augmented Generation (CRAG) based framework that enhances the context-specificity and control of AI-powered writing assistance. Our framework enables users to upload topic specific files, allowing the AI model to extract relevant context and generate text that aligns with the user’s research intent. Furthermore, we introduce a synthetic dataset that facilitates the incorporation of citation abilities into the text generation process. This research aims to investigate several key questions: (1) Does providing context to the LLM help generate consistent and specific text for academic writing? (2) Which RAG method is best suited for our application (RAG, Self-RAG or CRAG)? (3) Does LORA tuning the models on our dataset generate better text generations compared to just prompting the context to the base line model? (4) Is the proposed framework applicable as a real application for research writing? To achieve these objectives, we created a context-augmented synthetic text genera tion dataset derived from the SCIXGEN dataset. We trained our LLM using LORA adapters on this synthetic dataset and implemented the RAG frameworks using the trained LLM. Ad ditionally, we incorporated context extraction from academic APIs (e.g., arXiv) when the similarity score threshold was low to mitigate the hallucination problem. The evaluation of our framework was conducted using the BLEU score, the BERTscore, and human evaluation to assess the quality and applicability of the generated text.Our results indicate that LORA-Tuning is not the most effective way of making the LLM learn cite a given context and the large language models are capable enough to integrate the contextual information in the form of prompts. Our human evaluation results show that providing contextual information to the LLM does increase the specificity of the generated text. We also find that Corrective RAG method would be the most suitable RAG variant for our application. This research holds significant implications for enhancing the capabilities of AI-powered writing assistants in academic settings, ultimately benefiting students and researchers by providing more accurate, context-specific, and citation-rich writing support.
Year	2024
Type	Thesis
School	School of Engineering and Technology
Department	Department of Information and Communications Technologies (DICT)
Academic Program/FoS	Data Science and Artificial Intelligence (DSAI)
Chairperson(s)	Chaklam Silpasuwanchai;
Examination Committee(s)	Chantri Polprasert;Mongkol Ekpanyapong;
Scholarship Donor(s)	AIT Scholorship;
Degree	Thesis (M. Sc.) - Asian Institute of Technology, 2024