1 AIT Asian Institute of Technology

A multi-modal framework for context-aware plant disease classification and segmentation integrating visual and textual features

AuthorDoula, Md Shafi Ud
Call NumberAIT Thesis no.DSAI-25-05
Subject(s)Plant pathology--Data processing
NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence, School of Engineering and Technology
PublisherAsian Institute of Technology
AbstractPlant diseases substantially challenge agricultural productivity and global food security. Hence, better intelligent and interpretable diagnostic frameworks are needed. An auto mated disease identification system can reduce the human effort in checking large farms, and early detection and identification will minimize the loss, which ultimately positively affects the economy. Traditional image-based deep learning models, particularly Convo lutional Neural Networks (CNNs), often struggle to distinguish visually similar diseases due to the absence of contextual information. To address these limitations, we present an innovative multi-modal deep learning framework that effectively combines visual and textual data to improve plant disease classification and segmentation. Initially, the framework incorporates a linguistically enriched Text Encoder, where disease-related descriptions are preprocessed using natural language processing (NLP) techniques to extract salient noun, numerical, adjective, and adverbial features. These refined textual representations are then encoded using a fine-tuned transformer-based language model, capturing domain-specific semantics crucial for disease differentiation. Concurrently, CNN-based Vision Encoder extract discriminative hierarchical features, which are dy namically fused with textual representations via a multi-head attention mechanism, en suring adaptive cross-modal feature alignment. Unlike conventional fusion techniques, our approach learns complex inter-dependencies between textual cues and visual pat terns, enhancing classification accuracy and segmentation precision. Finally, we demon strate our proposed framework’seffectiveness byevaluating it onthePlantDiseaseDiag nosis Multimodal Dataset (PDDM) andachieving state-of-the-art (SOTA) segmentation and classification performance.
Year2025
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSData Science and Artificial Intelligence (DSAI)
Chairperson(s)Chutiporn Anutariya
Examination Committee(s)Mongkol Ekpanyapong;Cherdsak Kingkan
Scholarship Donor(s)AIT Scholarship
DegreeThesis (M. Eng.) - Asian Institute of Technology, 2025


Usage Metrics
View Detail0
Read PDF0
Download PDF0