AIT Asian Institute of Technology

1 AIT Asian Institute of Technology

> > >

A multi-modal framework for context-aware plant disease classification and segmentation integrating visual and textual features
Author	Doula, Md Shafi Ud
Call Number	AIT Thesis no.DSAI-25-05
Subject(s)	Plant pathology--Data processing
Note	A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence, School of Engineering and Technology
Publisher	Asian Institute of Technology
Abstract	Plant diseases substantially challenge agricultural productivity and global food security. Hence, better intelligent and interpretable diagnostic frameworks are needed. An auto mated disease identification system can reduce the human effort in checking large farms, and early detection and identification will minimize the loss, which ultimately positively affects the economy. Traditional image-based deep learning models, particularly Convo lutional Neural Networks (CNNs), often struggle to distinguish visually similar diseases due to the absence of contextual information. To address these limitations, we present an innovative multi-modal deep learning framework that effectively combines visual and textual data to improve plant disease classification and segmentation. Initially, the framework incorporates a linguistically enriched Text Encoder, where disease-related descriptions are preprocessed using natural language processing (NLP) techniques to extract salient noun, numerical, adjective, and adverbial features. These refined textual representations are then encoded using a fine-tuned transformer-based language model, capturing domain-specific semantics crucial for disease differentiation. Concurrently, CNN-based Vision Encoder extract discriminative hierarchical features, which are dy namically fused with textual representations via a multi-head attention mechanism, en suring adaptive cross-modal feature alignment. Unlike conventional fusion techniques, our approach learns complex inter-dependencies between textual cues and visual pat terns, enhancing classification accuracy and segmentation precision. Finally, we demon strate our proposed framework’seffectiveness byevaluating it onthePlantDiseaseDiag nosis Multimodal Dataset (PDDM) andachieving state-of-the-art (SOTA) segmentation and classification performance.
Year	2025
Type	Thesis
School	School of Engineering and Technology
Department	Department of Information and Communications Technologies (DICT)
Academic Program/FoS	Data Science and Artificial Intelligence (DSAI)
Chairperson(s)	Chutiporn Anutariya
Examination Committee(s)	Mongkol Ekpanyapong;Cherdsak Kingkan
Scholarship Donor(s)	AIT Scholarship
Degree	Thesis (M. Eng.) - Asian Institute of Technology, 2025