1
A multi-modal framework for context-aware plant disease classification and segmentation integrating visual and textual features | |
| Author | Doula, Md Shafi Ud |
| Call Number | AIT Thesis no.DSAI-25-05 |
| Subject(s) | Plant pathology--Data processing |
| Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Data Science and Artificial Intelligence, School of Engineering and Technology |
| Publisher | Asian Institute of Technology |
| Abstract | Plant diseases substantially challenge agricultural productivity and global food security. Hence, better intelligent and interpretable diagnostic frameworks are needed. An auto mated disease identification system can reduce the human effort in checking large farms, and early detection and identification will minimize the loss, which ultimately positively affects the economy. Traditional image-based deep learning models, particularly Convo lutional Neural Networks (CNNs), often struggle to distinguish visually similar diseases due to the absence of contextual information. To address these limitations, we present an innovative multi-modal deep learning framework that effectively combines visual and textual data to improve plant disease classification and segmentation. Initially, the framework incorporates a linguistically enriched Text Encoder, where disease-related descriptions are preprocessed using natural language processing (NLP) techniques to extract salient noun, numerical, adjective, and adverbial features. These refined textual representations are then encoded using a fine-tuned transformer-based language model, capturing domain-specific semantics crucial for disease differentiation. Concurrently, CNN-based Vision Encoder extract discriminative hierarchical features, which are dy namically fused with textual representations via a multi-head attention mechanism, en suring adaptive cross-modal feature alignment. Unlike conventional fusion techniques, our approach learns complex inter-dependencies between textual cues and visual pat terns, enhancing classification accuracy and segmentation precision. Finally, we demon strate our proposed framework’seffectiveness byevaluating it onthePlantDiseaseDiag nosis Multimodal Dataset (PDDM) andachieving state-of-the-art (SOTA) segmentation and classification performance. |
| Year | 2025 |
| Type | Thesis |
| School | School of Engineering and Technology |
| Department | Department of Information and Communications Technologies (DICT) |
| Academic Program/FoS | Data Science and Artificial Intelligence (DSAI) |
| Chairperson(s) | Chutiporn Anutariya |
| Examination Committee(s) | Mongkol Ekpanyapong;Cherdsak Kingkan |
| Scholarship Donor(s) | AIT Scholarship |
| Degree | Thesis (M. Eng.) - Asian Institute of Technology, 2025 |