Progressive Layer Activation CLIP for Few-Shot and Generalizable Cassava Disease Recognition
Abstract
Cassava diseases such as Cassava Mosaic Disease (CMD), Cassava Brown Streak Disease (CBSD), and Cassava Bacterial Blight (CBB) pose serious threats to global food security, particularly in resource-limited regions where expert diagnosis is scarce. Although large vision–language models enable automated plant disease recognition, existing fine-tuning approaches struggle under extreme data scarcity. This paper proposes Progressive Layer Activation CLIP (PLA-CLIP), a curriculum-inspired fine-tuning framework for efficient few-shot classification of cassava diseases. PLA-CLIP progressively unfreezes transformer layers during training, stabilizing the optimization process while preserving pretrained vision–language alignment. Using only 43 images per class, PLA-CLIP achieves 78.25% accuracy and a 78.00% F1-weighted score on CD1, outperforming zero-shot CLIP by +15.98% and standard fine-tuning by +3.94%. Cross-dataset evaluations on CD2 and CD3 demonstrate robust generalization across varying conditions. Attention map visualizations confirm that the model focuses on disease-relevant regions, supporting interpretability. With a 2.65 ms inference time and moderate model size, PLA-CLIP offers an effective balance between efficiency and performance for practical plant health monitoring. The implementation and experimental code are publicly available at https://github.com/ mshafay5/PLA-CLIP.
Related articles
Related articles are currently not available for this article.