Comparative Analysis of Convolutional and Vision Transformer Models for Automated Leukocyte Classification Enhanced by Generative Color Augmentation
Abstract
Manual differential leukocyte counting is a critical but time-consuming and subjective process. This study provides a rigorous comparative analysis of You OnlyLook Once v11 (YOLOv11) and Vision Transformer (ViT) architectures for classifying 14 types of leukocytes and artifacts from a private clinical dataset. We also evaluated the impact of HistAuGAN, a data augmentation technique that simulates real world staining variability, by training and evaluating models from both families with and without its application. The results demonstrated the consistent superiority of the ViT architecture over YOLOv11 in all experimental settings. Furthermore, the use of HistAuGAN promoted a universal and significant performance improvement across all tested models. The top performing configuration, the ViT-Base model trained with HistAuGAN, achieved a macro F1-Score of98.36% and an overall accuracy of 99.75% on the test set. We conclude that the synergy between an architecture capable of learning global features (ViT) and adomain-specific data augmentation technique that addresses practical challenges represents a state-of-the-art strategy, establishing a new performance benchmark for high granularity leukocyte classification and reinforcing the potentialof artificial intelligence to transform diagnostic hematology.
Related articles
Related articles are currently not available for this article.