Direct Feature Identification from Raman Spectra and Precise Data-driven Classification of Phytopathogens at Single Conidium-Species Level
Abstract
Conidia can cause outbreaks of fungal plant diseases, resulting in significant losses in crop yield and economy. Traditional diagnosis of phytopathogenic conidia based on morphology and molecular biology is time-consuming, labor-intensive, and often fails to differentiate fungal conidia. To overcome these challenges, a new classification approach was developed in this study by integrating Raman spectroscopy with data-driven modeling. Eight fungal species were selected and characterized using Raman spectra. Three characteristic Raman wavenumbers at 1003 to 1005 cm-1, 1153 to 1157 cm-1, and 1515 to 1522 cm-1shared a consistent pattern across species and could be attributed to carotenoids. Clustering of the Raman spectra using principal component analysis (PCA) showed substantial overlap, indicating inaccurate classification of conidia. Three data-driven models, support vector machines (SVMs), decision trees (DTs), and eXtreme Gradient Boosting Forest (XGBoost) were trained with three categories of features (number of peaks, maximum peak, and curve roughness) identified within eight characteristic wavenumber ranges, The optimal SVM, DT, and XGBoost determined by hyperparameter tuning achieved prediction precision of 0.88, 0.88, and 0.96, respectively. PCA-XGBoost trained by feeding principal components of PCA to XGBoost achieved prediction precision of 0.94, suggesting that features extracted from the raw datasets outperformed those extracted with PCA in terms of data-driven classification. This study has demonstrated the great potential of Raman spectroscopy combined with data-driven modeling for classification of phytopathogenic conidia.
Related articles
Related articles are currently not available for this article.