Learning the syntax of plant assemblages
Abstract
To address the urgent biodiversity crisis, it is crucial to understand the nature of plant assemblages. The distribution of plant species is not only shaped by their broad environmental requirements, but also by micro-environmental conditions, dispersal limitations, and direct and indirect species interactions. While predicting species composition and habitat identity is essential for conservation and restoration purposes, it thus remains challenging. In this study, we propose a novel approach inspired by advances in large language models to learn the "syntax" of abundance-ordered plant species sequences in communities. Our method, which captures latent associations between species across diverse ecosystems, can be fine-tuned for diverse tasks. In particular, we show that our methodology is able to outperform other approaches to (i) predict species that might occur in an assemblage given the other listed species, despite being originally missing in the species list (+16.53% compared to co-occurrence matrices and +6.56% compared to neural networks) and (ii) classify habitat types from species assemblages (+5.54% compared to expert systems and +1.14% compared to deep learning). The proposed application has a vocabulary that covers over ten thousand plant species from Europe and adjacent countries and provides a powerful methodology for improving biodiversity mapping, restoration, and conservation biology.
Related articles
Related articles are currently not available for this article.