Systematic Survey of Public Datasets for Behavioral Research in Invertebrate Models: Toward FAIR and Standardized Data Sharing
Abstract
Behavioral datasets for invertebrate model organisms are rapidly expanding alongside automated imaging, tracking, and artificial intelligence AI-based phenotyping, yet their technical structure and compliance with the Findable, Accessible, Interoperable and Reusable (FAIR) principles remain heterogeneous. We present a two-stage survey of openly available behavioural datasets for major invertebrate models Caenorhabditis elegans (C. elegans) , Drosophila melanogaster (D. melanogaster) , Galleria mellonella ( G. mellonella ), and planarians Schmidtea mediterranea ( S. mediterranea ) with larval zebrafish ( Danio rerio) included as a vertebrate comparator. Stage 1 comprised a PRISMA-guided literature review (from 2015 to 2025) across indexed databases and complementary non-indexed sources, yielding 12 eligible publications describing 12 open behavioural datasets. Stage 2 independently screened and technically evaluated repository deposits (from June 2022 to July 2025), producing a final corpus of 20 datasets scored on a four-dimension ordinal rubric capturing usability, annotation richness, technical quality and AI-readiness. All extracted descriptors, repository search logs, and scoring sheets are released as public data records enabling full regeneration of figures and summary statistics. Across Stage 2 deposits, multimodality and open file formats were common, whereas interoperability and AI-readiness were most constrained by limited machine-readable metadata, weak raw-to-derived provenance, and sparse adoption of formal standards or ontologies. This Data Descriptor provides a reproducible, dataset-centred overview of behavioural resources for invertebrate models and practical guidance for FAIR-aligned publication, secondary biological analyses, and AI benchmarking.
Related articles
Related articles are currently not available for this article.