Creation, Evaluation and Self-Validation of Simulation Models with Large Language Models

Tobias Möltner
Peter Manzl
Michael Pieber
Johannes Gerstmayr

0 evaluations Published on May 2, 2025

This article on Sciety

Abstract

Engineering tasks are significantly underrepresented in current large language model (LLM) datasets and research, despite their complexity and practical importance. These tasks often demand a deep mathematical and involve a combination of textual descriptions, visual representations, and numerical data. Moreover, engineering frequently relies on accepted approximations and models rather than exact values. Therefore, the present paper advances the integration of LLMs into mechanical engineering by introducing a comprehensive framework for automated simulation model generation and validation. The framework is designed as a benchmark and focuses on mechanical engineering problems in dynamics in particular on multibody dynamics simulation models in Python. It allows for the creation of a large number of test cases due to its use of parametrized models with ground truth solutions, allowing evaluation for executability and correctness. Lastly, LLM-agents are employed to generate simulation models and perform self-evaluation through a predefined set of validation methods, assessing models for parametrization errors. Evaluation results using classical F-score metrics demonstrate that most tested LLMs identify a majority of incorrect models, while the best-performing model achieves high accuracy in differing between correct and wrong simulation models.

Related articles are currently not available for this article.