Expert-Grounded Automatic Prompt Engineering for Extracting Lattice Constants of High-Entropy Alloys from Scientific Publications using Large Language Models

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Large language models (LLMs) have shown promise for scientific data extraction from publications, but rely on manual prompt refinement. We present an expert-grounded automatic prompt optimization framework that enhances LLM entity extraction reliability. Using high-entropy alloy lattice constant extraction as a testbed, we optimized prompts for Claude 3.5 Sonnet through feedback cycles on seven expert-annotated publications. Despite a modest optimization budget, recall improved from 0.27 to > 0.9, demonstrating that a small, expert-curated dataset can yield significant improvements. The approach was applied to extract lattice constants from 2,267 publications, yielding data for 1,861 compositions. The optimized prompt transferred effectively to newer models: Claude 4.5 Sonnet , GPT-5, and Gemini 2.5 Flash. Analysis revealed three categories of LLM mistakes: contextual hallucination, semantic misinterpretation, and unit conversion errors, emphasizing the need for validation protocols. These results establish feedback-guided prompt optimization as a low-cost, transferable methodology 1 for reliable scientific data extraction, providing a scalable pathway for complex LLM-assisted research tasks.

Related articles

Related articles are currently not available for this article.