Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

The assessment of Large Language Models (LLMs) has traditionally focused on performance metrics tied directly to their task-solving capabilities. This paper introduces a novel benchmark explicitly designed to measure personality traits in LLMs through scenario-based interpretive prompts. We detail the methodology behind this benchmark, where LLMs are presented with structured prompts inspired by psychological scenarios, and responses are assessed via a judge LLM. The evaluation encompasses traits such as emotional stability, creativity, adaptability, and anxiety levels, among others. Scores are assigned based on a judge LLM’s evaluation, with consistency across various judge models assessed through consensus analysis. Anecdotal observations on score validity and orthogonality with conventional performance metrics are discussed. Results, implementation scripts, and updated leaderboards are publicly accessible at https://github.com/fit-alessandro-berti/llm-dreams-benchmark

Related articles

Related articles are currently not available for this article.