📊

AgentBench

Benchmark framework for evaluating LLM-based autonomous agents

Research & Science

AgentBench

Benchmark framework for evaluating LLM-based autonomous agents

Research & ScienceFree

AgentBench is an open-source benchmark framework developed by Tsinghua University to evaluate the performance of LLM-based autonomous agents across diverse interactive environments. It tests agents in operating system tasks, database interactions, web browsing, coding challenges, and game environments to produce standardized performance comparisons. Researchers and AI companies use AgentBench to assess how well their models perform as agents rather than just text generators.