Skip to main content
📊

AgentBench

Benchmark framework for evaluating LLM-based autonomous agents

Research & Science
AgentBench logo

AgentBench

Benchmark framework for evaluating LLM-based autonomous agents

AgentBench is an open-source benchmark framework developed by Tsinghua University to evaluate the performance of LLM-based autonomous agents across diverse interactive environments. It tests agents in operating system tasks, database interactions, web browsing, coding challenges, and game environments to produce standardized performance comparisons. Researchers and AI companies use AgentBench to assess how well their models perform as agents rather than just text generators.

Key Features

  • Multi-environment evaluation
  • OS task testing
  • Web agent benchmarks
  • Standardized metrics
  • Open-source
  • LLM comparison
#ai-agents#benchmarking#research#evaluation#open-source

Get Started

Visit AgentBench
🟢
Free
Completely free to use

Quick Info

Category
Research & Science
Pricing
Free

More Research & Science Tools