AgentBench
Benchmark framework for evaluating LLM-based autonomous agents
AgentBench is an open-source benchmark framework developed by Tsinghua University to evaluate the performance of LLM-based autonomous agents across diverse interactive environments. It tests agents in operating system tasks, database interactions, web browsing, coding challenges, and game environments to produce standardized performance comparisons. Researchers and AI companies use AgentBench to assess how well their models perform as agents rather than just text generators.
Key Features
- ✓Multi-environment evaluation
- ✓OS task testing
- ✓Web agent benchmarks
- ✓Standardized metrics
- ✓Open-source
- ✓LLM comparison
Quick Info
- Category
- Research & Science
- Pricing
- Free
More Research & Science Tools
Inductiva
Research & ScienceAI-accelerated scientific simulation platform for engineering and research
Farmers Business Network AI
Research & ScienceAI-powered precision agriculture recommendations for farmers
Gatik AI
Research & ScienceAutonomous middle-mile delivery trucks for B2B logistics
Atomica AI
Research & ScienceAI platform for accelerating materials discovery and chemistry research