Skip to main content
📊

SWE-bench Verified

Standard benchmark for evaluating AI software engineering capabilities

Code & Development
SWE-bench Verified logo

SWE-bench Verified

Standard benchmark for evaluating AI software engineering capabilities

SWE-bench is the standard academic benchmark for evaluating AI systems' ability to resolve real-world software engineering issues from popular open-source Python repositories. Each task is a GitHub issue with a corresponding test that verifies the fix. SWE-bench Verified is a human-validated subset ensuring high-quality evaluation tasks. The benchmark has driven significant progress in autonomous software engineering and is widely used to compare coding agent capabilities. AI researchers developing coding agents, companies evaluating AI coding tool capabilities, and the software engineering AI community use SWE-bench as the reference point for measuring progress in automated bug fixing and feature implementation.

Key Features

  • Real GitHub issues
  • Automated evaluation
  • Human-validated subset
  • Multi-repository
  • Open benchmark
#benchmark#evaluation#coding#research#autonomous-engineering

Get Started

Visit SWE-bench Verified
🟢
Free
Completely free to use

Quick Info

Category
Code & Development
Pricing
Free

More Code & Development Tools