SWE-bench Verified
Standard benchmark for evaluating AI software engineering capabilities
SWE-bench Verified
Standard benchmark for evaluating AI software engineering capabilities
SWE-bench is the standard academic benchmark for evaluating AI systems' ability to resolve real-world software engineering issues from popular open-source Python repositories. Each task is a GitHub issue with a corresponding test that verifies the fix. SWE-bench Verified is a human-validated subset ensuring high-quality evaluation tasks. The benchmark has driven significant progress in autonomous software engineering and is widely used to compare coding agent capabilities. AI researchers developing coding agents, companies evaluating AI coding tool capabilities, and the software engineering AI community use SWE-bench as the reference point for measuring progress in automated bug fixing and feature implementation.
Key Features
- ✓Real GitHub issues
- ✓Automated evaluation
- ✓Human-validated subset
- ✓Multi-repository
- ✓Open benchmark
Quick Info
- Category
- Code & Development
- Pricing
- Free
More Code & Development Tools
GitHub Copilot
Code & DevelopmentThe AI pair programmer trusted by millions of developers
Cursor
Code & DevelopmentThe code editor built around AI from the ground up
Tabnine
Code & DevelopmentPrivacy-first AI code completion
Codeium
Code & DevelopmentFree AI coding assistant with no usage limits