📊

SWE-bench Verified

Standard benchmark for evaluating AI software engineering capabilities

Code & Development

SWE-bench Verified

Standard benchmark for evaluating AI software engineering capabilities

Code & DevelopmentFree

SWE-bench is the standard academic benchmark for evaluating AI systems' ability to resolve real-world software engineering issues from popular open-source Python repositories. Each task is a GitHub issue with a corresponding test that verifies the fix. SWE-bench Verified is a human-validated subset ensuring high-quality evaluation tasks. The benchmark has driven significant progress in autonomous software engineering and is widely used to compare coding agent capabilities. AI researchers developing coding agents, companies evaluating AI coding tool capabilities, and the software engineering AI community use SWE-bench as the reference point for measuring progress in automated bug fixing and feature implementation.