
Achieving an 83.4% Fix Rate on SWE-bench Verified with Runtime Facts
In our latest SWE-bench Verified tests, we validated a new AI debugging paradigm: systematic debugging based on Runtime Facts. By introducing a dynamic tracing mechanism into the Live-SWE-agent architecture to provide the model with runtime context, we achieved a theoretical combined fix rate of 83.4% using the Google Gemini 3 Pro model, marking the highest known performance on the SWE-bench Verified evaluation to date.









