Agent Evals: Task completion rate, trajectory evaluation, GAIA, SWE-bench Channel: Engineering Insider20 views • 3 weeks agoRelated VideosAgentPerf — Trajectory-replay benchmarking (agents per megawatt)Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary