// 3 challenges · Evals & Testing · hard
Evals & Testing
hardWrite evals for a summarisation model
Design and implement a rigorous evaluation suite for an AI summarisation system. Your evals should catch common failure modes.
~8–16 h·Any language
View →Evals & Testing
hardWrite evals for prompt injection defense
Design evals that detect prompt injection attempts in retrieved context and verify safe behavior from a RAG or tool-using assistant.