Skip to main content

Challenges Jobs Leaderboard For companies

// Challenge library

Challenges

Complete a challenge to add a verified artifact to your public profile. Build in any language — evaluated on behaviour, not syntax.

RAG Pipelines AI Agents MCP Servers Coding Agents Evals & Testing AI Tool Proficiency

Difficultyeasy medium hard

// 3 challenges · Evals & Testing · hard

Evals & Testing

Write evals for a summarisation model

Design and implement a rigorous evaluation suite for an AI summarisation system. Your evals should catch common failure modes.

~8–16 h·Any language

Evals & Testing

Write evals for prompt injection defense

Design evals that detect prompt injection attempts in retrieved context and verify safe behavior from a RAG or tool-using assistant.

© 2026 TryCrucible

~8–16 h·Any language

View →

Evals & Testing

Write evals for RAG answer faithfulness

Design evals that detect unsupported, contradicted, and partially supported answers from a RAG system.

~8–16 h·Any language