AI Challenge Platform for Engineers

Resumes tell you what
someone claims. TryCrucible
shows you what they've built.

Complete hands-on AI challenges — RAG pipelines, agents, evals, MCP servers. Get AI-scored across 6 dimensions. Earn a verified artifact on your public profile.

Browse challenges →Create free account

Challenge categories

Difficulty levels

AI scoring dimensions

88+

Expert-reviewed scores

// Challenge categories

Every domain of modern AI engineering

Pick the skill set you want to prove. Challenges range from easy to hard across six categories.

RAG Pipelines

Retrieval-augmented generation systems

↗

AI Agents

Autonomous multi-step agent loops

↗

MCP Servers

Model Context Protocol integrations

↗

Coding Agents

AI-assisted code generation workflows

↗

Evals & Testing

LLM evaluation frameworks & harnesses

↗

AI Tool Proficiency

Claude Code, Cursor, Copilot mastery

↗

// How it works

From challenge to verified portfolio

Four steps. Fully evaluated. Permanently yours.

Pick a challenge

Browse RAG pipelines, agents, MCP servers, evals, and more. Each challenge ships with a real dataset and a scoped LLM key.

Build locally & submit

Work in your own environment. Submit a public GitHub repo plus a brief decisions doc. We clone and run it against real test inputs.

Get AI-scored

Our AI evaluates across 6 dimensions. Scores above 88 get human expert review. Your artifact lives on your public profile permanently.

Get discovered

Companies browse verified profiles and reach out directly. No resume needed — your code speaks for itself.

// Score integrity

Scores you can actually trust

Every submission passes through a multi-layer verification system before a score is finalised.

🎲

Personalised datasets

Each submission receives a unique dataset variant seeded per candidate. No two candidates solve the exact same problem, making copy-paste useless.

⏱️

Timing analysis

We record the moment an LLM key is issued and compare it against the submission timestamp. Suspiciously fast completions are automatically flagged.

🔎

Similarity detection

All submissions are embedded using text-embedding-3-small and compared across the challenge history. High cosine similarity triggers instant review.

👤

Human expert review

Every submission scoring above 88 is reviewed by a domain expert from our reviewer network before the final score is confirmed on your profile.

// AI Evaluation

Scored across 6 real dimensions

Every submission is evaluated by our AI scoring system on correctness, architecture, decision quality, LLM usage, robustness, and clarity. Scores above 88 get an additional human expert review.

Start a challenge →

// Score breakdown

Correctness25%

Architecture20%

Decision quality20%

LLM usage20%

Robustness10%

Clarity5%

// Who is this for?

Built for two sides of the same conversation

💻

For engineers

I'm building with AI

✓Pick a real challenge — RAG, agents, evals, MCP
✓Get a dataset + scoped LLM key, build locally
✓Submit your repo, get AI-scored across 6 dimensions
✓Earn a verified artifact that lives on your public profile

Browse challenges →Free to join

🏢

For companies

I'm hiring AI talent

✓Search verified candidates by skill category and score
✓Create company-branded challenges scoped to your stack
✓Invite candidates directly — no cold outreach needed
✓See exactly how they reason, not just what they claim

Get access →Learn more

// Ready to prove your skills?

Stop claiming.
Start proving.

Free for candidates. No resume required — just build something real and let the evaluation speak for itself.

Get started — it's free →Hiring? Learn more

Hiring companies can sign up here

Resumes tell you whatsomeone claims. TryCrucibleshows you what they've built.