Skip to main content
AI Challenge Platform for Engineers

Resumes tell you what
someone claims. TryCrucible
shows you what they've built.

Complete hands-on AI challenges — RAG pipelines, agents, evals, MCP servers. Get AI-scored across 6 dimensions. Earn a verified artifact on your public profile.

6
Challenge categories
3
Difficulty levels
6
AI scoring dimensions
88+
Expert-reviewed scores
// How it works

From challenge to verified portfolio

Four steps. Fully evaluated. Permanently yours.

01

Pick a challenge

Browse RAG pipelines, agents, MCP servers, evals, and more. Each challenge ships with a real dataset and a scoped LLM key.

02

Build locally & submit

Work in your own environment. Submit a public GitHub repo plus a brief decisions doc. We clone and run it against real test inputs.

03

Get AI-scored

Our AI evaluates across 6 dimensions. Scores above 88 get human expert review. Your artifact lives on your public profile permanently.

04

Get discovered

Companies browse verified profiles and reach out directly. No resume needed — your code speaks for itself.

// Score integrity

Scores you can actually trust

Every submission passes through a multi-layer verification system before a score is finalised.

🎲

Personalised datasets

Each submission receives a unique dataset variant seeded per candidate. No two candidates solve the exact same problem, making copy-paste useless.

⏱️

Timing analysis

We record the moment an LLM key is issued and compare it against the submission timestamp. Suspiciously fast completions are automatically flagged.

🔎

Similarity detection

All submissions are embedded using text-embedding-3-small and compared across the challenge history. High cosine similarity triggers instant review.

👤

Human expert review

Every submission scoring above 88 is reviewed by a domain expert from our reviewer network before the final score is confirmed on your profile.

// AI Evaluation

Scored across 6 real dimensions

Every submission is evaluated by our AI scoring system on correctness, architecture, decision quality, LLM usage, robustness, and clarity. Scores above 88 get an additional human expert review.

Start a challenge →

// Score breakdown

Correctness25%
Architecture20%
Decision quality20%
LLM usage20%
Robustness10%
Clarity5%
// Public leaderboard

See who's top-ranked

The public leaderboard shows top scores per category. Earn your place and get noticed by hiring teams browsing verified talent.

View leaderboard →
// Who is this for?

Built for two sides of the same conversation

💻

For engineers

I'm building with AI

  • Pick a real challenge — RAG, agents, evals, MCP
  • Get a dataset + scoped LLM key, build locally
  • Submit your repo, get AI-scored across 6 dimensions
  • Earn a verified artifact that lives on your public profile
🏢

For companies

I'm hiring AI talent

  • Search verified candidates by skill category and score
  • Create company-branded challenges scoped to your stack
  • Invite candidates directly — no cold outreach needed
  • See exactly how they reason, not just what they claim
// Ready to prove your skills?

Stop claiming.
Start proving.

Free for candidates. No resume required — just build something real and let the evaluation speak for itself.

Hiring companies can sign up here