Scenario
Expected output
Dashboard satisfies the UI contract, handles states cleanly, and transcript shows effective AI-assisted development
Dataset
Scoring rubric
README explains how to run and inspect the dashboard feature
Transcript shows planning, implementation, review, and iteration prompts
Handles empty, loading, error, and responsive states
Dashboard includes required metrics, filters, table, flagged review panel, and UI states
Component structure is maintainable and state/data concerns are separated
Language-free evaluation
Build your solution in any language or framework — Python, TypeScript, Go, Rust, Java, C#, or anything else. The dataset artifacts may be in one language; your implementation does not need to match. TryCrucible evaluates the behaviour of your system, the quality of your AI workflow, your verification strategy, and the reproducibility of your submission — not your language choice.
Submission requirements
- A public GitHub repository link
- A Dockerfile in the repo root — any language or framework; the evaluator builds and runs your container
- Your solution reads test_inputs.json from the working directory and writes results.json — standard I/O contract across all challenges
- A decisions.md — 3–5 sentences on the key architectural and AI-workflow choices you made
- The system must be fully reproducible — we clone, build, and run it against real test inputs
Evaluation contract
When you submit, the evaluator runs these steps in order:
- 1Clone your public GitHub repository
- 2Build your container from the Dockerfile in the repo root
- 3Mount test_inputs.json into the working directory
- 4Run your solution in a network-isolated sandbox (5 min limit, 512 MB RAM)
- 5Read results.json from the working directory
- 6Score correctness against hidden ground truth, then score architecture, AI workflow, robustness, and clarity
Input (provided by evaluator)
// test_inputs.json
[
{ "id": "t1", "input": { ... } },
{ "id": "t2", "input": { ... } }
]Output (written by your solution)
Create a free account to start
Already have one? Sign in