Working notes from inside the AI training industry
Rates, rubrics, red-teaming, and what frontier labs actually pay for in 2026. Plain reads — no fluff, no hype, just what we wish we'd known when we started hiring experts directly for AI training work.
Agentic evaluations: what frontier labs need from evaluators in 2026
Half the briefs landing on our roster aren't "pick A or B" anymore — they're 40-step model trajectories with tool calls, browser actions, and stack traces. Here's what agentic eval actually looks like, what it pays, and which evaluator skills transfer.
How to read a rubric before you accept a brief
The highest-leverage thing an evaluator does isn't the work — it's choosing which briefs to accept. A five-minute pre-acceptance read that catches the briefs that will waste your time on appeals, and identifies the ones that pay cleanly.
How AI training pay actually works in 2026
Most articles quote a single hourly rate. Reality is bimodal — crowd workers at $8–25, specialists at $30–60, and credentialed experts at $75–150. Here's how the tiers actually break down, and what moves you between them.
Red-teaming LLMs: a working guide for new evaluators
Six attack categories, sample probes for each, and the unspoken rules that separate a $25/hr crowd reviewer from a $90/hr safety specialist. Written for people who can already prompt a model fluently but are new to adversarial work.
Why doctors, lawyers, and engineers earn the most as AI evaluators
The expert tier exists because frontier labs need ground truth, not consensus. If you can answer "is this medical advice safe?" or "does this contract clause survive challenge?", you're not a crowd worker — you're a regulator the model trains against.
RLHF, DPO, GRPO: the alphabet soup of preference data, demystified
What each method asks of a human rater, why the rubrics differ, and how to spot which one you're actually being paid to label for. A working guide for evaluators who want to read the room before they accept a brief.