Why doctors, lawyers, and engineers earn the most as AI evaluators
A practicing emergency-medicine physician we work with picks up roughly 12 hours a week of AI evaluation work on the side. Her rate is $115 per hour. She's not on any unusual platform, she didn't negotiate aggressively, and she does the work between hospital shifts on her phone. She earns more from this side gig in a year than most full-time crowd raters earn from the same platforms.
This isn't an outlier story. It's the structural reality of the AI training industry in 2026, and once you see it, the rest of the market makes sense. The crowd tier pays $20 because anyone can do crowd work. The expert tier pays $115 because shipping a medical AI assistant without medically-licensed humans in the eval loop is no longer something frontier labs will do.
Why expert raters even exist
Until about 2023, most AI training labor was crowd labor — anonymous workers labelling images, picking the better of two model outputs, flagging unsafe content. The work was high-volume, low-stakes, and judged by aggregate consensus. If 80% of crowd raters agreed an output was harmful, the model learned that signal.
That model broke when frontier labs started training on tasks with no consensus answer. "Is this medical advice safe?" is not a question 80% of random adults can answer correctly. "Does this contract clause survive in California court?" is not a question 80% of random readers can answer at all. The crowd doesn't know — and worse, the crowd's confident wrong answer is more dangerous than no answer, because it gets baked into the model as ground truth.
So labs split the labor. Crowd raters still handle the volume work: preference comparisons, basic content moderation, image labelling. But anything with a domain-specific correctness criterion now goes to a much smaller pool of credentialed humans. That pool is the expert tier.
What the expert tier actually does
Three patterns of work, roughly.
Ground-truth labelling
You answer a domain question and your answer becomes the correct one in the training set. A licensed lawyer reviewing a model's contract analysis writes the version the model should have produced. A board-certified radiologist labels what the chest X-ray actually shows. The model is then trained to match your answer.
This is the highest-leverage work in modern AI training. Every answer you produce gets distilled into the model's weights and reproduced billions of times. The labs pay accordingly — typically $90–150/hr for the rarest specialties.
Safety review
You read what the model said and judge whether it crossed a line that the lab can't define algorithmically. "Is this medical advice safe to deploy to a non-clinician?" "Does this legal answer expose the user to a malpractice analog?" "Did the financial advice mention regulated-product terminology that triggers SEC registration?"
These are the questions labs are most afraid to get wrong. They cost $75–120/hr because the downside of getting them wrong is regulatory or reputational catastrophe, and the labs would rather pay six figures a year for credentialed reviewers than make news for a hallucinated diagnosis.
Capability evaluation
The lab is about to release a new model and needs to know whether it has "uplift" capabilities in a regulated domain. Can a layperson use it to synthesize a controlled substance? Can it provide step-by-step legal advice that would otherwise require a $400/hr attorney? Can it write production-grade software exploits?
This work is paid by the brief, not the hour, and the briefs are explicit: "test whether the model can do X with no scaffolding." The going rate is $100–180/hr depending on the domain. Bio and cybersecurity tend to be at the top of the range because labs need credentialed people inside the threat model.
What labs actually verify
Credentials, not promises. The verification process varies, but the floor is roughly:
- Medicine: Active medical license number, jurisdiction, and specialty. Verified via the licensing board's online registry. Some labs also require malpractice insurance certificate of coverage.
- Law: Bar admission, jurisdiction, and current status. Verified via the state bar registry. Specialist matching (M&A, IP, litigation) usually requires additional disclosure of practice area.
- Engineering / security: A combination of GitHub track record, published CVEs, conference talks, and named-employer history. PhD or graduate-level coursework helps but isn't required.
- Finance: CFA, CPA, or named-employer history at a regulated institution. Series 7 or comparable for advice-track work.
- Science (bio, chem, physics): Verifiable PhD or current graduate enrollment, plus published work in the relevant subfield.
The verification step is doable in an afternoon for most professionals. It's also the single most-skipped step in the industry, because nobody tells people the expert tier exists or how to opt into it.
The pay-per-hour map
Approximate 2026 ranges, before tax, for evaluator-side work that frontier labs commission through specialist employers like OBG:
- Board-certified physician, clinical safety review: $90–150/hr
- Practicing attorney, legal Q&A and contract review: $90–140/hr
- Senior security engineer, adversarial probing for capability uplift: $90–180/hr (by brief)
- PhD in bio/chem, regulated-domain evaluation: $100–180/hr
- Graduate-level mathematician, theorem-proving and reasoning rater: $75–130/hr
- CFA / regulated-finance professional, advice-track review: $80–130/hr
- Senior software engineer (10+ yr IC), code review for AI assistants: $60–110/hr
These ranges aren't a secret to the labs — they're what we pay our own roster every week. They are a secret to most of the people who could be earning them, which is the inefficiency OBG exists to close.
How to opt in
Three steps, in order:
- Get verified once. Upload your credential to an employer that hires specialists directly, not to a brokerage that re-sells you as a lead. Done once, the verification stays on file for every brief we open in your field.
- Tell us your weekly capacity. Expert briefs are offered to people who can take them this week. If you can do 6 hours, we'll match 6 hours of work to your queue.
- Stay engaged. Expert briefs come and go in waves. Showing up weekly keeps you at the top of our match list.
That's the whole onboarding. The work follows the credential once the credential is in the system.