Fields

The work we actually publish

Fields are the categories of paid AI training briefs OBG publishes — and pays for — each week. They tell you what kind of work to expect from us. Different from expertise, which is the skills you bring.

LLM evaluation

Score model outputs against rubrics — accuracy, helpfulness, safety, hallucination. Often A/B between two responses; occasionally written critique.

Typical: 2–10h briefs · $32–$58 / hr · Conversational English required

Red-teaming & safety

Probe for harmful, deceptive, or policy-breaking model outputs. Document repro steps. Senior briefs include adversarial prompting and jailbreak research.

Typical: 4–20h briefs · $48–$95 / hr · Security or policy background helps

Code review & preference

Rate which of two code completions is better. Author reference solutions in 30+ languages. Some briefs require security-focused review or polyglot debugging.

Typical: 3–15h briefs · $40–$78 / hr · Production experience in 1+ languages

Multimodal annotation

Caption images, describe video clips, label document-AI extractions, evaluate OCR. Sometimes paired with audio for cross-modal grounding.

Typical: 4–18h briefs · $22–$42 / hr · Sharp visual attention required

RLHF preference rating

Compare model outputs head-to-head and explain which is preferable, why, and where the loser fails. Used to fine-tune reward models.

Typical: 6–25h briefs · $35–$62 / hr · Clear written reasoning required

Reasoning evaluation

Verify multi-step solutions in math, physics, logic, or strategy. Author step-by-step traces. Olympiad-tier briefs pay top-of-band.

Typical: 4–22h briefs · $45–$82 / hr · STEM degree or equivalent

Specialised translation & localisation

Native-fluency translation, MT post-editing, terminology alignment. Premium rates for legal, medical, and low-resource language pairs.

Typical: 2–12h briefs · $0.10–$0.22 / word · Native or near-native target

Domain reasoning (regulated)

Medical, legal, and finance briefs requiring credential verification (MD, JD, CFA, etc.). Clinical accuracy review, statute interpretation, model-output critique against industry standards.

Typical: 4–30h briefs · $55–$110 / hr · Verifiable credential required

Don't see the field that fits you?

We open new field rubrics every month — voice, robotics, agentic evaluation, and more. Apply with your expertise and we'll match you to the closest fit while new briefs come online.

Apply with Google