The work we actually publish
Fields are the categories of paid AI training briefs OBG publishes — and pays for — each week. They tell you what kind of work to expect from us. Different from expertise, which is the skills you bring.
LLM evaluation
Score model outputs against rubrics — accuracy, helpfulness, safety, hallucination. Often A/B between two responses; occasionally written critique.
Red-teaming & safety
Probe for harmful, deceptive, or policy-breaking model outputs. Document repro steps. Senior briefs include adversarial prompting and jailbreak research.
Code review & preference
Rate which of two code completions is better. Author reference solutions in 30+ languages. Some briefs require security-focused review or polyglot debugging.
Multimodal annotation
Caption images, describe video clips, label document-AI extractions, evaluate OCR. Sometimes paired with audio for cross-modal grounding.
RLHF preference rating
Compare model outputs head-to-head and explain which is preferable, why, and where the loser fails. Used to fine-tune reward models.
Reasoning evaluation
Verify multi-step solutions in math, physics, logic, or strategy. Author step-by-step traces. Olympiad-tier briefs pay top-of-band.
Specialised translation & localisation
Native-fluency translation, MT post-editing, terminology alignment. Premium rates for legal, medical, and low-resource language pairs.
Domain reasoning (regulated)
Medical, legal, and finance briefs requiring credential verification (MD, JD, CFA, etc.). Clinical accuracy review, statute interpretation, model-output critique against industry standards.