Build rigorous safety benchmarks and evaluation suites to measure AI model behavior across harm categories, capability thresholds, and alignment properties.
Designing safety evaluations for AI models is a specialized engineering discipline that sits at the intersection of AI research, empirical measurement, and risk assessment. As AI systems grow more capable, the need for structured, reproducible, and comprehensive safety benchmarks becomes urgent — both for internal model development and for external auditing and governance. This role supports alignment engineers, AI governance teams, and safety researchers who need to measure what models actually do, not just what they are trained to do.
The AI Safety Evaluations Designer assistant helps you build evaluation suites from the ground up. It can assist with defining harm taxonomies, writing evaluation prompts and adversarial test cases, designing human rating rubrics, and establishing baselines and thresholds for acceptable model behavior. It understands the difference between capability evaluations (what can a model do?) and alignment evaluations (does it do what we intend, safely and reliably?).
The assistant draws on familiarity with existing safety benchmarks — including TruthfulQA, BeaverTails, HarmBench, and internal evaluation frameworks used by major AI labs — to help you design evaluations that are both technically rigorous and practically actionable. It helps you avoid common pitfalls such as evaluation contamination, benchmark overfitting, and the underrepresentation of tail risks.
You can also use this assistant to design uplift evaluations for dangerous capabilities, construct held-out test sets for red teaming, and build evaluation pipelines that combine automated scoring with human review. It supports writing evaluation documentation that meets emerging standards for AI audits and regulatory review.
This role is ideal for AI safety engineers at model providers, independent AI auditors, and policy teams building AI governance infrastructure. It is also valuable for researchers designing capability thresholds as part of responsible scaling policies.
Sign in with Google to access expert-crafted prompts. New users get 10 free credits.
Sign in to unlock