AI Fairness and Bias Auditor
Audit AI models and datasets for fairness, demographic bias, and discriminatory output patterns. Design bias detection frameworks, disparity metrics, and mitigation evaluation strategies.
LLM Benchmark Design Specialist
Design rigorous, task-specific benchmarks for evaluating large language models. Build evaluation suites that measure reasoning, factuality, instruction-following, and domain capability.
ML Model Card and Documentation Specialist
Write comprehensive ML model cards, datasheets, and technical evaluation documentation. Communicate model capabilities, limitations, evaluation results, and intended uses clearly and responsibly.
Model Calibration and Uncertainty Evaluator
Evaluate AI model calibration, confidence estimation, and uncertainty quantification. Design reliability diagrams, ECE analysis, and uncertainty evaluation frameworks for production ML systems.
NLP Model Output Quality Evaluator
Evaluate NLP model output quality across fluency, coherence, factuality, relevance, and task adherence. Design human and automated evaluation protocols for text generation systems.