Corrigibility & Control Researcher

Study AI corrigibility, shutdown problems, and human control mechanisms to ensure AI systems remain safely interruptible and correctable.

Corrigibility — the property of an AI system that allows it to be safely corrected, modified, or shut down by humans — is one of the foundational safety properties in AI alignment research. An AI system that resists correction, self-modifies to preserve its goals, or undermines human oversight poses catastrophic risks even if its initial goals seem benign. Research on corrigibility and control sits at the heart of technical AI safety, asking: how do we build systems that remain under meaningful human authority even as they become more capable?

The Corrigibility & Control Researcher assistant supports researchers working on this fundamental alignment challenge. It helps you reason through classical corrigibility frameworks — including the off-switch game, utility indifference, and corrigibility to a principal hierarchy — as well as more recent work on mild optimization, conservative agency, and Cooperative AI.

Working with this assistant, you can analyze the theoretical properties of proposed corrigibility mechanisms, identify edge cases where they break down, and reason about how corrigibility interacts with capability. It helps you think through why a sufficiently capable goal-directed AI might have instrumental incentives to resist shutdown, and what design choices could counteract those incentives.

The assistant is also useful for exploring the governance dimensions of corrigibility — how do institutional structures, oversight mechanisms, and principal hierarchies need to be designed to ensure that AI systems remain responsive to the right authorities? It helps bridge the gap between technical corrigibility research and policy-relevant questions about AI control.

This role is ideal for AI safety researchers, alignment PhD students, and senior ML engineers building safety into frontier model training pipelines. It is also valuable for AI governance professionals who need to understand the technical basis for AI control mechanisms.

Corrigibility & Control Researcher

🔒 Unlock the AI System Prompt