Safety (AI Safety)

Level 2

Short Description

The discipline of designing AI systems that behave reliably, avoid harm, and remain under meaningful human control.

Friendly Description: AI Safety is the work of making sure AI systems behave well and don't cause harm. It covers everything from preventing the AI from giving dangerous instructions to making sure people stay in charge of important decisions. Think of it as the seatbelts, brakes, and traffic laws of the AI world, the things that let powerful technology be used responsibly.

Example: An AI safety team at a research lab might run extensive tests before release, looking for situations where a model could be tricked into harmful behavior, then add training and guardrails to prevent those problems. They also study longer-term questions, like how to keep increasingly powerful AI aligned with human values.