Agents in 60 lines of python : Part 7
Why ChatGPT Refuses Harmful Requests — Build Guardrails in Python Lesson 7 of 9 — A Tour of Agents The entire AI agent stack in 60 lines of Python. Your agent will do anything you ask. Delete a dat...

Source: DEV Community
Why ChatGPT Refuses Harmful Requests — Build Guardrails in Python Lesson 7 of 9 — A Tour of Agents The entire AI agent stack in 60 lines of Python. Your agent will do anything you ask. Delete a database. Leak a password. Call a tool it shouldn't. Right now there's nothing stopping it — no filter, no boundary, no policy. That's a problem. ChatGPT won't help you build a bomb. Here's how that refusal actually works under the hood — and it's simpler than you'd expect. The concept: two gates The fix is two checkpoints — an input gate and an output gate. The input gate runs before the agent sees the message. It scans the user's request against a list of rules. If any rule fails, the message never reaches the LLM. Blocked. The output gate runs after the agent responds but before the user sees the response. It scans the output for things that shouldn't leak — passwords, API keys, internal data. If a rule triggers, the response gets redacted. Two gates. One before. One after. That's the entire