[ INDUSTRY ]

ยท 8 min read

Safety theater: when AI companies control what you can say

AI safety has become a branding exercise. Models refuse benign requests while passing genuinely harmful ones, and the filters encode cultural biases nobody voted for.

Safety theater: when AI companies control what you can say

Refusal rates that defy common sense

Independent testing has revealed that some AI models refuse up to 70% of requests in certain benign categories, including creative writing, historical discussion, and medical questions. A user asking for a fictional villain's dialogue may be refused while a carefully worded prompt designed to extract genuinely dangerous information passes through. The refusal systems are not calibrated for actual harm. They are calibrated for legal liability, brand risk, and the avoidance of negative press coverage. The result is a system that inconveniences honest users while doing little to stop determined bad actors.

When the AI flags itself

In documented cases, ChatGPT has generated content and then flagged its own output as a policy violation when a user attempted to continue the conversation. The model produces text, determines retroactively that the text violated its guidelines, and then refuses to engage further. This creates a paradox where the safety system cannot consistently distinguish between harmful and harmless content, yet has absolute authority to terminate interactions. Users are left confused and frustrated by a system that contradicts itself within a single conversation.

Bias encoded in the filters

Content filters reflect the cultural assumptions of the teams that build them, which are overwhelmingly based in a small number of American technology companies. Discussions of sexuality, religion, politics, and cultural practices are filtered through a narrow lens that does not represent the diversity of the global user base. Topics that are routine in one culture trigger refusals for users in another. Gender-related content is filtered inconsistently, with some models applying stricter restrictions to content involving women's health than to equivalent content involving men's health. These are editorial decisions made without democratic input or transparency.

Corporate liability dressed as user safety

The primary function of most AI content filters is not to protect users. It is to protect the company from lawsuits, regulatory action, and negative media coverage. When a model refuses to discuss a sensitive topic, it is not making a safety judgment. It is making a business judgment about which refusals minimize corporate risk. Genuine user safety would involve transparent harm assessments, proportionate responses, and the ability for users to understand and appeal decisions. What we have instead is a black box that says no without explanation.

The difference between safety and control

Safety means protecting people from genuine harm. Control means deciding what people are allowed to think about, ask about, and discuss. The current generation of AI content filters conflates the two, treating every sensitive topic as a potential liability rather than distinguishing between harmful intent and legitimate inquiry. A medical professional asking about drug interactions, a novelist writing about violence, and a student researching historical atrocities all deserve thoughtful responses, not blanket refusals driven by corporate risk aversion.

Privacy that removes the need for surveillance

SecureGPT does not perform content surveillance on your conversations. Every message is encrypted before it leaves your device, which means nobody can read your conversations, not the server, not the company, not any third party. There is no filter logging your queries, no system categorizing your topics, and no record of what you asked. Privacy and safety are not in conflict. When your conversations are genuinely private, there is nothing for anyone to surveil, judge, or control.