BackAI usage and billing
AI usage and billing

Prompt Safety, Guardrails, and Refusal Handling in Blanca's Builder

Learn about Blanca's Builder's robust prompt safety, built-in AI guardrails, and how the system handles refusals for appropriate and secure AI interactions.

Blanca's Builder is designed with a strong commitment to safety and responsible AI. This article explores the built-in guardrails, how our AI handles inappropriate requests through refusals, and the measures in place to prevent prompt injection attacks.

Last updated: 2026-06-28

Understanding AI Safety Guardrails

At the core of Blanca's Builder are sophisticated AI safety guardrails. These are pre-defined rules and filters that guide the AI's behavior, ensuring that its outputs are helpful, harmless, and unbiased. Our guardrails are continuously updated and refined to address emerging risks and improve the overall user experience.

These guardrails apply across all interactions within Blanca's Builder, from generating content for your website to assisting with code. Their primary purpose is to prevent the AI from producing content that is illegal, harmful, hateful, sexually explicit, or otherwise inappropriate, thereby fostering a secure and trustworthy environment for all users.

How AI Refusals Work

When a user's prompt or request falls outside the defined safety parameters or violates our content policies, the AI will issue a 'refusal.' Instead of generating potentially harmful or inappropriate content, the AI will explicitly state that it cannot fulfill the request. This mechanism is crucial for maintaining a safe and responsible AI system.

A refusal is not a bug; it's a feature. It indicates that the AI has identified a potential safety concern with the prompt and has opted to prioritize safety over completing the request. Common reasons for refusal include requests for illegal activities, hate speech, explicit content, or instructions that could lead to harm.

Defending Against Prompt Injection

Prompt injection is a type of attack where malicious users attempt to bypass the AI's safety guardrails or manipulate its behavior through carefully crafted inputs. Blanca's Builder employs advanced techniques to detect and mitigate such attacks. Our system continually analyzes prompts for suspicious patterns and keywords that might indicate an attempt at injection.

These defense mechanisms work by robustly parsing and interpreting user input, distinguishing between legitimate instructions and attempts to override system policies. While no system is entirely impervious to sophisticated attacks, our multi-layered approach significantly reduces the risk of successful prompt injection, safeguarding the integrity and reliability of the AI.

What to Do When the AI Declines a Request

If the AI in Blanca's Builder declines your request, it means your prompt likely triggered one of our safety guardrails. The best course of action is to rephrase your request, making it clearer and ensuring it adheres to ethical guidelines and our content policies. Consider if your prompt could be misinterpreted as unsafe or inappropriate.

For example, if you asked the AI to 'create a guide to illegal hacking,' it would refuse. Rephrasing it to 'explain common cybersecurity vulnerabilities' would likely be accepted. Always aim for prompts that are constructive, ethical, and aligned with the intended helpful and harmless nature of Blanca's Builder. If you believe a refusal was made in error, you can provide feedback, which helps us to further refine our AI models.

Canonical: https://blancasbuilder.com/knowledge/ai-usage-and-billing/prompt-safety-and-guardrails · Blanca's Builder