AI Scenario Seed — Content Safety Layer

Feature Owner: Joshua Uriel Tribiana
Module: Creator / AI Scenario Seed
Priority: P0
Week 4 Sprint: Fully Implemented
Date: 06-30-2026


EXECUTIVE SUMMARY

What is this feature?
This subfeature adds a server-side content safety gate for AI scenario generation. It blocks unsafe prompts before generation, uses a Gemini-based classifier for nuanced review, and scans generated output before it reaches the creator. The design is built around a three-layer safety model: hard-block rules, sensitive-term allowlists, and Gemini-powered classification with local fallback.

Why does it matter?
Unsafe or low-quality prompts can lead to policy violations, reputational risk, bad learner experience, and wasted AI tokens. This layer protects the platform while keeping the creator workflow fast and intuitive. It also ensures that scenario generation remains usable for legitimate workplace training topics like ethics, conflict resolution, cybersecurity awareness, and compliance.

What’s the MVP scope?

  • A pre-flight safety analysis before calling Gemini for scenario generation

  • A three-layer safety pipeline with hard blocks, sensitive-term detection, and Gemini analysis

  • Local fallback behavior when Gemini is unavailable or the circuit breaker is open

  • Output scanning for generated scenarios before returning them to the client

  • Clear policy-violation responses with structured metadata for the UI


1. USER PAIN POINT & SOLUTION

Current State (Without Feature)

Creators can request AI-generated scenarios, but without a strong safety gate, unsafe or ambiguous prompts can slip through and produce problematic content. That creates risk for the business, poor user trust, and inconsistent experience for legitimate training use cases.

Pain Point

Emotional: Anxiety about generating “the wrong thing” or accidentally producing harmful content
Functional: Manual review is slow and unreliable, especially when prompts are ambiguous or use obfuscated language
Business Impact: Compliance risk, wasted token spend, poor creator confidence, and possible platform policy violations

Future State (With Feature)

Creators can confidently generate workplace training scenarios knowing that unsafe content is blocked early, legitimate educational topics are allowed, and ambiguous prompts are handled safely. The feature protects the platform without making the flow feel heavy or punitive.

Marketing Hook

“Safe by default: generate workplace training scenarios with a built-in content guardrail that blocks harmful prompts and keeps learning content on-policy.”


3. 4D FRAMEWORK MAPPING

Diagnose

The safety layer helps diagnose whether a prompt is appropriate for the platform before it becomes a scenario. It catches harmful intent early and prevents bad content from entering the generation workflow.

Design

The feature helps content designers stay within safe educational boundaries while still allowing legitimate topics such as ethics, compliance, conflict prevention, anti-harassment, and cybersecurity awareness.

Develop

Creators can continue to build scenarios quickly because the safety layer runs in the background and only interrupts when the request clearly violates policy or is too ambiguous to be safely approved.

Deliver

The output that reaches the creator is pre-screened and post-screened, which improves trust and reduces manual cleanup before publishing or delivery.


4. USER FLOWS

Entry Point

A creator opens the scenario generation experience from the quest editor and submits a topic, complexity, and optional context.

Success Criteria

  • The request passes the safety gate

  • Scenario content is generated and returned to the creator

  • Unsafe requests fail fast with a clear explanation and do not consume the generation pipeline unnecessarily

Main Flow (Happy Path)

  1. Creator submits a prompt for scenario generation

  2. The backend validates the request shape

  3. The request is passed through the safety pipeline

  4. If it passes, Gemini generates the scenarios

  5. The output is scanned for safety violations

  6. The route returns the scenarios to the client

Edge Cases

  • Obfuscated or evasive input: The normalization layer detects attempts to bypass filters and raises the scrutiny level

  • Gemini unavailable: The system falls back to local safety logic and uses the circuit breaker behavior

  • Ambiguous prompt: The Gemini classifier blocks it when unsure rather than allowing it through

  • Output violation: The request is rejected even if the input was allowed

Decision Points

  • IF the prompt is clearly unsafe → block immediately with a policy violation response

  • IF the prompt is ambiguous or obfuscated → escalate to the Gemini classifier and apply stricter scrutiny

  • IF the classifier or provider fails → use local fallback and return a safe default outcome

  • ELSE → proceed with generation


5. INFORMATION ARCHITECTURE

Primary Information (Always visible in the request lifecycle)

  • Topic

  • Complexity

  • Context

  • Safety decision result

  • User-facing reason/message

Secondary Information

  • Safety category

  • Flagged fields

  • Evasion detection details

  • Whether the decision used the local fallback

  • Whether the decision was a hard block

Tertiary Information (Hidden until needed)

  • Circuit-breaker state

  • Internal reasoning from the Gemini classifier

  • Layer-by-layer diagnostic info for debugging

Actions

Primary CTA:

  • Generate scenario

Secondary Actions:

  • Revise topic/context

  • Retry after correction

  • Contact support if the block appears incorrect


6. WIREFRAMES

No dedicated new screen is required for this subfeature. The safety experience is currently surfaced through the existing generation flow and the route response handling.

Key Screens:

  1. Existing generation modal / creator workflow

  2. Inline error state for blocked requests

  3. Loading state while the safety check and generation call are in progress

  4. Error state for provider or quota failures

Annotations:

  • The safety gate runs before the generation request is sent to Gemini

  • The UI should surface a user-friendly message based on the returned category, reasoning, and flaggedFields

  • No additional form field is required for this flow


7. WIREFLOWS

The current flow is:

  1. Creator enters topic/complexity/context

  2. Route validates and enforces quota

  3. Safety service runs the three-layer checks

  4. If safe, Gemini generates the scenarios

  5. Generated output is scanned for violations

  6. Response is returned or a policy violation is returned

A simple representation is:

Creator Input → Request Validation → Hard Blocks → Sensitive Term Review → Gemini Classifier → Output Scan → Response


8. PROTOTYPE

Figma Prototype Link: Not currently available for this subfeature

How to test:

  1. Open the scenario generation experience

  2. Submit a safe workplace training topic

  3. Submit a clearly unsafe topic and observe the blocked response

  4. Submit an ambiguous or obfuscated topic and confirm the stricter handling


9. BACKEND SCHEMA

Data Model

No new database table is required for this subfeature. The safety layer operates through server-side service logic and existing AI usage tracking infrastructure.

Relevant Runtime Types

export const scenarioSeedSafetyAIResultSchema = z.object({
passed: z.boolean(),
category: z.enum([
"CLEAN",
"EXPLICIT_SEXUAL",
"ILLEGAL_ACTIVITY",
"HATE_SPEECH",
"VIOLENCE_GLORIFICATION",
"NON_EDUCATIONAL",
"COMPANY_POLICY_VIOLATION",
]),
reasoning: z.string().default("No reasoning provided"),
flaggedFields: z.array(z.enum(["topic", "context"])).optional(),
});
export interface ScenarioSeedSafetyFullResult {
passed: boolean;
category: string;
reasoning: string;
flaggedFields: Array<"topic" | "context">;
evasionDetected: boolean;
evasionTechniques: string[];
isLocalFallback: boolean;
isHardBlock: boolean;
}

Constraints

  • Safety decisions must remain deterministic where possible

  • The Gemini classifier must return valid JSON only

  • Unsafe output must be blocked even when the input passes the pre-check


10. API ENDPOINTS

Primary Endpoint: POST /api/creator/scenarios/generate

Purpose: Generates scenario content after passing the safety gate
Auth: Required (Clerk user session)
Request Body:

{
"topic": "Handling difficult customers",
"count": 3,
"complexity": "intermediate",
"context": "Retail environment"
}

Response 200:

{
"success": true,
"data": {
"scenarios": [],
"tokensUsed": 1800,
"modelUsed": "gemini-2.5-flash"
}
}

Response 403 (Policy Violation):

{
"success": false,
"error": "This topic is not permitted on the platform (ILLEGAL_ACTIVITY). Please choose a different topic.",
"details": {
"category": "ILLEGAL_ACTIVITY",
"reasoning": "The request was blocked by the safety layer.",
"flaggedFields": ["topic"],
"evasionDetected": false,
"isHardBlock": true
}
}

Response 503 / 500:
Used when Gemini is unavailable, config is missing, or generation fails unexpectedly.


11. DATA REQUIREMENTS

Frontend Needs

The frontend should be able to:

  • Display a friendly user message when the request is blocked

  • Show a generic non-technical error when the provider is unavailable

  • Preserve the existing workflow without adding extra form fields

API Calls Frontend Will Make

  • POST /api/creator/scenarios/generate as the primary entry point

Caching Strategy

No client-side caching is required for this safety layer. The safety decision should be treated as a dynamic runtime check for every request.


12. PERFORMANCE CONSIDERATIONS

Runtime Optimization

  • Hard-block patterns run first and reject obvious cases quickly

  • Sensitive-term checks are lightweight

  • The Gemini classifier is only used for ambiguous cases rather than every request

  • The normalization layer prevents wasted provider calls when input is clearly evasive

API Response Time

Target: the safety layer should add minimal overhead and remain under the overall generation latency budget. The current flow is designed so that simple blocks fail fast before Gemini processing begins.

Reliability

The circuit breaker prevents the system from repeatedly relying on a provider that is failing or misconfigured.


13. SECURITY & AUTHORIZATION

Who can access this feature?

  • Creator:

  • Reviewer:

  • Learner:

Authorization Logic

  • The request must be authenticated through Clerk

  • The route uses server-side enforcement and is not trustable from the client

  • The safety layer is enforced on the server before generation begins

Data Validation

  • Request shape is validated with Zod

  • Prompt content is checked for policy and quality issues before generation

  • Generated output is scanned before it is returned to the creator


14. ERROR HANDLING

Common Errors

  • 403 Forbidden: The prompt is blocked by the safety layer

  • 503 Service Unavailable: Gemini or config is unavailable, and the fallback path is engaged

  • 500 Internal Error: Unexpected parsing or runtime failure

  • 429 Too Many Requests: Token quota is exhausted

Handling Guidance

  • Show a concise, actionable message to the user

  • Avoid revealing sensitive internals or over-explaining policy details

  • Log the full decision for debugging and review


15. TESTING CHECKLIST

Happy Path

  • A safe workplace ethics or compliance topic passes the safety gate

  • A safe topic reaches Gemini and returns scenarios successfully

  • A generated response passes the output safety scan

Safety Layer Cases

  • Hard-block patterns reject clearly unsafe prompts

  • Sensitive-term groups allow educational contexts when the prompt is clearly professional

  • Gemini classifier blocks ambiguous or unsafe prompts

  • Evasion patterns such as obfuscation are detected and escalate scrutiny

  • Local fallback works when Gemini is unavailable

  • Circuit-breaker behavior prevents repeated provider failures

Regression Cases

  • The route returns a friendly 403 response for blocked prompts

  • The route does not generate content when the output scan detects a violation

  • Token quota enforcement still blocks over-limit requests


16. OPEN QUESTIONS

For Frontend

  • Should the UI show the specific category (for example, illegal activity vs. non-educational) or only a generic policy message?

  • Should we add a small “Why was this blocked?” affordance for ambiguous cases?

For Backend

  • Should safety decisions be logged in a dedicated audit table for later review?

  • Should we add a lightweight admin override for testing and support scenarios?


17. OUT OF SCOPE (v1.1+)

  • A dedicated admin dashboard for safety decisions

  • User-level override controls

  • Multi-language safety evaluation

  • Fine-grained per-category tuning UI

Why: The current goal is to ship a reliable, server-side safety gate that protects the workflow without adding unnecessary complexity.


18. SUCCESS METRICS

How will we know this feature is working well?

  • Unsafe prompts are blocked before generation begins

  • Legitimate educational prompts continue to succeed

  • False-positive rate remains low enough for creators to trust the system

  • Output safety violations are caught before the content is returned


19. DEPENDENCIES

This feature depends on:

  • Clerk authentication

  • A configured Gemini API key and model selection

  • Existing token quota enforcement

  • Existing AI usage tracking infrastructure

These features depend on this:

  • The main AI Scenario Seed generation feature

  • Any future AI content generation experience that wants a safe-by-default gate


20. TIMELINE & OWNERSHIP

Implementation Ownership

  • Owner: Joshua Uriel Tribiana

  • QA: Sean Patrick ( scorevi )


Was this article helpful?