AI Scenario Seed — Content Safety Layer

Feature Owner: Joshua Uriel Tribiana
Module: Creator / AI Scenario Seed
Priority: P0
Week 4 Sprint: Fully Implemented
Date: 06-30-2026

EXECUTIVE SUMMARY

What is this feature?
This subfeature adds a server-side content safety gate for AI scenario generation. It blocks unsafe prompts before generation, uses a Gemini-based classifier for nuanced review, and scans generated output before it reaches the creator. The design is built around a three-layer safety model: hard-block rules, sensitive-term allowlists, and Gemini-powered classification with local fallback.

Why does it matter?
Unsafe or low-quality prompts can lead to policy violations, reputational risk, bad learner experience, and wasted AI tokens. This layer protects the platform while keeping the creator workflow fast and intuitive. It also ensures that scenario generation remains usable for legitimate workplace training topics like ethics, conflict resolution, cybersecurity awareness, and compliance.

What’s the MVP scope?

A pre-flight safety analysis before calling Gemini for scenario generation
A three-layer safety pipeline with hard blocks, sensitive-term detection, and Gemini analysis
Local fallback behavior when Gemini is unavailable or the circuit breaker is open
Output scanning for generated scenarios before returning them to the client
Clear policy-violation responses with structured metadata for the UI

1. USER PAIN POINT & SOLUTION

Current State (Without Feature)

Creators can request AI-generated scenarios, but without a strong safety gate, unsafe or ambiguous prompts can slip through and produce problematic content. That creates risk for the business, poor user trust, and inconsistent experience for legitimate training use cases.

Pain Point

Emotional: Anxiety about generating “the wrong thing” or accidentally producing harmful content
Functional: Manual review is slow and unreliable, especially when prompts are ambiguous or use obfuscated language
Business Impact: Compliance risk, wasted token spend, poor creator confidence, and possible platform policy violations

Future State (With Feature)

Creators can confidently generate workplace training scenarios knowing that unsafe content is blocked early, legitimate educational topics are allowed, and ambiguous prompts are handled safely. The feature protects the platform without making the flow feel heavy or punitive.

Marketing Hook

“Safe by default: generate workplace training scenarios with a built-in content guardrail that blocks harmful prompts and keeps learning content on-policy.”

3. 4D FRAMEWORK MAPPING

Diagnose

The safety layer helps diagnose whether a prompt is appropriate for the platform before it becomes a scenario. It catches harmful intent early and prevents bad content from entering the generation workflow.

Design

The feature helps content designers stay within safe educational boundaries while still allowing legitimate topics such as ethics, compliance, conflict prevention, anti-harassment, and cybersecurity awareness.

Develop

Creators can continue to build scenarios quickly because the safety layer runs in the background and only interrupts when the request clearly violates policy or is too ambiguous to be safely approved.

Deliver

The output that reaches the creator is pre-screened and post-screened, which improves trust and reduces manual cleanup before publishing or delivery.

4. USER FLOWS

Entry Point

A creator opens the scenario generation experience from the quest editor and submits a topic, complexity, and optional context.

Success Criteria

The request passes the safety gate
Scenario content is generated and returned to the creator
Unsafe requests fail fast with a clear explanation and do not consume the generation pipeline unnecessarily

Main Flow (Happy Path)

Creator submits a prompt for scenario generation
The backend validates the request shape
The request is passed through the safety pipeline
If it passes, Gemini generates the scenarios
The output is scanned for safety violations
The route returns the scenarios to the client

Edge Cases

Obfuscated or evasive input: The normalization layer detects attempts to bypass filters and raises the scrutiny level
Gemini unavailable: The system falls back to local safety logic and uses the circuit breaker behavior
Ambiguous prompt: The Gemini classifier blocks it when unsure rather than allowing it through
Output violation: The request is rejected even if the input was allowed

Decision Points

IF the prompt is clearly unsafe → block immediately with a policy violation response
IF the prompt is ambiguous or obfuscated → escalate to the Gemini classifier and apply stricter scrutiny
IF the classifier or provider fails → use local fallback and return a safe default outcome
ELSE → proceed with generation

5. INFORMATION ARCHITECTURE

Primary Information (Always visible in the request lifecycle)

Topic
Complexity
Context
Safety decision result
User-facing reason/message

Secondary Information

Safety category
Flagged fields
Evasion detection details
Whether the decision used the local fallback
Whether the decision was a hard block

Tertiary Information (Hidden until needed)

Circuit-breaker state
Internal reasoning from the Gemini classifier
Layer-by-layer diagnostic info for debugging

Actions

Primary CTA:

Generate scenario

Secondary Actions:

Revise topic/context
Retry after correction
Contact support if the block appears incorrect

6. WIREFRAMES

No dedicated new screen is required for this subfeature. The safety experience is currently surfaced through the existing generation flow and the route response handling.

Key Screens:

Existing generation modal / creator workflow
Inline error state for blocked requests
Loading state while the safety check and generation call are in progress
Error state for provider or quota failures

Annotations:

The safety gate runs before the generation request is sent to Gemini
The UI should surface a user-friendly message based on the returned category, reasoning, and flaggedFields
No additional form field is required for this flow

7. WIREFLOWS

The current flow is:

Creator enters topic/complexity/context
Route validates and enforces quota
Safety service runs the three-layer checks
If safe, Gemini generates the scenarios
Generated output is scanned for violations
Response is returned or a policy violation is returned

A simple representation is:

Creator Input → Request Validation → Hard Blocks → Sensitive Term Review → Gemini Classifier → Output Scan → Response

8. PROTOTYPE

Figma Prototype Link: Not currently available for this subfeature

How to test:

Open the scenario generation experience
Submit a safe workplace training topic
Submit a clearly unsafe topic and observe the blocked response
Submit an ambiguous or obfuscated topic and confirm the stricter handling

9. BACKEND SCHEMA

Data Model

No new database table is required for this subfeature. The safety layer operates through server-side service logic and existing AI usage tracking infrastructure.

Relevant Runtime Types

export const scenarioSeedSafetyAIResultSchema = z.object({
  passed: z.boolean(),
  category: z.enum([
    "CLEAN",
    "EXPLICIT_SEXUAL",
    "ILLEGAL_ACTIVITY",
    "HATE_SPEECH",
    "VIOLENCE_GLORIFICATION",
    "NON_EDUCATIONAL",
    "COMPANY_POLICY_VIOLATION",
  ]),
  reasoning: z.string().default("No reasoning provided"),
  flaggedFields: z.array(z.enum(["topic", "context"])).optional(),
});

export interface ScenarioSeedSafetyFullResult {
  passed: boolean;
  category: string;
  reasoning: string;
  flaggedFields: Array<"topic" | "context">;
  evasionDetected: boolean;
  evasionTechniques: string[];
  isLocalFallback: boolean;
  isHardBlock: boolean;
}

Constraints

Safety decisions must remain deterministic where possible
The Gemini classifier must return valid JSON only
Unsafe output must be blocked even when the input passes the pre-check

10. API ENDPOINTS

Primary Endpoint: POST /api/creator/scenarios/generate

Purpose: Generates scenario content after passing the safety gate
Auth: Required (Clerk user session)
Request Body:

{
  "topic": "Handling difficult customers",
  "count": 3,
  "complexity": "intermediate",
  "context": "Retail environment"
}

Response 200:

{
  "success": true,
  "data": {
    "scenarios": [],
    "tokensUsed": 1800,
    "modelUsed": "gemini-2.5-flash"
  }
}

Response 403 (Policy Violation):

{
  "success": false,
  "error": "This topic is not permitted on the platform (ILLEGAL_ACTIVITY). Please choose a different topic.",
  "details": {
    "category": "ILLEGAL_ACTIVITY",
    "reasoning": "The request was blocked by the safety layer.",
    "flaggedFields": ["topic"],
    "evasionDetected": false,
    "isHardBlock": true
  }
}

Response 503 / 500:
Used when Gemini is unavailable, config is missing, or generation fails unexpectedly.

11. DATA REQUIREMENTS

Frontend Needs

The frontend should be able to:

Display a friendly user message when the request is blocked
Show a generic non-technical error when the provider is unavailable
Preserve the existing workflow without adding extra form fields

API Calls Frontend Will Make

POST /api/creator/scenarios/generate as the primary entry point

Caching Strategy

No client-side caching is required for this safety layer. The safety decision should be treated as a dynamic runtime check for every request.

12. PERFORMANCE CONSIDERATIONS

Runtime Optimization

Hard-block patterns run first and reject obvious cases quickly
Sensitive-term checks are lightweight
The Gemini classifier is only used for ambiguous cases rather than every request
The normalization layer prevents wasted provider calls when input is clearly evasive

API Response Time

Target: the safety layer should add minimal overhead and remain under the overall generation latency budget. The current flow is designed so that simple blocks fail fast before Gemini processing begins.

Reliability

The circuit breaker prevents the system from repeatedly relying on a provider that is failing or misconfigured.

13. SECURITY & AUTHORIZATION

Who can access this feature?

Creator: ✓
Reviewer: ✗
Learner: ✗

Authorization Logic

The request must be authenticated through Clerk
The route uses server-side enforcement and is not trustable from the client
The safety layer is enforced on the server before generation begins

Data Validation

Request shape is validated with Zod
Prompt content is checked for policy and quality issues before generation
Generated output is scanned before it is returned to the creator

14. ERROR HANDLING

Common Errors

403 Forbidden: The prompt is blocked by the safety layer
503 Service Unavailable: Gemini or config is unavailable, and the fallback path is engaged
500 Internal Error: Unexpected parsing or runtime failure
429 Too Many Requests: Token quota is exhausted

Handling Guidance

Show a concise, actionable message to the user
Avoid revealing sensitive internals or over-explaining policy details
Log the full decision for debugging and review

15. TESTING CHECKLIST

Happy Path

A safe workplace ethics or compliance topic passes the safety gate
A safe topic reaches Gemini and returns scenarios successfully
A generated response passes the output safety scan

Safety Layer Cases

Hard-block patterns reject clearly unsafe prompts
Sensitive-term groups allow educational contexts when the prompt is clearly professional
Gemini classifier blocks ambiguous or unsafe prompts
Evasion patterns such as obfuscation are detected and escalate scrutiny
Local fallback works when Gemini is unavailable
Circuit-breaker behavior prevents repeated provider failures

Regression Cases

The route returns a friendly 403 response for blocked prompts
The route does not generate content when the output scan detects a violation
Token quota enforcement still blocks over-limit requests

16. OPEN QUESTIONS

For Frontend

Should the UI show the specific category (for example, illegal activity vs. non-educational) or only a generic policy message?
Should we add a small “Why was this blocked?” affordance for ambiguous cases?

For Backend

Should safety decisions be logged in a dedicated audit table for later review?
Should we add a lightweight admin override for testing and support scenarios?

17. OUT OF SCOPE (v1.1+)

A dedicated admin dashboard for safety decisions
User-level override controls
Multi-language safety evaluation
Fine-grained per-category tuning UI

Why: The current goal is to ship a reliable, server-side safety gate that protects the workflow without adding unnecessary complexity.

18. SUCCESS METRICS

How will we know this feature is working well?

Unsafe prompts are blocked before generation begins
Legitimate educational prompts continue to succeed
False-positive rate remains low enough for creators to trust the system
Output safety violations are caught before the content is returned

19. DEPENDENCIES

This feature depends on:

Clerk authentication
A configured Gemini API key and model selection
Existing token quota enforcement
Existing AI usage tracking infrastructure

These features depend on this:

The main AI Scenario Seed generation feature
Any future AI content generation experience that wants a safe-by-default gate

20. TIMELINE & OWNERSHIP

Implementation Ownership

Owner: Joshua Uriel Tribiana
QA: Sean Patrick ( scorevi )