Feature Owner: Joshua Uriel Tribiana
Module: Creator / AI Scenario Seed
Priority: P0
Week 4 Sprint: Fully Implemented
Date: 06-30-2026
EXECUTIVE SUMMARY
What is this feature?
This subfeature adds a server-side content safety gate for AI scenario generation. It blocks unsafe prompts before generation, uses a Gemini-based classifier for nuanced review, and scans generated output before it reaches the creator. The design is built around a three-layer safety model: hard-block rules, sensitive-term allowlists, and Gemini-powered classification with local fallback.
Why does it matter?
Unsafe or low-quality prompts can lead to policy violations, reputational risk, bad learner experience, and wasted AI tokens. This layer protects the platform while keeping the creator workflow fast and intuitive. It also ensures that scenario generation remains usable for legitimate workplace training topics like ethics, conflict resolution, cybersecurity awareness, and compliance.
What’s the MVP scope?
A pre-flight safety analysis before calling Gemini for scenario generation
A three-layer safety pipeline with hard blocks, sensitive-term detection, and Gemini analysis
Local fallback behavior when Gemini is unavailable or the circuit breaker is open
Output scanning for generated scenarios before returning them to the client
Clear policy-violation responses with structured metadata for the UI
1. USER PAIN POINT & SOLUTION
Current State (Without Feature)
Creators can request AI-generated scenarios, but without a strong safety gate, unsafe or ambiguous prompts can slip through and produce problematic content. That creates risk for the business, poor user trust, and inconsistent experience for legitimate training use cases.
Pain Point
Emotional: Anxiety about generating “the wrong thing” or accidentally producing harmful content
Functional: Manual review is slow and unreliable, especially when prompts are ambiguous or use obfuscated language
Business Impact: Compliance risk, wasted token spend, poor creator confidence, and possible platform policy violations
Future State (With Feature)
Creators can confidently generate workplace training scenarios knowing that unsafe content is blocked early, legitimate educational topics are allowed, and ambiguous prompts are handled safely. The feature protects the platform without making the flow feel heavy or punitive.
Marketing Hook
“Safe by default: generate workplace training scenarios with a built-in content guardrail that blocks harmful prompts and keeps learning content on-policy.”
3. 4D FRAMEWORK MAPPING
Diagnose
The safety layer helps diagnose whether a prompt is appropriate for the platform before it becomes a scenario. It catches harmful intent early and prevents bad content from entering the generation workflow.
Design
The feature helps content designers stay within safe educational boundaries while still allowing legitimate topics such as ethics, compliance, conflict prevention, anti-harassment, and cybersecurity awareness.
Develop
Creators can continue to build scenarios quickly because the safety layer runs in the background and only interrupts when the request clearly violates policy or is too ambiguous to be safely approved.
Deliver
The output that reaches the creator is pre-screened and post-screened, which improves trust and reduces manual cleanup before publishing or delivery.
4. USER FLOWS
Entry Point
A creator opens the scenario generation experience from the quest editor and submits a topic, complexity, and optional context.
Success Criteria
The request passes the safety gate
Scenario content is generated and returned to the creator
Unsafe requests fail fast with a clear explanation and do not consume the generation pipeline unnecessarily
Main Flow (Happy Path)
Creator submits a prompt for scenario generation
The backend validates the request shape
The request is passed through the safety pipeline
If it passes, Gemini generates the scenarios
The output is scanned for safety violations
The route returns the scenarios to the client
Edge Cases
Obfuscated or evasive input: The normalization layer detects attempts to bypass filters and raises the scrutiny level
Gemini unavailable: The system falls back to local safety logic and uses the circuit breaker behavior
Ambiguous prompt: The Gemini classifier blocks it when unsure rather than allowing it through
Output violation: The request is rejected even if the input was allowed
Decision Points
IF the prompt is clearly unsafe → block immediately with a policy violation response
IF the prompt is ambiguous or obfuscated → escalate to the Gemini classifier and apply stricter scrutiny
IF the classifier or provider fails → use local fallback and return a safe default outcome
ELSE → proceed with generation
5. INFORMATION ARCHITECTURE
Primary Information (Always visible in the request lifecycle)
Topic
Complexity
Context
Safety decision result
User-facing reason/message
Secondary Information
Safety category
Flagged fields
Evasion detection details
Whether the decision used the local fallback
Whether the decision was a hard block
Circuit-breaker state
Internal reasoning from the Gemini classifier
Layer-by-layer diagnostic info for debugging
Actions
Primary CTA:
Generate scenario
Secondary Actions:
Revise topic/context
Retry after correction
Contact support if the block appears incorrect
6. WIREFRAMES
No dedicated new screen is required for this subfeature. The safety experience is currently surfaced through the existing generation flow and the route response handling.
Key Screens:
Existing generation modal / creator workflow
Inline error state for blocked requests
Loading state while the safety check and generation call are in progress
Error state for provider or quota failures
Annotations:
The safety gate runs before the generation request is sent to Gemini
The UI should surface a user-friendly message based on the returned
category,reasoning, andflaggedFieldsNo additional form field is required for this flow
7. WIREFLOWS
The current flow is:
Creator enters topic/complexity/context
Route validates and enforces quota
Safety service runs the three-layer checks
If safe, Gemini generates the scenarios
Generated output is scanned for violations
Response is returned or a policy violation is returned
A simple representation is:
Creator Input → Request Validation → Hard Blocks → Sensitive Term Review → Gemini Classifier → Output Scan → Response
8. PROTOTYPE
Figma Prototype Link: Not currently available for this subfeature
How to test:
Open the scenario generation experience
Submit a safe workplace training topic
Submit a clearly unsafe topic and observe the blocked response
Submit an ambiguous or obfuscated topic and confirm the stricter handling
9. BACKEND SCHEMA
Data Model
No new database table is required for this subfeature. The safety layer operates through server-side service logic and existing AI usage tracking infrastructure.
Relevant Runtime Types
export const scenarioSeedSafetyAIResultSchema = z.object({ passed: z.boolean(), category: z.enum([ "CLEAN", "EXPLICIT_SEXUAL", "ILLEGAL_ACTIVITY", "HATE_SPEECH", "VIOLENCE_GLORIFICATION", "NON_EDUCATIONAL", "COMPANY_POLICY_VIOLATION", ]), reasoning: z.string().default("No reasoning provided"), flaggedFields: z.array(z.enum(["topic", "context"])).optional(),});
export interface ScenarioSeedSafetyFullResult { passed: boolean; category: string; reasoning: string; flaggedFields: Array<"topic" | "context">; evasionDetected: boolean; evasionTechniques: string[]; isLocalFallback: boolean; isHardBlock: boolean;}
Constraints
Safety decisions must remain deterministic where possible
The Gemini classifier must return valid JSON only
Unsafe output must be blocked even when the input passes the pre-check
10. API ENDPOINTS
Primary Endpoint: POST /api/creator/scenarios/generate
Purpose: Generates scenario content after passing the safety gate
Auth: Required (Clerk user session)
Request Body:
{ "topic": "Handling difficult customers", "count": 3, "complexity": "intermediate", "context": "Retail environment"}
Response 200:
{ "success": true, "data": { "scenarios": [], "tokensUsed": 1800, "modelUsed": "gemini-2.5-flash" }}
Response 403 (Policy Violation):
{ "success": false, "error": "This topic is not permitted on the platform (ILLEGAL_ACTIVITY). Please choose a different topic.", "details": { "category": "ILLEGAL_ACTIVITY", "reasoning": "The request was blocked by the safety layer.", "flaggedFields": ["topic"], "evasionDetected": false, "isHardBlock": true }}
Response 503 / 500:
Used when Gemini is unavailable, config is missing, or generation fails unexpectedly.
11. DATA REQUIREMENTS
Frontend Needs
The frontend should be able to:
Display a friendly user message when the request is blocked
Show a generic non-technical error when the provider is unavailable
Preserve the existing workflow without adding extra form fields
API Calls Frontend Will Make
POST /api/creator/scenarios/generateas the primary entry point
Caching Strategy
No client-side caching is required for this safety layer. The safety decision should be treated as a dynamic runtime check for every request.
12. PERFORMANCE CONSIDERATIONS
Runtime Optimization
Hard-block patterns run first and reject obvious cases quickly
Sensitive-term checks are lightweight
The Gemini classifier is only used for ambiguous cases rather than every request
The normalization layer prevents wasted provider calls when input is clearly evasive
API Response Time
Target: the safety layer should add minimal overhead and remain under the overall generation latency budget. The current flow is designed so that simple blocks fail fast before Gemini processing begins.
Reliability
The circuit breaker prevents the system from repeatedly relying on a provider that is failing or misconfigured.
Who can access this feature?
Creator: ✓
Reviewer: ✗
Learner: ✗
The request must be authenticated through Clerk
The route uses server-side enforcement and is not trustable from the client
The safety layer is enforced on the server before generation begins
Data Validation
Request shape is validated with Zod
Prompt content is checked for policy and quality issues before generation
Generated output is scanned before it is returned to the creator
14. ERROR HANDLING
Common Errors
403 Forbidden: The prompt is blocked by the safety layer
503 Service Unavailable: Gemini or config is unavailable, and the fallback path is engaged
500 Internal Error: Unexpected parsing or runtime failure
429 Too Many Requests: Token quota is exhausted
Handling Guidance
Show a concise, actionable message to the user
Avoid revealing sensitive internals or over-explaining policy details
Log the full decision for debugging and review
15. TESTING CHECKLIST
Happy Path
A safe workplace ethics or compliance topic passes the safety gate
A safe topic reaches Gemini and returns scenarios successfully
A generated response passes the output safety scan
Safety Layer Cases
Hard-block patterns reject clearly unsafe prompts
Sensitive-term groups allow educational contexts when the prompt is clearly professional
Gemini classifier blocks ambiguous or unsafe prompts
Evasion patterns such as obfuscation are detected and escalate scrutiny
Local fallback works when Gemini is unavailable
Circuit-breaker behavior prevents repeated provider failures
Regression Cases
The route returns a friendly 403 response for blocked prompts
The route does not generate content when the output scan detects a violation
Token quota enforcement still blocks over-limit requests
16. OPEN QUESTIONS
For Frontend
Should the UI show the specific category (for example, illegal activity vs. non-educational) or only a generic policy message?
Should we add a small “Why was this blocked?” affordance for ambiguous cases?
For Backend
Should safety decisions be logged in a dedicated audit table for later review?
Should we add a lightweight admin override for testing and support scenarios?
17. OUT OF SCOPE (v1.1+)
A dedicated admin dashboard for safety decisions
User-level override controls
Multi-language safety evaluation
Fine-grained per-category tuning UI
Why: The current goal is to ship a reliable, server-side safety gate that protects the workflow without adding unnecessary complexity.
18. SUCCESS METRICS
How will we know this feature is working well?
Unsafe prompts are blocked before generation begins
Legitimate educational prompts continue to succeed
False-positive rate remains low enough for creators to trust the system
Output safety violations are caught before the content is returned
19. DEPENDENCIES
This feature depends on:
Clerk authentication
A configured Gemini API key and model selection
Existing token quota enforcement
Existing AI usage tracking infrastructure
These features depend on this:
The main AI Scenario Seed generation feature
Any future AI content generation experience that wants a safe-by-default gate
20. TIMELINE & OWNERSHIP
Implementation Ownership
Owner: Joshua Uriel Tribiana
QA: Sean Patrick ( scorevi )