Agent Role in the System¶

1. What Is an Agent in This System?¶

In this architecture, an Agent is not just a chatbot that replies to messages. It is a structured workflow engine built on top of LangGraph. When a user sends a message, the agent does not simply generate text — it evaluates intent, decides what kind of operation is required, executes a controlled sequence of steps, calls external tools when necessary, and streams results back to the client.

An agent always works through defined stages. It receives input, classifies the request, determines what branch of logic to follow, and updates its internal state as it moves forward. When it needs to read or modify project data, it does not access databases directly. Instead, it delegates those operations to MCP (Model Context Protocol) servers, which provide validated and structured tool access.

All agents in this system share several core characteristics. They are stateful, meaning each conversation maintains structured state through LangGraph objects. They are thread-aware, meaning every conversation thread gets its own managed agent instance. They are integrated with PostgreSQL for agent registration and MCP server discovery. They stream responses using SSE or NDJSON. And they rely on configurable LLM providers such as OpenAI, Anthropic, or Bedrock depending on deployment.

This architectural pattern remains consistent across Project Structure, StoryCraft, Image Generation, and Video Generation systems.

2. The High-Level Architecture¶

At a high level, the architecture is built around routing and orchestration. A user message enters the system through an API endpoint. From there, it is handed to a router component that determines what kind of task is being requested. Based on that classification, execution moves into a specialized branch, which may involve conversation handling, querying project data, or performing structured generation workflows.

LangGraph orchestrates this process. Each step in the workflow is modeled as a node in a graph. Some nodes call an LLM. Others perform technical validation. Some invoke MCP tools. Others simply decide what node should execute next based on current state. The result is not a linear script, but a dynamic execution graph that can branch, loop, and terminate depending on context.

PostgreSQL supports the system behind the scenes. It stores agent registration metadata and allows discovery of MCP servers. MCP servers then provide structured interfaces to external systems, ensuring that agents never operate outside validated boundaries.

3. Types of Agents¶

Although the architecture is shared, different agents specialize in different responsibilities.

Router Agent¶

The RouterAgent is responsible for understanding what the user wants. It reads the incoming message and classifies it into a structured intent category. That intent might be simple conversation, a project query, a content generation request, or a domain-specific action like creation or update.

This classification is performed using structured LLM output. Once intent is identified, the graph routes execution to the appropriate branch. The router does not perform the task itself; it only decides where the workflow should go next.

Chat Agent¶

The ChatAgent handles general conversation. If the user greets the system, asks for clarification, or requests help, the chat branch produces a context-aware response without launching structured workflows.

This agent is used in multiple systems, including image generation, video generation, and StoryCraft environments. It keeps interaction fluid and conversational, while heavier workflows are triggered only when necessary.

Query Agent¶

The QueryAgent answers structured project questions. Instead of relying purely on generative reasoning, it calls MCP tools to retrieve real project data such as cards, shots, categories, or metadata.

Its role is read-only. It does not modify state in external systems. It gathers information and explains it clearly to the user. This ensures that responses remain grounded in actual project structure rather than inferred assumptions.

Project Structure Agent¶

The Project Structure Agent manages production workflows. It operates in both pre-production and production modes.

In pre-production, it works in a Kanban-style environment with categories like Characters, Environments, and Props. Cards can contain generation tasks for images, video, text, or 3D assets. Tags and caching mechanisms support organization and memory.

In production mode, the system shifts to a timeline-based structure with sequences and shots. Each shot contains metadata such as camera details, lighting, dialogue, and assigned characters. All read and write operations are performed through the project structure MCP server, maintaining strict separation between agent logic and data storage.

StoryCraft Action Agent¶

The StoryCraft Action Agent follows a domain-driven action model. Instead of hardcoding workflow sequences, it operates through a system of discrete actions. Each action runs in its own node and is governed by action policies.

The workflow begins by determining which actions are allowed given the current state. An LLM may then select the next appropriate action. Some actions are technical and automatically continue execution. Others are terminal and wait for user input.

This structure allows StoryCraft to manage story Bible creation, updates, and ontology progression in a flexible but controlled manner. The agent continuously evaluates what is permitted, what is required, and what should happen next.

Image Generation Agent¶

The ImageGenerationAgent v2.0 manages structured image generation through a carefully staged pipeline.

The workflow begins with intent classification. If generation is requested, the agent gathers context such as card data, shot information, and category details. It validates parameters to ensure they fall within allowed ranges. Then it builds a prompt tailored to the context.

Before calling the generation tool, the agent presents a confirmation summary to the user. Only after receiving explicit approval does it invoke the MCP tool responsible for creating the generation task. Once completed, it streams a final summary and persists the conversation state.

Video Generation Agent¶

The VideoGenerationAgent v2.0 follows a similar structure, adapted for video tasks. It uses a router, gathers context, validates parameters, builds prompts, formats confirmation messages, analyzes user confirmation responses, and finally calls a video generation MCP tool.

Tool access is tightly controlled via YAML configuration, ensuring that only approved tools are available to the agent. This maintains operational safety and predictable behavior.

4. The Common Workflow Pattern¶

Although each agent serves a different purpose, most follow the same broad lifecycle.

A request enters through a /chat endpoint. Authentication data may be decoded for logging. The system retrieves or creates a thread-specific agent instance and initializes state.

The router classifies the message into an intent category. If additional information is required, the agent loads project data, ontology information, or template structures using MCP tools.

Next comes action selection. Depending on the domain, allowed actions are filtered based on state. An LLM may decide which action to execute. The graph transitions to the corresponding node.

Execution may involve tool calls, prompt construction, validation steps, confirmation dialogs, entity creation, deletion, or generation task creation. Any errors encountered are stored in state.

Throughout this process, responses are streamed incrementally. The client receives structured events such as tool invocation markers, reply tokens, and completion signals. Messages are persisted progressively so that conversation history remains consistent.

5. State Management¶

Each agent relies on a typed state object that evolves throughout the workflow. Examples include VideoGenerationState, ImageGenerationState, and generic AgentState objects.

State typically contains conversation history, the current intent, generation configuration, contextual project data, confirmation decisions, and any errors that arise.

To prevent token overflow, message history is limited through custom reducers that retain only the most recent interactions. This ensures performance stability without sacrificing conversational continuity.

6. MCP Integration¶

A defining principle of this architecture is that agents never directly manipulate external systems. Instead, they discover MCP servers through PostgreSQL, connect using the AdvancedMCPAdapter, and dynamically load available tools.

When a tool must be executed, the agent calls it through the MCP interface. Tool access can be restricted to specific subsets depending on configuration. Some systems allow full access, others limit to read-only operations, and generation agents may whitelist only specific creation tools.

Image and video agents use YAML-based configuration to control which tools are exposed, reinforcing separation of concerns and operational safety.

7. The Confirmation Pattern¶

Generation workflows use a deliberate confirmation pattern to prevent unintended actions.

After building a generation configuration and prompt, the agent presents a summary to the user. It pauses and waits for a clear yes or no response. That response is parsed structurally. If confirmed, the MCP generation tool is called. If the answer is unclear, the agent requests clarification.

This confirmation gate introduces a human-in-the-loop safeguard within an otherwise automated pipeline.

8. Monitoring and Observability¶

Both image and video generation agents integrate with LangFuse for monitoring. LLM calls, tool invocations, generation metrics, and errors are traced and logged. User feedback can also be recorded.

This observability layer ensures that workflows are not opaque. Every major operation can be traced, analyzed, and audited if necessary.

Closing Perspective¶

Taken together, these agents form a coherent orchestration framework rather than a collection of isolated bots. Each agent operates as a controlled state machine capable of reasoning, validating, routing, invoking external tools, and streaming structured results.

The architecture balances flexibility with constraint. LLMs provide intelligence and dynamic decision-making, while LangGraph enforces structured execution and MCP servers guarantee safe interaction with external systems.

The result is a system where conversation and production workflows coexist seamlessly — not as improvisation, but as carefully managed orchestration.