Skip to content

Multi-Agent System Overview

This platform is a production-oriented multi-agent system designed for story development, shot planning, and media generation. It presents a single conversational interface to the user, while internally coordinating multiple specialized agents. Each agent owns a clear domain—story development, shot composition, image generation, or video production—and can delegate tasks to others when workflows cross domain boundaries.

The architecture separates reasoning from execution. Agents interpret intent and decide what to do next. All persistent operations and integrations are performed through MCP (Model Context Protocol) servers, which expose structured, validated tools. Agents never access databases or services directly; they invoke tools with explicit schemas, and MCP servers enforce rules, validation, and side effects.


System Architecture

At a high level, the system consists of three layers:

1. Orchestrator Agents (Reasoning Layer)

Agents receive user input, classify intent, select the appropriate tool, and manage the workflow loop. They are responsible for decision-making but not for performing complex business operations directly. At this level agents follows the ReAct (Reason–Act) pattern, a feedback loop where an agent thinks through a problem, takes action, evaluates the result, and then refines its reasoning for the next step.

2. Tool Layer (Execution Flows)

Tools represent well-defined operations such as “create character,” “write screenplay scenes,” or “run generation pipeline.” Internally, tools may use sub-agents for structured multi-step logic, but externally they appear as single callable capabilities.

3. MCP Servers (Integration Layer)

MCP servers own data access, validation, permissions, and integration with storage or external services. All state mutations and reads go through MCP tools. This centralizes control and makes the system predictable, secure, and auditable.

This separation makes the agent layer lightweight and horizontally scalable, while MCP servers provide durable operational guarantees.


Agents and Domains

The system includes multiple specialized agents:

  • StoryCraft Agent — manages story bible, scripts, characters, and narrative structure.
  • Shots & Assets Agent — manages shot composition and asset relationships.
  • Image Generation Agent — handles image creation and editing workflows.
  • Video Generation Agent — manages video generation pipelines.

Agents are not isolated individuals. A single request such as “write a scene and generate a keyframe” can flow naturally across story, shot, and media domains, while preserving one conversational experience.


LLM Strategy and Configuration

The system uses multiple LLMs with task-specific configurations to balance accuracy, creativity, and cost efficiency.

Orchestrator Agents

The top-level orchestrator agents are configured for high decision reliability. They typically run:

  • Claude 3.5 Sonnet (via AWS Bedrock)
  • Temperature: 0.0–0.2

Low temperature ensures deterministic action selection and consistent tool routing.

Sub-Agents and Creative Tasks

Sub-agents responsible for creative writing, content generation, or structured extraction may use different models and temperatures depending on the task:

  • Claude 3.5 Haiku (via AWS Bedrock) for structured extraction and lightweight reasoning

    • Temperature: ~0.0 for structured outputs
  • Claude 3.5 Sonnet for higher-quality creative writing

    • Temperature: 0.3–0.7 depending on creativity needs

Model selection and temperature are configured externally, allowing different environments (development, evaluation, production) to tune cost and performance without modifying core logic.

This multi-model approach ensures that decision logic remains stable and deterministic, while creative generation remains expressive when required.


Workflow Model

A typical interaction follows a structured reasoning loop:

  1. The user sends a request.
  2. The agent interprets intent.
  3. The agent selects an appropriate tool.
  4. The tool executes (possibly via sub-agents).
  5. MCP servers perform validated operations.
  6. Results are returned and transformed into a user-facing response.

This loop may repeat multiple times within a single interaction. Technical steps automatically continue execution. Conversational steps pause and wait for user input.

The system supports both conversational invocation (LLM-driven tool selection) and direct invocation (UI-triggered tool flows), enabling hybrid human-in-the-loop workflows.


MCP: Controlled Execution

MCP (Model Context Protocol) is the execution backbone. It exposes capabilities as structured tools using JSON-RPC 2.0. Agents discover available tools during initialization and can only call what is explicitly exposed.

Key characteristics of the MCP layer:

  • Strong schema validation
  • Centralized access control
  • Clear separation of read and write capabilities
  • Explicit side-effect boundaries
  • Transport support via STDIO or HTTP

This ensures that LLM reasoning is always mediated by deterministic execution rules.


StoryCraft Agent: A Detailed Example

The StoryCraft Agent provides a concrete illustration of the three-level architecture in action. It manages story bible creation, updates, ontology workflows, and script generation.

StoryCraft Agent
├── Main Agent (orchestrator)
│   Claude Sonnet 4.5, temperature 0.1
│   Talks to the user, decides what to do
│   │
│   ├── Tool: character_create
│   │   ├── Sub-agent: field generation (Haiku 4.5, temp 0.7)
│   │   ├── Sub-agent: image prompt (Sonnet 4, temp 0.7)
│   │   └── MCP calls: save to database
│   │
│   ├── Tool: script_scenes_create
│   │   ├── Sub-agent: screenplay generation (Haiku 4.5, temp 0.7)
│   │   └── MCP calls: save scenes
│   │
│   ├── Tool: full_generate
│   │   ├── Delegates to script_summary → outline → scenes
│   │   ├── Delegates to characters_create (parallel)
│   │   ├── Delegates to environments_create (parallel)
│   │   └── Each step has its own sub-agents with their own models
│   │
│   ├── ... 16 more tools
│   └── Tool: send_suggestions (UI buttons)
└── MCP Servers (external data services)
    ├── mcp_storycraft_server (story bible, characters, environments)
    └── mcp_project_structure_server (projects, cards, messages)

Level One: Orchestrator

At the top sits the StoryCraft Orchestrator. Its job is to understand what the user wants, determine which actions are allowed, and select the appropriate tool. It does not generate entire scripts or manipulate storage directly.

The orchestrator runs at low temperature using Claude 3.5 Sonnet to ensure reliable decision-making. It operates in a controlled loop: interpret intent, determine allowed actions, choose one, execute it, and then either continue or pause depending on the result. This graph-driven approach makes the workflow predictable and traceable.


Level Two: Tools and Sub-Agents

Below the orchestrator are tools that encapsulate structured operations. For example, a “Create Story Bible Item” tool may internally call a structured extraction sub-agent to gather fields from conversation context, then call a persistence step to save results.

These sub-agents are tuned according to task. A field extraction step may use Claude 3.5 Haiku at temperature 0.0 to ensure accurate key-value mapping. A script-writing step may use Claude 3.5 Sonnet at temperature 0.5 to produce creative narrative content.

By isolating responsibilities in sub-agents, the system avoids overloading a single prompt with conflicting goals. Each component remains focused and testable.


Level Three: MCP Execution Layer

All StoryCraft data operations occur through MCP tools. When the system needs to load a template, update a field, or create a new story item, it calls a corresponding MCP capability.

The MCP server validates the request, enforces permissions, executes the operation, and returns structured results. The agent then transforms that structured result into a conversational response.

This design guarantees that creative reasoning never bypasses operational safeguards. Validation and side effects live in the server layer, not in prompt logic.


Example Interaction Flow

Consider a user request: “Change the hero’s name to Denis.”

The orchestrator interprets the request as an update action. It selects the appropriate update tool. A structured sub-agent identifies all fields that contain the old name and need modification. The tool then calls MCP to update each field. Once persistence succeeds, the orchestrator generates a friendly summary for the user.

In this flow, the orchestrator decides, the tool reasons about structure, and MCP performs the validated update. The responsibilities remain cleanly separated.

Scalability and Governance

The system is based on stateless agents and stateful servers.

Agents

  • Are ephemeral and restartable
  • Hold short-lived conversational context
  • Scale horizontally with minimal overhead

MCP Servers

  • Persist data and memory
  • Guarantee durability and consistency
  • Isolate failures and enforce rules

This design enables robust systems that tolerate agent restarts and partial failures.

  • Agents are stateless and horizontally scalable.
  • Business logic and permissions live in MCP servers.
  • Tool-based execution reduces prompt complexity.
  • Specialized sub-agents reduce token usage and model cost.
  • Tool access can be restricted or audited centrally.

By concentrating integration logic in MCP servers, the system becomes easier to secure, monitor, and extend.