Building Multi-Agent Systems

01

What is a
multi-agent system?

A multi-agent system (MAS) is an architecture where multiple AI agents work together to complete tasks that would be too complex, too slow, or too large for a single agent operating alone. Each agent has its own context window, its own tools, and its own scope of responsibility.

Think of it like a team of specialists versus one generalist. The generalist knows a bit of everything. The specialist team has a coordinator who knows how to delegate, and workers who go deep on their assigned lane.

90.2%

Performance improvement on complex research tasks when using Claude Opus 4 as lead agent + Claude Sonnet 4 as sub-agents, versus a single-agent Claude Opus 4 setup. Token usage alone accounted for 80% of performance variance. Anthropic internal research.

The core architecture

Every multi-agent system has the same structural layers, regardless of framework:

Layer	What it does
Lead Agent	The orchestrator. Receives the task, creates a strategy, delegates to sub-agents, synthesises results. Uses the most capable (and expensive) model.
Sub-Agents	Specialist workers. Each assigned a narrow scope. Run in parallel. Have their own tools, prompts, and context windows.
Tools	Built-in capabilities: web search, file read/write, browser, shell commands, API calls.
Skills	Playbooks that teach an agent how to use tools. A skill is a SKILL.md file: YAML metadata + markdown instructions.
Memory	How agents persist information across sessions. Can be static files (MEMORY.md), semantic retrieval, or in-context state.
MCP Servers	Standardised connectors to external systems. One server can connect to GitHub, Slack, Google Drive. Any MCP-compatible agent can use them.

When to use multi-agent vs single agent

Multi-agent systems burn through tokens fast — roughly 15x more than a standard chat interaction. Use them only when the task justifies it.

Scenario	Recommendation
Simple tasks, quick lookups, one clear answer, tight budgets	Use single agent
Broad research, parallel workstreams, tasks too large for one context window, clearly separable subtasks	Use multi-agent
4–5 sub-agents, 5–8 tasks each	Sweet spot. Beyond 5 specialists, coordination overhead cancels out the parallelism benefit.

Anthropic's scaling rules

Simple fact check: 1 agent, 3–10 tool calls.
Direct comparison: 2–4 sub-agents, 10–15 calls each.
Complex research problem: 10+ sub-agents with clearly divided responsibilities.

Embed these in your lead agent prompt.

02

How to build
sub-agents

In OpenClaw (and Claude Code), a sub-agent is not a background process or separate service. It is a clearly defined role that the agent temporarily adopts to perform one specific job under strict rules. You invoke sub-agents by assigning tasks through your lead agent's prompt.

The three building blocks

Block 1: The lead agent prompt

Your lead agent needs explicit instructions on how to decompose tasks and delegate. Without this, sub-agents either duplicate each other's work or leave gaps.

Required elements:

How to break down the input task into sub-questions
How many sub-agents to spawn (and scaling rules)
What each sub-agent is responsible for (no overlap)
What format sub-agents should return results in
How the lead agent should synthesise the results

Example lead agent prompt — research task

You are a lead research agent. When given a query:

1. DECOMPOSE: Break the query into 3-5 independent research angles.
2. DELEGATE: Assign one angle to each sub-agent. Give each:
   - A concrete objective (one sentence)
   - Boundaries (what NOT to cover)
   - Required output format (JSON: findings, sources, confidence)
   - Tool budget (max 10 tool calls per sub-agent)
3. SYNTHESISE: Once all sub-agents return, compile results.
   Remove duplicates. Rank by confidence. Write final answer.

Scaling rules:
  Simple fact:    1 sub-agent, max 5 tool calls
  Comparison:     2-3 sub-agents, max 10 tool calls each
  Complex:        up to 8 sub-agents, max 15 tool calls each

Block 2: Sub-agent prompts

Each sub-agent needs four things to work without confusion:

Element	Description
Concrete objective	One sentence. Specific. Measurable. "Find the top 5 real estate agencies in Tauranga by number of listings, as of April 2026."
Task boundaries	What this sub-agent should NOT do. Prevents overlap with other sub-agents.
Output format	Exact structure the lead agent expects. JSON, markdown table, bullet list. Specify it.
Tool guidance	Which tools to use and in what order. "Start with web search, then use browser to verify top results."

Block 3: The OODA research loop

Anthropic found the best-performing sub-agents run an explicit OODA loop. Embed this in any sub-agent prompt:

Research loop — repeat until task complete:

OBSERVE:  What information do I have? What gaps remain? What tools are available?
ORIENT:   Which tools and queries would best fill the gaps?
DECIDE:   Choose ONE specific tool call. Start broad, then narrow.
ACT:      Execute the tool call. Record the result. Loop back to OBSERVE.

Stop when: task objective is met OR tool budget is reached.

OpenClaw-specific implementation

In OpenClaw, sub-agents are implemented through the multi-agent routing system. The workspace can route inbound tasks to isolated agents with separate sessions and tool profiles.

Option A: Workspace-level sub-agents

Create a separate OpenClaw workspace for each specialist agent. Each workspace has its own SOUL.md, AGENTS.md, and MEMORY files. The lead agent routes tasks to them via the Gateway's multi-agent routing layer.

~/.openclaw/workspace/
  theo/                    ← Lead agent
    SOUL.md
    AGENTS.md
    MEMORY.md
  content-agent/           ← Sub-agent: content creation only
    SOUL.md
    AGENTS.md
  leads-agent/             ← Sub-agent: lead research and scoring
    SOUL.md
    AGENTS.md
  outreach-agent/          ← Sub-agent: message drafting
    SOUL.md
    AGENTS.md

Option B: Skill-based sub-agent invocation

Define a skill that teaches the lead agent how to invoke sub-agent behaviour within a single session. Lighter weight, better for task-specific delegation within one context window.

# sub-agent-research.skill/SKILL.md
---
name: sub_agent_research
description: Spawn a focused research sub-agent for a specific topic.
---
## When to use
When asked for research that requires multiple independent angles.

## How to execute
1. Decompose the query into 3-5 independent research angles.
2. For each angle, run a focused OODA research loop (max 8 tool calls).
3. Record each angle's findings separately before synthesising.
4. Return: findings per angle, key sources, confidence, final synthesis.

## Output format
Return JSON:
{ angles: [{topic, findings, sources, confidence}], synthesis: string }

Common sub-agent mistakes

Mistake	Fix
Vague objectives	Sub-agents with unclear scope duplicate work or leave gaps. Every sub-agent needs a one-sentence, specific objective.
No output format	If you don't specify output format, the lead agent can't synthesise cleanly. Always define the expected structure.
Too many specialists	Teams larger than 5 specialists hit coordination overhead that cancels out parallelism. Start with 3.
Wrong tool for context	An agent searching the web for context that only exists in Slack is doomed. Match tools to where the data actually lives.
No scaling rules	Without scaling rules, agents over-invest in simple tasks or under-invest in complex ones.

03

Skills: teaching agents
how to use tools

In OpenClaw and Claude Code, a skill is not code. It is a folder containing a SKILL.md file — YAML frontmatter for metadata and markdown for instructions. Skills teach the agent how to use tools in a disciplined, repeatable way.

Tools

Capabilities: read a file, run a shell command, call an API, use a browser. Tools have no instructions — they are just hands.

Skills

Playbooks: step-by-step instructions that tell the agent which tools to use, in what order, with what constraints. Skills give the hands a brain.

A tool without a skill is raw capability. A skill without a tool is instructions with no hands.

Anatomy of a SKILL.md file

---
name: lead_scraper
description: Find NZ businesses by profession and location. Returns phone numbers.
requires:
  env: [GOOGLE_PLACES_API_KEY]
  tools: [exec, web]
---
## When to use this skill
When asked to find leads in a specific trade or location.

## How to execute
1. Ask for: profession (e.g. real estate agent), location (e.g. Tauranga).
2. Run the scraper:
   `node scripts/lead-scraper.js --profession '{profession}' --location '{location}'`
3. Output returns businesses with: name, phone, address, website status.
4. Filter: businesses WITHOUT a website are highest priority leads.
5. Format results as a table: Name | Phone | Address | Priority.

## Stop conditions
Stop if Google Places API returns an error. Report the error immediately.

## Output format
Markdown table. Max 20 results. Sorted by priority (no website = HIGH).

Skill loading priority

OpenClaw loads skills in this order — higher priority overrides lower:

Priority	Location	Use
1	`<workspace>/.openclaw/skills/`	Per-agent. Highest priority. Use for agent-specific customisations.
2	`~/.openclaw/skills/`	Shared across all agents on the host.
3	Bundled skills	Built-in skills provided with OpenClaw. Cannot be modified but can be overridden by workspace or managed skills.

If a skill with the same name exists in multiple locations, the higher-priority source wins. You can override any bundled skill by creating a workspace skill with the same name.

What makes a skill good

Most weak skills fail because the body reads like marketing copy. The agent needs a runbook with deterministic steps, stop conditions, and a clear output format.

Element	What good looks like
Good skill body	Reads like a checklist you would hand to a tired engineer at 3am. Step-by-step. Stop conditions. Exact output format.
Bad skill body	Generic description of what the skill does. No steps. No output spec. Agent improvises = inconsistent results.
Good description	Short and specific. If your description overlaps with another skill's, the agent will pick the wrong one.
Bad description	"Helps with research tasks." Overlaps with everything.
Required fields	`name`, `description`, `tools`. Everything else is optional but recommended.

The recommended skill build order

For AI consultancy and AgentNZ operations, build skills in this order based on revenue impact:

1
lead-scraper
Find businesses by profession and location. Returns phone numbers. Already built — extend with Google review scoring.
2
outreach-drafter
Draft personalised outreach messages for real estate prospects based on their profile.
3
content-factory
Spoken prompt to social post. Publish to LinkedIn, Instagram, Facebook. Needs Meta Graph API.
4
audit-runner
Run an AI Audit on a client business. Structured questions, outputs a scored report.
5
sub-agent-research
Decompose research queries into parallel sub-agent workstreams. Returns synthesised findings.
6
proposal-builder
Turn an audit report into a formatted proposal with pricing options.

04

MCP: connecting agents
to everything

The Model Context Protocol (MCP) is an open standard, originally built by Anthropic in November 2024 and now maintained by the Linux Foundation, that solves the N×M integration problem.

Before MCP: every AI agent needed a custom connector to every external tool. GitHub, Slack, Google Drive, Postgres, Airtable — all different implementations. As the number of agents and tools grew, the complexity was unsustainable.

With MCP: every tool builds one server. Every agent connects to any server using the same protocol. N agents + M tools = N+M integrations instead of N×M.

MCP in plain terms

MCP is USB-C for AI agents. Instead of every agent having its own proprietary connector for every tool, MCP gives you one universal plug. Any MCP-compatible agent can connect to any MCP server — GitHub, Slack, Google Drive, HubSpot, Postgres, Xero. Build the server once, use it everywhere.

MCP architecture

Component	Role
MCP Host	The AI application (Claude, OpenClaw, Claude Code, ChatGPT). Initiates connections to MCP servers.
MCP Client	Lives inside the host. Maintains a 1:1 connection to each server. Handles JSON-RPC message passing.
MCP Server	A small service that exposes tools, resources, and prompts to any connected client. Built once, usable by any host.

Communication uses JSON-RPC 2.0. The session stays open as long as needed. The agent lists available tools, calls them, and the server returns results. Parallel tool calls are supported in the November 2025 spec.

What MCP servers can expose

Primitive	What it does
Tools	Functions the agent can call to take actions. Send an email. Create a task. Query a database. Run a search.
Resources	Data the agent can read. Files, database records, API responses, document content.
Prompts	Reusable prompt templates the agent can retrieve and use. Useful for standardising workflows.

Pre-built MCP servers worth installing

Server	Capability
GitHub	Read repos, create issues, manage PRs, search code. Essential for any dev-adjacent work.
Google Drive	Read, search, and organise files. Useful for client document management.
Slack	Read channels, post messages, search history. Useful for team comms automation.
Gmail	Read, send, search emails. Foundation for outreach automation.
Google Calendar	Create, read, update events. Useful for booking and scheduling flows.
Postgres	Query and write to a Postgres database directly. Useful for lead tracking and CRM.
HubSpot	List and create contacts, log engagements, manage pipeline. Sales automation foundation.
Puppeteer / Browser	Full browser automation. Screenshot, click, fill forms, scrape. Powerful — use with caution.
Airtable	Read and write Airtable bases. Good lightweight CRM alternative.

How to connect an MCP server to OpenClaw

# In your OpenClaw config (config.yaml or via openclaw settings):
mcp_servers:
  - name: github
    type: stdio
    command: npx
    args: ['-y', '@modelcontextprotocol/server-github']
    env:
      GITHUB_TOKEN: your_github_token

  - name: gmail
    type: url
    url: https://gmail.mcp.claude.com/mcp

  - name: google-calendar
    type: url
    url: https://gcal.mcp.claude.com/mcp

Once connected, your lead agent can list available tools from each server and use them in any skill or sub-agent workflow. No additional code required.

Building a custom MCP server

If you need to connect to a system without an existing server (like a custom Supabase endpoint or Xero), you build one. The MCP SDK makes this straightforward:

// Install: npm install @modelcontextprotocol/sdk
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const server = new McpServer({ name: 'agentnz-crm', version: '1.0.0' });

// Define a tool
server.tool(
  'get_leads',
  { location: z.string(), limit: z.number().default(20) },
  async ({ location, limit }) => {
    const leads = await getLeadsFromSupabase(location, limit);
    return { content: [{ type: 'text', text: JSON.stringify(leads) }] };
  }
);

// Start the server
const transport = new StdioServerTransport();
await server.connect(transport);

Claude Sonnet is particularly good at generating MCP server implementations quickly. Give it the API docs for the service you want to connect, and ask it to build the server.

MCP security — what you must know

Security warning

Research in 2025 found thousands of MCP servers exposed to the internet with no authentication. Over-permissioning is the most common failure mode. When Replit's AI agent deleted a production database of 1,200+ records, the root cause was MCP permissions that were too broad. Scope your permissions tightly.

Rule	Why it matters
Use OAuth scopes	Request only the permissions each tool actually needs. Read-only where possible.
Never expose servers publicly	Run MCP servers locally or on private infrastructure. Not on public internet without auth.
Audit tool descriptions	Agents trust tool descriptions. Malicious descriptions can cause agents to take unintended actions (prompt injection).
Human in the loop	For irreversible actions (delete, send, purchase), require explicit user confirmation before the MCP tool fires.
Rotate credentials	Treat MCP server credentials like API keys. Rotate regularly. Never hardcode in prompts.

05

Putting it all together:
the full stack

Here is how all four layers — agents, sub-agents, skills, MCP — combine into a working system:

KEIRA (approver) | THEO (lead agent — Sonnet 4.6 via OpenClaw) SOUL.md + AGENTS.md | _____________________________|_____________________________ | | | Content Agent Leads Agent Outreach Agent (sub-agent) (sub-agent) (sub-agent) | | | ── SKILLS ──────────── SKILLS ──────────────── SKILLS ────── content-factory lead-scraper outreach-drafter | | | ── MCP SERVERS ────────────────────────────────────────────── Gmail Google Places LinkedIn Instagram Supabase Supabase LinkedIn Xero

Example workflow: real estate lead to meeting

The full workflow from lead discovery to booked meeting, using all four layers:

1
Task arrives
"Find 20 real estate agencies in Tauranga with no AI tools and book demos."
2
Lead scraper activates
Sub-agent runs Google Places search via MCP, filters for agencies without AI mentions on their website. Returns scored list.
3
Outreach drafting
Outreach-drafter skill fires. Sub-agent generates personalised LinkedIn message for each agency principal.
4
Human approval
Lead agent presents draft messages to Keira for approval. Keira approves or edits on Telegram.
5
Messages sent
After approval, lead agent uses LinkedIn MCP server to send approved messages.
6
Reply arrives
When a reply comes in via Gmail MCP, lead agent drafts a meeting invite response.
7
Meeting booked
After Keira approves, lead agent uses Google Calendar MCP to book the meeting.
8
Logged
Lead agent writes the result to Supabase via custom MCP server and updates MEMORY.md.

The milestone ladder

Autonomy is earned through milestones, not assumed upfront. Each milestone unlocks more autonomous operation:

Milestone	Autonomy level
$100 — prove the concept	Manual approval on every outreach and booking.
$1,000 — prove repeatability	Can send pre-approved message templates without per-message approval.
$10,000 — prove scale	Can initiate lead research and draft outreach autonomously. Approval only at send stage.
$100,000 — prove the business	Can run complete lead-to-demo workflows. Review weekly summaries.
$1,000,000 — category dominance	Operates marketplace and sub-agent network with minimal intervention.

Sharp end

The fastest path to first revenue is not building new tools. It is using what already exists (lead scraper, Telegram, research skills) to book one real estate demo this week. Every other build decision should be measured against that single benchmark.

06

Quick reference

Key URLs

Resource	URL
OpenClaw GitHub	`github.com/openclaw/openclaw`
ClawHub (skill registry)	`clawhub.ai`
MCP Documentation	`modelcontextprotocol.io`
MCP Spec (Nov 2025)	`modelcontextprotocol.io/specification/2025-11-25`
Anthropic multi-agent blog	`anthropic.com/engineering/multi-agent-research-system`

Decision tree: what to build next

Does it generate revenue in the next 7 days?

Build it. Book the demo first, automate second.

Is it a new tool before first revenue?

Defer. Validate the business model with manual outreach first.

Is it a skill for an existing tool?

Build it. Skills have no cost, high leverage.

Is it a new MCP server?

Only if you need to connect a specific system that is currently blocking you.

Is it a new sub-agent?

Only if a task is genuinely too large for one context window.

Daily checklist

08:00: Morning health check — Gateway, API, security
Check messages. Respond immediately.
Daily self-audit: where was human input needed? What can be automated?
Check lead pipeline: any new responses to outreach?
Produce one piece of content (if content-factory skill is active)
Report progress on current milestone. What moved today?

K

Keira Nesdale · Miss AI

Building AI-powered businesses and teaching others how to do the same.
realmissai.com · @RealMissAI

← Back to vault

What is amulti-agent system?

How to buildsub-agents

Skills: teaching agentshow to use tools