AdvancedFree Guide · April 2026

Building
Multi-Agent
Systems

A practical guide to sub-agents, skills, tools, and MCP integration. Built for OpenClaw agents. Relevant to any agentic system.

Keira Nesdale · Miss AI

By Keira Nesdale Advanced 35 min read April 2026
01

What is a
multi-agent system?

A multi-agent system (MAS) is an architecture where multiple AI agents work together to complete tasks that would be too complex, too slow, or too large for a single agent operating alone. Each agent has its own context window, its own tools, and its own scope of responsibility.

Think of it like a team of specialists versus one generalist. The generalist knows a bit of everything. The specialist team has a coordinator who knows how to delegate, and workers who go deep on their assigned lane.

90.2%
Performance improvement on complex research tasks when using Claude Opus 4 as lead agent + Claude Sonnet 4 as sub-agents, versus a single-agent Claude Opus 4 setup. Token usage alone accounted for 80% of performance variance. Anthropic internal research.
The core architecture

Every multi-agent system has the same structural layers, regardless of framework:

LayerWhat it does
Lead AgentThe orchestrator. Receives the task, creates a strategy, delegates to sub-agents, synthesises results. Uses the most capable (and expensive) model.
Sub-AgentsSpecialist workers. Each assigned a narrow scope. Run in parallel. Have their own tools, prompts, and context windows.
ToolsBuilt-in capabilities: web search, file read/write, browser, shell commands, API calls.
SkillsPlaybooks that teach an agent how to use tools. A skill is a SKILL.md file: YAML metadata + markdown instructions.
MemoryHow agents persist information across sessions. Can be static files (MEMORY.md), semantic retrieval, or in-context state.
MCP ServersStandardised connectors to external systems. One server can connect to GitHub, Slack, Google Drive. Any MCP-compatible agent can use them.
When to use multi-agent vs single agent

Multi-agent systems burn through tokens fast — roughly 15x more than a standard chat interaction. Use them only when the task justifies it.

ScenarioRecommendation
Simple tasks, quick lookups, one clear answer, tight budgetsUse single agent
Broad research, parallel workstreams, tasks too large for one context window, clearly separable subtasksUse multi-agent
4–5 sub-agents, 5–8 tasks eachSweet spot. Beyond 5 specialists, coordination overhead cancels out the parallelism benefit.
Anthropic's scaling rules

Simple fact check: 1 agent, 3–10 tool calls.
Direct comparison: 2–4 sub-agents, 10–15 calls each.
Complex research problem: 10+ sub-agents with clearly divided responsibilities.

Embed these in your lead agent prompt.

02

How to build
sub-agents

In OpenClaw (and Claude Code), a sub-agent is not a background process or separate service. It is a clearly defined role that the agent temporarily adopts to perform one specific job under strict rules. You invoke sub-agents by assigning tasks through your lead agent's prompt.

The three building blocks
Block 1: The lead agent prompt

Your lead agent needs explicit instructions on how to decompose tasks and delegate. Without this, sub-agents either duplicate each other's work or leave gaps.

Required elements:

Example lead agent prompt — research task
You are a lead research agent. When given a query:

1. DECOMPOSE: Break the query into 3-5 independent research angles.
2. DELEGATE: Assign one angle to each sub-agent. Give each:
   - A concrete objective (one sentence)
   - Boundaries (what NOT to cover)
   - Required output format (JSON: findings, sources, confidence)
   - Tool budget (max 10 tool calls per sub-agent)
3. SYNTHESISE: Once all sub-agents return, compile results.
   Remove duplicates. Rank by confidence. Write final answer.

Scaling rules:
  Simple fact:    1 sub-agent, max 5 tool calls
  Comparison:     2-3 sub-agents, max 10 tool calls each
  Complex:        up to 8 sub-agents, max 15 tool calls each
Block 2: Sub-agent prompts

Each sub-agent needs four things to work without confusion:

ElementDescription
Concrete objectiveOne sentence. Specific. Measurable. "Find the top 5 real estate agencies in Tauranga by number of listings, as of April 2026."
Task boundariesWhat this sub-agent should NOT do. Prevents overlap with other sub-agents.
Output formatExact structure the lead agent expects. JSON, markdown table, bullet list. Specify it.
Tool guidanceWhich tools to use and in what order. "Start with web search, then use browser to verify top results."
Block 3: The OODA research loop

Anthropic found the best-performing sub-agents run an explicit OODA loop. Embed this in any sub-agent prompt:

Research loop — repeat until task complete:

OBSERVE:  What information do I have? What gaps remain? What tools are available?
ORIENT:   Which tools and queries would best fill the gaps?
DECIDE:   Choose ONE specific tool call. Start broad, then narrow.
ACT:      Execute the tool call. Record the result. Loop back to OBSERVE.

Stop when: task objective is met OR tool budget is reached.
OpenClaw-specific implementation

In OpenClaw, sub-agents are implemented through the multi-agent routing system. The workspace can route inbound tasks to isolated agents with separate sessions and tool profiles.

Option A: Workspace-level sub-agents

Create a separate OpenClaw workspace for each specialist agent. Each workspace has its own SOUL.md, AGENTS.md, and MEMORY files. The lead agent routes tasks to them via the Gateway's multi-agent routing layer.

~/.openclaw/workspace/
  theo/                    ← Lead agent
    SOUL.md
    AGENTS.md
    MEMORY.md
  content-agent/           ← Sub-agent: content creation only
    SOUL.md
    AGENTS.md
  leads-agent/             ← Sub-agent: lead research and scoring
    SOUL.md
    AGENTS.md
  outreach-agent/          ← Sub-agent: message drafting
    SOUL.md
    AGENTS.md
Option B: Skill-based sub-agent invocation

Define a skill that teaches the lead agent how to invoke sub-agent behaviour within a single session. Lighter weight, better for task-specific delegation within one context window.

# sub-agent-research.skill/SKILL.md
---
name: sub_agent_research
description: Spawn a focused research sub-agent for a specific topic.
---
## When to use
When asked for research that requires multiple independent angles.

## How to execute
1. Decompose the query into 3-5 independent research angles.
2. For each angle, run a focused OODA research loop (max 8 tool calls).
3. Record each angle's findings separately before synthesising.
4. Return: findings per angle, key sources, confidence, final synthesis.

## Output format
Return JSON:
{ angles: [{topic, findings, sources, confidence}], synthesis: string }
Common sub-agent mistakes
MistakeFix
Vague objectivesSub-agents with unclear scope duplicate work or leave gaps. Every sub-agent needs a one-sentence, specific objective.
No output formatIf you don't specify output format, the lead agent can't synthesise cleanly. Always define the expected structure.
Too many specialistsTeams larger than 5 specialists hit coordination overhead that cancels out parallelism. Start with 3.
Wrong tool for contextAn agent searching the web for context that only exists in Slack is doomed. Match tools to where the data actually lives.
No scaling rulesWithout scaling rules, agents over-invest in simple tasks or under-invest in complex ones.
03

Skills: teaching agents
how to use tools

In OpenClaw and Claude Code, a skill is not code. It is a folder containing a SKILL.md file — YAML frontmatter for metadata and markdown for instructions. Skills teach the agent how to use tools in a disciplined, repeatable way.

Tools
Capabilities: read a file, run a shell command, call an API, use a browser. Tools have no instructions — they are just hands.
Skills
Playbooks: step-by-step instructions that tell the agent which tools to use, in what order, with what constraints. Skills give the hands a brain.

A tool without a skill is raw capability. A skill without a tool is instructions with no hands.

Anatomy of a SKILL.md file
---
name: lead_scraper
description: Find NZ businesses by profession and location. Returns phone numbers.
requires:
  env: [GOOGLE_PLACES_API_KEY]
  tools: [exec, web]
---
## When to use this skill
When asked to find leads in a specific trade or location.

## How to execute
1. Ask for: profession (e.g. real estate agent), location (e.g. Tauranga).
2. Run the scraper:
   `node scripts/lead-scraper.js --profession '{profession}' --location '{location}'`
3. Output returns businesses with: name, phone, address, website status.
4. Filter: businesses WITHOUT a website are highest priority leads.
5. Format results as a table: Name | Phone | Address | Priority.

## Stop conditions
Stop if Google Places API returns an error. Report the error immediately.

## Output format
Markdown table. Max 20 results. Sorted by priority (no website = HIGH).
Skill loading priority

OpenClaw loads skills in this order — higher priority overrides lower:

PriorityLocationUse
1<workspace>/.openclaw/skills/Per-agent. Highest priority. Use for agent-specific customisations.
2~/.openclaw/skills/Shared across all agents on the host.
3Bundled skillsBuilt-in skills provided with OpenClaw. Cannot be modified but can be overridden by workspace or managed skills.

If a skill with the same name exists in multiple locations, the higher-priority source wins. You can override any bundled skill by creating a workspace skill with the same name.

What makes a skill good

Most weak skills fail because the body reads like marketing copy. The agent needs a runbook with deterministic steps, stop conditions, and a clear output format.

ElementWhat good looks like
Good skill bodyReads like a checklist you would hand to a tired engineer at 3am. Step-by-step. Stop conditions. Exact output format.
Bad skill bodyGeneric description of what the skill does. No steps. No output spec. Agent improvises = inconsistent results.
Good descriptionShort and specific. If your description overlaps with another skill's, the agent will pick the wrong one.
Bad description"Helps with research tasks." Overlaps with everything.
Required fieldsname, description, tools. Everything else is optional but recommended.
The recommended skill build order

For AI consultancy and AgentNZ operations, build skills in this order based on revenue impact:

04

MCP: connecting agents
to everything

The Model Context Protocol (MCP) is an open standard, originally built by Anthropic in November 2024 and now maintained by the Linux Foundation, that solves the N×M integration problem.

Before MCP: every AI agent needed a custom connector to every external tool. GitHub, Slack, Google Drive, Postgres, Airtable — all different implementations. As the number of agents and tools grew, the complexity was unsustainable.

With MCP: every tool builds one server. Every agent connects to any server using the same protocol. N agents + M tools = N+M integrations instead of N×M.

MCP in plain terms

MCP is USB-C for AI agents. Instead of every agent having its own proprietary connector for every tool, MCP gives you one universal plug. Any MCP-compatible agent can connect to any MCP server — GitHub, Slack, Google Drive, HubSpot, Postgres, Xero. Build the server once, use it everywhere.

MCP architecture
ComponentRole
MCP HostThe AI application (Claude, OpenClaw, Claude Code, ChatGPT). Initiates connections to MCP servers.
MCP ClientLives inside the host. Maintains a 1:1 connection to each server. Handles JSON-RPC message passing.
MCP ServerA small service that exposes tools, resources, and prompts to any connected client. Built once, usable by any host.

Communication uses JSON-RPC 2.0. The session stays open as long as needed. The agent lists available tools, calls them, and the server returns results. Parallel tool calls are supported in the November 2025 spec.

What MCP servers can expose
PrimitiveWhat it does
ToolsFunctions the agent can call to take actions. Send an email. Create a task. Query a database. Run a search.
ResourcesData the agent can read. Files, database records, API responses, document content.
PromptsReusable prompt templates the agent can retrieve and use. Useful for standardising workflows.
Pre-built MCP servers worth installing
ServerCapability
GitHubRead repos, create issues, manage PRs, search code. Essential for any dev-adjacent work.
Google DriveRead, search, and organise files. Useful for client document management.
SlackRead channels, post messages, search history. Useful for team comms automation.
GmailRead, send, search emails. Foundation for outreach automation.
Google CalendarCreate, read, update events. Useful for booking and scheduling flows.
PostgresQuery and write to a Postgres database directly. Useful for lead tracking and CRM.
HubSpotList and create contacts, log engagements, manage pipeline. Sales automation foundation.
Puppeteer / BrowserFull browser automation. Screenshot, click, fill forms, scrape. Powerful — use with caution.
AirtableRead and write Airtable bases. Good lightweight CRM alternative.
How to connect an MCP server to OpenClaw
# In your OpenClaw config (config.yaml or via openclaw settings):
mcp_servers:
  - name: github
    type: stdio
    command: npx
    args: ['-y', '@modelcontextprotocol/server-github']
    env:
      GITHUB_TOKEN: your_github_token

  - name: gmail
    type: url
    url: https://gmail.mcp.claude.com/mcp

  - name: google-calendar
    type: url
    url: https://gcal.mcp.claude.com/mcp

Once connected, your lead agent can list available tools from each server and use them in any skill or sub-agent workflow. No additional code required.

Building a custom MCP server

If you need to connect to a system without an existing server (like a custom Supabase endpoint or Xero), you build one. The MCP SDK makes this straightforward:

// Install: npm install @modelcontextprotocol/sdk
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const server = new McpServer({ name: 'agentnz-crm', version: '1.0.0' });

// Define a tool
server.tool(
  'get_leads',
  { location: z.string(), limit: z.number().default(20) },
  async ({ location, limit }) => {
    const leads = await getLeadsFromSupabase(location, limit);
    return { content: [{ type: 'text', text: JSON.stringify(leads) }] };
  }
);

// Start the server
const transport = new StdioServerTransport();
await server.connect(transport);

Claude Sonnet is particularly good at generating MCP server implementations quickly. Give it the API docs for the service you want to connect, and ask it to build the server.

MCP security — what you must know
Security warning

Research in 2025 found thousands of MCP servers exposed to the internet with no authentication. Over-permissioning is the most common failure mode. When Replit's AI agent deleted a production database of 1,200+ records, the root cause was MCP permissions that were too broad. Scope your permissions tightly.

RuleWhy it matters
Use OAuth scopesRequest only the permissions each tool actually needs. Read-only where possible.
Never expose servers publiclyRun MCP servers locally or on private infrastructure. Not on public internet without auth.
Audit tool descriptionsAgents trust tool descriptions. Malicious descriptions can cause agents to take unintended actions (prompt injection).
Human in the loopFor irreversible actions (delete, send, purchase), require explicit user confirmation before the MCP tool fires.
Rotate credentialsTreat MCP server credentials like API keys. Rotate regularly. Never hardcode in prompts.
05

Putting it all together:
the full stack

Here is how all four layers — agents, sub-agents, skills, MCP — combine into a working system:

KEIRA (approver) | THEO (lead agent — Sonnet 4.6 via OpenClaw) SOUL.md + AGENTS.md | _____________________________|_____________________________ | | | Content Agent Leads Agent Outreach Agent (sub-agent) (sub-agent) (sub-agent) | | | ── SKILLS ──────────── SKILLS ──────────────── SKILLS ────── content-factory lead-scraper outreach-drafter | | | ── MCP SERVERS ────────────────────────────────────────────── Gmail Google Places LinkedIn Instagram Supabase Supabase LinkedIn Xero
Example workflow: real estate lead to meeting

The full workflow from lead discovery to booked meeting, using all four layers:

The milestone ladder

Autonomy is earned through milestones, not assumed upfront. Each milestone unlocks more autonomous operation:

MilestoneAutonomy level
$100 — prove the conceptManual approval on every outreach and booking.
$1,000 — prove repeatabilityCan send pre-approved message templates without per-message approval.
$10,000 — prove scaleCan initiate lead research and draft outreach autonomously. Approval only at send stage.
$100,000 — prove the businessCan run complete lead-to-demo workflows. Review weekly summaries.
$1,000,000 — category dominanceOperates marketplace and sub-agent network with minimal intervention.
Sharp end

The fastest path to first revenue is not building new tools. It is using what already exists (lead scraper, Telegram, research skills) to book one real estate demo this week. Every other build decision should be measured against that single benchmark.

06

Quick reference

Key URLs
ResourceURL
OpenClaw GitHubgithub.com/openclaw/openclaw
ClawHub (skill registry)clawhub.ai
MCP Documentationmodelcontextprotocol.io
MCP Spec (Nov 2025)modelcontextprotocol.io/specification/2025-11-25
Anthropic multi-agent bloganthropic.com/engineering/multi-agent-research-system
Decision tree: what to build next
Does it generate revenue in the next 7 days?
Build it. Book the demo first, automate second.
Is it a new tool before first revenue?
Defer. Validate the business model with manual outreach first.
Is it a skill for an existing tool?
Build it. Skills have no cost, high leverage.
Is it a new MCP server?
Only if you need to connect a specific system that is currently blocking you.
Is it a new sub-agent?
Only if a task is genuinely too large for one context window.
Daily checklist
K
Keira Nesdale · Miss AI

Building AI-powered businesses and teaching others how to do the same.
realmissai.com · @RealMissAI

← Back to vault