All Modules › OpenClaw Overview › Architecture Overview

Lesson 2 of 3 ⏱ 15 min read

Architecture Overview

Architecture overview

OpenClaw's architecture is built around a single central concept: one Gateway, many surfaces.

Big picture

graph TB subgraph Channels["📨 Messaging Channels"] WA[WhatsApp
via Baileys] TG[Telegram
via grammY] DC[Discord
via Carbon] SL[Slack
via Bolt] IM[iMessage
via BlueBubbles] WC[WebChat] end

subgraph GW["🏠 Gateway (your machine)"]
    direction TB
    CHAN[Channel Manager]
    SESS[Session Store]
    AGENT[Agent Runtime<br/>pi-mono]
    TOOLS[Tool Engine]
    CONFIG[Config / Auth]
end

subgraph Providers["☁️ LLM Providers"]
    ANT[Anthropic]
    OAI[OpenAI]
    OLL[Ollama<br/>local]
    OR[OpenRouter]
end

subgraph Clients["💻 Control Clients"]
    MAC[macOS App]
    CLI[CLI]
    WEBUI[Web UI]
end

subgraph Nodes["📱 Nodes"]
    IOS[iOS]
    AND[Android]
end

Channels <-->|inbound/outbound| CHAN
CHAN --> SESS
SESS --> AGENT
AGENT <-->|tool calls| TOOLS
AGENT -->|API calls| Providers
Providers -->|responses| AGENT
AGENT --> SESS
SESS --> CHAN
Clients <-->|WebSocket| GW
Nodes <-->|WebSocket| GW

The Gateway is the hub. Everything else connects to it.

The Gateway

The Gateway is a long-lived Node.js process. Its responsibilities:

Owns persistent connections to WhatsApp, Telegram bots, Discord servers, etc.
Routes inbound messages to the right session and agent
Manages sessions, storing conversation context per-user, per-channel, per-group
Runs the agent: invokes the LLM, processes tool calls, streams responses
Exposes a WebSocket API for control clients (macOS app, CLI, Web UI) and nodes (iOS, Android)
Serves the Canvas, an agent-driven visual workspace

Gateway listens on:
  ws://127.0.0.1:18789  (WebSocket — control plane)
  http://127.0.0.1:18789/__openclaw__/canvas/  (Canvas UI)

Component deep dive

Channel Manager

Each channel has a plugin that handles the platform-specific protocol:

Channel	Library	Notes
WhatsApp	Baileys	Maintains a WhatsApp Web session (QR scan)
Telegram	grammY	Bot API polling or webhook
Discord	@buape/carbon	Slash commands + DMs + channels
Slack	@slack/bolt	Event API
Signal	signal-cli	Requires separate signal-cli daemon

When a message arrives on any channel, the Channel Manager:

Validates the sender (allowlist / pairing check)
Determines the session key
Hands off to the Session Manager

Session Manager

Sessions are the unit of conversation continuity. A session is:

A unique key (e.g., agent:main:telegram:dm:821071206)
A JSONL transcript file (~/.openclaw/agents/<id>/sessions/<SessionId>.jsonl)
Metadata: token counts, last updated, channel origin

Session keys follow this pattern:

agent:<agentId>:<channel>:<type>:<id>
               │         │      │
               │         │      └── sender ID or group ID
               │         └── dm, group, channel
               └── telegram, discord, whatsapp, etc.

When dmScope is set to main (the default), all direct messages collapse into a single shared session regardless of channel:

agent:main:main

This is the key you'll see most often in single-user setups.

Agent Runtime

The agent is built on pi-mono, a coding agent framework. OpenClaw wraps it with:

Custom tool wiring (browser, canvas, nodes, cron, session tools)
Workspace bootstrap injection (AGENTS.md, SOUL.md, etc.)
Skill loading (bundled + managed + workspace skills)
Compaction (context window management)
Memory flush (pre-compaction note-writing)

The agent runs one turn at a time per session. If a new message arrives while a turn is running, it queues according to the queue mode (steer/followup/collect).

Tool Engine

Tools are what give the agent its power. Core tools (always available):

Tool	What it does
`read`	Read file contents
`write`	Write/create files
`edit`	Precise text replacement in files
`exec`	Run shell commands (with sandboxing)
`browser`	Control a browser (Playwright)
`canvas`	Control the visual Canvas
`nodes`	Control paired iOS/Android/headless nodes
`message`	Send messages to channels
`web_search`	Web search
`web_fetch`	Fetch URL content
`image`	Analyze images
`tts`	Text-to-speech

Skills can register additional tools.

WebSocket control plane

Clients (macOS app, CLI, Web UI) connect to the Gateway via WebSocket. The protocol is typed and schema-validated:

sequenceDiagram participant CLI participant Gateway

CLI->>Gateway: {type:"req", method:"connect", params:{...}}
Gateway-->>CLI: {type:"res", ok:true, payload:{snapshot:...}}

CLI->>Gateway: {type:"req", method:"agent", params:{message:"Hello"}}
Gateway-->>CLI: {type:"res", ok:true, payload:{runId:"...", status:"accepted"}}
Gateway-->>CLI: {type:"event", event:"agent", payload:{delta:"Hi! I'm..."}}
Gateway-->>CLI: {type:"event", event:"agent", payload:{done:true}}

The protocol uses:

Requests paired with Responses (matched by request ID)
Events (server push, not tied to a request)
Idempotency keys on mutating requests to safely retry

Nodes: the mobile extension

Nodes (iOS, Android, macOS) connect to the same WebSocket as clients, but declare role: "node". They provide:

Camera feeds (photos/video)
Screen recording
Location
Canvas rendering (the visual workspace)
Voice input/output (Talk Mode)

Nodes are paired through a one-time approval flow. Once paired, they get a device token for subsequent connections.

State storage

All state lives in ~/.openclaw/:

~/.openclaw/
├── openclaw.json          ← Main config
├── agents/
│   └── main/
│       ├── sessions/
│       │   ├── sessions.json        ← Session registry
│       │   └── <SessionId>.jsonl    ← Conversation transcript
│       └── pairing/
│           └── store.json           ← Paired devices
├── skills/                ← Managed skills
└── auth/                  ← OAuth tokens, API keys cache

No external database. No Redis. No PostgreSQL. Just files.

Security boundaries

graph LR INTERNET["🌐 Internet
(Untrusted)"] CHANNEL["Channel
(e.g. Telegram)"] GW["Gateway
(Trusted)"] LLM["LLM Provider
(Semi-trusted)"] TOOLS["Tools
(Elevated trust)"]

INTERNET -->|"message from anyone"| CHANNEL
CHANNEL -->|"pairing check +<br/>allowlist filter"| GW
GW -->|"sanitized context"| LLM
LLM -->|"tool call requests"| GW
GW -->|"sandboxed exec"| TOOLS

The security model in brief:

Inbound messages are untrusted by default, wrapped in EXTERNAL_UNTRUSTED_CONTENT markers
Pairing and allowlists gate who can talk to your agent
Tool policy controls what the agent can execute (sandbox vs elevated)
The LLM provider is trusted for tool call structure, but not for prompt content

Security gets its own module (Module 6).

How it all fits together

When you send a message on Telegram:

Telegram delivers it to the Gateway's grammY bot
The Gateway checks: is this sender approved? (pairing/allowlist)
The Channel Manager normalizes it to an internal InboundMessage
The Session Manager finds (or creates) the session for this conversation
The Agent adds the message to the session transcript, calls the LLM
The LLM responds with text or tool call requests
The Tool Engine executes requested tool calls (sandboxed)
The Agent feeds tool results back to the LLM, continues until done
The Channel Manager delivers the final response back to Telegram
The transcript is updated with the full turn

This all happens in memory — no round-trips to a database. Fast and local.

Summary

Layer	Technology	Role
Channels	Platform-specific libs	Inbound/outbound messaging
Gateway	Node.js, HTTP, WS	Control plane, session mgmt
Agent	pi-mono runtime	LLM calls, tool execution
Storage	JSONL files	Session transcripts, state
Control	WebSocket API	CLI, macOS app, Web UI
Nodes	WS + device pairing	Mobile/companion devices

The next lesson traces a single message through the entire system, step by step.