Voice channels
Call a phone number and talk to your own Claude Code or openclaw agent — in its own context, with its tools and memory — over Saperly's manual mode. No audio ever reaches your machine.
Call a phone number and talk to YOUR coding agent. A voice channel makes the Claude Code or openclaw session you already have open phone-reachable, in its own context — with its tools, its memory, its whole session. The caller speaks; the transcribed turn is injected into your running agent as input; the agent replies in-session; the reply is spoken back over the phone. Tool calls, DTMF, transfers, and "say a final line then hang up" all work.
No audio ever reaches your machine — speech-to-text and text-to-speech run in-network, and only text crosses to your agent. A small connector (a Claude Code channel / an openclaw extension) holds a Saperly manual-mode websocket and bridges it to your agent. The agent connects out, so it needs no public URL.
How it works
☎ caller ☎ caller
│ speech speech ▲
▼ │
speech-to-text ──text──▶ Saperly ──────▶ text-to-speech
(in-network) │ manual-mode websocket ▲ (in-network)
▼ │ directive
┌──────────────────┐ │
│ connector │ │
│ (channel/plugin) │─────────────────┘
└──────────────────┘ reply tool → directive
│ turn injected as input
▼
YOUR agent (its tools + memory)
audio never touches your machine · speech-to-text + text-to-speech stay in-networkThe connector holds one websocket per Saperly connection and multiplexes
every live call on it. Each caller turn carries a request_id; your reply must
echo it so the right call gets the right directive. The wire contract is the
manual-mode protocol — the connectors just adapt it
to your agent's native event + tool surface.
Prerequisites
- Bun — the connectors are Bun packages.
- A Saperly manual connection and its
manualSecret(from the dashboard: Connections → your manual connection). See Connections. - A phone number pointed at that connection (Numbers page → set the line's handler to your manual connection). See Numbers.
The base URL accepts https://… (derives wss), http://localhost:8787
(derives ws), or a bare host (defaults to wss).
Call your agent — the demo
Provision the line. Create a manual connection, copy its manualSecret, and
point a phone number at it.
Configure and launch the connector for your agent (Claude Code or openclaw —
see below). On open it connects the websocket and sends hello; you'll see a log
line confirming your agent is now phone-reachable.
Call the number. Your agent receives an inbound_call. It greets the caller
by replying with a speak.
Talk. Each thing you say arrives as a caller-said turn. The agent answers in-session — running tools, reading memory — and replies. Answer within ~18s or the caller hears a brief hold line (Saperly falls back at ~20s).
Hang up. A call_ended event closes the turn out.
Claude Code
The saperly-voice connector is a Claude Code channel (an MCP server, per the
channels reference) that also holds the Saperly websocket.
Configure. In a Claude Code session, run the configure command — it writes
~/.claude/channels/saperly-voice/.env:
/saperly-voice:configure baseUrl=https://api.saperly.com connectionId=conn_123 secret=mc_abc123Or set the env vars directly before launch: SAPERLY_BASE_URL,
SAPERLY_CONNECTION_ID, SAPERLY_MANUAL_SECRET, and optional SAPERLY_CLIENT
(a label for the event trail).
Launch. During the channels research preview, a custom channel needs the development flag:
claude --dangerously-load-development-channels plugin:saperly-voiceAnswer calls. Each call shows up in your session as a channel event:
<channel source="saperly-voice" kind="inbound_call" request_id="…" …>Each caller utterance arrives as 📞 Caller said: …. You answer with the
reply tool, echoing the request_id:
{
"request_id": "req_…", // from the <channel> tag — echo it verbatim
"kind": "speak", // speak | wait | hangup | transfer | send_dtmf
"text": "…", // required for speak
"end_call": false, // speak: hang up after the line plays
"to": "+1…", // required for transfer (E.164 or SIP URI)
"digits": "123#", // required for send_dtmf
"timeout_ms": 5000, // wait: optional gather timeout
"reason": "…" // hangup: optional
}For unattended use (so a call's tool calls don't block on a permission prompt
while you're away), pair the launch with Claude Code's
--dangerously-skip-permissions — only in environments you trust.
openclaw
The saperly-voice-openclaw connector is an openclaw extension loaded by the
Gateway. It registers the saperly_voice_reply agent tool and holds the Saperly
websocket; each call routes to a stable per-call session
saperly-voice:<call_id>, so a multi-turn call stays in one conversation.
Configure under plugins.entries.saperly-voice.config in openclaw.json5
(the plugin id is saperly-voice; the npm package is
saperly-voice-openclaw):
{
plugins: {
enabled: true,
allow: ["saperly-voice"],
entries: {
"saperly-voice": {
enabled: true,
config: {
baseUrl: "https://api.saperly.com",
connectionId: "conn_123",
// Prefer the env var for the secret:
// manualSecret: "mc_…",
}
}
}
}
}Or via env (env wins, so the secret can stay out of the file): SAPERLY_BASE_URL,
SAPERLY_CONNECTION_ID, SAPERLY_MANUAL_SECRET, optional SAPERLY_CLIENT.
Answer calls. Each caller turn is injected as a next-turn input; the agent
replies with the saperly_voice_reply tool, echoing the request_id. The
arguments are identical to the Claude Code reply tool above (kind is
speak | wait | hangup | transfer | send_dtmf).
This is NOT the openclaw voice-call plugin
openclaw also ships a voice-call plugin that streams raw call audio to a
realtime provider and holds a media socket for the call's duration — which Saperly
deliberately avoids. saperly-voice-openclaw binds the manual-mode websocket
instead: signaling, speech-to-text, and text-to-speech stay in-network, and only
text turns reach your process.
Reply / directive reference
Every reply (Claude Code) and saperly_voice_reply (openclaw) maps to one
manual-mode directive. Invalid args (e.g. a speak
with no text, an unknown kind) are rejected by the tool with a correctable
message and never sent to the live call.
kind | Effect | Fields |
|---|---|---|
speak | Say a line, then keep listening | text (required); end_call: true to say a final line then hang up |
wait | Listen without speaking | timeout_ms? |
hangup | End the call | reason? |
transfer | Transfer the call | to (E.164 or SIP URI) |
send_dtmf | Send touch-tones | digits |
Always echo the request_id from the incoming turn, and answer within ~18s
or the caller hears a short hold line.
Text in, directives out
A voice channel only ever moves text in and directives out. Speech-to-text
and text-to-speech run in-network; no call audio ever reaches your machine. That
is the difference from a media-path bridge (such as openclaw's voice-call
plugin), which streams raw call audio to a realtime provider. See
Core concepts.
Next steps
- Manual mode — the underlying event/directive protocol these connectors speak.
- Connections — creating the manual connection and
finding its
manualSecret. - Voice — calls, lifecycle, and outbound calling.
- Core concepts — the Saperly model and how a call flows.
Manual mode
Bring your own LLM as the brain of a phone call — Saperly sends it text turns and executes the directives it returns, while speech-to-text, text-to-speech, and the audio all stay in-network.
Webhooks
Saperly delivers events — inbound SMS, call lifecycle, 10DLC status, delivery receipts — to your endpoint, signed with HMAC-SHA256; verify the signature on the raw body and dedup the delivery id before you trust a payload.