saperly
Guides

Voice channels

Call a phone number and talk to your own Claude Code or openclaw agent — in its own context, with its tools and memory — over Saperly's manual mode. No audio ever reaches your machine.

Call a phone number and talk to YOUR coding agent. A voice channel makes the Claude Code or openclaw session you already have open phone-reachable, in its own context — with its tools, its memory, its whole session. The caller speaks; the transcribed turn is injected into your running agent as input; the agent replies in-session; the reply is spoken back over the phone. Tool calls, DTMF, transfers, and "say a final line then hang up" all work.

No audio ever reaches your machine — speech-to-text and text-to-speech run in-network, and only text crosses to your agent. A small connector (a Claude Code channel / an openclaw extension) holds a Saperly manual-mode websocket and bridges it to your agent. The agent connects out, so it needs no public URL.

How it works

  ☎ caller                                                     ☎ caller
     │  speech                                            speech  ▲
     ▼                                                            │
  speech-to-text ──text──▶  Saperly  ──────▶ text-to-speech
   (in-network)                 │   manual-mode websocket    ▲  (in-network)
                                ▼                            │ directive
                       ┌──────────────────┐                 │
                       │   connector      │                 │
                       │ (channel/plugin) │─────────────────┘
                       └──────────────────┘   reply tool → directive
                                │  turn injected as input

                    YOUR agent (its tools + memory)

  audio never touches your machine  ·  speech-to-text + text-to-speech stay in-network

The connector holds one websocket per Saperly connection and multiplexes every live call on it. Each caller turn carries a request_id; your reply must echo it so the right call gets the right directive. The wire contract is the manual-mode protocol — the connectors just adapt it to your agent's native event + tool surface.

Prerequisites

  • Bun — the connectors are Bun packages.
  • A Saperly manual connection and its manualSecret (from the dashboard: Connections → your manual connection). See Connections.
  • A phone number pointed at that connection (Numbers page → set the line's handler to your manual connection). See Numbers.

The base URL accepts https://… (derives wss), http://localhost:8787 (derives ws), or a bare host (defaults to wss).

Call your agent — the demo

Provision the line. Create a manual connection, copy its manualSecret, and point a phone number at it.

Configure and launch the connector for your agent (Claude Code or openclaw — see below). On open it connects the websocket and sends hello; you'll see a log line confirming your agent is now phone-reachable.

Call the number. Your agent receives an inbound_call. It greets the caller by replying with a speak.

Talk. Each thing you say arrives as a caller-said turn. The agent answers in-session — running tools, reading memory — and replies. Answer within ~18s or the caller hears a brief hold line (Saperly falls back at ~20s).

Hang up. A call_ended event closes the turn out.

Claude Code

The saperly-voice connector is a Claude Code channel (an MCP server, per the channels reference) that also holds the Saperly websocket.

Configure. In a Claude Code session, run the configure command — it writes ~/.claude/channels/saperly-voice/.env:

/saperly-voice:configure baseUrl=https://api.saperly.com connectionId=conn_123 secret=mc_abc123

Or set the env vars directly before launch: SAPERLY_BASE_URL, SAPERLY_CONNECTION_ID, SAPERLY_MANUAL_SECRET, and optional SAPERLY_CLIENT (a label for the event trail).

Launch. During the channels research preview, a custom channel needs the development flag:

claude --dangerously-load-development-channels plugin:saperly-voice

Answer calls. Each call shows up in your session as a channel event:

<channel source="saperly-voice" kind="inbound_call" request_id="…" …>

Each caller utterance arrives as 📞 Caller said: …. You answer with the reply tool, echoing the request_id:

{
  "request_id": "req_…",   // from the <channel> tag — echo it verbatim
  "kind": "speak",          // speak | wait | hangup | transfer | send_dtmf
  "text": "…",              // required for speak
  "end_call": false,         // speak: hang up after the line plays
  "to": "+1…",              // required for transfer (E.164 or SIP URI)
  "digits": "123#",         // required for send_dtmf
  "timeout_ms": 5000,        // wait: optional gather timeout
  "reason": "…"             // hangup: optional
}

For unattended use (so a call's tool calls don't block on a permission prompt while you're away), pair the launch with Claude Code's --dangerously-skip-permissions — only in environments you trust.

openclaw

The saperly-voice-openclaw connector is an openclaw extension loaded by the Gateway. It registers the saperly_voice_reply agent tool and holds the Saperly websocket; each call routes to a stable per-call session saperly-voice:<call_id>, so a multi-turn call stays in one conversation.

Configure under plugins.entries.saperly-voice.config in openclaw.json5 (the plugin id is saperly-voice; the npm package is saperly-voice-openclaw):

{
  plugins: {
    enabled: true,
    allow: ["saperly-voice"],
    entries: {
      "saperly-voice": {
        enabled: true,
        config: {
          baseUrl: "https://api.saperly.com",
          connectionId: "conn_123",
          // Prefer the env var for the secret:
          // manualSecret: "mc_…",
        }
      }
    }
  }
}

Or via env (env wins, so the secret can stay out of the file): SAPERLY_BASE_URL, SAPERLY_CONNECTION_ID, SAPERLY_MANUAL_SECRET, optional SAPERLY_CLIENT.

Answer calls. Each caller turn is injected as a next-turn input; the agent replies with the saperly_voice_reply tool, echoing the request_id. The arguments are identical to the Claude Code reply tool above (kind is speak | wait | hangup | transfer | send_dtmf).

This is NOT the openclaw voice-call plugin

openclaw also ships a voice-call plugin that streams raw call audio to a realtime provider and holds a media socket for the call's duration — which Saperly deliberately avoids. saperly-voice-openclaw binds the manual-mode websocket instead: signaling, speech-to-text, and text-to-speech stay in-network, and only text turns reach your process.

Reply / directive reference

Every reply (Claude Code) and saperly_voice_reply (openclaw) maps to one manual-mode directive. Invalid args (e.g. a speak with no text, an unknown kind) are rejected by the tool with a correctable message and never sent to the live call.

kindEffectFields
speakSay a line, then keep listeningtext (required); end_call: true to say a final line then hang up
waitListen without speakingtimeout_ms?
hangupEnd the callreason?
transferTransfer the callto (E.164 or SIP URI)
send_dtmfSend touch-tonesdigits

Always echo the request_id from the incoming turn, and answer within ~18s or the caller hears a short hold line.

Text in, directives out

A voice channel only ever moves text in and directives out. Speech-to-text and text-to-speech run in-network; no call audio ever reaches your machine. That is the difference from a media-path bridge (such as openclaw's voice-call plugin), which streams raw call audio to a realtime provider. See Core concepts.

Next steps

  • Manual mode — the underlying event/directive protocol these connectors speak.
  • Connections — creating the manual connection and finding its manualSecret.
  • Voice — calls, lifecycle, and outbound calling.
  • Core concepts — the Saperly model and how a call flows.

On this page