Running Claude Code Sessions Remotely via Mac Agent + WebSocket

The problem is narrow but sharp: a headless FastAPI orchestrator running in Docker on a Linux box cannot spawn claude CLI sessions. The Claude Code binary wants an interactive terminal, a real user home directory, the user's MCP servers installed, and the user's ~/.claude/ state. It is fundamentally a desktop tool pretending to be a CLI. So if your backend wants to delegate agentic work (writing a 1500-word article, running a multi-turn planning session, editing files in a local repo) to Claude Code, you need a bridge that lives on the user's actual Mac.

That bridge is tooler-mac-agent: a small Rust binary installed as a LaunchAgent, holding a persistent WebSocket back to tooler-core, and shelling out to claude on demand. This post walks through why WebSocket beat every other option, how session_id resume lets a single ticket span multiple turns without losing context, and the exact shape of the runner module at toolers/tooler-mac-agent/src/runner.rs.

Why WebSocket, not HTTP polling

The first instinct was an HTTP poll loop: Mac Agent hits GET /mac-agent/next-job every few seconds, does the work, POSTs the result. That works for short jobs. It falls apart the moment you want to stream partial output — and Claude Code writes output continuously during a turn, not in a single response blob.

The second instinct was Server-Sent Events from core to agent. SSE is unidirectional (server to client). Fine for streaming job assignments, useless for streaming tool results back while the job is running.

WebSocket is bidirectional by default. The Mac Agent dials out (so it works behind NAT without port-forwarding) and keeps a long-lived connection. tooler-core can push a claude_run_request message, the Mac Agent can push back claude_stream_chunk messages as stdout arrives, and everyone is happy. The protocol is documented in docs/tooler-core/WS_PROTOCOL.md and the wire contract is enforced by a single WsMessage enum shared between Rust and Python via serde-tagged JSON.

The relevant protocol primer from the WebSocket project at websockets.spec.whatwg.org explains why WebSocket survives hostile middleboxes better than long-polling: a single upgraded TCP connection looks like HTTP to proxies, and ping/pong frames keep idle NATs awake. For production deployments behind nginx, the nginx WebSocket proxying guide is the authoritative reference for the Upgrade / Connection header dance.

The LaunchAgent envelope

A LaunchAgent is the right install target because it runs in the user's graphical session (so claude sees the user's $HOME, the user's ~/.claude/, the user's PATH-installed MCP servers) and restarts automatically if the Rust binary crashes. The launchd plist lives at toolers/tooler-mac-agent/deploy/com.longlanh.tooler-mac-agent.plist and the installer is at toolers/tooler-mac-agent/install.sh.

Key plist keys:

RunAtLoad = true so the agent is alive the moment the user logs in.
KeepAlive = { Crashed = true, SuccessfulExit = false } so launchd resurrects the process on panic but respects intentional exit(0).
StandardOutPath and StandardErrorPath pointing at ~/Library/Logs/tooler-mac-agent/. This is the single most useful operational decision: when the user says "it stopped working," you ask for that log file.

Apple's launchd reference is dated but still the best source of truth for which keys do what. Ignore StackOverflow answers from 2015; the correct answer for a user-session daemon is almost always a LaunchAgent in ~/Library/LaunchAgents/, not a LaunchDaemon in /Library/LaunchDaemons/.

session_id resume: the part that matters

Here is the constraint that drives the whole design. A Claude Code "session" is a conversation thread with ~/.claude/ state: MCP tool history, file edits, compaction state. If tooler-core sends a claude_run_request and gets back a final message, and then ten minutes later sends a follow-up, you do NOT want a fresh session. You want to resume the previous one so Claude remembers what it just edited.

The claude CLI supports this via --resume <session_id>. The first invocation prints session_id: <uuid> to stderr on startup; subsequent invocations pass --resume <that-uuid> and pick up exactly where the previous turn stopped. The Rust runner captures the session_id on the first turn, persists it into the Postgres mac_agent_jobs row via a WebSocket message back to core, and core uses it for all follow-ups on the same ticket.

The runner looks approximately like this:

use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::Command;
use uuid::Uuid;

pub async fn run_claude_turn(
    ticket_id: Uuid,
    prompt: &str,
    resume_session_id: Option<&str>,
    tx: &WsSender,
) -> anyhow::Result<ClaudeTurnResult> {
    let mut cmd = Command::new("claude");
    cmd.arg("-p").arg(prompt);
    cmd.arg("--output-format").arg("stream-json");

    if let Some(sid) = resume_session_id {
        cmd.arg("--resume").arg(sid);
    }

    cmd.stdout(std::process::Stdio::piped());
    cmd.stderr(std::process::Stdio::piped());

    let mut child = cmd.spawn()?;
    let stdout = child.stdout.take().expect("stdout piped");
    let mut lines = BufReader::new(stdout).lines();

    let mut captured_session_id: Option<String> = None;
    let mut final_text = String::new();

    while let Some(line) = lines.next_line().await? {
        let event: ClaudeEvent = serde_json::from_str(&line)?;

        if captured_session_id.is_none() {
            if let Some(sid) = event.session_id() {
                captured_session_id = Some(sid.to_string());
                tx.send(WsMessage::ClaudeSessionCaptured {
                    ticket_id,
                    session_id: sid.to_string(),
                }).await?;
            }
        }

        if let ClaudeEvent::AssistantText { text } = &event {
            final_text.push_str(text);
            tx.send(WsMessage::ClaudeStreamChunk {
                ticket_id,
                chunk: text.clone(),
            }).await?;
        }
    }

    let status = child.wait().await?;
    Ok(ClaudeTurnResult {
        session_id: captured_session_id,
        final_text,
        exit_code: status.code().unwrap_or(-1),
    })
}

Three things worth calling out:

--output-format stream-json emits one JSON event per line. This is dramatically more reliable than parsing the default terminal rendering, which mixes ANSI escape codes with content and reflows on window resize. Use stream-json for any programmatic consumer.
The session_id arrives in the very first event. Capturing it and pushing it back to core immediately (before the turn even produces text) means the ticket row in Postgres has a resume handle within ~200ms of spawn, so a dashboard polling core sees "session active" instead of "pending."
Every stream chunk is forwarded as a separate WsMessage::ClaudeStreamChunk. This keeps the core-side ticket log updating live, which is the whole reason you picked WebSocket over HTTP.

The ticket lifecycle

On the core side, a ticket goes through five states in the mac_agent_jobs table: queued, dispatched, running, awaiting_followup, complete. dispatched is the brief window between "core pushed claude_run_request over WS" and "agent confirmed with claude_session_captured." awaiting_followup means the first turn finished, a session_id is stored, and core is computing or waiting for the next prompt in the same conversation.

Why a separate awaiting_followup state? Because article generation is inherently multi-turn: turn 1 drafts, turn 2 critiques against quality gates (cadence, uniqueness, citations), turn 3 revises. Core runs the gate logic in Python (fast, stateless), then sends turn 2 to the same session_id. Without the explicit state, you cannot tell "is the agent busy or are we waiting on core's gate check?" The distinction matters during operational triage.

The state transitions are encoded as enum variants in toolers/tooler-core/src/mac_agent/ticket.py, and the two relevant WebSocket messages (ClaudeRunRequest, ClaudeRunResult) are defined once in packages/rust-shared/src/ws_protocol.rs and imported by both sides. Keeping the enum in one place is how you avoid the classic "Python thinks the field is session_id but Rust serializes sessionId" bug at 2am.

Reconnect, backoff, and the heartbeat

The WebSocket connection will die. A Mac goes to sleep, a home router reboots, a flaky cafe wifi drops frames, macOS decides to migrate your network interface. The Rust client uses tokio-tungstenite with an exponential backoff reconnect loop starting at 1 second and capping at 60 seconds, plus a 30-second application-level ping that core responds to with a pong.

Heartbeat is application-level, not WebSocket-level. The WebSocket control frames work but some middleboxes strip them. An application-level JSON {"kind":"ping"} message is guaranteed to round-trip through any proxy that passes the main protocol. Cloudflare in front of tooler-core was the concrete motivator; their default idle timeout is 100 seconds and a 30-second ping keeps well under that.

Any ticket in dispatched or running state when the WS drops is marked needs_recovery in Postgres. On reconnect, the Mac Agent sends a resume_tickets message listing any session_ids it is still tracking locally; core reconciles against its own state and re-sends the latest prompt for each. If the Mac died mid-turn, that turn is lost and the ticket restarts from the last checkpoint — idempotency of article revision is designed around this.

Security

The WebSocket endpoint is authenticated with a pre-shared bearer token stored in tooler-core's secrets gateway and read by the Mac Agent from ~/.config/tooler-mac-agent/config.toml. There is no path from a compromised public API route to a Mac Agent command, because claude_run_request can only be constructed by a service with the internal mac_agent.dispatch capability. The authorization model is enforced in toolers/tooler-core/src/mac_agent/dispatcher.py and the capability set is deliberately tiny: dispatch, inspect_status, cancel.

Full transport spec, ticket schema, and replay semantics live in the monorepo — see github.com/hglong16/tinktink for the current source of truth. The design is deliberately boring, which is the right posture for a component that executes arbitrary shell commands on a user's personal machine.