Skip to main content

Claude Cowork Deployment

Platform-agnostic reference for deploying Claude Cowork against a self-hosted /v1/messages gateway. Covers the five-move architecture, three auth tiers (PAT, session, mTLS), the signed manifest, the audit schema, and routes to platform-specific install guides for macOS and Windows.

TL;DR - Claude Cowork reads five managed-preference keys on launch that let an organisation replace Anthropic's hosted inference, auth, and plugin catalogue with its own. This page is the platform-agnostic reference: architecture, auth tiers, manifest shape, audit schema. For the platform mechanics see macOS and Windows. For the strategic case a CTO/CISO makes for this deployment, see the Cowork on your own infrastructure guide.

Anthropic's three extension points

Anthropic documents Cowork running on customer infrastructure in three support articles:

Three extension points come out of those articles:

  • /v1/messages gateway. When inferenceProvider = "gateway" is set in managed preferences, Cowork routes every chat turn, tool call, and extended-thinking block through inferenceGatewayBaseUrl. The URL must speak the Anthropic Messages wire format.
  • Managed MCP allowlist. Cowork reads a JSON file under the org-plugins mount to decide which MCP servers a user may invoke. Servers not in the list are hidden from the UI and refused at runtime.
  • org-plugins mount. A directory Cowork scans on launch. Plugins, skills, and agents inside it are visible in the slash menu and the agent runtime. End users have no write access.

The five-move deployment

Five moves take a fleet from zero to a Cowork deployment that runs entirely on infrastructure you operate. The first three run once per organisation; the last two run per device.

  1. Publish the gateway. Stand up the /v1/messages endpoint on a host the fleet can reach. See the Gateway Service reference for endpoint and routing details.
  2. Mint the credential. Pick the auth tier that matches your compliance posture - PAT, session cookie, or mTLS. The gateway's /v1/auth/cowork/capabilities endpoint advertises which tiers it accepts.
  3. Ship the signed manifest. Publish the plugins, skills, and managed MCP servers the fleet should receive, signed with an Ed25519 private key held by the gateway.
  4. Distribute the MDM profile. Push the five managed-preference keys to every device. Platform specifics: macOS, Windows.
  5. Verify in the audit trail. One SQL query against audit_events joining prompt, tool calls, and MCP execs by trace_id. Empty result set means the deployment is not finished.

The five managed-preference keys

Cowork reads the same five keys on every platform. The meaning is identical; the transport differs.

Key Values Purpose
inferenceProvider "gateway" Switches inference from api.anthropic.com to the configured gateway
inferenceGatewayBaseUrl "https://cowork-gateway.example.com" Where /v1/messages calls go. HTTPS required unless host is 127.0.0.1
inferenceCredentialHelper absolute path Path to the sp-cowork-auth / systemprompt-cowork binary Cowork invokes on every call
inferenceCredentialHelperTtlSec integer seconds (e.g. 3600) How long a minted JWT is cached on disk before the helper re-authenticates
inferenceGatewayAuthScheme "bearer" Auth scheme on outbound calls. bearer is the only value today

Platform-specific syntax lives in the platform docs. Linux developer workstations read the same values from CLAUDE_* environment variables.

The three auth tiers

sp-cowork-auth exposes three authentication providers. On every call they run in a fixed order - mTLS first, session cookie second, PAT third. The first provider that produces a valid token wins.

+------------------------------------------------+
|  sp-cowork-auth on each inference call         |
+------------------------------------------------+
                     |
         cache hit?  | yes -> return cached line
                     |
                     v  no
+------------------------------------------------+
|  Tier 1: mTLS device certificate               |
|  POST /v1/auth/cowork/mtls                     |
+------------------------------------------------+
                     | no cert / disabled
                     v
+------------------------------------------------+
|  Tier 2: Session cookie (browser OAuth)        |
|  POST /v1/auth/cowork/session                  |
+------------------------------------------------+
                     | no session / 401
                     v
+------------------------------------------------+
|  Tier 3: Personal Access Token                 |
|  POST /v1/auth/cowork/pat                      |
+------------------------------------------------+
                     |
                     v
    Write {token, ttl, headers} JSON to stdout

PAT is the simplest tier. An administrator mints a token at the gateway, the user runs sp-cowork-auth login sp_pat_... once, and the token lives in the OS keystore (macOS Keychain, Windows Credential Manager, Linux Secret Service). Revoke a PAT at the gateway and the next helper call returns 401.

Session cookie is right when you already have a browser-based SSO flow and want Cowork enrolment to ride on it. The helper opens {gateway}/cowork/device-link?redirect=http://127.0.0.1:<ephemeral>, the user authenticates against the IdP, the browser returns the code to the loopback server, and a session cookie lands in the OS keystore.

mTLS is the right tier when the device itself is the identity. The helper loads a client certificate from an OS keystore reference, presents it to /v1/auth/cowork/mtls, and receives a JWT carrying the certificate's SHA-256 fingerprint as a claim. The cert is typically provisioned by MDM and backed by TPM (Windows), Secure Enclave (macOS), or a PKCS#11 token (Linux).

The seven canonical headers

Every /v1/messages call that leaves Cowork carries seven headers the gateway validates against the JWT and stamps into audit_events:

Header Typed ID Meaning
x-user-id UserId The human on the other side of the keyboard
x-session-id SessionId Cowork chat session - stable across tool calls within one conversation
x-trace-id TraceId This one inference call. Every tool call and MCP exec inherits it
x-client-id ClientId sp_cowork for Cowork; sp_cli / sp_desktop / sp_web elsewhere
x-tenant-id TenantId Tenant the user belongs to
x-policy-version (plain) Hash of the policy bundle in force at JWT mint time
x-call-source SessionSource Module that issued the call (cowork, subagent, job)

The header constants live in systemprompt_identifiers::headers. The provider never sees them; the gateway strips them at the outbound boundary.

The stdout contract

inferenceCredentialHelper is strict: on every run the binary prints exactly one JSON line to stdout. Anything else - a banner, a log preamble, two lines - breaks Cowork's parser and surfaces as "credential helper failed" in the UI. Diagnostics go to stderr.

{
  "token": "eyJhbGciOi...",
  "ttl": 3600,
  "headers": {
    "x-user-id": "u_29f8a3",
    "x-session-id": "s_01hy...",
    "x-trace-id": "t_a8bf...",
    "x-client-id": "sp_cowork",
    "x-tenant-id": "org_acme",
    "x-policy-version": "2026-04-22",
    "x-call-source": "cowork"
  }
}

Cached at $XDG_CACHE_HOME/systemprompt/systemprompt-cowork.json (mode 0600). Cache invalidation is a matter of deleting the file.

The signed manifest

The manifest is a JSON document served by the gateway at /v1/cowork/manifest and signed with an Ed25519 private key per RFC 8032. Every client verifies the signature against a pubkey pinned at install time.

{
  "version": "2026-04-22T09:30:00Z",
  "user": {
    "id": "u_29f8a3",
    "roles": ["engineering", "senior"]
  },
  "plugins": [
    {
      "id": "devops-plugin",
      "version": "1.4.2",
      "files": [
        {"path": "plugin.toml", "sha256": "..."},
        {"path": "handlers/deploy.sh", "sha256": "..."}
      ]
    }
  ],
  "skills": [
    {
      "name": "review-terraform-plan",
      "description": "Audit a Terraform plan for destructive changes",
      "file_path": ".systemprompt-cowork/skills/review-terraform-plan.md",
      "instructions": "..."
    }
  ],
  "agents": [
    {
      "id": "pr-reviewer",
      "endpoint": "/v1/messages",
      "model": "claude-sonnet-4-6",
      "skills": ["review-terraform-plan"],
      "mcp_servers": ["gh-readonly"],
      "enabled": true
    }
  ],
  "managed_mcp_servers": [
    {
      "name": "gh-readonly",
      "url": "https://mcp-internal.example.com/gh-readonly",
      "tool_policy": {"allow": ["search_code", "read_file"]}
    }
  ],
  "revocations": [
    {"kind": "skill", "name": "leaked-api-probe"}
  ],
  "signature": {
    "alg": "ed25519",
    "sig": "base64(ed25519(canonical_json(manifest_body)))"
  }
}

Five details carry the governance story:

  • Per-user. The user block is resolved server-side from the JWT; two engineers in the same tenant can receive different manifests.
  • managed_mcp_servers is the allowlist. Servers not in the list are refused at runtime.
  • revocations is the kill switch. The binary removes revoked files atomically on the next sync.
  • Skills and plugins are separate. Plugins execute; skills are text context. Agents tie them together.
  • Signature covers canonical JSON. Pubkey is pinned at install --gateway time; mismatched signature aborts the sync before touching the filesystem.

The sync flow

  1. Fetch /v1/cowork/manifest with the cached JWT. A 401/403/404 short-circuits the sync; the existing mount is untouched.
  2. Verify the Ed25519 signature against the pinned pubkey. Verification failure aborts before any filesystem write.
  3. Stage every file referenced in plugins[].files[] under org-plugins/.staging/. Each file is SHA-256-hashed and compared to the manifest. Mismatch aborts.
  4. Rename atomically into place (MoveFileEx(MOVEFILE_REPLACE_EXISTING | MOVEFILE_WRITE_THROUGH) on Windows; rename(2) on Unix).
  5. Write .systemprompt-cowork/managed-mcp.json and .systemprompt-cowork/last-sync.json.

Sign the manifest from a build pipeline that retains every version, and keep the Ed25519 signing key in an HSM, YubiHSM, or cloud KMS - never on an admin laptop.

The audit schema

Every pass through the governance pipeline writes one row to audit_events before the response returns to the caller, so there is no path where a successful call is not audited.

CREATE TABLE audit_events (
  id           BIGSERIAL PRIMARY KEY,
  occurred_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  kind         TEXT NOT NULL,      -- 'inference' | 'tool_call' | 'mcp_exec' | 'cost'
  user_id      TEXT NOT NULL,      -- from x-user-id
  session_id   TEXT NOT NULL,      -- from x-session-id
  trace_id     TEXT NOT NULL,      -- from x-trace-id
  client_id    TEXT NOT NULL,      -- from x-client-id
  tenant_id    TEXT NOT NULL,      -- from x-tenant-id
  policy_ver   TEXT NOT NULL,      -- from x-policy-version
  call_source  TEXT NOT NULL,      -- from x-call-source
  model        TEXT,
  provider     TEXT,               -- 'anthropic', 'openai', 'bedrock', ...
  tokens_in    INTEGER,
  tokens_out   INTEGER,
  cost_micro   BIGINT,             -- microdollars
  latency_ms   INTEGER,
  outcome      TEXT NOT NULL,      -- 'allowed' | 'denied' | 'error'
  payload      JSONB NOT NULL
);

CREATE INDEX ON audit_events (tenant_id, occurred_at DESC);
CREATE INDEX ON audit_events (trace_id);
CREATE INDEX ON audit_events (user_id, occurred_at DESC);

The one-shot lineage query

Every event triggered by one user prompt shares a trace_id. A single join on trace_id gives the full lineage:

SELECT
  e.occurred_at, e.kind, e.outcome, e.model, e.provider,
  e.tokens_in, e.tokens_out, e.cost_micro,
  e.payload ->> 'tool'   AS tool_name,
  e.payload ->> 'server' AS mcp_server
FROM audit_events e
WHERE e.tenant_id = 'org_acme'
  AND e.user_id   = 'u_29f8a3'
  AND e.trace_id  = 't_a8bf...'
ORDER BY e.occurred_at ASC;

Returns a chronological list: one inference row, zero or more tool_call rows, zero or more mcp_exec rows, and a final cost row. Every row is tied back to the JWT-verified user, tenant, and policy version.

SIEM export

Every row is also emitted as a JSON event on a separate topic for Splunk / ELK / Datadog / Sumo Logic ingestion. The shape mirrors the table row.

Cost attribution

cost_micro is microdollars. SUM(cost_micro) GROUP BY user_id is per-user cost. GROUP BY (tenant_id, provider) compares spend across upstreams - "what did we spend on Bedrock vs direct Anthropic in April 2026" is a one-line query.

Provider routing

The gateway's /v1/messages router forwards to one of the built-in provider tags (anthropic, openai, moonshot, qwen, minimax, gemini, bedrock, vertex, azure, groq) or a custom tag registered via the GatewayUpstream trait and the inventory crate. Routes match on model_pattern, agent, user/tenant, cost, or failover. See the AI Services reference for the full rule schema and the Gateway Service reference for the route config block.

Air-gapped deployment

Nothing in the runtime flow requires outbound traffic to anthropic.com, a telemetry endpoint, or a licence server. The only network calls the binary makes are the ones explicitly pointed at your gateway.

An air-gapped deployment shifts three responsibilities inward:

  • The gateway runs inside the egress boundary.
  • The upstream provider sits on the same side - self-hosted vLLM / Ollama / sglang, a private Bedrock VPC endpoint, or Azure OpenAI over private link.
  • The Ed25519 signing key and its HSM never leave.

For pure-offline deployments where clients have no network path to any gateway, the sync agent supports pre-seeded mode: the admin produces a signed manifest and plugin payload on a build machine, copies to a read-only share, and points the binary at a file:// URL. Signature verification still runs; there is no trust shortcut for local files.

Migration from direct API usage

Teams running Cowork against api.anthropic.com with individual OAuth need a soft cutover in four evidence-gated phases:

  1. Shadow mode. The gateway accepts requests and writes to audit_events; upstream of record stays Anthropic. Advance when shadow audit matches production call shape.
  2. Pilot cohort. MDM profile to a small volunteer group. Advance when the cohort reports no regressions and outcome = 'allowed' distribution looks right.
  3. Ring rollout. Push department by department. Spike in outcome = 'error' rows in any ring blocks the next ring until triaged.
  4. Retire direct access. Revoke individual OAuth tokens at the Anthropic side. Gateway is the only path.

The gate between phases is evidence, not calendar.

CLI

# Probe the gateway - what auth tiers does it accept?
systemprompt gateway capabilities

# Mint a PAT for the current user
systemprompt gateway pat create --name "cowork laptop"

# List PATs
systemprompt gateway pat list

# Revoke a PAT (takes effect on next helper call, i.e. within one TTL window)
systemprompt gateway pat revoke <id>

# Install the helper locally
systemprompt gateway cowork-auth install

Where to go next