Claude Cowork no longer has to call api.anthropic.com. Anthropic's April 2026 update to the Claude Cowork third-party platforms article defines three extension points that let an organisation put its own AI behind the Cowork client: a /v1/messages inference gateway, a managed MCP allowlist, and a local org-plugins mount. Every Cowork chat turn, every tool call, every extended thinking block can be pointed at an inference endpoint you operate: a Bedrock account, an Azure OpenAI deployment, a Vertex AI region, or a self-hosted Llama, Qwen, or vLLM cluster inside your own datacenter. The Cowork client does not know the difference. The auditor does, and so does your finance team.

This guide is the deployment story for the other side of that interface. It walks through publishing the systemprompt-cowork credential helper binary to a fleet of Windows and Mac machines, distributing managed preferences through Intune, Jamf, or Group Policy, minting the PAT, session, or mTLS credential that each client presents, fetching and verifying a signed manifest of plugins, skills, and MCP servers, and pointing /v1/messages traffic at the inference cluster you chose. Every key, path, header, and endpoint is cited to the systemprompt-core v0.3.0 CHANGELOG that shipped the binary on 22 April 2026 or to the Anthropic support article that defines the contract.

Run Claude Cowork on your own AI: the deployment story in one minute

Five moves take a fleet from zero to a Claude Cowork deployment that runs entirely against an inference stack you operate. The first three run once per organisation. The last two run per device.

  1. Publish the gateway. Stand up the /v1/messages endpoint on a host your employees can reach. The gateway speaks the Anthropic Messages wire format, proxies to the upstream provider you pick, and stamps every response with audit metadata.
  2. Mint the credential. Pick the auth tier that matches your compliance posture. A Personal Access Token (PAT) for teams that trust the OS keystore; an OAuth session cookie for browser-based enrolment; an mTLS device certificate for regulated environments. The gateway's /v1/cowork/capabilities advertises which tiers it accepts.
  3. Ship the signed manifest. Publish the plugins, skills, and managed MCP servers you want the fleet to receive. Sign the manifest with an Ed25519 private key held by the gateway. Every client verifies the signature before writing a file.
  4. Distribute the MDM profile. Push five managed-preference keys to every Mac and Windows device. The keys tell Cowork to use a gateway for inference, where the gateway lives, which binary acts as the credential helper, how long a minted JWT stays cached, and which auth scheme to present on each call.
  5. Verify in the audit trail. Run a single SQL query against the audit_events table joining a prompt, a tool call, and an MCP execution by trace_id. Every row carries the JWT-verified user identity, the tenant, the policy version, and the call source. If the row set is empty, the deployment is not finished.

The artefacts involved are summarised in the table below. Each row is sourced from the systemprompt-core v0.3.0 CHANGELOG.

Read on for the full walkthrough. Each section ends with a concrete pass criterion so that an endpoint engineer armed with this guide and a fresh Windows VM can execute the rollout top to bottom without calling vendor support.

How does Claude Cowork work when you bring your own inference?

Claude Cowork is an Electron client that talks HTTP and spawns plugin subprocesses on the local machine. It has never had a server of its own. In the default configuration it sends /v1/messages requests directly to api.anthropic.com with the user's own OAuth token; plugins live under the user's home directory; MCP servers are configured per-user through the Cowork settings UI. That model is fine for an individual developer and a dead end for any organisation that needs to choose its own inference provider, keep prompts inside its own network, or answer an auditor about who ran what.

The April 2026 Anthropic support article formalises what the product has quietly supported for some time: three extension points that let a third party replace the Anthropic-hosted pieces with something the enterprise owns.

  • The /v1/messages gateway. When inferenceProvider is set to gateway in managed preferences, Cowork sends every chat turn, every tool call, and every extended thinking block to the URL in inferenceGatewayBaseUrl. That URL must speak the exact Anthropic Messages wire format that the Cowork client already knows. Anthropic's public API reference documents the request and response shapes; the gateway implements them server-side.
  • The managed MCP allowlist. Cowork reads a JSON file at a fixed path inside the org-plugins mount to decide which MCP servers a user may invoke. Servers not in the allowlist are hidden from the UI and refused at runtime. The file is signed and replaced atomically on every sync, so an employee cannot add or whitelist an MCP server without central approval.
  • The org-plugins mount. A directory at a system path that Cowork scans on launch. Any plugin inside that directory is visible in the slash menu, the sidebar, and the agent runtime. The deployment binary owns the directory; end users have no write access.

The extension points exist on every platform. What makes a deployment real is a binary that fills all three at once and a protocol for getting that binary onto every machine.

The binary that fills the three slots

The systemprompt-core v0.3.0 release introduced systemprompt-cowork, a 2.3 MB standalone Rust binary that acts as:

  1. An inferenceCredentialHelper that prints a single JSON line to stdout containing a fresh JWT, a TTL, and the seven canonical headers every /v1/messages call needs to carry. Anthropic's Cowork client invokes the helper on every inference call; anything other than one JSON line on stdout breaks the contract and Cowork refuses to route.
  2. A signed-manifest sync agent that fetches a plugin, skill, and MCP manifest from /v1/cowork/manifest, verifies the Ed25519 signature against a pubkey pinned at install time, downloads and hash-checks every file, and writes the results atomically into the org-plugins mount.
  3. An installer and MDM emitter that creates the org-plugins directory with the right ownership on each OS, pins the gateway's signing pubkey, and prints a ready-to-apply MDM snippet for macOS, Windows, or Linux.

The binary has no daemon. It has no network ports. Anthropic invokes it as a subprocess; an administrator invokes it as a CLI; a scheduled task invokes it on a timer. The only long-lived state is a config TOML and a cached JWT, both stored with mode 0600 on Unix and with the equivalent ACLs on Windows.

Who speaks to whom

The call graph for a single Cowork chat turn under the gateway model is:

  1. User types a message in the Cowork UI.
  2. Cowork spawns systemprompt-cowork as a subprocess (configured via inferenceCredentialHelper).
  3. The helper reads the cached JWT. If the JWT is valid, it prints the JSON line to stdout. If not, it walks the capability ladder (mTLS first, then session cookie, then PAT) to mint a fresh one.
  4. Cowork parses the JSON, attaches the seven headers, and sends the Messages-format request to inferenceGatewayBaseUrl + /v1/messages.
  5. The gateway validates the JWT, applies scope checks, secret scanning, blocklist rules, and rate limits, dispatches to the upstream provider (Anthropic, OpenAI, Bedrock, Vertex, a self-hosted vLLM), and streams the response back.
  6. On the way back the gateway writes one row to audit_events for the inference call. Every subsequent tool call the model triggers produces additional rows with the same trace_id.

Nothing in that graph depends on the user's local Anthropic account. The model, the key, the audit trail, and the plugin catalogue all come from infrastructure the organisation controls.

Pass criterion

You are ready for the next section when you can draw the call graph above on a whiteboard without looking and you can name the three extension points (/v1/messages gateway, managed MCP allowlist, org-plugins mount) and the five policy keys (inferenceProvider, inferenceGatewayBaseUrl, inferenceCredentialHelper, inferenceCredentialHelperTtlSec, inferenceGatewayAuthScheme) from memory.

The value: a managed plugin marketplace, an MDM governance plane, and an audit trail you own

Before the commands, the thesis they prove. Today, a Cowork rollout is per-user by default. Every engineer installs their own plugins, connects their own MCP servers, signs in with their own OAuth token, and sends prompts to api.anthropic.com with no visibility for the organisation that pays the bill. That model works for a single developer and collapses at an enterprise: nobody knows what runs, nobody can revoke anything, the subscription is per-seat, and the data-flow diagram on the compliance form has a single box marked "third party" that the auditor is not going to accept.

The three extension points Anthropic formalised in April 2026 let an organisation replace that per-user model with three centrally-operated systems: a plugin and skill marketplace it curates, a managed-preference governance plane that sits outside the app, and an inference gateway that terminates every prompt on infrastructure the organisation runs. What follows are the three pillars, and how the rest of this guide hands you the artefacts to build each one.

1. An enterprise plugin marketplace that replaces per-user installs

When the rollout below is finished every laptop in the organisation opens Cowork to the same catalogue of plugins, skills, and MCP servers, pulled from a signed manifest an administrator publishes centrally. The catalogue is role-scoped: a finance engineer and a platform engineer see different plugins from the same deployment, because the gateway resolves the manifest server-side against the JWT claims. Every file is SHA-256 pinned and Ed25519 signed, so a tampered payload fails verification before it lands. New plugin today, new skill tomorrow, revoked plugin the day after — all three roll out and roll back atomically on the next sync, fleet-wide, with no end-user action. Plugins and skills stop being a tangle of per-user installs and become a shared asset the enterprise curates once and everyone inherits, with a versioned publish pipeline backing every change. This is how preinstalled signed plugins, skills, and a managed MCP allowlist across an organisation without end-user setup — outcome three of this guide — works in practice.

2. MDM as the governance plane that sits outside the app

Managed Device Management is the lever that makes the marketplace binding. Cowork reads five managed-preference keys (inferenceProvider, inferenceGatewayBaseUrl, inferenceCredentialHelper, inferenceCredentialHelperTtlSec, inferenceGatewayAuthScheme) from the OS-level policy store on every launch. Those keys decide where inference goes, which binary mints credentials, how long a token is cached, and what auth scheme the client presents. Intune, Jamf, and Group Policy are what push them; a .mobileconfig payload on macOS, a registry hive on Windows, environment variables on Linux. An end user cannot override the keys, cannot disable the credential helper, cannot swap the gateway URL — the governance decisions sit in the OS, not in the Cowork UI. That matters because it decouples policy from the application: when a new gateway host comes up, one MDM push updates every device; when an employee leaves, revoking their IdP account immediately breaks their JWT minting chain; when compliance asks "can a user bypass this," the answer is "not without administrative rights on the device." This is what outcome two — distribute a working MDM profile through Intune, Jamf, or Group Policy that points Cowork at your own inference gateway — actually buys the organisation: a governance plane that lives where your endpoint team already operates, not inside an app the user controls.

3. An end-to-end audit trail from agent to inference

The third pillar is what closes the compliance case. Every agent call — a plugin subprocess, a skill invocation, a direct prompt from the Cowork UI — leaves the client as an HTTPS request that carries a per-user JWT (outcome four) and a gateway-assigned trace ID. The gateway logs that ID before it routes the call, after it resolves the provider, and once more when the response comes back with token counts attached. Because the agent and the governance plane are decoupled — the agent runs on the laptop, the policy and audit sit on the gateway — you get complete end-to-end auditability and traceability without asking the client to be honest about what it ran. One query against the audit_events table (outcome five) joins user, prompt hash, provider, model, token spend, and latency for any request that crossed the fleet in the last N days. That single query is what closes a SOC 2 CC7.2 control, a GDPR Article 30 data-flow register, an internal cost-allocation spreadsheet, or a board-level ask about which models the company actually runs.

The money picture follows from the same split. /v1/messages traffic leaves the client and terminates on a gateway you run. The gateway fans out to the inference you chose: an air-gapped on-prem Llama cluster, a Bedrock or Azure deployment inside your VPC, a self-hosted Claude-compatible model, or a mix routed per team. That means one Anthropic Cowork subscription for the client UI and zero per-seat AI cost. The inference bill moves to capacity you already own or buy wholesale; prompts and completions never cross your perimeter unless you let them; cost and inference ownership stop being "whatever Anthropic billed us" and become a per-user, per-project line item you can chart, chargeback, or cap.

How the rest of this guide builds each pillar

The sections that follow are the artefacts. Installing the signed binary (outcome one) hands you a tamper-verifiable credential helper every device runs. Rolling it out via MDM (outcome two) puts the five policy keys on every laptop. The manifest and plugin marketplace (outcome three) defines the signed catalogue. The credential helper and seven headers (outcome four) thread per-user identity through every call. The audit trail (outcome five) is what you hand the auditor. Read the commands for how; come back to this section for why.

How to install Claude Cowork on Windows and Mac

Two artefacts, not one. The Cowork client ships through the App Store, the Microsoft Store, or Anthropic's own installers, and none of that changes when you move to a self-hosted gateway. The piece that does change is a second binary: systemprompt-cowork. It acts as the credential helper Anthropic invokes on every inference call and as the sync agent that keeps plugins and skills in lockstep with the gateway. The next three subsections cover that binary on Mac, on Windows, and as a scripted MDM payload.

Which binary do I want?

Pick one. Wrong architecture means a confusing "can't execute" error and an hour of debugging.

Every release ships a SHA256SUMS file alongside the binaries. Verify the hash before you do anything else. The deployment helper runs on every machine on every inference call, which makes it a high-value target for supply-chain spoofing; the one-line shasum or certutil check below closes that door.

The v0.3.0 release ships Developer ID-signed and notarised builds for macOS (both aarch64 and x86_64) and an Authenticode-signed build for Windows. Gatekeeper and SmartScreen accept the binaries on first run without a prompt, so the rollout is a straight copy-verify-install with no manual trust steps.

macOS install

The Mac path has exactly three moves: verify, install, print the MDM payload. The notarisation ticket travels with the binary, so you do not need to strip quarantine or whitelist the subprocess in Gatekeeper.

shasum -a 256 systemprompt-cowork

# 2. Install into a stable path with the right mode
sudo install -m 0755 systemprompt-cowork /usr/local/bin/systemprompt-cowork

# 3. First-run bootstrap: pin the gateway, print the MDM snippet
systemprompt-cowork install \
  --gateway https://cowork-gateway.example.com \
  --print-mdm macos

That last command is doing the real work. Five things happen in one call:

  1. It creates /Library/Application Support/Claude/org-plugins/ with system ownership (or falls back to ~/Library/Application Support/Claude/org-plugins/ if you did not elevate).
  2. It fetches /v1/cowork/pubkey from the gateway.
  3. It pins the returned Ed25519 public key into ~/Library/Application Support/systemprompt/systemprompt-cowork.toml.
  4. It persists the gateway URL into the same TOML.
  5. It prints the .mobileconfig payload you will push through Jamf, Intune for Mac, or profiles install in the next section.

Confirm the result:

systemprompt-cowork validate

Every line should be [ok]: binary presence, config presence, gateway reachability, org-plugins mount. Expect one [warn] on the cached JWT before the first login; that clears on the next inference call.

Windows install

Same three moves, PowerShell syntax. The Authenticode-signed binary clears SmartScreen on first run, so scripted rollout works without a manual "Run anyway" click.

# 1. Verify the hash against SHA256SUMS
certutil -hashfile systemprompt-cowork.exe SHA256

# 2. Install into a stable path
mkdir "C:\Program Files\systemprompt"
Move-Item systemprompt-cowork.exe "C:\Program Files\systemprompt\systemprompt-cowork.exe"

# 3. First-run bootstrap from an elevated PowerShell
& "C:\Program Files\systemprompt\systemprompt-cowork.exe" install `
    --gateway https://cowork-gateway.example.com `
    --print-mdm windows

One watch-out. install prefers system scope and creates C:\ProgramData\Claude\org-plugins\ with machine-wide ACLs only when run from an elevated PowerShell. Running it as a normal user silently falls back to %LOCALAPPDATA%\Claude\org-plugins\, which is fine for a developer trying the binary on their own box but wrong for production: Cowork always reads from the system path first, so plugins land somewhere the client ignores and the UI looks empty.

Confirm the result:

& "C:\Program Files\systemprompt\systemprompt-cowork.exe" validate

Same [ok] lines as on Mac. The first line explicitly reports which org-plugins path resolved and at what scope, which is your chance to catch the user-scope mistake before it becomes a support ticket.

Scripted rollout

Both installers are well-formed CLIs with a sensible --apply mode, so an MDM product can script the whole flow without a human in the loop. The binary is idempotent on purpose: re-running install on a machine that is already configured refreshes the pinned pubkey and the gateway URL without disturbing the cached JWT, which matters because MDM checks typically re-run the install script on every policy evaluation.

On macOS, a Jamf policy needs two lines of bash: drop the signed binary at /usr/local/bin/, then call systemprompt-cowork install --apply with the gateway URL pinned from a Jamf script parameter. The policy runs with root, which is what system-scope ownership needs.

On Windows, an Intune Win32 app package wraps the same idea. Set Install-Command to call systemprompt-cowork.exe install --apply, set Install-Context to System so the ACLs land on C:\ProgramData\, and point the Install-Behavior at a detection script that runs systemprompt-cowork.exe validate and greps for [ok] lines. Intune re-evaluates detection on every check-in; idempotency keeps that safe.

Pass criterion

systemprompt-cowork validate prints every line as [ok] on at least one Mac and one Windows machine, with the org-plugins path resolved to system scope on both. Every deployed binary's SHA-256 matches the published SHA256SUMS. If either of those fails, stop and fix it here before moving on to MDM; the MDM section assumes you have a working credential helper on disk.

Rolling Claude Cowork out via MDM: Intune, Jamf, and Group Policy

Managed preferences are how an administrator tells Cowork to use a gateway instead of calling Anthropic directly. Cowork reads the same five keys on every platform. The keys have the same meaning on every platform. The transport differs: a .mobileconfig payload on macOS, a registry hive on Windows, environment variables on Linux. This section gives a working example of each.

The five keys, side by side

A few things worth knowing. The gateway URL on Windows will be rejected if it is http:// unless the host resolves to 127.0.0.1; anything else must be HTTPS. The inferenceCredentialHelperTtlSec value is advisory; the binary will also respect a shorter TTL coming back from /v1/cowork/auth/pat in the response body, so a gateway that wants five-minute tokens can override the MDM value downward. The inferenceGatewayAuthScheme value is always bearer today; the key exists to allow a future custom scheme without another MDM rollout.

macOS: the .mobileconfig payload

A .mobileconfig file is an Apple-signed property list that an MDM product installs into the managed-preferences store. The payload below is the exact shape Cowork reads. Replace the five UUIDs with fresh ones from uuidgen, sign the plist through your MDM signing certificate, and push it through Jamf, Intune for Mac, or (for a single device) profiles install.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>PayloadType</key>
  <string>Configuration</string>
  <key>PayloadIdentifier</key>
  <string>com.example.cowork-gateway.profile</string>
  <key>PayloadUUID</key>
  <string>A1111111-1111-1111-1111-111111111111</string>
  <key>PayloadDisplayName</key>
  <string>Claude Cowork gateway policy</string>
  <key>PayloadVersion</key>
  <integer>1</integer>
  <key>PayloadScope</key>
  <string>System</string>
  <key>PayloadContent</key>
  <array>
    <dict>
      <key>PayloadType</key>
      <string>com.anthropic.claudefordesktop</string>
      <key>PayloadIdentifier</key>
      <string>com.example.cowork-gateway.content</string>
      <key>PayloadUUID</key>
      <string>B2222222-2222-2222-2222-222222222222</string>
      <key>PayloadDisplayName</key>
      <string>Cowork managed preferences</string>
      <key>inferenceProvider</key>
      <string>gateway</string>
      <key>inferenceGatewayBaseUrl</key>
      <string>https://cowork-gateway.example.com</string>
      <key>inferenceCredentialHelper</key>
      <string>/usr/local/bin/systemprompt-cowork</string>
      <key>inferenceCredentialHelperTtlSec</key>
      <integer>3600</integer>
      <key>inferenceGatewayAuthScheme</key>
      <string>bearer</string>
    </dict>
  </array>
</dict>
</plist>

The payload's PayloadType of com.anthropic.claudefordesktop is the preference domain Cowork reads on launch. The Apple Configuration Profile Reference documents the outer Configuration structure. Jamf users will want to wrap this with a Custom Schema entry from the Jamf Custom Schema docs so that the preferences are editable in the Jamf Pro UI rather than by pasting XML.

Install locally for testing with:

sudo profiles install -path=./cowork.mobileconfig
defaults read /Library/Managed\ Preferences/com.anthropic.claudefordesktop | head

The defaults read output should echo the five keys. If it does not, the profile scope is wrong or the PayloadType does not match.

Windows: the registry policy

On Windows the five keys live at HKEY_CURRENT_USER\SOFTWARE\Policies\Claude. Machine-wide policy under HKEY_LOCAL_MACHINE is supported by Cowork as a fallback but the canonical scope is user. The example below is the exact output of systemprompt-cowork.exe install --print-mdm windows.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\SOFTWARE\Policies\Claude]
"inferenceProvider"="gateway"
"inferenceGatewayBaseUrl"="https://cowork-gateway.example.com"
"inferenceCredentialHelper"="C:\\Program Files\\systemprompt\\systemprompt-cowork.exe"
"inferenceCredentialHelperTtlSec"=dword:00000e10
"inferenceGatewayAuthScheme"="bearer"

The hex 00000e10 decodes to 3,600 decimal. For Intune, convert each key into a Custom OMA-URI row under ./User/Vendor/MSFT/Policy/Config/Claude/; the Microsoft Intune Custom OMA-URI docs describe the transport. For Group Policy, the systemprompt-core repository ships an ADMX stub alongside the release artefacts; import it into your Central Store and the five settings appear under User Configuration → Policies → Administrative Templates → Claude.

Apply a .reg file locally with:

reg import cowork.reg
reg query "HKCU\SOFTWARE\Policies\Claude"

Fully quit the Cowork client (right-click tray icon, Quit, confirm no Claude.exe in Task Manager) before relaunching. Cowork reads managed preferences once on process start.

Windows scheduled task for periodic sync

The credential helper contract runs the binary on every inference call, which keeps the cached JWT fresh. The manifest sync, which pulls new plugins and skills, is a separate concern. A Windows Scheduled Task is the right shape: it runs at logon and then on an interval the template defines.

systemprompt-cowork.exe install --emit-schedule-template windows
# writes: systemprompt-cowork-sync.xml to the current directory

schtasks /Create /TN "SystempromptCoworkSync" /XML systemprompt-cowork-sync.xml
schtasks /Query  /TN "SystempromptCoworkSync" /V /FO LIST

The XML template conforms to the Microsoft Task Scheduler schema and can be edited before import to change the interval, scope, or run account. A parallel macOS launchd variant ships with --emit-schedule-template macos and drops a plist into ~/Library/LaunchAgents/.

Linux reference rollout

Linux does not have a managed-preferences framework that matches the five-key shape, so the binary reads the same values from environment variables. An Ansible role, a Puppet manifest, or a Nix module that writes the five CLAUDE_* variables to /etc/environment (or a user profile) is a valid rollout for developer workstations.

Pass criterion

Applying the MDM profile on a Mac and the registry snippet on a Windows box causes Cowork to skip the Anthropic sign-in screen on launch. A reg query "HKCU\SOFTWARE\Policies\Claude" on Windows or a defaults read /Library/Managed\ Preferences/com.anthropic.claudefordesktop on macOS returns all five keys. Cowork opens and the UI immediately shows a chat window with no authentication prompt.

How Claude Cowork authenticates users: PAT, session, mTLS, and the seven canonical headers

Cowork has no identity of its own. When the client sends a /v1/messages request through the gateway, every identity claim on the call comes from the JWT that systemprompt-cowork prints to stdout. The JWT is what the gateway validates, logs, and uses to scope the response. Every downstream audit event keys off the claims inside it.

Getting that right is the difference between a deployment that satisfies an auditor and a deployment that is a pile of shared service accounts in a trench coat. This section walks through the three auth tiers, the seven headers, the caching model, and the revocation story.

The three auth tiers as a capability ladder

The binary exposes three authentication providers. On every call they run in a fixed order: mTLS first, session cookie second, PAT third. The first provider that can produce a valid token wins; the others never run.

The right tier depends on the compliance posture and the enrolment story.

PAT is the simplest tier. An administrator mints a token at the gateway, the user runs systemprompt-cowork login sp-live-xxxxxxxx once, and the token lives in the OS keystore (macOS Keychain, Windows Credential Manager, Linux Secret Service). The gateway exchanges the PAT for a JWT each time the cache expires. Revoke a PAT at the gateway and the next credential-helper call returns a 401, which Cowork surfaces as a sign-in prompt.

Session cookie is the right tier when you already have a browser-based SSO flow and want Cowork enrolment to ride on it. The helper opens {gateway}/cowork/device-link?redirect=http://127.0.0.1:<ephemeral> in the user's default browser, the user authenticates against your IdP, the browser sends the auth code back to the loopback server, the helper exchanges the code at /v1/cowork/auth/session, and a session cookie lands in the OS keystore. The cookie is long-lived; the JWT minted from it respects the TTL in inferenceCredentialHelperTtlSec.

mTLS is the right tier when the device itself is the identity. The helper loads a client certificate from an OS keystore reference (SP_COWORK_DEVICE_CERT_LABEL on macOS/Windows, a file path on Linux), presents it to /v1/cowork/auth/mtls, and receives a JWT that carries the certificate's SHA-256 fingerprint as a claim. This is how regulated environments get per-device traceability without a shared PAT. The certificate itself can be issued by the corporate PKI, stored in a TPM or Secure Enclave, and rotated on the existing device-certificate lifecycle.

The seven canonical headers

Every /v1/messages call that leaves Cowork carries seven headers the gateway stamps into audit_events. The set was formalised in R030 alongside the new PolicyVersion typed identifier. Every header name is a constant in systemprompt_identifiers::headers; there is no magic string lookup.

  • x-user-id identifies the human on the other side of the keyboard. For PAT and session tiers it is the user who minted the credential; for mTLS it is the user claim encoded in the certificate's Subject Alternative Name.
  • x-session-id identifies the Cowork chat session. It rolls over when the user opens a new conversation and stays stable across tool calls within one conversation.
  • x-trace-id identifies this one inference call. Every tool call the model triggers and every MCP execution that results inherits the same x-trace-id, which is what makes the audit join possible.
  • x-client-id is always sp_cowork for requests that originate in the Cowork client. Custom integrations that call the gateway directly use their own sp_* identifier. The constant lives in ClientId::cowork().
  • x-tenant-id identifies the organisation. For single-tenant deployments it is set once in the gateway config; for multi-tenant deployments it is resolved from the JWT claim on each call.
  • x-policy-version identifies which version of the governance policy was in effect when the request was evaluated. A PolicyVersion::unversioned() constant exists for deployments that do not version their policies yet.
  • x-call-source identifies the module that issued the call. Cowork's own inference calls set it to cowork; tool subagents set it to subagent; scheduled jobs set it to job.

The seven headers are the primary key of the audit trail. Joining audit_events on any subset of x-user-id, x-trace-id, and x-tenant-id yields the full lineage of a request. The full list appears in the credential-helper stdout line.

The stdout contract

The Anthropic inferenceCredentialHelper contract is uncompromising. On every run the binary must print exactly one JSON line to stdout. One. Not two, not zero, not a log preamble. Every byte of diagnostic output goes to stderr. If the binary prints anything else, Cowork parses the first line, fails, and surfaces a "credential helper failed" error in the UI.

The shape of the JSON line is:

{
  "token": "eyJhbGciOi...",
  "ttl": 3600,
  "headers": {
    "x-user-id": "u_29f8a3",
    "x-session-id": "s_01hy...",
    "x-trace-id": "t_a8bf...",
    "x-client-id": "sp_cowork",
    "x-tenant-id": "org_acme",
    "x-policy-version": "2026-04-22",
    "x-call-source": "cowork"
  }
}

token is the JWT. ttl is the seconds remaining before the JWT expires. headers is the seven-header set. Cowork attaches each header verbatim to the /v1/messages call.

Caching, rotation, revocation

The JWT and its TTL are cached on disk at $XDG_CACHE_HOME/systemprompt/systemprompt-cowork.json with mode 0600 on Unix. A run that finds a cached JWT within its TTL window returns the cached JSON without making any network call. When the TTL expires, the capability ladder runs and mints a fresh one. Cache invalidation is a matter of deleting the file; the next helper run will re-authenticate.

Rotation happens naturally because the TTL expires. A 3,600-second TTL means a user is re-authenticated against the gateway at least hourly. Revocation at the gateway therefore takes effect within one TTL window. For faster revocation drop the TTL to 300 seconds in the MDM profile; the binary handles short TTLs without complaint and the keystore hit per inference call is negligible.

Certificate rotation for the mTLS tier follows your PKI's lifecycle. Rotate the pinned manifest-signing pubkey by running systemprompt-cowork install --gateway <url> again; the binary writes the new pubkey atomically and the next sync verifies against it.

Pass criterion

Running systemprompt-cowork with no arguments on a logged-in machine prints exactly one JSON line to stdout, the line parses as JSON, and the seven headers match the ones above. Revoking the PAT at the gateway causes the next credential-helper call to fail with a clean error and Cowork falls back to a sign-in prompt rather than failing silently.

Preinstalling Claude Cowork plugins and skills across an organisation

The admin's most-searched question about Claude Cowork is some version of "how do we ship plugins and skills to the whole team without every engineer following a setup doc". The answer is the signed manifest. It is the piece of the R030 release that most directly changes the deployment story.

What the manifest contains

A manifest is a JSON document served by the gateway at /v1/cowork/manifest and signed with an Ed25519 private key. Every client verifies the signature against a pubkey pinned at install time. The shape of the manifest, as defined in systemprompt-core/bin/cowork/src/manifest.rs, is:

{
  "version": "2026-04-22T09:30:00Z",
  "user": {
    "id": "u_29f8a3",
    "name": "Jane Example",
    "email": "jane@example.com",
    "roles": ["engineering", "senior"]
  },
  "plugins": [
    {
      "id": "devops-plugin",
      "version": "1.4.2",
      "files": [
        {"path": "plugin.toml", "sha256": "..."},
        {"path": "handlers/deploy.sh", "sha256": "..."}
      ]
    }
  ],
  "skills": [
    {
      "name": "review-terraform-plan",
      "description": "Audit a Terraform plan for destructive changes",
      "file_path": ".systemprompt-cowork/skills/review-terraform-plan.md",
      "instructions": "..."
    }
  ],
  "agents": [
    {
      "id": "pr-reviewer",
      "endpoint": "/v1/messages",
      "model": "claude-sonnet-4-6",
      "skills": ["review-terraform-plan"],
      "mcp_servers": ["gh-readonly"],
      "enabled": true,
      "default": false
    }
  ],
  "managed_mcp_servers": [
    {
      "name": "gh-readonly",
      "url": "https://mcp-internal.example.com/gh-readonly",
      "oauth": false,
      "headers": {"x-team": "platform"},
      "tool_policy": {"allow": ["search_code", "read_file"]}
    }
  ],
  "revocations": [
    {"kind": "skill", "name": "leaked-api-probe"}
  ],
  "signature": {
    "alg": "ed25519",
    "sig": "base64(ed25519(canonical_json(manifest_body)))"
  }
}

That JSON looks straightforward. Five details carry the whole governance story:

  • The manifest is per-user. The user block is resolved server-side from the JWT on the request, so two engineers in the same tenant can receive different manifests when role-scoping puts them in different buckets. Finance sees the finance skill catalogue. Platform sees the platform plugin set. Nothing crosses the streams.
  • managed_mcp_servers is the allowlist. Any MCP server Cowork tries to invoke that is not in this list is refused at runtime. Central, per-principal, revocable. Remove a server from the manifest and the next sync drops it from the allowlist file before any tool call can reach it.
  • revocations is the kill switch. If a shipped plugin turns out to contain a bug, a malicious pattern, or a leaked credential, the next manifest carries a revocation entry and the binary removes the offending file atomically on the next sync. No end-user action required.
  • skills and plugins are separate on purpose. A plugin is an executable unit (a subprocess, an MCP server, an extension). A skill is a text instruction the model consults as context. Both can ship in the same manifest and reference each other; agents are what tie them together.
  • The signature is Ed25519 per RFC 8032. It covers a canonical JSON encoding of the manifest body. The binary pinned the gateway's pubkey at install --gateway time, so a mismatched signature fails loudly and the sync does not proceed. If an attacker ever swaps the gateway, sync stops on the spot rather than silently pulling tampered plugins.

The sync flow

A sync invocation does five things, in order, with atomicity guarantees at every step.

  1. Fetch /v1/cowork/manifest with the cached JWT as a Bearer token. Failure at this step (401, 403, 404) short-circuits the sync with a clear error and leaves the existing org-plugins mount untouched.
  2. Verify the Ed25519 signature against the pinned pubkey. If verification fails, the binary aborts before touching the filesystem. This is the line of defence against a compromised gateway or a man-in-the-middle.
  3. Stage every file referenced in plugins[].files[] under org-plugins/.staging/. Each file is downloaded from /plugins/{id}/{relative_path}, hashed with SHA-256, and compared to the expected hash from the manifest. A mismatch aborts the sync before anything lands in the live mount.
  4. Rename the staged files into place atomically. On Windows this uses MoveFileEx(MOVEFILE_REPLACE_EXISTING | MOVEFILE_WRITE_THROUGH). On Unix it uses rename(2). Either way there is no moment where Cowork sees a half-written plugin.
  5. Write the managed MCP allowlist to .systemprompt-cowork/managed-mcp.json and a sync-status file to .systemprompt-cowork/last-sync.json. Both files include the manifest version for drift detection.

When a sync succeeds the binary prints a single line of the form sync ok: N installed, 0 updated, 0 removed (M MCP servers, manifest 2026-04-22T09:30:00Z) to stdout. On failure the stdout is empty and a structured error lands on stderr.

The filesystem layout

After a successful sync the org-plugins mount looks like this on Windows:

C:\ProgramData\Claude\org-plugins\
├── .systemprompt-cowork\
│   ├── version.json
│   ├── last-sync.json
│   └── managed-mcp.json
├── devops-plugin\
│   ├── plugin.toml
│   └── handlers\
│       └── deploy.sh
└── pr-review-plugin\
    ├── plugin.toml
    └── skills\
        └── review-terraform-plan.md

On macOS the equivalent mount lives at /Library/Application Support/Claude/org-plugins/. On Linux it lives at ${XDG_DATA_HOME:-$HOME/.local/share}/Claude/org-plugins/. The layout is identical across platforms; only the root path changes.

Cowork scans the mount on launch and on Cmd+R (macOS) or F5 (Windows). New plugins appear in the slash menu and the sidebar without relaunching.

Skill versioning and rollback

Every skill carries a hash of its instructions. The manifest version is a timestamp. When a skill changes, the manifest bumps to a new version and the old skill file is overwritten in place. Rolling back is a matter of publishing the previous version of the manifest; the binary applies it on the next sync and the skill content reverts.

For higher safety, publish the manifest from a build pipeline that retains every signed version, and make rollback a one-line publish --manifest v2026-04-21 operation rather than a hand edit. The Ed25519 signing key should live in a hardware module (HSM, YubiHSM, or cloud KMS) and never on an administrator's laptop.

Pass criterion

Publishing a new plugin at the gateway and running systemprompt-cowork sync on a Mac and a Windows box causes the plugin files to appear under org-plugins/<plugin>/ on both machines. The Cowork slash menu shows the plugin without relaunching. Revoking the plugin in the next manifest causes it to disappear on the following sync. The last-sync.json file reflects the current manifest version.

Pointing Claude Cowork at Bedrock, Vertex, Azure, or a self-hosted model

The /v1/messages gateway is a proxy with a router. Incoming requests match against a list of routes; each route names a provider, an endpoint, and an API key resolved by name from the secrets file. This is where "our own inference cluster" stops being a slogan and becomes a YAML block.

The gateway config

The v0.3.0 release introduced an optional gateway block in the profile YAML. It is off by default; when gateway.enabled: true the gateway endpoints mount and the router activates.

gateway:
  enabled: true
  routes:
    - model_pattern: "claude-opus-*"
      provider: anthropic
      endpoint: "https://api.anthropic.com/v1/messages"
      api_key_secret: "anthropic_prod"

    - model_pattern: "claude-sonnet-*"
      provider: anthropic
      endpoint: "https://bedrock-runtime.us-east-1.amazonaws.com"
      api_key_secret: "bedrock_sonnet_signing"
      upstream_model: "anthropic.claude-sonnet-4-6-v1:0"

    - model_pattern: "gpt-4.*"
      provider: openai
      endpoint: "https://api.openai.com/v1/chat/completions"
      api_key_secret: "openai_prod"
      upstream_model: "gpt-4.1"

    - model_pattern: "kimi-*"
      provider: moonshot
      endpoint: "https://api.moonshot.cn/v1/chat/completions"
      api_key_secret: "moonshot_prod"

    - model_pattern: "qwen-*"
      provider: qwen
      endpoint: "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
      api_key_secret: "qwen_prod"

    - model_pattern: "internal-llama-*"
      provider: openai
      endpoint: "https://vllm-internal.example.com/v1/chat/completions"
      api_key_secret: "internal_vllm_token"
      upstream_model: "llama-3.3-70b-instruct"

The router walks the routes in order and the first model_pattern that matches wins. Each field in the matched route plays a specific role:

  • provider picks the dispatch strategy (byte proxy vs format-converting proxy vs custom).
  • endpoint is the URL the upstream will see.
  • api_key_secret names an entry in the existing secrets file. The gateway never holds a raw API key in its own store; it resolves the name at dispatch time, so key rotation is a secrets-file edit, not a code change.
  • upstream_model rewrites the model name if the upstream expects a different identifier (for example, Bedrock's anthropic.claude-sonnet-4-6-v1:0 instead of Cowork's claude-sonnet-4-6).

Providers and what they do differently

The v0.3.0 release shipped five built-in provider tags. Each has its own understanding of the request shape.

  • anthropic is a transparent byte proxy. Extended thinking blocks, cache-control headers, prompt-caching markers, and Anthropic-specific SSE events all pass through unchanged. This is the right provider for Anthropic's own endpoint and for Amazon Bedrock's Claude models (which speak the same wire format under the covers).
  • openai converts Anthropic Messages to OpenAI Chat Completions on the way in and converts the response back on the way out. Streaming is mapped event by event. Tool use survives the round-trip. This is what you use for Azure OpenAI, for self-hosted vLLM running Llama or Qwen, and for any OpenAI-compatible inference server.
  • moonshot routes to the Kimi API and handles Moonshot's native error shape. Extended thinking is mapped to the provider's reasoning field.
  • qwen routes to Alibaba's DashScope, which exposes an OpenAI-compatible surface but with different rate-limit headers and a distinct error schema.
  • minimax routes to MiniMax's Anthropic-compatible endpoint, which preserves streaming, tool use, and thinking blocks verbatim.

A gemini stub exists for Google's Vertex AI Gemini surface and is scheduled to move out of stub status in the next release.

Registering a custom provider

Providers are no longer a closed enum. The GatewayProvider field is a free-form string tag resolved at dispatch time, and extension crates register new providers via the inventory crate. The trait you implement is GatewayUpstream:

use async_trait::async_trait;
use systemprompt_gateway::{GatewayUpstream, UpstreamCtx};

pub struct InternalAgentRunner;

#[async_trait]
impl GatewayUpstream for InternalAgentRunner {
    async fn proxy(&self, ctx: UpstreamCtx<'_>) -> Result<(), anyhow::Error> {
        // Forward the Messages-format request to the internal agent
        // runner, stream the response back through ctx.response_sink().
        // ...
    }
}

inventory::submit! {
    systemprompt_gateway::ProviderRegistration {
        tag: "internal-agent-runner",
        factory: || Box::new(InternalAgentRunner),
    }
}

The tag internal-agent-runner is now usable as a provider value in any route. No changes to the core crate; no core release needed to expose a new upstream.

Preserving extended thinking and cache-control

Claude's extended thinking blocks and prompt-caching headers are the two places where a format-converting provider can quietly break behaviour. The anthropic provider is a byte proxy precisely to avoid this trap. For the openai provider the conversion is lossy: extended thinking is dropped, and cache-control is ignored because the OpenAI surface has no equivalent. Teams that rely on prompt caching for cost control should route Claude traffic through the anthropic provider even when the upstream is Bedrock; the Bedrock Claude surface preserves the caching semantics.

Secret handling

The gateway resolves api_key_secret values from the same secrets file that every other systemprompt component uses. The names in that file are the only thing that appears in gateway.routes[]. The actual keys never land in a process environment variable, a log line, or a stack trace. Rotation is a matter of updating the secrets file and sending a SIGHUP; the gateway re-reads on signal and switches keys without dropping connections.

Pass criterion

A /v1/messages call with a claude-sonnet-* model name routed through the gateway arrives at Bedrock with the expected AWS Signature V4 and returns an Anthropic-formatted response. The same call with a gpt-4.* model name arrives at Azure OpenAI with the expected chat-completions shape and returns content that Cowork renders as an assistant message. An audit_events row exists for each call, and the x-tenant-id and x-user-id on the row match the calling user.

What the audit trail looks like when Claude Cowork is governed end to end

Audit is where the deployment earns its keep. A CISO who can answer "what did user jane@example.com do with Cowork between 14:00 and 16:00 last Tuesday, and which tools did the model decide to invoke on her behalf" in one SQL query is in a different posture from a CISO who cannot.

The audit_events schema

Every pass through the governance pipeline writes one row to audit_events. The row is written before the response returns to the caller, which means there is no path where a successful call is not audited. The schema, as of R030:

CREATE TABLE audit_events (
  id           BIGSERIAL PRIMARY KEY,
  occurred_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  kind         TEXT NOT NULL,      -- 'inference' | 'tool_call' | 'mcp_exec' | 'cost'
  user_id      TEXT NOT NULL,      -- from x-user-id header
  session_id   TEXT NOT NULL,      -- from x-session-id header
  trace_id     TEXT NOT NULL,      -- from x-trace-id header
  client_id   TEXT NOT NULL,      -- from x-client-id header
  tenant_id    TEXT NOT NULL,      -- from x-tenant-id header
  policy_ver   TEXT NOT NULL,      -- from x-policy-version header
  call_source  TEXT NOT NULL,      -- from x-call-source header
  model        TEXT,               -- 'claude-sonnet-4-6', 'gpt-4.1', ...
  provider     TEXT,               -- 'anthropic', 'openai', 'moonshot', ...
  tokens_in    INTEGER,
  tokens_out   INTEGER,
  cost_micro   BIGINT,             -- microdollars
  latency_ms   INTEGER,
  outcome      TEXT NOT NULL,      -- 'allowed' | 'denied' | 'error'
  payload      JSONB NOT NULL      -- redacted request/response or tool args
);

CREATE INDEX ON audit_events (tenant_id, occurred_at DESC);
CREATE INDEX ON audit_events (trace_id);
CREATE INDEX ON audit_events (user_id, occurred_at DESC);

The seven canonical headers are columns in the table. Every audit query starts by filtering on two or three of them.

The one-shot lineage query

The point of the trace_id column is that every event triggered by one user prompt shares it. A single JOIN on trace_id gives you the full lineage.

-- Lineage for a single user prompt: inference call + every tool call + every MCP exec
SELECT
  e.occurred_at,
  e.kind,
  e.outcome,
  e.model,
  e.provider,
  e.tokens_in,
  e.tokens_out,
  e.cost_micro,
  e.payload ->> 'tool'    AS tool_name,
  e.payload ->> 'server'  AS mcp_server
FROM audit_events e
WHERE e.tenant_id = 'org_acme'
  AND e.user_id   = 'u_29f8a3'
  AND e.trace_id  = 't_a8bf...'         -- from the user's session
ORDER BY e.occurred_at ASC;

The query returns a chronological list: one inference row for the initial prompt, zero or more tool_call rows for each tool the model decided to invoke, zero or more mcp_exec rows for each MCP server call, and a final cost row summing the microdollars. Every row is tied back to the JWT-verified user, the tenant, and the policy version that was in force.

SIEM export

The audit_events table is the primary store; the structured JSON export is the secondary. Every row is also emitted as a JSON event on a separate topic for ingestion by Splunk, ELK, Datadog, or Sumo Logic. The shape of the event mirrors the table row. Teams that already run a SIEM usually prefer the JSON path; teams that want ad-hoc querying prefer the SQL path. Both are live in parallel.

Cost attribution

The cost_micro column is microdollars, which handles fractional-cent costs without floating point. Summing cost_micro grouped by user_id gives per-user cost. Grouping by (tenant_id, provider) gives per-provider cost for a tenant. Grouping by (tenant_id, payload ->> 'department') gives per-department cost, assuming the gateway stamps the department into the payload from the JWT.

Because the provider column captures which upstream served the call, cost comparisons across providers are a single aggregation. "What did we spend on Bedrock vs direct Anthropic in April 2026" is a one-line query.

Common failure modes and what they mean

  • Cowork still shows an Anthropic sign-in screen. The MDM keys did not apply. On Windows, reg query "HKCU\SOFTWARE\Policies\Claude" must show all five values. On macOS, defaults read /Library/Managed\ Preferences/com.anthropic.claudefordesktop must show them. If they are present and Cowork still prompts, the client was not fully quit before relaunch.
  • validate reports org-plugins path: ... (scope: user). The install ran without elevation. On Windows re-run from an elevated PowerShell; on macOS from sudo. Until the system-scope path exists, Cowork and the sync agent disagree on where plugins live.
  • sync fails "manifest signature verification failed". The gateway's signing key rotated since the last install. Open the config TOML (%APPDATA%\systemprompt\systemprompt-cowork.toml on Windows, ~/Library/Application Support/systemprompt/systemprompt-cowork.toml on macOS), delete the [sync] block, and re-run install --gateway <url> to pin the new key.
  • manifest fetch failed: 401. The PAT expired or was revoked. systemprompt-cowork logout && systemprompt-cowork login <new-pat> re-authenticates.
  • Chat works but no plugins are visible. The sync ran against the user-scope path while Cowork is reading from the system path. Reconcile with validate and re-run install from elevation.
  • Cowork shows "credential helper failed". The binary printed something other than a JSON line to stdout. Run systemprompt-cowork directly from a terminal; any stderr output is expected, any extra stdout output breaks the contract and needs to be fixed.
  • The audit row carries no x-tenant-id. The JWT was minted without a tenant claim. Gateway config needs to resolve tenant from the authenticated identity; until it does, multi-tenant analysis is not possible.
  • Latency on the first call after idle is high. The credential-helper cache expired and the capability ladder ran end to end. Either reduce the MDM TTL (to rotate more often in the foreground) or increase it (to rotate less often). The right value is a function of the revocation SLA.

Pass criterion

For a single Cowork chat turn the audit_events table contains one inference row and one or more tool_call rows with a common trace_id, user_id, and tenant_id. Cost attribution grouped by user returns a non-empty result. The SIEM export pipeline picks up the same rows on its normal polling cadence.

Air-gapped and offline-first Claude Cowork deployments

The binary was designed with air-gap compatibility in mind. Nothing about the runtime flow requires outbound traffic to anthropic.com, a phone-home telemetry endpoint, or a vendor licence server. The only network calls the binary makes are the ones explicitly pointed at your gateway. This makes the architecture suitable for environments where outbound egress is restricted to a single allowlisted host.

An air-gapped deployment shifts three responsibilities inward:

  • The gateway has to run inside the egress boundary. It needs to be reachable from every client without those clients crossing the perimeter.
  • The upstream provider has to sit on the same side of the boundary. In practice this means a self-hosted inference cluster (vLLM, Ollama-on-server, sglang), a private Bedrock VPC endpoint, or an Azure OpenAI deployment inside a private link.
  • The signing keys have to be minted and distributed inside the security boundary. The Ed25519 key pair that signs manifests never leaves; nor does the HSM or KMS that holds it.

For true offline deployments where clients have no network path to any gateway, the sync agent supports a pre-seeded mode. An administrator produces a signed manifest and the full plugin payload on a build machine, copies the artefacts to a read-only share the clients can mount, and points the binary at a file:// URL. Sync still verifies the signature; there is no trust shortcut for local files. Cache TTL can be extended for this shape to match the operating window; the JWT, in a pure-offline deployment, is minted once per device at enrolment and rotated on a schedule that matches the security posture.

A half-online deployment (clients online, upstream offline) is the most common middle ground. The gateway sits on the client network, talks to an internal inference cluster on a private link, and the whole system never reaches the public internet for an inference call. The audit table lives on the same side of the boundary as the gateway. SIEM export lands in the same network. For large shops this is the posture auditors are happiest with: every byte of prompt content stays inside the perimeter, and the audit evidence is available locally without waiting on a vendor-hosted dashboard.

Storing the mTLS device certificate

For the mTLS tier in a regulated environment the device certificate belongs in hardware-backed storage, not on the filesystem. On Windows that means the Microsoft Platform Crypto Provider, which backs keys to the Trusted Platform Module (TPM). On macOS it means the Secure Enclave on Apple Silicon; Intel Macs have a T2 chip that provides an equivalent. On Linux it means a PKCS#11 token (YubiKey, Nitrokey, or a server-side HSM).

The binary's mTLS provider reads an OS keystore reference, not a raw private key. Configuring the reference through MDM means the cert never touches a PEM file on disk. The SP_COWORK_DEVICE_CERT_LABEL variable points at a keychain item on macOS, a certificate store entry on Windows, or a PKCS#11 URI on Linux. The private key stays inside the hardware; the binary signs challenges through the OS API and the certificate is extracted only for its public portion.

Certificate issuance rides on the existing corporate PKI. A device joining the fleet requests a cert from the internal CA at enrolment (via Jamf, Intune, or a user-driven systemprompt-cowork enrol flow), the CA signs it to the validity your PKI normally issues for device certs, and renewal happens through the same channel that rotates every other device credential. Revocation is immediate: flag the cert in the CRL and the next /v1/cowork/auth/mtls call returns a 403.

Migrating from direct API usage to the gateway

Teams that have been running Claude Cowork against api.anthropic.com with individual user OAuth need a cutover plan. The risk is not technical, it is change management: an engineer whose workflow breaks mid-task is a visible incident, and the right migration avoids visibility the wrong way.

The recommended cutover is a soft launch in four phases, each governed by a measurable gate rather than a time-boxed schedule.

  1. Stand up the gateway in shadow mode. The gateway accepts requests and writes to audit_events but does not alter routing; the upstream of record remains Anthropic's hosted API. Do not advance until the audit trail for the shadow period matches what the production calls look like.
  2. Cut over a pilot cohort. Push the MDM profile to a small group of engineers comfortable with being the first through. Advance when the pilot reports no workflow regressions and audit_events for the cohort shows the expected outcome = 'allowed' distribution.
  3. Cut over a department at a time. Roll the MDM profile out in rings that align with organisational structure. Advance ring by ring; any user with a spike in outcome = 'error' rows indicates a deployment problem that belongs on a triage list before the next ring goes.
  4. Retire direct access. Once the last ring is on the gateway, revoke the individual OAuth tokens at the Anthropic side. The gateway is now the only path.

The gate between each phase is evidence, not calendar. Running two auth paths in parallel is worse than either one alone, so advance as soon as the evidence supports it and not before.

Pass criterion

The audit table shows every inference call for the migrated cohort flowing through the gateway. Outbound traffic to api.anthropic.com is zero from client machines post-migration. Pilot users report no workflow regressions. The CRL-based revocation path has been tested end to end on at least one issued device certificate.

What to do next

Start with a pilot cohort small enough that you know everyone on it by name. The goal of the pilot is to validate the binary install, the MDM profile, the manifest signing key, the three auth tiers, and the audit trail end to end. Expand to departments once the pilot returns clean signals across all of those. Revisit the routing config whenever the model landscape shifts or a new upstream becomes available. Read the sibling guides on self-hosted AI governance, enterprise Claude Code managed settings, and MCP gateway security for the adjacent chapters in the same story.