Skip to main content

ONE BINARY. COMPLETE STACK. FIFTY MEGABYTES, POSTGRES ONLY.

A 50 MB Rust binary runs RBAC, audit_events, OAuth2, and MCP subprocess supervision on your network, with PostgreSQL as the only runtime dependency and no outbound callback before the response returns.

Data Stays On Your Network

A compliance officer asks where every prompt, every tool call, and every model response physically resides. In most AI deployments the honest answer is "I do not know, because the vendor's sub-processor list is twelve names long." When the binary runs on your hardware and connects only to your Postgres, the answer is one address. There is no compiled-in callback to a systemprompt.io endpoint, no anonymous telemetry ping, no authentication round-trip to our service. A CISO can confirm this with a packet capture. The binary's outbound connections are the Postgres socket and the AI providers configured in the profile, nothing else.

The audit trail lives next to the data. Every tool call writes a row into mcp_tool_executions with the tool name, server, input, output, status, user_id, session_id, and trace_id, so one SQL query links a model response back to the identity that authorised it. AI request cost in microdollars, tokens, model, and provider land in the ai_requests schema. The trace module stitches identity, permission decision, tool call, and model request into one trace_id per request. An auditor runs a SQL query against your Postgres. No API key to a SaaS audit service sits in the loop.

Retention and export stay under customer control. Skills, configuration, and profile data export as Markdown with YAML frontmatter through a file-based sync pipeline, so a compliance team can diff them in their own git. Anonymous session data is purged on a schedule by a job that runs in the same binary. Session retention is governed by the session repository. The binary, the extension crates, and the database schema are the complete artifact. No vendor sits in the loop after handover.

  • No Outbound Telemetry — The binary connects only to the Postgres configured in the profile and to the AI providers configured in the profile. A CISO confirms the data boundary with tcpdump rather than a vendor statement.
  • audit_events In Your Postgres — The mcp_tool_executions table stores tool call input, output, status, user_id, session_id, and trace_id in the database your DBA already owns. A compliance query is a SELECT, not a support ticket.
  • Markdown Export, Git-Diffable — Skills export as Markdown with YAML frontmatter so a compliance team can diff configuration in git. Anonymous-user cleanup runs as a scheduled job inside the same binary. Session retention lives in the session repository. No third-party processor appears in the export path.

Six Services Collapsed Into One Binary

A platform engineer rolling out Claude across teams usually wires six things together by hand. An OAuth server, a permission layer, a host that supervises agent processes, a secret store that keeps API keys out of model prompts, an audit pipeline a SOC 2 auditor will accept, and a way to ship reusable skills without asking each user to clone a repo. Six vendors, six failure modes, six upgrade cycles, six processor contracts the security team has to review. systemprompt.io compiles all six into one Rust binary, so the control plane is one artifact to monitor, one process to patch, one contract to review.

The binary bundles identity, permission enforcement, agent lifecycle supervision, server-side secret injection, audit persistence, and skill distribution. Identity is an OAuth2 server with PKCE and WebAuthn passkeys. Permission enforcement is a single check that runs before every governed handler. Agent lifecycle supervision is a reconciler that drives running processes to the configured state. Server-side secret injection sets credentials on the subprocess environment before the tool process spawns, so they never appear in a prompt. Audit persistence writes every tool call to Postgres with its identity and trace_id. Skill distribution emits versioned Markdown with YAML frontmatter. Each subsystem is named inline in the bullets below with a code reference for every claim.

One release build against a customer fork produces one artifact. The same artifact runs on a laptop for a developer smoke test, in a Kubernetes pod for staging, and inside an air-gapped subnet for production. The CTO's "what am I actually self-hosting" question resolves to one answer, the whole stack. No sidecar, no outbound call to a vendor service, no second daemon to upgrade in lockstep.

  • Permission Middleware — A single check validates the JWT, reads user claims, compares OAuth2 scopes against the required permission, and returns 403 before any handler runs. Without this surface, every team wires its own permission layer and the security review has to audit six of them.
  • In-Process Agent Supervision — The agent orchestration module runs process supervision in-process. A reconciler drives actual process state to the configured state, a port manager assigns isolated ports, a monitor health-checks each subprocess, and an event bus emits lifecycle events. A crashed MCP server is picked up by the same binary, not by a separate pod-restart controller.
  • Subprocess Secret Injection — Server-side secret injection merges per-profile credentials onto the MCP subprocess environment before spawn, so the model receives the tool result, never the credential. Without this path, a key leaks into a prompt or a completion and appears in a conversation history row.

Federate Or Run The Built-Ins

The security team has already bought identity, secrets, and a SIEM. They will not approve a deployment that asks them to rip any of it out. systemprompt.io does not require that. The Extension trait exposes override points for AI providers, background jobs, database schemas, HTTP routes, and tool providers, so a team keeps the vendors the security review has already cleared and drops systemprompt.io in as the governance layer. What an extension does not override stays populated by the built-ins.

Integrate. Identity is not an extension override. It is OAuth2 federation at runtime configuration. Point the binary at Okta or Auth0 as the issuer and the external JWT flows into the permission check unchanged. Forward audit events out to Datadog or Splunk with a scheduled job that reads the audit schema and pushes, or with a lifecycle hook handler. Pull credentials from Vault the same way, through a job implementation that writes into the server-side secret store. Identity, secrets, and telemetry stay in the systems the security team already audits. The permission check, the rate limits, and the lifecycle hooks still run on every request inside the same process, so the governance plane does not move.

Replace. For air-gapped or greenfield deployments, the binary ships the built-ins and runs with PostgreSQL as the only dependency. OAuth2 with PKCE, JWT generation, WebAuthn passkeys, profile-defined rate limits, and per-request cost attribution in microdollars all compile in. One artifact, one database, zero outbound calls. That resolves the staff engineer's "how do I run this in an air-gapped VM" question without adding a second process.

  • Federated Or Built-In Identity — Federate to Okta or Auth0 over OAuth2, or run the built-in server with PKCE and WebAuthn passkeys. Either way the JWT lands in the same permission check before the handler runs, so the governance plane is identical across both paths.
  • Forward Or Self-Host Telemetry — Pull credentials from Vault or AWS Secrets Manager via a job implementation, or use server-side secret injection. Forward audit events through a scheduled job or a lifecycle hook handler, or query the audit schema directly in your Postgres. The existing SIEM contract stays intact.
  • No Bypass Path — Every governed handler passes through the same permission check and the same rate-limit multipliers regardless of identity or telemetry path. No opt-out flag lets a team skip enforcement to meet a deadline, so the audit surface is uniform across deployments.

Named Boundary Surfaces

A staff engineer reading the architecture wants to know exactly where the binary ends and the rest of the world begins, because that boundary is where the security review ends. The boundary is concrete. AI model calls, identity providers, SIEM sinks, MCP-speaking agents, and operator browsers sit outside. Everything inside the binary terminates at one of four named surfaces, each with a file a reviewer can open.

AI providers. The provider factory routes to any upstream you operate through a custom endpoint, including a self-hosted inference cluster or vLLM gateway. Commodity providers (Anthropic, OpenAI, Gemini, Bedrock) match a provider name string to a concrete implementation. Cost is attributed per request in microdollars, so a finance review of AI spend is a SQL query on your own database rather than an invoice reconciliation.

Identity. The OAuth2 discovery document is served from /.well-known/oauth-authorization-server, the path OIDC clients already probe. Federated tokens are validated by the same permission check the built-in OAuth server uses, so a security team reviews one code path whether identity is local or federated.

Telemetry. Analytics events implement a stable JSON shape through a server-sent-event serialisation trait, which Splunk, Datadog, or ELK ingest without a custom parser. A SIEM search by user_id or session_id is one query, not a regex over free-form strings, so the audit workload your team already runs extends to AI agent activity without a glue service.

Agents. Claude Desktop, Claude Code, and any MCP client connect to the binary as tool consumers. Per-server health checks run under the MCP monitoring module and agent-to-agent traffic terminates in the A2A server. Every tool call, whichever client originated it, hits the same permission middleware, so a leaked admin token in a CLI is blocked at the same line as a leaked token in a browser.

  • Self-Hosted Upstream, Then Commodity Providers — The provider factory accepts a custom endpoint for any upstream you operate (self-hosted vLLM, Ollama, or a private inference cluster), and also maps names to Anthropic, OpenAI, and Gemini. Cost lands in microdollars per request in your Postgres, not in a vendor invoice.
  • One Identity Path, Local Or Federated — The OAuth2 discovery document is served from the standard well-known endpoint. Federated tokens flow into the same permission check the built-in OAuth server uses, so identity integration does not fork the governance code path.
  • Stable JSON For SIEM — Analytics events serialise to a stable JSON shape that Splunk, Datadog, and ELK ingest without custom glue. MCP server health checks and every tool call pass through the same permission middleware, so telemetry and enforcement share one schema.

In-Process Permission Middleware

A security lead does not ask "is there governance". They ask "what code, in what process, on what line, decides whether a write tool against a customer database runs". In this binary the answer is a single function in the MCP permission middleware. It validates the JWT, extracts the user claims, compares OAuth2 scopes against the required permission, and returns an error before any handler runs. Nothing reaches the tool runtime around it, so a CISO can cite one line in an audit instead of stitching logs across a proxy and a sidecar.

Two companion surfaces run on the same call path. Rate limits live in a profile YAML file with per-tier multipliers so a tightening response to a live incident is a config push, not a release. The lifecycle hook surface is an enum variant per moment an external action might need to fire. Before a tool runs, after it runs, after it fails, on session start, session end, prompt submit, notification, stop, subagent start, and subagent stop. The variant list is the contract.

Policy values live in YAML profiles the binary reads at startup, so a security team ships a tightened rate limit or a new permission tier without touching Rust and without waiting for a product release. The enforcement code lives in the binary on the same call path as the tool. No sidecar can be misconfigured into a no-op, which resolves the CTO's "does this replace what I'm building" question. The enforcement layer is a single function call away from the tool dispatcher, and that is what a sidecar has to emulate.

  • Permission Check Before Dispatch — The permission middleware validates the JWT, extracts claims, checks audience, and compares OAuth2 scopes to the required permission. A mismatch returns a typed error before any handler runs, so a denied call never reaches the backend. Without this path, a leaked token reaches the tool runtime before anything catches it.
  • Tier Multipliers In YAML — Per-tier rate multipliers live in the profile YAML with a burst multiplier applied on top. A tightening response to a live incident is a config push, not a release. Anonymous callers cannot saturate your backend, admins get headroom for scripting.
  • Fixed Lifecycle Hook Enum — The hook enum defines the moments an external action can usefully fire. Before and after a tool call, on failure, on session start and end, on prompt submit, on notification, on stop, and on subagent start and stop. A fixed list means matchers compile, not interpret, and the audit path has a finite surface to review.

One Extension Trait

An architect drawing the box diagram wants one place where every AI request from every agent crosses a controlled line. That line is the binary, and the contract is a single Rust trait. An extension implements the trait to contribute HTTP routes, background jobs, database schemas, migrations, AI providers, tool providers, page prerenderers, roles, required assets, and configuration sections. What an extension does not override falls through to the in-binary defaults, so no path lets a component sneak in without the trait declaring it.

Two deployment shapes use the same trait. Run the binary with every default on, and one artifact provides identity, permission enforcement, MCP supervision, secrets, audit, and skill distribution against one Postgres. That is the air-gapped shape a staff engineer cares about, a single binary in a subnet with no internet egress. Override the AI provider routing, the HTTP router, and an analytics job, and the same artifact becomes a thin policy layer that hands identity to Okta and telemetry to Datadog while still running the permission check and the lifecycle hooks on every call. The security review is the same trait in both cases, which is what makes the build-vs-buy answer defensible.

The codebase is exercised by integration tests, load tests, fuzz tests, and benchmarks, so the "is it production-ready" question has a test suite to read rather than a sales claim to trust. The Rust runtime has no garbage collector to schedule around, a real point for latency-sensitive agent calls that belongs on a dedicated benchmark page, not here.

  • Built-In Or Federated, One Trait — Use every built-in (OAuth2, analytics, rate limits), or override AI provider routing, the router, and analytics jobs to forward to Okta and Datadog. Both deployments share the same extension contract, so a security review covers one trait, not two architectures.
  • Tested Under Load, Fuzz, Benchmark — Integration, load, fuzz, and benchmark suites live under the tests crate. A staff engineer reads the failing cases to see what the binary does at the edges, rather than taking a vendor benchmark at face value.
  • Air-Gap Or Thin Policy, Same Artifact — Greenfield air-gapped deployments use the built-ins for identity, audit, and secrets, so one artifact covers the whole stack. Existing stacks override only the surfaces they want to keep, so the binary becomes a policy layer above Okta and Vault. The audit row shape is identical either way.

Compile-In Extensions

The team adopting this does not want a SaaS dashboard with limited webhooks. They want to add a route, a job, a schema, and a custom AI provider, and ship it as part of the same binary the security team already approved. systemprompt.io is a source-level dependency. The integrator adds the crate to Cargo.toml, implements the extension trait, runs a release build, and ships one artifact. Proprietary logic compiles into your binary, not into a third-party service, so source-available licensing covers both the library and the extensions together.

The extension trait exposes methods across every domain. HTTP routes, database schemas, migrations, background jobs, AI model routing, MCP tool providers, static page prerenderers, YAML config namespaces, permission roles, and required static assets. Each method has a predicate so the registry can skip extensions that do not contribute in a given domain, and a default so extensions only write the methods they override. The registry that loads them lives in a typed registry module a reviewer can read end to end in one sitting.

Version pinning is the integrator's call. The artifact compiled today runs the same way next year unless someone bumps the dependency and recompiles, so no forced upgrade can land on a production deployment from a vendor. A staff engineer answering "can I freeze this for a regulated audit cycle" has a one-line answer. Pin the crate version in Cargo.toml.

  • Cargo Dependency, Not SaaS Subscription — systemprompt.io is added as a crate dependency and compiled into your artifact. No upstream service runs the workload, so regulated audit cycles freeze cleanly. Pin the version and the behaviour does not drift.
  • One Trait, Every Override Surface — Routes, schemas, migrations, jobs, AI providers, tool providers, prerenderers, config namespaces, roles, and required assets. Each with a predicate and a default, so an extension only writes what it overrides and the registry skips the rest.
  • Reference Extensions As The Contract — The systemprompt-template repository ships extension crates that exercise the trait against real HTTP routes, jobs, and schemas. A staff engineer reads them to see the contract in use, rather than inferring it from documentation alone.

Founder-led. Self-service first.

No sales team. No demo theatre. The template is free to evaluate — if it solves your problem, we talk.

Who we are

One founder, one binary, full IP ownership. Every line of Rust, every governance rule, every MCP integration — written in-house. Two years of building AI governance infrastructure from first principles. No venture capital dictating roadmap. No advisory board approving features.

How to engage

Compile it. Run it. Read the audit table.

Clone the template, build a release binary against your own fork, and put a real permission check in front of a real Claude agent. The artifact is yours, the database is yours, and the trace_id lands in your Postgres. No key to a SaaS tenant is handed over at any point.