Skip to main content

ANALYTICS & OBSERVABILITY. ONE TRACE ID, ONE AUDIT QUERY.

Every AI request writes a typed row to PostgreSQL with trace_id, user_id, session_id, task_id, tokens, cost in microdollars, latency, model, and agent. One trace_id lookup returns the full request chain. Structured JSON streams to Splunk, Datadog, Elastic.

Trace ID Lookup

An auditor asks what an agent did for one user last Tuesday. In a stock Claude deployment that question becomes a grep across application logs, an LLM provider dashboard, and an MCP server's stdout, and a week later the answer is still a guess. systemprompt.io collapses it into a single trace_id lookup, so the CISO answering "can I prove this in an audit?" has the query in one line and the row in one table.

A structured log row reaches the database only after identity is bound to it. Every event carries user_id, session_id, trace_id, and where applicable task_id, context_id, and client_id as typed columns. Builder methods attach those ids at construction, so a row that reaches the database without a trace_id is a programming error, not a configuration choice. The struct that enforces this is named in the reference below.

One trace_id resolves the full request lineage in a single call. The trace query service runs parallel lookups across log events, AI request events, MCP execution events, execution step events, and the linked summaries and task id, returning an ordered timeline. Nothing is sampled. Every row for the trace is fetched. An auditor runs one SQL lookup and gets the chain from login to model output, which is the artefact finance, legal, or an external auditor actually asks for.

  • Identity Bound Before Write — An unattributed log row cannot be persisted. Identity (user_id, session_id, trace_id, task_id, context_id, client_id) is set through builder methods at construction, so 'who did this' is never null when an auditor looks.
  • One Lookup, Full Chain — An incident responder has a request id, a task id, or a partial trace prefix and nothing else. The audit-lookup helper resolves from any of those and returns the whole conversation plus every tool invocation, so the 3am question is one query, not a join across five log stores.
  • Finance and Engineering Read One Row — Per-trace cost is aggregated as integer microdollars alongside total tokens, request count, and latency. An exec asking 'what did this agent cost us' and an engineer asking 'how slow was this trace' read the same row, so cost conversations stop turning into reconciliation tickets.

Pre-Tool-Use Hooks

An agent issues a tool call nobody anticipated. By the time it lands in a log, it has already executed against a customer database. The fix is a programmable checkpoint in front of the tool call, not behind it, so the block happens before the model runtime ever sees a result. The staff engineer answering "where do I plug my policy in?" gets one enum with named lifecycle moments, not a post-hoc alerting rule.

Lifecycle moments are covered across the agent runtime: before a tool runs, after it succeeds, on tool failure, at session start and end, on user prompt submit, on notification, on stop, and when a subagent starts or stops. Each moment routes to a list of matchers (glob default "*") and each matcher runs one of three handlers, a shell command, an LLM evaluation, or a delegation to another agent. A pre-tool-use matcher that exits non-zero aborts the tool call before dispatch, so a destructive call written into an agent prompt does not reach the backend.

Delegation closes the loop. When one agent spawns a child, the parent-child relationship is recorded against the same trace_id, and the trace lookup from the previous section walks the full chain back to the originating user prompt. A CISO asking "which child did what for which user" reads the same single table, not a join across a parent log and a subagent log.

  • Named Lifecycle Moments — Session boundaries, tool-call before/after/failure, user prompt submit, notification, stop, subagent start and stop. The named moment is a column in the audit row, so 'why did this fire' is the first column, not a log archaeology dig.
  • Pre-Execution Deny — A pre-tool-use handler that exits non-zero aborts the call before the model runtime sees a result. Three handler types (shell command, LLM evaluation, delegation to another agent) cover block, evaluate, and escalate without recompiling the binary.
  • Subagent Chains Under One Trace — When an agent spawns a child, the child's events carry the parent's trace_id. The audit walks the full chain back to the originating user prompt, so a delegation does not become an unattributed gap in the log.

SIEM-Ready JSON

The SIEM team runs one test before accepting a new audit source. Does the event schema import into Splunk, ELK, Datadog, or Sumo Logic without a custom parser? systemprompt.io emits structured JSON for every event domain, with stable field names like trace_id, user_id, session_id, context_id, client_id, and task_id that a SIEM indexes without a regex extraction pass. A search by user or session is one query, not an ILIKE over free-form messages.

Distribution runs over server-sent events, with one channel per user per connection. The broadcaster holds the connection state in memory, and an automatic cleanup deregisters the subscriber when the client disconnects, so there is no background reaper for stale connections. A short keep-alive keeps idle dashboards open and detects a dropped link on the next tick rather than waiting for a TCP timeout.

Anomaly checks sit on the same audit stream. Default metrics trend against a rolling average, request rate per user, unique sessions per device fingerprint, and error rate. A value above baseline raises a warning signal, a larger multiple raises a critical one. Warning catches the "something changed" moment, critical is the "page someone now" moment, and both trigger off the same audit rows the CISO already reads. An unauthorised account hammering the inference API at 3am shows up as a row in the anomaly table, not a ticket filed next Tuesday.

  • Stable JSON, No Parser — Event domains serialise to SSE-compatible JSON with fixed identity fields (trace_id, user_id, session_id, context_id, client_id, task_id). Splunk, ELK, Datadog, and Sumo Logic ingest the stream directly, so the SIEM search a CISO runs works the day the log pipeline is connected.
  • Live Stream, No Reaper — Per-user SSE connections with automatic deregistration on disconnect and a short keep-alive, sized to catch a dropped dashboard on the next heartbeat without chatting over the wire every second. Broadcasters route agent-UI, agent-to-agent, context, and analytics events to the consumers that need them.
  • Thresholds Sized Like a Runbook — Warning and critical thresholds on request rate, sessions per fingerprint, and error rate. The alert is a row in the audit table, not a separate telemetry pipeline.

Microdollar Cost Ledger

Finance asks why this month's Claude spend doubled. Without typed attribution, the answer is a ticket and a week of CSV merging. The CTO answering "can I show the board spend broken by team, tool, and model without a spreadsheet?" should run one SQL query against one table. systemprompt.io's cost repository reads exactly that table.

Methods share one source. A summary method returns total request count, total cost, and total tokens for a window. Per-model and per-provider breakdowns group by the dimension an exec actually asks about. A per-agent breakdown joins the request log to the task log on task_id, so every dollar resolves to a named agent, not a category bucket. A time-series method returns the raw points a chart renders. Costs are stored as integer microdollars to avoid floating-point drift across aggregation and presentation boundaries.

The wider dashboard reads the same numbers. One query returns platform metrics for the overview ribbon (users, sessions, contexts, tasks, and request counts), and rolling windows across 24h, 7d, and 30d sit beside total cost and average cost per request. A tool ranking is a second read against the tool executions table, ordered by execution count, success rate, latency, or last-used. The reliability question and the spend question hit the same rows, so an engineer and a CFO stop arguing about definitions.

  • Model, Provider, and Agent in One Table — Per-model and per-provider breakdowns name cost, request count, and token totals. The per-agent breakdown joins the request log to the task log on task_id, so 'which agent ran up this spend' resolves to a named agent, not a category.
  • Rolling Windows — Rolling 24-hour, 7-day, and 30-day cost windows sit beside total cost and average cost per request. The board-meeting question ('what did we spend last month on AI, and how is it trending?') is one read, not a finance reconciliation project.
  • Tool Rankings, Reliability and Spend Together — Tools rank by execution count, success rate, average time, and last used. The same ranking answers 'which tool is cheapest to run' and 'which tool is failing most', so a reliability fix and a spend fix stop living in two different dashboards.

Admin Dashboard

An operator opening the dashboard at the start of a shift wants three answers. Who is using AI, how the system is behaving, and what changed since yesterday. In most AI stacks those answers live in three tools owned by three teams. systemprompt.io collapses them into one query set against one database, so the 9am conversation stops being a reconciliation meeting.

The overview ribbon reads one method that returns total users, active users in the last 24 hours and 7 days, total and active sessions, total contexts, total tasks, and total AI requests. An activity-trend method returns a daily time series across sessions, contexts, tasks, AI requests, and tool executions over a configurable window. The operator answering "what changed since yesterday?" reads one chart backed by one query, and the number on the chart is the same number the CISO queries in the audit table.

User-side behaviour sits on the same schema. Named event categories cover page view, page exit, link click, scroll, engagement, and conversion, plus an escape hatch for in-house events, mapped into navigation, interaction, engagement, and conversion buckets. An engagement row persists identity and timestamps alongside behavioural fields a product team actually uses: time on page, scroll depth, scroll velocity, focus time, rage-click and dead-click flags, and a coarse reading-pattern label. The dashboard updates over SSE with the same heartbeat described in the SIEM section, and leaderboards rank top users, agents, and tools so "who is most active today" is one read, not a stitched report.

  • One Overview, Not Three Dashboards — The overview ribbon and the daily trend share one read per dashboard open, so the operator's 'who is using AI and how is it behaving' question is answered before the page finishes loading, not after a background job.
  • Behaviour in Typed Columns — Engagement events persist time on page, scroll depth and velocity, focus time, rage-click and dead-click flags, plus a reading-pattern label. The product team answering 'where are users bouncing' reads typed columns, not an event blob it has to parse.
  • Leaderboards and Trend Arrows — Top users, agents, and tools rank by activity. A trend helper computes current-versus-previous counts across 24h, 7d, and 30d so the dashboard shows up and down arrows without a second query. 'Who spiked today' is one read, not a cross-dashboard comparison.

CLI Analytics

The 3am incident question is rarely "what does the dashboard show." It is "give me the raw rows for trace abc123 right now, and pipe it through jq." A platform engineer wants a CLI that hits the same database the dashboard uses, not a second query path that could disagree. The CLI here runs the same SQL methods the dashboard does, so an incident responder and an operator never read different numbers.

Analytics subcommands cover the daily surface: overview, conversations, agents, tools, requests, sessions, content, traffic, and costs. Each subcommand offers both a remote HTTP mode and a local database-context mode, so the same binary runs against staging, production, or a local Postgres dump on a laptop. The underlying trace service exposes list and search helpers with typed filters for agent, status, tool, server, level, and since, so a pattern search or a time-bounded scan is one flag, not a recompile.

Audit work uses the same lookups. One call resolves a request from a request id, task id, or trace prefix. Another returns the full conversation. A third returns every tool invocation. A fourth joins tool invocations to the MCP executions they ran against. A retention runner handles cleanup with tiered policies so hot logs age out faster than error logs, with debug, info, warn, and error windows configurable per deployment, and a scheduled vacuum keeps the index small enough that the 3am query still returns in seconds.

  • Same Binary, Local or Remote — Subcommands cover overview, conversations, agents, tools, requests, sessions, content, traffic, costs. Each runs over the network or directly against a local database context, so a Postgres dump from prod is debuggable on a laptop without a second tool.
  • Typed Filters — List and search helpers take typed filters for agent, status, tool, server, level, and since. An incident responder types one flag instead of chaining pipes, and the CLI hits the same query the SIEM does.
  • Audit Lookup From the Shell — One helper resolves from request_id, task_id, or trace prefix. Three more return the full conversation, every tool call, and the linked MCP executions. The auditor answering 'what did this trace do' runs a handful of commands, not a browser session.

Founder-led. Self-service first.

No sales team. No demo theatre. The template is free to evaluate — if it solves your problem, we talk.

Who we are

One founder, one binary, full IP ownership. Every line of Rust, every governance rule, every MCP integration — written in-house. Two years of building AI governance infrastructure from first principles. No venture capital dictating roadmap. No advisory board approving features.

How to engage

Ready to build?

Get started with systemprompt.io in minutes.