Disclosure: I built systemprompt.io. This guide covers SIEM integration for AI agent governance generally, and I will be specific about how systemprompt.io handles it. Where competitors do things differently or better, I will say so.
Why AI Agents Are a SIEM Blind Spot
Your security operations centre monitors everything. Network traffic. Authentication events. Database queries. File access. API calls. Application logs. Every meaningful action in your infrastructure generates a structured event that flows into your SIEM, gets correlated, and triggers alerts when something looks wrong.
Except for AI agents. AI agents are the newest actors in your infrastructure, and for most enterprises they are completely invisible to the SOC.
An AI agent that can read your database, write to your CRM, send emails on behalf of users, and execute code against production systems is not a chatbot. It is an autonomous actor with real permissions and real consequences. When it calls a tool, that tool call has an identity context, a permission boundary, a set of parameters, and a result. Every one of those data points belongs in your SIEM.
The problem is not that teams do not want to log AI agent activity. It is that the logging infrastructure does not exist yet. Traditional application logging captures HTTP requests and database queries. It does not capture "agent X, operating on behalf of user Y, evaluated policy Z, called tool W with parameters P, and got result R at cost C." That is a fundamentally different kind of event and it requires a fundamentally different schema.
The OWASP Top 10 for Agentic Applications 2026 makes this explicit. Risk ASI09 — Insufficient Logging and Monitoring — calls out the gap directly. If you cannot reconstruct what an AI agent did, who authorised it, what policies were evaluated, and what the outcome was, you do not have governance. You have a chatbot with API keys.
Most enterprises discovered this gap the hard way. An agent made an unexpected API call. Someone asked "what happened?" and the answer was silence. The agent's activity existed in the AI provider's context window, which is ephemeral, and maybe in the provider's usage logs, which tell you token counts but not what tools were called or why.
That is the gap this guide addresses. Not whether to log AI agent activity — that question is settled — but how to get structured, queryable, SIEM-compatible audit events from your AI agent infrastructure into the tools your security team already uses.
What a Complete AI Audit Trail Looks Like
Before you can integrate with a SIEM, you need to define what you are capturing. A complete AI agent audit trail is a five-point trace that follows every action from initiation to outcome.
Point 1: Identity
Who initiated this action? Not the agent — the human. Every AI agent action ultimately traces back to a user. That user has an identity in your IdP, a set of roles, a department, and a risk profile. The audit event must capture the user identifier in a format that matches your identity provider. If your SIEM correlates events by email address, the AI audit event needs that email address. If it correlates by employee ID, you need that.
This sounds obvious, but most AI agent systems do not propagate user identity through tool calls. The agent authenticates with a service account, and tool calls happen under that service account's identity. From the SIEM's perspective, every AI action looks like the same user. You lose the ability to detect "user A is making an unusual number of database queries through the AI agent" because all queries look like they come from ai-service-account.
Point 2: Agent Context
Which AI agent is executing? In a multi-agent system, you may have a sales assistant, a code reviewer, a data analyst, and a support agent all running simultaneously. The audit event needs the agent identifier, its configured role, and the AI model powering it.
Agent context matters for policy evaluation. A sales assistant querying the CRM is expected behaviour. A code reviewer querying the CRM is anomalous. Without agent context in your audit events, you cannot write correlation rules that distinguish between the two.
Point 3: Permissions Evaluated
What policies were checked before the tool executed? This is the governance layer. When an agent requests a tool call, the governance pipeline evaluates the request against configured policies. The audit event should capture which policies were evaluated, whether they passed or failed, and the specific rule that triggered any denial.
This is the data point most governance solutions miss entirely. They log the tool call. They do not log the policy evaluation that preceded it. Without policy evaluation in your audit trail, you cannot answer "was this action authorised?" after the fact. You can only answer "did this action happen?"
Point 4: Tool Execution
What tool was called, with what parameters, and what was the response? This is the core of the audit event. Tool name, input parameters (with sensitive values redacted or hashed), execution duration, response status, and response size.
Input parameter handling requires care. You want enough detail to reconstruct what happened, but you do not want raw database credentials or API keys in your SIEM. Hash sensitive parameters. Redact known secret patterns. Log parameter shapes (which fields were provided) even when you cannot log parameter values.
Point 5: Result and Cost
What was the outcome, and what did it cost? Success or failure. Error messages if applicable. Token consumption for the AI interaction. Monetary cost if trackable. Duration from request to response.
Cost data in audit events enables a category of alerts that most teams do not think about until they need them. An agent that suddenly costs ten times its daily average is either malfunctioning or being abused. You cannot detect that without cost data in your events.
Event Schema Design
Abstract descriptions of audit trails are useful for planning. Concrete schemas are useful for implementation. Here is a real structured event that captures all five points of the trace.
{
"event_id": "evt_01JR9X5K7M3N2P4Q6R8S0T",
"event_type": "tool_call.completed",
"timestamp": "2026-04-07T14:32:18.847Z",
"schema_version": "1.2.0",
"identity": {
"user_id": "usr_8f3a2b1c",
"email": "j.chen@acme.com",
"department": "engineering",
"idp_subject": "auth0|8f3a2b1c4d5e6f7g"
},
"agent": {
"agent_id": "agt_code_reviewer",
"agent_name": "Code Reviewer",
"model": "claude-sonnet-4-6",
"session_id": "ses_7k2m9n4p"
},
"governance": {
"policies_evaluated": ["allow_read_only_tools", "department_scope_check"],
"result": "allowed",
"matched_rule": null,
"evaluation_ms": 2
},
"tool_call": {
"tool_name": "database_query",
"tool_server": "postgres-readonly",
"parameters_hash": "sha256:a1b2c3d4e5f6...",
"parameter_fields": ["query", "schema", "limit"],
"duration_ms": 145,
"response_status": "success",
"response_size_bytes": 2340
},
"cost": {
"input_tokens": 1250,
"output_tokens": 340,
"estimated_cost_usd": 0.0089
}
}
Let me walk through the design decisions.
event_id uses a ULID-style identifier, sortable by time. This matters for SIEM correlation. UUIDs v4 are random and unsortable. ULIDs give you time-ordering without a separate index.
event_type follows a dot-separated convention: tool_call.completed, tool_call.blocked, policy.violation, secret.detected, session.started. This lets you filter by category in your SIEM without parsing the event body.
schema_version is non-negotiable. Your SIEM dashboards, correlation rules, and alerts all depend on field names and types. When the schema changes, your integrations break. Versioning the schema means your SIEM team can test against new versions before they hit production.
identity maps to your IdP. The idp_subject field is the raw subject claim from your identity provider. This is the join key between AI audit events and everything else in your SIEM.
governance captures the policy evaluation, not just the outcome. policies_evaluated tells you which rules were checked. matched_rule tells you which specific rule caused a block, if any. evaluation_ms tells you whether your governance pipeline is adding latency.
tool_call.parameters_hash is a SHA-256 hash of the parameters. This lets you detect identical tool calls (potential replay attacks or loops) without storing sensitive parameter values. parameter_fields logs which fields were provided without logging their values.
cost enables budget-based alerting. This is operational cost tracking, not model billing.
Why Typed Fields Matter
Every field in this schema has a defined type. timestamp is ISO 8601. duration_ms is an integer. estimated_cost_usd is a float. policies_evaluated is an array of strings.
This matters because SIEM platforms index fields by type. If duration_ms arrives as a string in one event and an integer in another, your Splunk field extraction breaks. Your Kibana visualisations show errors. Your Datadog monitors stop alerting.
Typed fields also enable meaningful aggregation. You can compute average evaluation_ms across all events to monitor governance pipeline performance. You can sum estimated_cost_usd by agent to track operational budgets. You can count events by event_type to detect volume anomalies. None of this works with untyped or inconsistently typed fields.
Define your schema once. Validate it at emission time. Never emit an event that violates the schema. This is the single most important implementation decision for SIEM integration.
Three Integration Paths
Getting structured events from your AI governance system to your SIEM requires a transport mechanism. There are three practical approaches, each with different tradeoffs.
Path 1: Log Forwarding
The most common approach. Your AI governance system writes structured JSON events to log files or stdout. A log forwarder (Fluentd, Filebeat, Logstash, or the SIEM vendor's agent) picks them up and ships them to the SIEM.
This is the path of least resistance. Every SIEM supports log ingestion. Every ops team knows how to configure a log forwarder. The governance system does not need to know anything about the SIEM — it just writes structured JSON, and the forwarder handles transport, buffering, and retry.
The tradeoff is latency. Log forwarding introduces seconds to minutes of delay depending on your forwarder configuration. For audit and compliance purposes, this is usually acceptable. For real-time security monitoring, it may not be.
Configuration example for Filebeat shipping AI governance events to Elasticsearch:
filebeat.inputs:
- type: log
paths:
- /var/log/ai-governance/events.jsonl
json.keys_under_root: true
json.add_error_key: true
fields:
source: ai-governance
environment: production
output.elasticsearch:
hosts: ["https://elk.internal:9200"]
index: "ai-governance-%{+yyyy.MM.dd}"
Path 2: Real-Time Streaming
For organisations that need sub-second visibility into AI agent activity, streaming provides real-time event delivery. The governance system exposes an event stream (Server-Sent Events, WebSocket, or a message queue), and a collector on the SIEM side consumes events as they are emitted.
Streaming is the right choice when you need to detect and respond to threats in real time. A secret detected in a tool call parameter needs immediate attention — not a notification that arrives two minutes later after the log forwarder's next flush.
The tradeoff is complexity. You need a persistent connection between the governance system and the collector. You need to handle reconnection, buffering during disconnects, and deduplication. This is not difficult, but it is more operational surface area than log forwarding.
Path 3: CLI and API Queries
For batch workflows, scheduled reports, or organisations that prefer pull-based ingestion, a CLI or API query exports events for a given time range. Run the query on a schedule, pipe the output to your SIEM's batch ingestion endpoint.
This is the simplest approach for teams that do not need real-time alerting on AI agent activity and are primarily using the audit trail for compliance reporting and post-incident investigation.
The tradeoff is obvious: there is no real-time component. You are always looking at historical data. For compliance audits, this is fine. For security operations, it is insufficient on its own.
Most production deployments use a combination. Log forwarding for the bulk of events, streaming for high-priority event types (policy violations, secret detections), and CLI queries for compliance reporting and ad-hoc investigation.
SIEM-Specific Implementation
Splunk
Splunk ingests JSON events through the HTTP Event Collector (HEC). Point your log forwarder at the HEC endpoint with a dedicated token, and events appear in your Splunk index within seconds.
The key configuration decisions for Splunk:
Index. Create a dedicated index for AI governance events. Do not dump them into your main application index. A dedicated index gives you separate retention policies, access controls, and search performance.
Source type. Define a custom source type (e.g., ai:governance:v1) with field extractions mapped to your event schema. This ensures Splunk parses the JSON correctly and indexes fields with the right types.
Field aliases. Map your event schema fields to Splunk's Common Information Model (CIM) where possible. identity.email maps to CIM's user. tool_call.tool_name maps to action. governance.result maps to action_result. CIM mapping means your existing Splunk dashboards, reports, and correlation rules can consume AI governance events without custom field references.
A useful Splunk dashboard for AI governance includes four panels: tool call volume by agent over time (line chart), policy violations by type (bar chart), top 10 users by tool call volume (table), and governance pipeline latency percentiles (line chart). These four panels give your SOC a real-time view of AI agent activity without requiring them to learn a new data model.
Example SPL for detecting unusual tool call volume:
index=ai_governance event_type="tool_call.completed"
| timechart span=1h count by agent.agent_id
| foreach * [eval anomaly_<<FIELD>>=if('<<FIELD>>' > avg('<<FIELD>>') + 3*stdev('<<FIELD>>'), 1, 0)]
ELK (Elasticsearch, Logstash, Kibana)
For ELK deployments, Filebeat is the natural collector. Configure Filebeat to read your governance event log, parse the JSON, and ship to Elasticsearch.
Index template. Define an index template that maps your event schema to Elasticsearch field types. timestamp as date. duration_ms as integer. policies_evaluated as keyword array. Getting the mapping right upfront prevents the painful reindexing that happens when Elasticsearch auto-detects a field as text when you needed keyword.
Index lifecycle management. AI governance events accumulate quickly. A busy deployment generating 10,000 tool calls per day produces roughly 15MB of events. Set up ILM to roll indices daily, move to warm storage after 7 days, and cold storage after 30. Keep hot data small for fast searching.
Kibana dashboards. Build a dedicated Kibana space for AI governance. The same four panels recommended for Splunk apply here. Add a fifth: a data table of recent policy violations with drill-down to the full event, so analysts can investigate blocked actions without writing queries.
An example Kibana Lens visualisation for policy violations over time uses the event_type field filtered to tool_call.blocked and policy.violation, broken down by governance.matched_rule. This shows which policies are triggering most frequently — useful for tuning policies that are too aggressive and creating noise.
Datadog
Datadog ingests structured logs through its agent or the Logs API. Configure the Datadog agent to tail your governance event log, or ship events directly to the Logs API from your log forwarder.
Log pipeline. Create a Datadog log pipeline that extracts key fields from the JSON and maps them to Datadog's standard attributes. identity.email to usr.email. tool_call.tool_name to evt.name. governance.result to evt.outcome. Standard attributes enable Datadog's built-in views and monitors to work with your AI governance data.
Monitors. Datadog monitors are the alerting mechanism. Create monitors for the same patterns described in the correlation rules section below. Datadog's anomaly detection monitors are particularly useful for tool call volume — they learn the baseline pattern and alert on deviations without you having to define static thresholds.
Dashboards and notebooks. Build a Datadog dashboard for real-time operational monitoring and a notebook for investigation workflows. The dashboard shows live agent activity. The notebook provides step-by-step investigation templates for common scenarios: "an agent was blocked — was the policy correct?" and "a secret was detected — where did it come from?"
Correlation Rules and Alerts
Having AI governance events in your SIEM is step one. Making them actionable requires correlation rules that detect meaningful patterns and alerts that notify the right people.
Priority 1: Governance Policy Violations
Every blocked tool call should generate an alert. Not because every block is a security incident, but because blocks indicate either a misconfigured agent (it is trying to do something it should not) or a misconfigured policy (it is blocking something it should not). Either way, someone needs to look at it.
The alert should include the agent identity, the user who initiated the action, the tool that was blocked, the policy that triggered the block, and the parameters (hashed) of the tool call. Group alerts by agent and user to avoid alert fatigue from a single misconfigured agent generating hundreds of blocks.
Priority 2: Secret Detection Events
A secret detected in a tool call parameter is always urgent. It means a credential, API key, or token is present in the AI agent's context and was about to be sent to an external tool. Even if the governance pipeline blocked the call, the secret's presence in the context window is a problem.
Secret detection alerts should page the security team, not just send an email. Include the secret type (AWS key, GitHub token, database credential), the tool that would have received it, and the user whose session contained the secret. The response workflow should include rotating the detected credential immediately.
Priority 3: Volume Anomalies
A sudden spike in tool call volume from a single agent or user is either a legitimate burst of activity or something wrong. An agent stuck in a loop, a compromised user account, or an abuse scenario all manifest as volume spikes.
Use your SIEM's anomaly detection rather than static thresholds. AI agent usage patterns are bursty by nature — a developer debugging a production issue will generate significantly more tool calls than a developer writing documentation. Static thresholds produce false positives. Anomaly detection learns the pattern and alerts on genuine deviations.
Priority 4: Scope Violations
An agent accessing resources outside its configured scope is a potential privilege escalation. A sales assistant querying the engineering database. A code reviewer accessing the HR system. A support agent writing to the finance CRM.
Scope violation detection requires correlation between the agent's configured scope (department, data classification, tool allowlist) and the actual tools called. This is where agent context in your audit events pays off. Without it, you cannot build this rule.
Priority 5: Authentication and Session Anomalies
Failed authentication attempts, sessions from unusual locations, and concurrent sessions from the same user are standard SIEM alerts. Apply the same rules to AI agent sessions. If a user's AI agent session originates from an IP address that does not match their normal location, that is worth investigating.
Compliance Implications
Structured AI audit events are not just a security tool. They are a compliance requirement for several frameworks that regulated organisations must satisfy.
SOC 2
SOC 2 Type II requires continuous monitoring and logging of system activity. AI agents that interact with customer data, internal systems, or external APIs fall squarely within the SOC 2 audit scope. Your auditor will ask: "How do you monitor AI agent activity?" Having structured events in your SIEM with retention policies, access controls, and alerting is a concrete, demonstrable answer.
The specific SOC 2 criteria that AI audit trails satisfy include CC6.1 (logical access security), CC7.2 (system monitoring), and CC7.3 (detection and response). If your AI agents can access systems containing customer data, you need audit trails that prove you are monitoring that access.
ISO 27001
ISO 27001 Annex A controls A.8.15 (logging) and A.8.16 (monitoring activities) require organisations to produce, retain, and review event logs for information security. AI agent activity is information processing. It requires the same logging treatment as any other system that accesses controlled information.
The 2022 revision of ISO 27001 added explicit controls for technology monitoring. AI agents are technology. The auditor will expect logs.
HIPAA
For healthcare organisations, HIPAA's Security Rule requires audit controls (45 CFR 164.312(b)) that record and examine activity in systems containing electronic protected health information. If your AI agent can access patient data, lab results, or clinical notes through tool calls, every one of those tool calls must be logged with the user identity, the data accessed, and the purpose.
HIPAA audit trails must be retained for a minimum of six years. Configure your SIEM retention policies accordingly for AI governance events that involve PHI-adjacent systems.
The Common Thread
Across all three frameworks, the requirement is the same: demonstrate that you can answer "who accessed what, when, why, and what happened" for any system interaction involving sensitive data. Structured AI audit events answer that question for AI agent interactions specifically.
The critical point is that compliance frameworks do not have an AI agent exception. They do not say "you need audit trails for human users but not for AI agents." If the agent acts on behalf of a user and interacts with controlled systems, it is in scope. Full stop.
How systemprompt.io Addresses This
I will be specific about what systemprompt.io provides today, not what is planned.
16 event hooks. systemprompt.io's governance pipeline emits structured JSON events at 16 points in the agent lifecycle: session start, session end, tool call request, policy evaluation, tool call approved, tool call blocked, tool call completed, tool call failed, secret detected, secret blocked, cost threshold warning, cost threshold exceeded, authentication success, authentication failure, agent configuration change, and policy configuration change. Every event follows the schema described in this guide with typed fields and versioned schemas.
Three output paths. Events can be consumed via structured log files (for log forwarder pickup), real-time SSE stream (for streaming collectors), or CLI query (for batch export). All three paths emit the same events in the same format. Choose based on your SIEM integration preference.
Queryable audit trail. Beyond SIEM forwarding, the full audit trail is queryable through the admin API and CLI. Run systemprompt analytics requests list --since 24h --status blocked to see all blocked tool calls in the last day. Pipe the output to your SIEM's batch ingestion endpoint for ad-hoc backfills.
Schema versioning. The event schema is versioned following semantic versioning. Breaking changes increment the major version. New fields increment the minor version. Your SIEM integrations get advance notice of schema changes through the changelog, and the schema_version field in every event lets you handle multiple versions during migration.
For the full SIEM integration documentation and sample configurations for Splunk, ELK, and Datadog, see the systemprompt.io audit trail and SIEM feature page.
Where to Start
If you are building AI agent audit trails from scratch, here is the practical sequence.
Week 1: Schema. Define your event schema. Use the schema in this guide as a starting point and adapt it to your organisation's identity model and SIEM field conventions. Get sign-off from your security team on the schema before writing any code.
Week 2: Emission. Implement event emission in your governance layer. Every tool call should produce a structured event. Start with the log file path — it is the simplest to implement and debug.
Week 3: Ingestion. Configure your SIEM to ingest the events. Set up the index, source type, and field extractions. Verify that events appear correctly and fields are typed as expected.
Week 4: Alerts. Build the five priority alerts described in this guide. Start with policy violations and secret detections, then add volume anomalies, scope violations, and authentication anomalies.
Ongoing: Tuning. Every alert will need tuning. False positives erode trust. False negatives miss incidents. Review alert firing rates weekly for the first month, then monthly. Adjust thresholds, add exceptions for known-good patterns, and tighten rules as you learn your environment's baseline.
The organisations that get AI agent governance right are the ones that treat AI agents as first-class actors in their security monitoring. Not as a separate category. Not as an afterthought. As peers to every other system in the infrastructure that accesses sensitive data and makes consequential decisions.
Your SIEM already knows how to monitor everything else. AI agents should not be the exception.