Architecture & Why Rust

50MB single binary with zero runtime dependencies. Memory-safe multi-tenancy on Tokio. Learn why SystemPrompt is built in Rust and how the agentic loop works.

SystemPrompt compiles to a 50MB single binary that deploys anywhere with zero runtime dependencies. No Python virtual environments. No Node modules. No Docker-in-Docker. One file runs your entire AI infrastructure.

Source: 33 crates organized into five dependency layers

Why Rust for AI Infrastructure

AI infrastructure has unique requirements that Rust handles exceptionally well:

Memory Safety for Multi-Tenancy

Multi-tenant AI systems process requests from hundreds of users simultaneously. Memory bugs in this context are catastrophic:

  • Buffer overflows could leak User A's data to User B
  • Use-after-free bugs could crash the entire system
  • Data races could corrupt shared state

Rust eliminates these classes of bugs at compile time. If your code compiles, these vulnerabilities don't exist.

// Rust's ownership system prevents data races
// This code won't compile - Rust catches the error
let mut data = vec![1, 2, 3];
let reference = &data[0];
data.push(4);  // Compile error: cannot borrow `data` as mutable
println!("{}", reference);

Async-First on Tokio

AI workloads involve lots of waiting:

  • Waiting for LLM API responses (seconds)
  • Waiting for database queries (milliseconds)
  • Waiting for file I/O (milliseconds)

Rust's async/await with Tokio handles thousands of concurrent connections efficiently:

// Handle thousands of concurrent AI requests
async fn handle_request(req: Request) -> Response {
    let user = authenticate(&req).await?;
    let response = call_llm(&req.prompt).await?;
    log_request(&user, &response).await?;
    Response::ok(response)
}

One SystemPrompt instance handles what would require multiple Node.js processes or Python workers.

Zero-Cost Abstractions

High-level code with low-level performance. Extension traits, generics, and iterators compile to the same machine code as hand-written loops:

// This high-level code...
let active_users: Vec<_> = users
    .iter()
    .filter(|u| u.is_active())
    .map(|u| u.id)
    .collect();

// ...compiles to the same assembly as a manual loop

Compile-Time Guarantees

If it compiles, it works. Rust's type system catches errors before they reach production:

Error Type Python/Node Rust
Type mismatches Runtime crash Compile error
Null pointer access Runtime crash Compile error
Unhandled errors Silent failure Compile error
Thread safety bugs Race conditions Compile error

The 50MB Binary

Everything you need in one file:

# That's it. One file. Run anywhere.
./systemprompt infra services start --all

What's included:

  • HTTP/HTTPS server (Axum)
  • OAuth2/OIDC authorization server
  • WebAuthn authentication
  • MCP server hosting
  • Agent runtime (A2A protocol)
  • Job scheduler
  • Database migrations
  • Static file server
  • All extensions

What's NOT required:

  • Runtime interpreters (Python, Node)
  • External web servers (nginx, Apache)
  • Separate auth services (Keycloak, Auth0)
  • Message queues (Redis, RabbitMQ)

Five-Layer Architecture

Dependencies flow downward only. Each layer can only import from layers below it:

┌───────────────────────────────────────────────────────────────┐
│  ENTRY: api, cli                                              │
│  HTTP endpoints, CLI commands, request handling               │
├───────────────────────────────────────────────────────────────┤
│  APP: runtime, scheduler, generator, sync                     │
│  Application orchestration, job scheduling, content gen       │
├───────────────────────────────────────────────────────────────┤
│  DOMAIN: users, oauth, ai, agent, mcp, files, content         │
│  Business logic, domain models, core functionality            │
├───────────────────────────────────────────────────────────────┤
│  INFRA: database, events, security, config, logging           │
│  Infrastructure concerns, persistence, cross-cutting          │
├───────────────────────────────────────────────────────────────┤
│  SHARED: models, traits, identifiers, extension               │
│  Common types, traits, identifiers used everywhere            │
└───────────────────────────────────────────────────────────────┘

This layering ensures:

  • Testability: Domain logic has no infrastructure dependencies
  • Maintainability: Changes are isolated to appropriate layers
  • Clarity: Easy to understand where code belongs

The Agentic Loop

SystemPrompt implements a complete agentic loop with memory, retention, and self-learning:

┌─────────────────────────────────────────────────────────────┐
│                     AGENTIC LOOP                             │
│                                                              │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │  INPUT   │───▶│ PROCESS  │───▶│  OUTPUT  │              │
│  │ Request  │    │  + LLM   │    │ Response │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│       ▲                │               │                    │
│       │                ▼               ▼                    │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │ CONTEXT  │◀───│ MEMORY   │◀───│ANALYTICS │              │
│  │ Retrieval│    │ Storage  │    │ Tracking │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                        │                                    │
│                        ▼                                    │
│                 ┌──────────┐                                │
│                 │  LEARN   │                                │
│                 │ Optimize │                                │
│                 └──────────┘                                │
└─────────────────────────────────────────────────────────────┘

Memory

Every interaction is stored:

  • Session context and history
  • User preferences and patterns
  • Agent performance data
  • Tool call results

Retention

Context persists across sessions:

  • Shared contexts between agents
  • User-specific memory
  • Project-level knowledge bases

Self-Learning

Continuous optimization through:

  • Usage analytics and patterns
  • Cost tracking and optimization
  • Performance monitoring
  • Feedback-driven improvements

Extension System

SystemPrompt is a library, not a platform. You compile it into YOUR binary:

use systemprompt::prelude::*;

struct MyExtension;

impl Extension for MyExtension {
    fn id(&self) -> &'static str { "my-extension" }
    fn name(&self) -> &'static str { "My Extension" }
}

impl ApiExtension for MyExtension {
    fn router(&self, ctx: &ExtensionContext) -> Option<Router> {
        Some(Router::new()
            .route("/my-api", get(my_handler))
            .with_state(ctx.clone()))
    }
}

register_extension!(MyExtension);

Extensions are discovered at compile time via the inventory crate. No runtime reflection. No configuration files for extension discovery. If you import it, it's included.

Deployment Options

The same binary runs everywhere:

Environment Command
Local dev ./systemprompt infra services start --all
Docker docker run systemprompt/systemprompt
Kubernetes Standard deployment manifest
Bare metal Copy binary, run
Cloud (managed) One-click deploy

No special runtime requirements. PostgreSQL is the only external dependency.

Performance Characteristics

Metric Value
Binary size ~50MB
Startup time <1 second
Memory baseline ~30MB
Concurrent connections 10,000+
Request latency (p99) <5ms (excluding LLM)

Previous Next
Extensions Features Overview