Skip to main content

Scheduled Jobs

Run background jobs on cron schedules for content publishing, analytics aggregation, session cleanup, database maintenance, and custom automation tasks.

TL;DR: The scheduler runs background jobs on cron schedules or at application startup. Jobs are defined in services/scheduler/config.yaml and implemented by extensions using the Job trait. The publish pipeline, analytics aggregation, traffic reports, session cleanup, and database maintenance all run as scheduled jobs. You can also trigger any job manually from the CLI.

Why Scheduled Jobs Matter

systemprompt.io applications have ongoing work that cannot happen inside a request-response cycle. Content needs to be ingested, prerendered, and published. Analytics data needs aggregation. Sessions expire and need cleanup. Databases accumulate stale rows.

Without a built-in scheduler, you would need an external cron daemon, a separate task queue, or manual intervention. The scheduler service removes that dependency by embedding job scheduling directly into the application. Define jobs in YAML configuration, implement them in Rust, and the scheduler handles the rest.

How the Scheduler Works

The scheduler starts with the application. On startup it reads job definitions from services/scheduler/config.yaml, builds a cron trigger for each enabled job, and begins watching the clock. When a trigger fires, the scheduler calls the job's execute() method with a JobContext that provides access to the database pool and application configuration.

Jobs run asynchronously and never block the main application or each other. Each job receives its own context and reports back a JobResult containing success/failure status, item counts, and duration. Failed jobs are logged but do not prevent other jobs from running.

Job Types

Cron Jobs

The default type. A cron expression determines when the job runs. The scheduler evaluates the expression and fires the job at matching times.

- name: cleanup_empty_contexts
  extension: core
  job: cleanup_empty_contexts
  schedule: "0 0 * * * *"   # Every hour
  enabled: true

Startup Jobs

Some jobs need to run immediately when the application starts, before any cron trigger fires. A job opts into startup execution by implementing run_on_startup() returning true in its Rust code. Two conditions must both be met:

  1. The job's Rust implementation returns true from run_on_startup()
  2. The job is listed in services/scheduler/config.yaml with enabled: true

This two-layer design lets developers set sensible defaults in code while operations teams can enable or disable jobs per environment without code changes.

The publish_pipeline job is the primary example: it runs on startup to generate all HTML, sitemaps, and feeds, then continues running every 15 minutes.

One-Off (Manual) Jobs

Any registered job can be triggered manually through the CLI. Manual execution does not affect the cron schedule -- the next scheduled run still fires at its configured time.

systemprompt infra jobs run publish_pipeline

This is useful for testing, deployments, or situations where you need content published immediately rather than waiting for the next scheduled window.

Configuration

All jobs are configured in services/scheduler/config.yaml. Here is a complete real-world configuration:

# services/scheduler/config.yaml
scheduler:
  enabled: true
  jobs:
    # --- Core maintenance jobs ---
    - name: cleanup_anonymous_users
      extension: core
      job: cleanup_anonymous_users
      schedule: "0 0 3 * * *"      # Daily at 3:00 AM
      enabled: true

    - name: cleanup_empty_contexts
      extension: core
      job: cleanup_empty_contexts
      schedule: "0 0 * * * *"      # Every hour
      enabled: true

    - name: cleanup_inactive_sessions
      extension: core
      job: cleanup_inactive_sessions
      schedule: "0 0 * * * *"      # Every hour
      enabled: true

    - name: database_cleanup
      extension: core
      job: database_cleanup
      schedule: "0 0 4 * * *"      # Daily at 4:00 AM
      enabled: true

    # --- Web extension jobs ---
    - name: publish_pipeline
      extension: web
      job: publish_pipeline
      schedule: "0 */15 * * * *"   # Every 15 minutes (also runs on startup)
      enabled: true

    - name: content_analytics_aggregation
      extension: web
      job: content_analytics_aggregation
      schedule: "0 */15 * * * *"   # Every 15 minutes
      enabled: true

    - name: daily_traffic_report
      extension: web
      job: daily_traffic_report
      schedule: "0 0 7,19 * * *"   # Twice daily at 7 AM and 7 PM
      enabled: true

    - name: daily_activity_summary
      extension: web
      job: daily_activity_summary
      schedule: "0 0 18 * * *"     # Daily at 6 PM
      enabled: true

Job Definition Fields

Field Type Required Description
name string yes Unique job identifier used by the CLI and logs
extension string yes Extension that provides the job implementation (core, web, marketplace, etc.)
job string yes Job function name matching the name() method in the Job trait implementation
schedule string yes Six-field cron expression (see below)
enabled boolean yes Whether the scheduler activates this job; set to false to disable without removing

Cron Schedule Format

The scheduler uses six-field cron expressions that include a seconds field:

┌──────────── second (0-59)
│ ┌────────── minute (0-59)
│ │ ┌──────── hour (0-23)
│ │ │ ┌────── day of month (1-31)
│ │ │ │ ┌──── month (1-12)
│ │ │ │ │ ┌── day of week (0-6, Sunday = 0)
│ │ │ │ │ │
* * * * * *

Common patterns:

Expression Meaning
0 0 * * * * Every hour at :00
0 */15 * * * * Every 15 minutes
0 */30 * * * * Every 30 minutes
0 0 3 * * * Daily at 3:00 AM
0 0 7,19 * * * Twice daily at 7 AM and 7 PM
0 0 18 * * * Daily at 6:00 PM
0 0 0 * * 0 Weekly on Sunday at midnight
0 0 0 1 * * Monthly on the 1st at midnight

Note the leading 0 in the seconds field. Most standard cron documentation shows five fields; the systemprompt.io scheduler adds seconds as the first field.

Built-In Jobs

Core Maintenance Jobs

These ship with systemprompt-core and keep the application healthy:

Job Schedule Description
cleanup_anonymous_users Daily 3 AM Removes old anonymous user accounts that were never upgraded
cleanup_empty_contexts Hourly Deletes conversation contexts that contain no messages
cleanup_inactive_sessions Hourly Expires and removes sessions that have been idle too long
database_cleanup Daily 4 AM General database maintenance: vacuuming, index cleanup

Content Publishing Jobs

These are provided by the web extension and handle the full content lifecycle:

Job Schedule Startup Description
publish_pipeline Every 15 min yes Orchestrates the entire publishing workflow (see below)
content_analytics_aggregation Every 15 min no Aggregates page views, unique visitors, and time-on-page into per-content statistics
daily_traffic_report 7 AM, 7 PM no Compiles traffic metrics and posts a summary to Discord
daily_activity_summary 6 PM no Summarizes daily activity across all services and sends a Discord report

The Publish Pipeline

The publish_pipeline job is the most important scheduled job. It orchestrates nine sub-steps in sequence:

  1. Content ingestion -- Reads markdown files from services/content/ and loads them into the database
  2. Asset copy -- Copies extension assets (CSS, JS, images) to the output directory
  3. Content prerender -- Renders markdown content into HTML fragments
  4. Page prerender -- Generates full HTML pages from templates and prerendered content
  5. Sitemap generation -- Builds sitemap.xml from all published pages
  6. llms.txt generation -- Creates the llms.txt file for AI crawler guidance
  7. robots.txt generation -- Generates robots.txt with current sitemap references
  8. RSS feed generation -- Builds the RSS/Atom feed from published blog content
  9. Asset organization -- Moves CSS and JS files into their final distribution paths

Each step runs independently. If one step fails, the pipeline logs the error and continues with the next step. The final JobResult reports how many steps succeeded and how many failed.

Because this job sets run_on_startup() to true, it runs immediately when the application boots, ensuring all HTML is generated before the first request arrives.

Error Handling

The scheduler handles errors at two levels:

Job-level errors. If a job's execute() method returns an error, the scheduler logs it with tracing::error! and moves on. Other jobs continue to run on schedule. The failed execution appears in job history with its error message.

Step-level errors (pipeline jobs). The publish pipeline catches errors from each sub-step individually. A failure in sitemap generation does not prevent RSS feed generation. The pipeline tracks succeeded and failed step counts in its PipelineStats and reports both in the final JobResult.

There is no automatic retry for failed cron executions. If a job fails, it will run again at the next scheduled time. For immediate recovery, trigger the job manually:

systemprompt infra jobs run publish_pipeline

Writing Custom Jobs

Extensions register jobs by implementing the Job trait and calling the submit_job! macro.

The Job Trait

use systemprompt::traits::{Job, JobContext, JobResult};

#[derive(Debug, Clone, Copy, Default)]
pub struct MyCustomJob;

#[async_trait::async_trait]
impl Job for MyCustomJob {
    fn name(&self) -> &'static str {
        "my_custom_job"
    }

    fn description(&self) -> &'static str {
        "Brief description of what this job does"
    }

    fn schedule(&self) -> &'static str {
        "0 0 * * * *"  // Default schedule (can be overridden in config)
    }

    fn run_on_startup(&self) -> bool {
        false  // Set to true if this job should run at application startup
    }

    async fn execute(&self, ctx: &JobContext) -> Result<JobResult> {
        let db_pool = ctx.db_pool::<DbPool>()
            .ok_or_else(|| anyhow::anyhow!("Database not available"))?;

        // Do work here...
        let items_processed = 42;
        let items_failed = 0;

        Ok(JobResult::success()
            .with_stats(items_processed, items_failed))
    }
}

// Register the job with the scheduler
systemprompt::traits::submit_job!(&MyCustomJob);

Registering in Configuration

After writing the job, add it to services/scheduler/config.yaml:

- name: my_custom_job
  extension: my_extension
  job: my_custom_job
  schedule: "0 0 * * * *"
  enabled: true

The name field in the YAML must match the string returned by the job's name() method. The schedule in the YAML overrides the default schedule defined in code.

CLI Reference

Command Description
systemprompt infra jobs list List all registered jobs and their schedules
systemprompt infra jobs show <name> Show details for a specific job
systemprompt infra jobs run <name> Execute a job immediately
systemprompt infra jobs history View recent job execution history
systemprompt infra jobs history --job <name> View history for a specific job
systemprompt infra jobs enable <name> Enable a disabled job
systemprompt infra jobs disable <name> Disable a job without removing its configuration
systemprompt infra jobs cleanup-sessions Shortcut to run session cleanup
systemprompt infra jobs log-cleanup Shortcut to run log cleanup

Run systemprompt infra jobs <command> --help for detailed options.

Integration with Other Services

The scheduler connects to several other systemprompt.io services:

  • Content service -- The publish pipeline reads markdown from services/content/ and writes generated HTML to web/dist/
  • Web service -- Content prerendering and page generation use templates registered by the web extension
  • Analytics service -- The content_analytics_aggregation job reads engagement events and writes aggregated metrics
  • Config service -- Job definitions are loaded through the standard config aggregation pattern from services/scheduler/config.yaml
  • Database service -- Most jobs access the database through ctx.db_pool() for reads and writes
  • Discord (external) -- The traffic report and activity summary jobs post formatted reports to Discord channels

The scheduler operates independently of agents and MCP servers. It handles infrastructure-level maintenance and content automation, not user-facing AI interactions.

Troubleshooting

Job not running. Verify the job is enabled in services/scheduler/config.yaml. Check that the extension is loaded and the job function name matches exactly. Use systemprompt infra jobs list to see which jobs the scheduler knows about.

Job failing silently. Check application logs for the job name:

systemprompt infra logs view --level error --since 1h

Jobs log their execution start, completion, and any errors through tracing.

Cron schedule not working as expected. Remember that systemprompt.io uses six-field cron expressions with seconds as the first field. A schedule of 0 * * * * (five fields) is invalid. Use 0 0 * * * * for hourly.

Startup job not running. Both conditions must be true: the job's Rust code returns true from run_on_startup(), and the job has enabled: true in the scheduler config. Check both.

Publish pipeline partially failing. The pipeline continues past individual step failures. Check logs for which step failed. You can often fix the issue and run the pipeline again manually without waiting for the next scheduled execution.