Scheduled Jobs
Run background jobs on cron schedules for content publishing, analytics aggregation, session cleanup, database maintenance, and custom automation tasks.
On this page
TL;DR: The scheduler runs background jobs on cron schedules or at application startup. Jobs are defined in services/scheduler/config.yaml and implemented by extensions using the Job trait. The publish pipeline, analytics aggregation, traffic reports, session cleanup, and database maintenance all run as scheduled jobs. You can also trigger any job manually from the CLI.
Why Scheduled Jobs Matter
systemprompt.io applications have ongoing work that cannot happen inside a request-response cycle. Content needs to be ingested, prerendered, and published. Analytics data needs aggregation. Sessions expire and need cleanup. Databases accumulate stale rows.
Without a built-in scheduler, you would need an external cron daemon, a separate task queue, or manual intervention. The scheduler service removes that dependency by embedding job scheduling directly into the application. Define jobs in YAML configuration, implement them in Rust, and the scheduler handles the rest.
How the Scheduler Works
The scheduler starts with the application. On startup it reads job definitions from services/scheduler/config.yaml, builds a cron trigger for each enabled job, and begins watching the clock. When a trigger fires, the scheduler calls the job's execute() method with a JobContext that provides access to the database pool and application configuration.
Jobs run asynchronously and never block the main application or each other. Each job receives its own context and reports back a JobResult containing success/failure status, item counts, and duration. Failed jobs are logged but do not prevent other jobs from running.
Job Types
Cron Jobs
The default type. A cron expression determines when the job runs. The scheduler evaluates the expression and fires the job at matching times.
- name: cleanup_empty_contexts
extension: core
job: cleanup_empty_contexts
schedule: "0 0 * * * *" # Every hour
enabled: true
Startup Jobs
Some jobs need to run immediately when the application starts, before any cron trigger fires. A job opts into startup execution by implementing run_on_startup() returning true in its Rust code. Two conditions must both be met:
- The job's Rust implementation returns
truefromrun_on_startup() - The job is listed in
services/scheduler/config.yamlwithenabled: true
This two-layer design lets developers set sensible defaults in code while operations teams can enable or disable jobs per environment without code changes.
The publish_pipeline job is the primary example: it runs on startup to generate all HTML, sitemaps, and feeds, then continues running every 15 minutes.
One-Off (Manual) Jobs
Any registered job can be triggered manually through the CLI. Manual execution does not affect the cron schedule -- the next scheduled run still fires at its configured time.
systemprompt infra jobs run publish_pipeline
This is useful for testing, deployments, or situations where you need content published immediately rather than waiting for the next scheduled window.
Configuration
All jobs are configured in services/scheduler/config.yaml. Here is a complete real-world configuration:
# services/scheduler/config.yaml
scheduler:
enabled: true
jobs:
# --- Core maintenance jobs ---
- name: cleanup_anonymous_users
extension: core
job: cleanup_anonymous_users
schedule: "0 0 3 * * *" # Daily at 3:00 AM
enabled: true
- name: cleanup_empty_contexts
extension: core
job: cleanup_empty_contexts
schedule: "0 0 * * * *" # Every hour
enabled: true
- name: cleanup_inactive_sessions
extension: core
job: cleanup_inactive_sessions
schedule: "0 0 * * * *" # Every hour
enabled: true
- name: database_cleanup
extension: core
job: database_cleanup
schedule: "0 0 4 * * *" # Daily at 4:00 AM
enabled: true
# --- Web extension jobs ---
- name: publish_pipeline
extension: web
job: publish_pipeline
schedule: "0 */15 * * * *" # Every 15 minutes (also runs on startup)
enabled: true
- name: content_analytics_aggregation
extension: web
job: content_analytics_aggregation
schedule: "0 */15 * * * *" # Every 15 minutes
enabled: true
- name: daily_traffic_report
extension: web
job: daily_traffic_report
schedule: "0 0 7,19 * * *" # Twice daily at 7 AM and 7 PM
enabled: true
- name: daily_activity_summary
extension: web
job: daily_activity_summary
schedule: "0 0 18 * * *" # Daily at 6 PM
enabled: true
Job Definition Fields
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Unique job identifier used by the CLI and logs |
extension |
string | yes | Extension that provides the job implementation (core, web, marketplace, etc.) |
job |
string | yes | Job function name matching the name() method in the Job trait implementation |
schedule |
string | yes | Six-field cron expression (see below) |
enabled |
boolean | yes | Whether the scheduler activates this job; set to false to disable without removing |
Cron Schedule Format
The scheduler uses six-field cron expressions that include a seconds field:
┌──────────── second (0-59)
│ ┌────────── minute (0-59)
│ │ ┌──────── hour (0-23)
│ │ │ ┌────── day of month (1-31)
│ │ │ │ ┌──── month (1-12)
│ │ │ │ │ ┌── day of week (0-6, Sunday = 0)
│ │ │ │ │ │
* * * * * *
Common patterns:
| Expression | Meaning |
|---|---|
0 0 * * * * |
Every hour at :00 |
0 */15 * * * * |
Every 15 minutes |
0 */30 * * * * |
Every 30 minutes |
0 0 3 * * * |
Daily at 3:00 AM |
0 0 7,19 * * * |
Twice daily at 7 AM and 7 PM |
0 0 18 * * * |
Daily at 6:00 PM |
0 0 0 * * 0 |
Weekly on Sunday at midnight |
0 0 0 1 * * |
Monthly on the 1st at midnight |
Note the leading 0 in the seconds field. Most standard cron documentation shows five fields; the systemprompt.io scheduler adds seconds as the first field.
Built-In Jobs
Core Maintenance Jobs
These ship with systemprompt-core and keep the application healthy:
| Job | Schedule | Description |
|---|---|---|
cleanup_anonymous_users |
Daily 3 AM | Removes old anonymous user accounts that were never upgraded |
cleanup_empty_contexts |
Hourly | Deletes conversation contexts that contain no messages |
cleanup_inactive_sessions |
Hourly | Expires and removes sessions that have been idle too long |
database_cleanup |
Daily 4 AM | General database maintenance: vacuuming, index cleanup |
Content Publishing Jobs
These are provided by the web extension and handle the full content lifecycle:
| Job | Schedule | Startup | Description |
|---|---|---|---|
publish_pipeline |
Every 15 min | yes | Orchestrates the entire publishing workflow (see below) |
content_analytics_aggregation |
Every 15 min | no | Aggregates page views, unique visitors, and time-on-page into per-content statistics |
daily_traffic_report |
7 AM, 7 PM | no | Compiles traffic metrics and posts a summary to Discord |
daily_activity_summary |
6 PM | no | Summarizes daily activity across all services and sends a Discord report |
The Publish Pipeline
The publish_pipeline job is the most important scheduled job. It orchestrates nine sub-steps in sequence:
- Content ingestion -- Reads markdown files from
services/content/and loads them into the database - Asset copy -- Copies extension assets (CSS, JS, images) to the output directory
- Content prerender -- Renders markdown content into HTML fragments
- Page prerender -- Generates full HTML pages from templates and prerendered content
- Sitemap generation -- Builds
sitemap.xmlfrom all published pages - llms.txt generation -- Creates the
llms.txtfile for AI crawler guidance - robots.txt generation -- Generates
robots.txtwith current sitemap references - RSS feed generation -- Builds the RSS/Atom feed from published blog content
- Asset organization -- Moves CSS and JS files into their final distribution paths
Each step runs independently. If one step fails, the pipeline logs the error and continues with the next step. The final JobResult reports how many steps succeeded and how many failed.
Because this job sets run_on_startup() to true, it runs immediately when the application boots, ensuring all HTML is generated before the first request arrives.
Error Handling
The scheduler handles errors at two levels:
Job-level errors. If a job's execute() method returns an error, the scheduler logs it with tracing::error! and moves on. Other jobs continue to run on schedule. The failed execution appears in job history with its error message.
Step-level errors (pipeline jobs). The publish pipeline catches errors from each sub-step individually. A failure in sitemap generation does not prevent RSS feed generation. The pipeline tracks succeeded and failed step counts in its PipelineStats and reports both in the final JobResult.
There is no automatic retry for failed cron executions. If a job fails, it will run again at the next scheduled time. For immediate recovery, trigger the job manually:
systemprompt infra jobs run publish_pipeline
Writing Custom Jobs
Extensions register jobs by implementing the Job trait and calling the submit_job! macro.
The Job Trait
use systemprompt::traits::{Job, JobContext, JobResult};
#[derive(Debug, Clone, Copy, Default)]
pub struct MyCustomJob;
#[async_trait::async_trait]
impl Job for MyCustomJob {
fn name(&self) -> &'static str {
"my_custom_job"
}
fn description(&self) -> &'static str {
"Brief description of what this job does"
}
fn schedule(&self) -> &'static str {
"0 0 * * * *" // Default schedule (can be overridden in config)
}
fn run_on_startup(&self) -> bool {
false // Set to true if this job should run at application startup
}
async fn execute(&self, ctx: &JobContext) -> Result<JobResult> {
let db_pool = ctx.db_pool::<DbPool>()
.ok_or_else(|| anyhow::anyhow!("Database not available"))?;
// Do work here...
let items_processed = 42;
let items_failed = 0;
Ok(JobResult::success()
.with_stats(items_processed, items_failed))
}
}
// Register the job with the scheduler
systemprompt::traits::submit_job!(&MyCustomJob);
Registering in Configuration
After writing the job, add it to services/scheduler/config.yaml:
- name: my_custom_job
extension: my_extension
job: my_custom_job
schedule: "0 0 * * * *"
enabled: true
The name field in the YAML must match the string returned by the job's name() method. The schedule in the YAML overrides the default schedule defined in code.
CLI Reference
| Command | Description |
|---|---|
systemprompt infra jobs list |
List all registered jobs and their schedules |
systemprompt infra jobs show <name> |
Show details for a specific job |
systemprompt infra jobs run <name> |
Execute a job immediately |
systemprompt infra jobs history |
View recent job execution history |
systemprompt infra jobs history --job <name> |
View history for a specific job |
systemprompt infra jobs enable <name> |
Enable a disabled job |
systemprompt infra jobs disable <name> |
Disable a job without removing its configuration |
systemprompt infra jobs cleanup-sessions |
Shortcut to run session cleanup |
systemprompt infra jobs log-cleanup |
Shortcut to run log cleanup |
Run systemprompt infra jobs <command> --help for detailed options.
Integration with Other Services
The scheduler connects to several other systemprompt.io services:
- Content service -- The publish pipeline reads markdown from
services/content/and writes generated HTML toweb/dist/ - Web service -- Content prerendering and page generation use templates registered by the web extension
- Analytics service -- The
content_analytics_aggregationjob reads engagement events and writes aggregated metrics - Config service -- Job definitions are loaded through the standard config aggregation pattern from
services/scheduler/config.yaml - Database service -- Most jobs access the database through
ctx.db_pool()for reads and writes - Discord (external) -- The traffic report and activity summary jobs post formatted reports to Discord channels
The scheduler operates independently of agents and MCP servers. It handles infrastructure-level maintenance and content automation, not user-facing AI interactions.
Troubleshooting
Job not running. Verify the job is enabled in services/scheduler/config.yaml. Check that the extension is loaded and the job function name matches exactly. Use systemprompt infra jobs list to see which jobs the scheduler knows about.
Job failing silently. Check application logs for the job name:
systemprompt infra logs view --level error --since 1h
Jobs log their execution start, completion, and any errors through tracing.
Cron schedule not working as expected. Remember that systemprompt.io uses six-field cron expressions with seconds as the first field. A schedule of 0 * * * * (five fields) is invalid. Use 0 0 * * * * for hourly.
Startup job not running. Both conditions must be true: the job's Rust code returns true from run_on_startup(), and the job has enabled: true in the scheduler config. Check both.
Publish pipeline partially failing. The pipeline continues past individual step failures. Check logs for which step failed. You can often fix the issue and run the pipeline again manually without waiting for the next scheduled execution.
Related Documentation
- Content Service -- How content is stored and managed before publishing
- Web Service -- Templates, prerendering, and the generated site
- Analytics Service -- Engagement tracking that feeds into analytics jobs
- Config Service -- How YAML configuration files are loaded and aggregated