Skip to main content

Content Service

Manage blog posts, documentation, and legal pages with markdown authoring, YAML frontmatter, automatic indexing, and built-in SEO. The content service ingests files, stores them in PostgreSQL, and generates sitemaps, RSS feeds, and llms.txt.

TL;DR: The content service turns markdown files into a database-backed content library. You author posts and pages as markdown with YAML frontmatter, the ingestion system parses and stores them in PostgreSQL, and the publish pipeline generates static HTML, sitemaps, RSS feeds, robots.txt, and llms.txt -- all automatically.

What It Does and Why It Matters

Every systemprompt.io application needs content: blog posts for announcements, documentation for guides, legal pages for compliance. Each type has different URL patterns, templates, branding, and SEO requirements.

The content service solves this by providing a single, declarative system for all content types. You define content sources in YAML -- each source points to a directory of markdown files and specifies how those files should be categorized, indexed, and surfaced. The service handles:

  • Parsing -- Reads markdown files and extracts YAML frontmatter metadata
  • Validation -- Checks titles, slugs, descriptions, and content kinds before storing
  • Indexing -- Stores content in PostgreSQL with version hashing for change detection
  • Publishing -- Prerenders static HTML, generates sitemaps, RSS feeds, and llms.txt
  • Orphan cleanup -- Removes database records when source files are deleted

This keeps your content workflow file-based (write markdown, commit, publish) while giving you database-backed querying, search, and analytics.

How Content Flows

Content moves through a multi-stage pipeline:

Markdown files  -->  Ingestion  -->  PostgreSQL  -->  Prerender  -->  Static HTML
                                         |
                                         +--> Sitemap XML
                                         +--> RSS Feed
                                         +--> llms.txt
                                         +--> robots.txt
  1. Authoring -- Create .md files with YAML frontmatter in the source directory
  2. Ingestion -- The ContentIngestionJob walks each source directory, parses frontmatter, computes a SHA-256 version hash, and upserts records into the markdown_content table
  3. Validation -- The ValidationService checks metadata (title length, slug format, content kind, SEO description length) and body content before storage
  4. Storage -- Content records include slug, title, description, body, author, keywords, kind, image, category, tags, and related links -- all queryable via SQL
  5. Prerendering -- The ContentPrerenderJob renders database content into static HTML pages using templates
  6. Sitemap -- The SitemapGenerationJob builds sitemap.xml from all published content, then appends static pages (homepage, feature pages)
  7. RSS and llms.txt -- The pipeline generates an RSS feed and an llms.txt file for AI crawlers

The full PublishPipelineJob runs all these steps in sequence. It executes on startup and every 15 minutes by default.

Content Sources

Each content source defines a collection of related content. Configure them in services/content/config.yaml:

Source Path Content Types URL Pattern Priority
blog content/blog blog /blog/{slug} 0.8
documentation content/documentation guide, reference, tutorial, docs, feature /documentation/{slug} 0.7
platform content/platform guide, reference, tutorial, docs, feature /platform/{slug} 0.7
legal content/legal legal /legal/{slug} 0.3
skills skills skill N/A (not in sitemap) --
playbooks content/playbooks playbook /playbooks/{slug} 0.7 (disabled)

Add new sources to handle custom content types. Each source gets its own branding, indexing behavior, and sitemap settings.

Markdown and Frontmatter

Every markdown file must begin with a YAML frontmatter block delimited by ---. The ingestion service parses this block into a ContentMetadata struct.

Required Fields

---
title: "Getting Started with systemprompt.io"
slug: "getting-started"
kind: "docs"
published_at: "2026-02-24"
---

Full Frontmatter Example

---
title: "Building AI Agents with systemprompt.io"
description: "Learn how to create, configure, and deploy AI agents using the systemprompt.io agent service."
author: "systemprompt.io Team"
slug: "building-ai-agents"
keywords: "agents, AI, automation, systemprompt"
image: "/files/images/docs/agents-guide.svg"
kind: "docs"
public: true
tags: ["agents", "AI", "guide"]
published_at: "2026-02-20"
updated_at: "2026-02-24"
category: "documentation"
after_reading_this:
  - "Create and configure AI agents"
  - "Deploy agents to production"
related_skills:
  - title: "Agent Creator"
    url: "/skills/agent-creator"
related_code:
  - title: "Agent Extension Source"
    url: "https://github.com/systempromptio/systemprompt-template/blob/main/extensions/agents/"
related_docs:
  - title: "Agent Service Reference"
    url: "/documentation/services/agents"
links:
  - title: "Agent API Documentation"
    url: "/documentation/api/agents"
---

Content Kinds

The kind field determines how content is categorized and rendered. It must be one of:

Kind Description
blog Blog posts and announcements
guide Step-by-step guides
tutorial Hands-on tutorials
reference API and configuration reference
docs General documentation
docs-index Documentation index pages
docs-list Documentation listing pages
feature Feature description pages
legal Legal and compliance pages

The kind must match one of the allowed_content_types for the source it belongs to.

Configuration Reference

Full Source Configuration

# services/content/config.yaml
content_sources:
  blog:
    path: "content/blog"           # Directory relative to services/content/
    source_id: "blog"              # Unique identifier for this source
    category_id: "blog"            # Links to a category definition
    enabled: true                  # Set false to skip during ingestion
    description: "Blog articles and announcements"
    allowed_content_types: ["blog"]

    branding:
      name: "Blog"
      description: "Articles and updates"
      image: "/images/blog/og-default.png"
      keywords: "blog, articles, announcements, guides"

    indexing:
      clear_before: false          # Clear all records before re-indexing
      recursive: true              # Walk subdirectories

    sitemap:
      enabled: true
      url_pattern: "/blog/{slug}"
      priority: 0.8
      changefreq: "weekly"
      fetch_from: "database"       # Fetch content list from database
      parent_route:
        enabled: true
        url: "/blog"
        priority: 0.9
        changefreq: "daily"

Categories

Categories group content sources for filtering and organization:

categories:
  blog:
    name: "Blog"
    slug: "blog"
  documentation:
    name: "Documentation"
    slug: "documentation"
  platform:
    name: "Platform"
    slug: "platform"
    description: "Platform documentation for the Claude Cowork platform"
  skills:
    name: "Skills"
    slug: "skills"
    description: "Agent skills for AI-powered automation"

Structured Data and Metadata

The content configuration includes structured data for SEO and social sharing:

metadata:
  default_author: "Author"
  language: "en"

  structured_data:
    organization:
      name: "systemprompt.io"
      url: "https://systemprompt.io"
      logo: "https://systemprompt.io/files/images/logo.png"
      description: "The production harness for AI superagents"

    blog:
      type: "Blog"
      name: "Blog"
      url: "https://systemprompt.io/blog"
      description: "Articles and updates"
      article_section: "Technology"
      language: "en-US"

    article:
      type: "BlogPosting"
      language: "en-US"
      article_section: "Technology"

This structured data is injected into page templates as JSON-LD, giving search engines rich information about your content.

SEO Features

The content service includes several built-in SEO mechanisms:

  • Validation -- Enforces optimal description length (120-160 characters), rejects descriptions over 500 characters, warns on empty keywords, and requires well-formed slugs (lowercase alphanumeric with hyphens, no double hyphens)
  • Sitemap -- Generates standards-compliant sitemap.xml with configurable priority and change frequency per source; static pages are appended automatically
  • RSS feed -- Creates an RSS feed from blog content for feed readers and syndication
  • llms.txt -- Produces a structured text file for AI crawlers and language models
  • robots.txt -- Generates appropriate crawler directives
  • Structured data -- JSON-LD for organization, blog, and article schema types embedded in templates

Automatic Indexing and Database Storage

The ingestion system is built around change detection. Each file's content is hashed with SHA-256 and stored as a version_hash. On subsequent ingestion runs, only changed files are updated.

Key behaviors:

  • Recursive walking -- The ingestion service uses walkdir to traverse source directories, following symlinks and filtering for .md files
  • Frontmatter parsing -- YAML between --- delimiters is deserialized into ContentMetadata
  • Date parsing -- Supports both RFC 3339 (2026-02-24T00:00:00Z) and simple date (2026-02-24) formats
  • Orphan cleanup -- When CONTENT_INGESTION_DELETE_ORPHANS=true, records in the database whose slugs no longer correspond to source files are deleted
  • Version hashing -- Content is only re-written when the SHA-256 hash changes, keeping database writes minimal

Content is stored in the markdown_content PostgreSQL table with columns for all metadata fields, the rendered body, and JSONB columns for structured data (links, related skills, related code, related docs).

The Publish Pipeline

The PublishPipelineJob orchestrates the entire content lifecycle. It runs these steps in order:

  1. Content ingestion -- Parse and store all markdown files
  2. Asset copy -- Copy extension assets to the dist directory
  3. Content prerender -- Render database content into static HTML
  4. Page prerender -- Render dynamic pages (product pages, feature pages)
  5. Sitemap generation -- Build sitemap.xml from published content
  6. llms.txt generation -- Build llms.txt for AI crawlers
  7. robots.txt generation -- Build robots.txt for search engines
  8. RSS feed generation -- Build RSS feed from blog content
  9. Asset organization -- Organize CSS and JS files in the dist directory

The pipeline runs on startup and every 15 minutes. You can also trigger it manually:

# Run the full publish pipeline
systemprompt core content publish

# Publish a specific source only
systemprompt core content publish --source blog

Content-File Associations

Link files to content with specific roles. This is essential for featured images, attachments, and inline media.

Important: For featured images to display on pages, you must do both:

  1. Link the file to content (content files link)
  2. Set the image field on the content record (content edit --set image=...)
# Link a file as featured image
systemprompt core content files link <file_id> --content <content_id> --role featured

# Set the image field for template display
systemprompt core content edit <content_id> --set image="<public_url>"

# Link as attachment
systemprompt core content files link <file_id> --content <content_id> --role attachment

Available roles: featured, og-image, thumbnail, inline, attachment.

CLI Reference

Command Description
systemprompt core content publish Run the full publish pipeline
systemprompt core content publish --source <source> Publish a specific source
systemprompt core content list List content with pagination
systemprompt core content list --source <source> List content from a specific source
systemprompt core content show <id> Show content details
systemprompt core content search <query> Search content
systemprompt core content edit <id> Edit content fields
systemprompt core content delete <id> Delete content by ID
systemprompt core content delete-source <source> Delete all content from a source
systemprompt core content popular Get popular content
systemprompt core content verify <id> Verify content is published and accessible
systemprompt core content status <source> Show content health status for a source
systemprompt core content files Content-file operations (link, unlink, featured)
systemprompt core content analytics Content analytics

Use systemprompt core content <command> --help for detailed options.

Troubleshooting

Content not appearing after publish -- Verify public: true is set in frontmatter, the source is enabled: true in config, and the kind value matches one of the source's allowed_content_types. Run systemprompt core content status <source> to check health.

Wrong URL in sitemap -- Check the url_pattern in the source's sitemap configuration. The {slug} placeholder is replaced with the content slug from frontmatter.

Frontmatter parse errors -- Ensure the file starts with --- on the first line, has valid YAML, and closes with ---. Required fields are title, slug, kind, and published_at.

Orphaned records in database -- Set the environment variable CONTENT_INGESTION_DELETE_ORPHANS=true before running ingestion. This removes records whose slugs no longer exist in the source directories.

SEO warnings during ingestion -- The validation service warns when descriptions are outside the 120-160 character range, when keywords are empty, or when authors are missing. These are warnings, not errors -- content will still be indexed.

Stale content after file changes -- The ingestion service compares SHA-256 hashes. If a file's hash matches the stored version, it is skipped. Force re-ingestion by modifying the file or using clear_before: true in the source's indexing config (use with caution in production).