Content Service

Manage blog posts, documentation, and legal pages with markdown authoring, YAML frontmatter, automatic indexing, and built-in SEO. The content service ingests files, stores them in PostgreSQL, and generates sitemaps, RSS feeds, and llms.txt.

Last updated:

TL;DR: The content service turns markdown files into a database-backed content library. You author posts and pages as markdown with YAML frontmatter, the ingestion system parses and stores them in PostgreSQL, and the publish pipeline generates static HTML, sitemaps, RSS feeds, robots.txt, and llms.txt -- all automatically.

What It Does and Why It Matters

Every systemprompt.io application needs content: blog posts for announcements, documentation for guides, legal pages for compliance. Each type has different URL patterns, templates, branding, and SEO requirements.

The content service solves this by providing a single, declarative system for all content types. You define content sources in YAML -- each source points to a directory of markdown files and specifies how those files should be categorized, indexed, and surfaced. The service handles:

Parsing -- Reads markdown files and extracts YAML frontmatter metadata
Validation -- Checks titles, slugs, descriptions, and content kinds before storing
Indexing -- Stores content in PostgreSQL with version hashing for change detection
Publishing -- Prerenders static HTML, generates sitemaps, RSS feeds, and llms.txt
Orphan cleanup -- Removes database records when source files are deleted

This keeps your content workflow file-based (write markdown, commit, publish) while giving you database-backed querying, search, and analytics.

How Content Flows

Content moves through a multi-stage pipeline:

Markdown files  -->  Ingestion  -->  PostgreSQL  -->  Prerender  -->  Static HTML
                                         |
                                         +--> Sitemap XML
                                         +--> RSS Feed
                                         +--> llms.txt
                                         +--> robots.txt

Authoring -- Create .md files with YAML frontmatter in the source directory
Ingestion -- The ContentIngestionJob walks each source directory, parses frontmatter, computes a SHA-256 version hash, and upserts records into the markdown_content table
Validation -- The ValidationService checks metadata (title length, slug format, content kind, SEO description length) and body content before storage
Storage -- Content records include slug, title, description, body, author, keywords, kind, image, category, tags, and related links -- all queryable via SQL
Prerendering -- The ContentPrerenderJob renders database content into static HTML pages using templates
Sitemap -- The SitemapGenerationJob builds sitemap.xml from all published content, then appends static pages (homepage, feature pages)
RSS and llms.txt -- The pipeline generates an RSS feed and an llms.txt file for AI crawlers

The full PublishPipelineJob runs all these steps in sequence. It executes on startup and every 15 minutes by default.

Content Sources

Each content source defines a collection of related content. Configure them in services/content/config.yaml:

Source	Path	Content Types	URL Pattern	Priority
`blog`	content/blog	blog	`/blog/{slug}`	0.8
`documentation`	content/documentation	guide, reference, tutorial, docs, feature	`/documentation/{slug}`	0.7
`platform`	content/platform	guide, reference, tutorial, docs, feature	`/platform/{slug}`	0.7
`legal`	content/legal	legal	`/legal/{slug}`	0.3
`skills`	skills	skill	N/A (not in sitemap)	--
`playbooks`	content/playbooks	playbook	`/playbooks/{slug}`	0.7 (disabled)

Add new sources to handle custom content types. Each source gets its own branding, indexing behavior, and sitemap settings.

Markdown and Frontmatter

Every markdown file must begin with a YAML frontmatter block delimited by ---. The ingestion service parses this block into a ContentMetadata struct.

Required Fields

---
title: "Getting Started with systemprompt.io"
slug: "getting-started"
kind: "docs"
published_at: "2026-02-24"
---

Full Frontmatter Example

---
title: "Building AI Agents with systemprompt.io"
description: "Learn how to create, configure, and deploy AI agents using the systemprompt.io agent service."
author: "systemprompt.io Team"
slug: "building-ai-agents"
keywords: "agents, AI, automation, systemprompt"
image: "/files/images/docs/agents-guide.svg"
kind: "docs"
public: true
tags: ["agents", "AI", "guide"]
published_at: "2026-02-20"
updated_at: "2026-02-24"
category: "documentation"
after_reading_this:
  - "Create and configure AI agents"
  - "Deploy agents to production"
related_playbooks:
  - title: "Agent Creator"
    url: "/skills/agent-creator"
related_code:
  - title: "Agent Extension Source"
    url: "https://github.com/systempromptio/systemprompt-template/blob/main/extensions/agents/"
related_docs:
  - title: "Agent Service Reference"
    url: "/documentation/services/agents"
links:
  - title: "Agent API Documentation"
    url: "/documentation/api/agents"
---

Content Kinds

The kind field determines how content is categorized and rendered. It must be one of:

Kind	Description
`blog`	Blog posts and announcements
`guide`	Step-by-step guides
`tutorial`	Hands-on tutorials
`reference`	API and configuration reference
`docs`	General documentation
`docs-index`	Documentation index pages
`docs-list`	Documentation listing pages
`feature`	Feature description pages
`legal`	Legal and compliance pages

The kind must match one of the allowed_content_types for the source it belongs to.

Configuration Reference

Full Source Configuration

# services/content/config.yaml
content_sources:
  blog:
    path: "content/blog"           # Directory relative to services/content/
    source_id: "blog"              # Unique identifier for this source
    category_id: "blog"            # Links to a category definition
    enabled: true                  # Set false to skip during ingestion
    description: "Blog articles and announcements"
    allowed_content_types: ["blog"]

    branding:
      name: "Blog"
      description: "Articles and updates"
      image: "/images/blog/og-default.png"
      keywords: "blog, articles, announcements, guides"

    indexing:
      clear_before: false          # Clear all records before re-indexing
      recursive: true              # Walk subdirectories

    sitemap:
      enabled: true
      url_pattern: "/blog/{slug}"
      priority: 0.8
      changefreq: "weekly"
      fetch_from: "database"       # Fetch content list from database
      parent_route:
        enabled: true
        url: "/blog"
        priority: 0.9
        changefreq: "daily"

Structured Data and Metadata

The content configuration includes structured data for SEO and social sharing:

metadata:
  default_author: "Author"
  language: "en"

  structured_data:
    organization:
      name: "systemprompt.io"
      url: "https://systemprompt.io"
      logo: "https://systemprompt.io/files/images/logo.png"
      description: "The production harness for AI superagents"

    blog:
      type: "Blog"
      name: "Blog"
      url: "https://systemprompt.io/blog"
      description: "Articles and updates"
      article_section: "Technology"
      language: "en-US"

    article:
      type: "BlogPosting"
      language: "en-US"
      article_section: "Technology"

This structured data is injected into page templates as JSON-LD, giving search engines rich information about your content.

SEO Features

The content service includes several built-in SEO mechanisms:

Validation -- Enforces optimal description length (120-160 characters), rejects descriptions over 500 characters, warns on empty keywords, and requires well-formed slugs (lowercase alphanumeric with hyphens, no double hyphens)
Sitemap -- Generates standards-compliant sitemap.xml with configurable priority and change frequency per source; static pages are appended automatically
RSS feed -- Creates an RSS feed from blog content for feed readers and syndication
llms.txt -- Produces a structured text file for AI crawlers and language models
robots.txt -- Generates appropriate crawler directives
Structured data -- JSON-LD for organization, blog, and article schema types embedded in templates

Automatic Indexing and Database Storage

The ingestion system is built around change detection. Each file's content is hashed with SHA-256 and stored as a version_hash. On subsequent ingestion runs, only changed files are updated.

Key behaviors:

Recursive walking -- The ingestion service uses walkdir to traverse source directories, following symlinks and filtering for .md files
Frontmatter parsing -- YAML between --- delimiters is deserialized into ContentMetadata
Date parsing -- Supports both RFC 3339 (2026-02-24T00:00:00Z) and simple date (2026-02-24) formats
Orphan cleanup -- When CONTENT_INGESTION_DELETE_ORPHANS=true, records in the database whose slugs no longer correspond to source files are deleted
Version hashing -- Content is only re-written when the SHA-256 hash changes, keeping database writes minimal

Content is stored in the markdown_content PostgreSQL table with columns for all metadata fields, the rendered body, and JSONB columns for structured data (links, related skills, related code, related docs).

The Publish Pipeline

The PublishPipelineJob orchestrates the entire content lifecycle. It runs these steps in order:

Content ingestion -- Parse and store all markdown files
Asset copy -- Copy extension assets to the dist directory
Content prerender -- Render database content into static HTML
Page prerender -- Render dynamic pages (product pages, feature pages)
Sitemap generation -- Build sitemap.xml from published content
llms.txt generation -- Build llms.txt for AI crawlers
robots.txt generation -- Build robots.txt for search engines
RSS feed generation -- Build RSS feed from blog content
Asset organization -- Organize CSS and JS files in the dist directory

The pipeline runs on startup and every 15 minutes. You can also trigger it manually:

# Run the full publish pipeline
systemprompt core content publish

# Publish a specific source only
systemprompt core content publish --source blog

Content-File Associations

Link files to content with specific roles. This is essential for featured images, attachments, and inline media.

Important: For featured images to display on pages, you must do both:

Link the file to content (content files link)
Set the image field on the content record (content edit --set image=...)

# Link a file as featured image
systemprompt core content files link <file_id> --content <content_id> --role featured

# Set the image field for template display
systemprompt core content edit <content_id> --set image="<public_url>"

# Link as attachment
systemprompt core content files link <file_id> --content <content_id> --role attachment

Available roles: featured, og-image, thumbnail, inline, attachment.

CLI Reference

Command	Description
`systemprompt core content publish`	Run the full publish pipeline
`systemprompt core content publish --source <source>`	Publish a specific source
`systemprompt core content list`	List content with pagination
`systemprompt core content list --source <source>`	List content from a specific source
`systemprompt core content show <id>`	Show content details
`systemprompt core content search <query>`	Search content
`systemprompt core content edit <id>`	Edit content fields
`systemprompt core content delete <id>`	Delete content by ID
`systemprompt core content delete-source <source>`	Delete all content from a source
`systemprompt core content popular`	Get popular content
`systemprompt core content verify <id>`	Verify content is published and accessible
`systemprompt core content status <source>`	Show content health status for a source
`systemprompt core content files`	Content-file operations (link, unlink, featured)
`systemprompt core content analytics`	Content analytics

Use systemprompt core content <command> --help for detailed options.

Troubleshooting

Content not appearing after publish -- Verify public: true is set in frontmatter, the source is enabled: true in config, and the kind value matches one of the source's allowed_content_types. Run systemprompt core content status <source> to check health.

Wrong URL in sitemap -- Check the url_pattern in the source's sitemap configuration. The {slug} placeholder is replaced with the content slug from frontmatter.

Frontmatter parse errors -- Ensure the file starts with --- on the first line, has valid YAML, and closes with ---. Required fields are title, slug, kind, and published_at.

Orphaned records in database -- Set the environment variable CONTENT_INGESTION_DELETE_ORPHANS=true before running ingestion. This removes records whose slugs no longer exist in the source directories.

SEO warnings during ingestion -- The validation service warns when descriptions are outside the 120-160 character range, when keywords are empty, or when authors are missing. These are warnings, not errors -- content will still be indexed.

Stale content after file changes -- The ingestion service compares SHA-256 hashes. If a file's hash matches the stored version, it is skipped. Force re-ingestion by modifying the file or using clear_before: true in the source's indexing config (use with caution in production).

Web Service -- Renders content using templates
Scheduler Service -- Runs the publish pipeline on a schedule
Files Service -- Manages file storage and uploads
Config Service -- Aggregates service configuration