Content Service
Manage blog posts, documentation, and legal pages with markdown authoring, YAML frontmatter, automatic indexing, and built-in SEO. The content service ingests files, stores them in PostgreSQL, and generates sitemaps, RSS feeds, and llms.txt.
On this page
TL;DR: The content service turns markdown files into a database-backed content library. You author posts and pages as markdown with YAML frontmatter, the ingestion system parses and stores them in PostgreSQL, and the publish pipeline generates static HTML, sitemaps, RSS feeds, robots.txt, and llms.txt -- all automatically.
What It Does and Why It Matters
Every systemprompt.io application needs content: blog posts for announcements, documentation for guides, legal pages for compliance. Each type has different URL patterns, templates, branding, and SEO requirements.
The content service solves this by providing a single, declarative system for all content types. You define content sources in YAML -- each source points to a directory of markdown files and specifies how those files should be categorized, indexed, and surfaced. The service handles:
- Parsing -- Reads markdown files and extracts YAML frontmatter metadata
- Validation -- Checks titles, slugs, descriptions, and content kinds before storing
- Indexing -- Stores content in PostgreSQL with version hashing for change detection
- Publishing -- Prerenders static HTML, generates sitemaps, RSS feeds, and llms.txt
- Orphan cleanup -- Removes database records when source files are deleted
This keeps your content workflow file-based (write markdown, commit, publish) while giving you database-backed querying, search, and analytics.
How Content Flows
Content moves through a multi-stage pipeline:
Markdown files --> Ingestion --> PostgreSQL --> Prerender --> Static HTML
|
+--> Sitemap XML
+--> RSS Feed
+--> llms.txt
+--> robots.txt
- Authoring -- Create
.mdfiles with YAML frontmatter in the source directory - Ingestion -- The
ContentIngestionJobwalks each source directory, parses frontmatter, computes a SHA-256 version hash, and upserts records into themarkdown_contenttable - Validation -- The
ValidationServicechecks metadata (title length, slug format, content kind, SEO description length) and body content before storage - Storage -- Content records include slug, title, description, body, author, keywords, kind, image, category, tags, and related links -- all queryable via SQL
- Prerendering -- The
ContentPrerenderJobrenders database content into static HTML pages using templates - Sitemap -- The
SitemapGenerationJobbuildssitemap.xmlfrom all published content, then appends static pages (homepage, feature pages) - RSS and llms.txt -- The pipeline generates an RSS feed and an
llms.txtfile for AI crawlers
The full PublishPipelineJob runs all these steps in sequence. It executes on startup and every 15 minutes by default.
Content Sources
Each content source defines a collection of related content. Configure them in services/content/config.yaml:
| Source | Path | Content Types | URL Pattern | Priority |
|---|---|---|---|---|
blog |
content/blog | blog | /blog/{slug} |
0.8 |
documentation |
content/documentation | guide, reference, tutorial, docs, feature | /documentation/{slug} |
0.7 |
platform |
content/platform | guide, reference, tutorial, docs, feature | /platform/{slug} |
0.7 |
legal |
content/legal | legal | /legal/{slug} |
0.3 |
skills |
skills | skill | N/A (not in sitemap) | -- |
playbooks |
content/playbooks | playbook | /playbooks/{slug} |
0.7 (disabled) |
Add new sources to handle custom content types. Each source gets its own branding, indexing behavior, and sitemap settings.
Markdown and Frontmatter
Every markdown file must begin with a YAML frontmatter block delimited by ---. The ingestion service parses this block into a ContentMetadata struct.
Required Fields
---
title: "Getting Started with systemprompt.io"
slug: "getting-started"
kind: "docs"
published_at: "2026-02-24"
---
Full Frontmatter Example
---
title: "Building AI Agents with systemprompt.io"
description: "Learn how to create, configure, and deploy AI agents using the systemprompt.io agent service."
author: "systemprompt.io Team"
slug: "building-ai-agents"
keywords: "agents, AI, automation, systemprompt"
image: "/files/images/docs/agents-guide.svg"
kind: "docs"
public: true
tags: ["agents", "AI", "guide"]
published_at: "2026-02-20"
updated_at: "2026-02-24"
category: "documentation"
after_reading_this:
- "Create and configure AI agents"
- "Deploy agents to production"
related_skills:
- title: "Agent Creator"
url: "/skills/agent-creator"
related_code:
- title: "Agent Extension Source"
url: "https://github.com/systempromptio/systemprompt-template/blob/main/extensions/agents/"
related_docs:
- title: "Agent Service Reference"
url: "/documentation/services/agents"
links:
- title: "Agent API Documentation"
url: "/documentation/api/agents"
---
Content Kinds
The kind field determines how content is categorized and rendered. It must be one of:
| Kind | Description |
|---|---|
blog |
Blog posts and announcements |
guide |
Step-by-step guides |
tutorial |
Hands-on tutorials |
reference |
API and configuration reference |
docs |
General documentation |
docs-index |
Documentation index pages |
docs-list |
Documentation listing pages |
feature |
Feature description pages |
legal |
Legal and compliance pages |
The kind must match one of the allowed_content_types for the source it belongs to.
Configuration Reference
Full Source Configuration
# services/content/config.yaml
content_sources:
blog:
path: "content/blog" # Directory relative to services/content/
source_id: "blog" # Unique identifier for this source
category_id: "blog" # Links to a category definition
enabled: true # Set false to skip during ingestion
description: "Blog articles and announcements"
allowed_content_types: ["blog"]
branding:
name: "Blog"
description: "Articles and updates"
image: "/images/blog/og-default.png"
keywords: "blog, articles, announcements, guides"
indexing:
clear_before: false # Clear all records before re-indexing
recursive: true # Walk subdirectories
sitemap:
enabled: true
url_pattern: "/blog/{slug}"
priority: 0.8
changefreq: "weekly"
fetch_from: "database" # Fetch content list from database
parent_route:
enabled: true
url: "/blog"
priority: 0.9
changefreq: "daily"
Categories
Categories group content sources for filtering and organization:
categories:
blog:
name: "Blog"
slug: "blog"
documentation:
name: "Documentation"
slug: "documentation"
platform:
name: "Platform"
slug: "platform"
description: "Platform documentation for the Claude Cowork platform"
skills:
name: "Skills"
slug: "skills"
description: "Agent skills for AI-powered automation"
Structured Data and Metadata
The content configuration includes structured data for SEO and social sharing:
metadata:
default_author: "Author"
language: "en"
structured_data:
organization:
name: "systemprompt.io"
url: "https://systemprompt.io"
logo: "https://systemprompt.io/files/images/logo.png"
description: "The production harness for AI superagents"
blog:
type: "Blog"
name: "Blog"
url: "https://systemprompt.io/blog"
description: "Articles and updates"
article_section: "Technology"
language: "en-US"
article:
type: "BlogPosting"
language: "en-US"
article_section: "Technology"
This structured data is injected into page templates as JSON-LD, giving search engines rich information about your content.
SEO Features
The content service includes several built-in SEO mechanisms:
- Validation -- Enforces optimal description length (120-160 characters), rejects descriptions over 500 characters, warns on empty keywords, and requires well-formed slugs (lowercase alphanumeric with hyphens, no double hyphens)
- Sitemap -- Generates standards-compliant
sitemap.xmlwith configurable priority and change frequency per source; static pages are appended automatically - RSS feed -- Creates an RSS feed from blog content for feed readers and syndication
- llms.txt -- Produces a structured text file for AI crawlers and language models
- robots.txt -- Generates appropriate crawler directives
- Structured data -- JSON-LD for organization, blog, and article schema types embedded in templates
Automatic Indexing and Database Storage
The ingestion system is built around change detection. Each file's content is hashed with SHA-256 and stored as a version_hash. On subsequent ingestion runs, only changed files are updated.
Key behaviors:
- Recursive walking -- The ingestion service uses
walkdirto traverse source directories, following symlinks and filtering for.mdfiles - Frontmatter parsing -- YAML between
---delimiters is deserialized intoContentMetadata - Date parsing -- Supports both RFC 3339 (
2026-02-24T00:00:00Z) and simple date (2026-02-24) formats - Orphan cleanup -- When
CONTENT_INGESTION_DELETE_ORPHANS=true, records in the database whose slugs no longer correspond to source files are deleted - Version hashing -- Content is only re-written when the SHA-256 hash changes, keeping database writes minimal
Content is stored in the markdown_content PostgreSQL table with columns for all metadata fields, the rendered body, and JSONB columns for structured data (links, related skills, related code, related docs).
The Publish Pipeline
The PublishPipelineJob orchestrates the entire content lifecycle. It runs these steps in order:
- Content ingestion -- Parse and store all markdown files
- Asset copy -- Copy extension assets to the dist directory
- Content prerender -- Render database content into static HTML
- Page prerender -- Render dynamic pages (product pages, feature pages)
- Sitemap generation -- Build sitemap.xml from published content
- llms.txt generation -- Build llms.txt for AI crawlers
- robots.txt generation -- Build robots.txt for search engines
- RSS feed generation -- Build RSS feed from blog content
- Asset organization -- Organize CSS and JS files in the dist directory
The pipeline runs on startup and every 15 minutes. You can also trigger it manually:
# Run the full publish pipeline
systemprompt core content publish
# Publish a specific source only
systemprompt core content publish --source blog
Content-File Associations
Link files to content with specific roles. This is essential for featured images, attachments, and inline media.
Important: For featured images to display on pages, you must do both:
- Link the file to content (
content files link) - Set the
imagefield on the content record (content edit --set image=...)
# Link a file as featured image
systemprompt core content files link <file_id> --content <content_id> --role featured
# Set the image field for template display
systemprompt core content edit <content_id> --set image="<public_url>"
# Link as attachment
systemprompt core content files link <file_id> --content <content_id> --role attachment
Available roles: featured, og-image, thumbnail, inline, attachment.
CLI Reference
| Command | Description |
|---|---|
systemprompt core content publish |
Run the full publish pipeline |
systemprompt core content publish --source <source> |
Publish a specific source |
systemprompt core content list |
List content with pagination |
systemprompt core content list --source <source> |
List content from a specific source |
systemprompt core content show <id> |
Show content details |
systemprompt core content search <query> |
Search content |
systemprompt core content edit <id> |
Edit content fields |
systemprompt core content delete <id> |
Delete content by ID |
systemprompt core content delete-source <source> |
Delete all content from a source |
systemprompt core content popular |
Get popular content |
systemprompt core content verify <id> |
Verify content is published and accessible |
systemprompt core content status <source> |
Show content health status for a source |
systemprompt core content files |
Content-file operations (link, unlink, featured) |
systemprompt core content analytics |
Content analytics |
Use systemprompt core content <command> --help for detailed options.
Troubleshooting
Content not appearing after publish -- Verify public: true is set in frontmatter, the source is enabled: true in config, and the kind value matches one of the source's allowed_content_types. Run systemprompt core content status <source> to check health.
Wrong URL in sitemap -- Check the url_pattern in the source's sitemap configuration. The {slug} placeholder is replaced with the content slug from frontmatter.
Frontmatter parse errors -- Ensure the file starts with --- on the first line, has valid YAML, and closes with ---. Required fields are title, slug, kind, and published_at.
Orphaned records in database -- Set the environment variable CONTENT_INGESTION_DELETE_ORPHANS=true before running ingestion. This removes records whose slugs no longer exist in the source directories.
SEO warnings during ingestion -- The validation service warns when descriptions are outside the 120-160 character range, when keywords are empty, or when authors are missing. These are warnings, not errors -- content will still be indexed.
Stale content after file changes -- The ingestion service compares SHA-256 hashes. If a file's hash matches the stored version, it is skipped. Force re-ingestion by modifying the file or using clear_before: true in the source's indexing config (use with caution in production).
Related Documentation
- Web Service -- Renders content using templates
- Scheduler Service -- Runs the publish pipeline on a schedule
- Files Service -- Manages file storage and uploads
- Config Service -- Aggregates service configuration