Automating Dynamic Route Generation for Headless Blogs

This page resolves one specific problem: keeping your headless blog’s URL space in perfect sync with your CMS without manual intervention, so bots always find valid, pre-rendered HTML at every published path.

When to use this approach

  • Your CMS publishes or unpublishes content at a rate that makes hand-maintained route lists error-prone (more than a few posts per week).
  • You have experienced dynamic route generation failures — 404 spikes, orphaned index entries, or crawl errors caused by mismatched slug data between the CMS and the build output.
  • Your framework (Next.js, SvelteKit, or Nuxt) uses a static-params or pre-rendering hook that needs a deterministic list of paths at build time.

Headless blog route generation pipeline Five stages: CMS API, Manifest Script, routes.json, Framework Build (generateStaticParams), CDN/Edge. Arrows connect each stage left to right. A Webhook trigger arrow loops from CMS API back to Manifest Script. CMS API published posts Manifest Script fetch · normalise · checksum routes.json versioned artifact Framework Build generateStaticParams / getStaticPaths CDN / Edge pre-rendered HTML webhook trigger ① audit ② generate ③ artifact ④ inject ⑤ serve Checksum mismatch → halt CI → revert Git tag → CDN purge

Implementation steps

Step 1 — Baseline audit before touching routing logic

Establish crawl metrics first. Changing routing without a baseline makes it impossible to detect regressions.

# Export current sitemap URL count
curl -s https://example.com/sitemap.xml | grep -c '<loc>'

# Count published CMS entries (example: Contentful)
curl -s "https://cdn.contentful.com/spaces/$SPACE_ID/entries?content_type=post&fields.status=published&limit=1" \
  -H "Authorization: Bearer $CDA_TOKEN" | jq '.total'

# Crawl /blog/ for 404s
npx crawlee run --url https://example.com/blog/ --output-file audit.json

Validation: Published CMS count and live 200 URL count must agree within 1–2 entries (allow for drafts in transit). Any gap larger than 5% signals slug normalization drift or unpublished entries leaking into the routing layer.


Step 2 — Build the route manifest generation script

The script fetches only published entries, normalises every slug, writes a checked artifact, and aborts on any error rather than silently truncating output.

// scripts/generate-route-manifest.mjs
import fs from 'node:fs/promises';
import crypto from 'node:crypto';

const CMS_URL = process.env.CMS_URL;
const OUT_PATH = './routes.json';
const PAGE_SIZE = 100;

async function fetchAllRoutes() {
  let page = 1, routes = [];
  while (true) {
    const res = await fetch(
      `${CMS_URL}/posts?status=published&page=${page}&pageSize=${PAGE_SIZE}`
    );
    if (!res.ok) throw new Error(`CMS fetch failed: ${res.status} on page ${page}`);
    const { data, hasNextPage } = await res.json();
    routes.push(
      ...data.map((p) => ({ path: `/blog/${p.slug}`, lastmod: p.updatedAt }))
    );
    if (!hasNextPage) break;
    page++;
  }
  return routes;
}

const routes = await fetchAllRoutes();
const json   = JSON.stringify(routes, null, 2);
const hash   = crypto.createHash('sha256').update(json).digest('hex');

await fs.writeFile(OUT_PATH, json);
await fs.writeFile(`${OUT_PATH}.sha256`, hash);
console.log(`Wrote ${routes.length} routes. SHA-256: ${hash}`);

Validation command:

node scripts/generate-route-manifest.mjs
# Confirm count matches CMS total
node -e "const r=JSON.parse(require('fs').readFileSync('./routes.json')); console.log(r.length)"

Key differences from a naive implementation: pagination prevents API rate-limit truncation, the SHA-256 checksum lets CI detect stale or partial artifacts, and the script throws hard on any non-200 CMS response instead of writing an empty file.


Step 3 — Inject the manifest into the framework routing layer

Next.js App Router

// app/blog/[slug]/page.js
import routes from '../../../routes.json' assert { type: 'json' };

export async function generateStaticParams() {
  return routes.map((r) => ({
    slug: r.path.replace('/blog/', ''),
  }));
}

// Match CMS publish cadence; 3 600 s is safe for blogs updating hourly
export const revalidate = 3600;

SEO impact: Bots receive pre-rendered HTML at every /blog/[slug] path. ISR revalidation keeps content fresh without full rebuilds, reducing the crawl budget cost of discovery.

Validation command:

# Confirm build output contains the expected slug directories
ls .next/server/app/blog/ | wc -l
# Must equal routes.json length

SvelteKit

// src/routes/blog/[slug]/+page.server.js
import routes from '$lib/routes.json';

export function entries() {
  return routes.map((r) => ({ slug: r.path.replace('/blog/', '') }));
}

export const prerender = true;

SEO impact: SvelteKit pre-renders every entry at build time. Unlike Next.js ISR, there is no background revalidation — re-deploy triggers via webhook keep the static output current.

Validation command:

ls .svelte-kit/output/prerendered/pages/blog/ | wc -l

Nuxt

// nuxt.config.ts
import routes from './routes.json';

export default defineNuxtConfig({
  nitro: {
    prerender: {
      routes: routes.map((r) => r.path),
    },
  },
});

SEO impact: Nitro pre-renders each path to static HTML at nuxi generate time. Pair with canonical URL enforcement to prevent the pre-rendered and SSR versions of a path from appearing as separate URLs.

Validation command:

ls .output/public/blog/ | wc -l

Step 4 — Post-deployment diagnostic validation

# Spot-check 10 random routes for 200 + canonical
node -e "
  const r = JSON.parse(require('fs').readFileSync('./routes.json'));
  const sample = r.sort(() => 0.5 - Math.random()).slice(0, 10);
  sample.forEach(({ path }) => console.log('https://example.com' + path));
" | xargs -I{} sh -c 'echo "---"; curl -sI {} | grep -E "HTTP/|canonical"'

# Diff manifest count vs sitemap count
MANIFEST=$(node -e "console.log(JSON.parse(require('fs').readFileSync('./routes.json')).length)")
SITEMAP=$(curl -s https://example.com/sitemap.xml | grep -c '<loc>')
echo "Manifest: $MANIFEST  Sitemap: $SITEMAP"

What to look for:

  • Every sampled route returns HTTP/2 200 and a <link rel="canonical"> pointing to the canonical https:// URL (not a staging domain or preview URL).
  • Manifest count and sitemap URL count match within ±2 (the difference accounts for the homepage and non-blog pages in the sitemap).
  • Google Search Console Coverage API shows no increase in “Crawled – currently not indexed” or “Submitted URL not found (404)” after deployment.

Step 5 — Manifest versioning and rollback automation

# In CI/CD — validate checksum before deploying
STORED=$(cat routes.json.sha256)
COMPUTED=$(sha256sum routes.json | awk '{print $1}')
if [ "$STORED" != "$COMPUTED" ]; then
  echo "Checksum mismatch — aborting deployment" && exit 1
fi

# Rollback: revert to the last known-good manifest tag
git tag routes-stable-$(date +%Y%m%d) routes.json routes.json.sha256
# On failure:
# git checkout routes-stable-<date> -- routes.json
# Purge CDN cache for /blog/*
curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE/purge_cache" \
  -H "Authorization: Bearer $CF_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"prefixes":["example.com/blog/"]}'

Validation command:

# Confirm previous manifest is retrievable
git show routes-stable-<date>:routes.json | jq 'length'

SEO impact summary

Signal Correctly configured Misconfigured
Crawl efficiency Bots find only published paths; no 404 waste Draft or deleted paths return 404, eroding crawl budget
Index coverage 1:1 mapping between CMS entries and indexed URLs Orphaned entries stay indexed; removed content returns 410 too slowly
Core Web Vitals Pre-rendered HTML keeps LCP under 2.5 s for bots and users CSR-only pages force bots to execute JavaScript; LCP degrades
Canonical consistency Every path carries a self-referencing canonical pointing to https:// ISR and on-demand paths can generate mismatched canonicals causing duplicate-content signals

Monitor these signals in GSC after the first automated build: Coverage → “Valid” count should trend up (or hold steady) and “Excluded” count should not increase.


Edge cases and gotchas

Preview environments leaking into manifests. If CMS_URL resolves to a preview or staging endpoint, draft posts enter the manifest and get pre-rendered on production. Enforce status=published in the API query and pin CMS_URL to the production delivery endpoint via CI/CD secrets — never the management API.

Multi-locale slug collisions. When a CMS serves content in multiple locales, two posts with the same English slug but different locale prefixes (/en/blog/my-post and /fr/blog/my-post) can both map to the same [slug] param if locale is not included in the manifest path. Add the locale prefix to every path field and adjust generateStaticParams to return { locale, slug }.

Incremental builds caching stale manifests. Vercel, Netlify, and Cloudflare Pages cache build dependencies between runs. If routes.json is generated inside the build step (not committed to the repository), a cache hit can skip manifest regeneration and serve a stale artifact. Either commit the manifest to the repository (so its content change triggers a new build hash) or disable dependency caching for the manifest generation step in your CI config.

Webhook race conditions on bulk CMS operations. Bulk-publishing 50 posts fires 50 rapid webhooks. If each webhook triggers a full rebuild, builds queue up and the last one may use a manifest fetched before all posts were fully written to the CMS delivery API. Add a debounce delay (30–60 seconds) to webhook handling, or use a scheduled manifest refresh every 15 minutes in place of per-post webhooks.

Slug changes without redirect mapping. If a CMS author changes a post’s slug, the old URL returns 404 and any external links or indexed copies of the old URL lose equity. Enforce immutable slugs at the CMS schema level (read-only after first publish), or detect slug changes in the manifest diff and write corresponding 301 entries to your redirect chain management config before deploying.


FAQ

How do I validate that all CMS entries generated valid routes post-deployment? Diff the generated route manifest against the CMS content count using the script above, then run a headless crawler over a random sample to verify 200 status codes and canonical tags.

What is the safest rollback strategy if automated routing breaks indexation? Maintain a versioned route manifest in Git (tagged before each deploy). On failure: restore the previous manifest tag, redeploy, and purge the CDN cache for /blog/*. The pre-rendered HTML from the previous build remains cached at the edge while the rollback completes.

How do I handle pagination for headless blog archives without duplicate content? Use self-referencing canonicals on paginated archive pages (/blog/page/2/ canonicals to itself, not to page 1). Apply noindex, follow on pages 2 and beyond so bots traverse links but do not index the paginated views. Full guidance is in pagination handling for headless APIs.


Part of: Dynamic Route Generation

Related