Automating Dynamic Route Generation for Headless Blogs
This page resolves one specific problem: keeping your headless blog’s URL space in perfect sync with your CMS without manual intervention, so bots always find valid, pre-rendered HTML at every published path.
When to use this approach
- Your CMS publishes or unpublishes content at a rate that makes hand-maintained route lists error-prone (more than a few posts per week).
- You have experienced dynamic route generation failures — 404 spikes, orphaned index entries, or crawl errors caused by mismatched slug data between the CMS and the build output.
- Your framework (Next.js, SvelteKit, or Nuxt) uses a static-params or pre-rendering hook that needs a deterministic list of paths at build time.
Implementation steps
Step 1 — Baseline audit before touching routing logic
Establish crawl metrics first. Changing routing without a baseline makes it impossible to detect regressions.
# Export current sitemap URL count
curl -s https://example.com/sitemap.xml | grep -c '<loc>'
# Count published CMS entries (example: Contentful)
curl -s "https://cdn.contentful.com/spaces/$SPACE_ID/entries?content_type=post&fields.status=published&limit=1" \
-H "Authorization: Bearer $CDA_TOKEN" | jq '.total'
# Crawl /blog/ for 404s
npx crawlee run --url https://example.com/blog/ --output-file audit.json
Validation: Published CMS count and live 200 URL count must agree within 1–2 entries (allow for drafts in transit). Any gap larger than 5% signals slug normalization drift or unpublished entries leaking into the routing layer.
Step 2 — Build the route manifest generation script
The script fetches only published entries, normalises every slug, writes a checked artifact, and aborts on any error rather than silently truncating output.
// scripts/generate-route-manifest.mjs
import fs from 'node:fs/promises';
import crypto from 'node:crypto';
const CMS_URL = process.env.CMS_URL;
const OUT_PATH = './routes.json';
const PAGE_SIZE = 100;
async function fetchAllRoutes() {
let page = 1, routes = [];
while (true) {
const res = await fetch(
`${CMS_URL}/posts?status=published&page=${page}&pageSize=${PAGE_SIZE}`
);
if (!res.ok) throw new Error(`CMS fetch failed: ${res.status} on page ${page}`);
const { data, hasNextPage } = await res.json();
routes.push(
...data.map((p) => ({ path: `/blog/${p.slug}`, lastmod: p.updatedAt }))
);
if (!hasNextPage) break;
page++;
}
return routes;
}
const routes = await fetchAllRoutes();
const json = JSON.stringify(routes, null, 2);
const hash = crypto.createHash('sha256').update(json).digest('hex');
await fs.writeFile(OUT_PATH, json);
await fs.writeFile(`${OUT_PATH}.sha256`, hash);
console.log(`Wrote ${routes.length} routes. SHA-256: ${hash}`);
Validation command:
node scripts/generate-route-manifest.mjs
# Confirm count matches CMS total
node -e "const r=JSON.parse(require('fs').readFileSync('./routes.json')); console.log(r.length)"
Key differences from a naive implementation: pagination prevents API rate-limit truncation, the SHA-256 checksum lets CI detect stale or partial artifacts, and the script throws hard on any non-200 CMS response instead of writing an empty file.
Step 3 — Inject the manifest into the framework routing layer
Next.js App Router
// app/blog/[slug]/page.js
import routes from '../../../routes.json' assert { type: 'json' };
export async function generateStaticParams() {
return routes.map((r) => ({
slug: r.path.replace('/blog/', ''),
}));
}
// Match CMS publish cadence; 3 600 s is safe for blogs updating hourly
export const revalidate = 3600;
SEO impact: Bots receive pre-rendered HTML at every /blog/[slug] path. ISR revalidation keeps content fresh without full rebuilds, reducing the crawl budget cost of discovery.
Validation command:
# Confirm build output contains the expected slug directories
ls .next/server/app/blog/ | wc -l
# Must equal routes.json length
SvelteKit
// src/routes/blog/[slug]/+page.server.js
import routes from '$lib/routes.json';
export function entries() {
return routes.map((r) => ({ slug: r.path.replace('/blog/', '') }));
}
export const prerender = true;
SEO impact: SvelteKit pre-renders every entry at build time. Unlike Next.js ISR, there is no background revalidation — re-deploy triggers via webhook keep the static output current.
Validation command:
ls .svelte-kit/output/prerendered/pages/blog/ | wc -l
Nuxt
// nuxt.config.ts
import routes from './routes.json';
export default defineNuxtConfig({
nitro: {
prerender: {
routes: routes.map((r) => r.path),
},
},
});
SEO impact: Nitro pre-renders each path to static HTML at nuxi generate time. Pair with canonical URL enforcement to prevent the pre-rendered and SSR versions of a path from appearing as separate URLs.
Validation command:
ls .output/public/blog/ | wc -l
Step 4 — Post-deployment diagnostic validation
# Spot-check 10 random routes for 200 + canonical
node -e "
const r = JSON.parse(require('fs').readFileSync('./routes.json'));
const sample = r.sort(() => 0.5 - Math.random()).slice(0, 10);
sample.forEach(({ path }) => console.log('https://example.com' + path));
" | xargs -I{} sh -c 'echo "---"; curl -sI {} | grep -E "HTTP/|canonical"'
# Diff manifest count vs sitemap count
MANIFEST=$(node -e "console.log(JSON.parse(require('fs').readFileSync('./routes.json')).length)")
SITEMAP=$(curl -s https://example.com/sitemap.xml | grep -c '<loc>')
echo "Manifest: $MANIFEST Sitemap: $SITEMAP"
What to look for:
- Every sampled route returns
HTTP/2 200and a<link rel="canonical">pointing to the canonicalhttps://URL (not a staging domain or preview URL). - Manifest count and sitemap URL count match within ±2 (the difference accounts for the homepage and non-blog pages in the sitemap).
- Google Search Console Coverage API shows no increase in “Crawled – currently not indexed” or “Submitted URL not found (404)” after deployment.
Step 5 — Manifest versioning and rollback automation
# In CI/CD — validate checksum before deploying
STORED=$(cat routes.json.sha256)
COMPUTED=$(sha256sum routes.json | awk '{print $1}')
if [ "$STORED" != "$COMPUTED" ]; then
echo "Checksum mismatch — aborting deployment" && exit 1
fi
# Rollback: revert to the last known-good manifest tag
git tag routes-stable-$(date +%Y%m%d) routes.json routes.json.sha256
# On failure:
# git checkout routes-stable-<date> -- routes.json
# Purge CDN cache for /blog/*
curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE/purge_cache" \
-H "Authorization: Bearer $CF_TOKEN" \
-H "Content-Type: application/json" \
--data '{"prefixes":["example.com/blog/"]}'
Validation command:
# Confirm previous manifest is retrievable
git show routes-stable-<date>:routes.json | jq 'length'
SEO impact summary
| Signal | Correctly configured | Misconfigured |
|---|---|---|
| Crawl efficiency | Bots find only published paths; no 404 waste |
Draft or deleted paths return 404, eroding crawl budget |
| Index coverage | 1:1 mapping between CMS entries and indexed URLs | Orphaned entries stay indexed; removed content returns 410 too slowly |
| Core Web Vitals | Pre-rendered HTML keeps LCP under 2.5 s for bots and users | CSR-only pages force bots to execute JavaScript; LCP degrades |
| Canonical consistency | Every path carries a self-referencing canonical pointing to https:// |
ISR and on-demand paths can generate mismatched canonicals causing duplicate-content signals |
Monitor these signals in GSC after the first automated build: Coverage → “Valid” count should trend up (or hold steady) and “Excluded” count should not increase.
Edge cases and gotchas
Preview environments leaking into manifests. If CMS_URL resolves to a preview or staging endpoint, draft posts enter the manifest and get pre-rendered on production. Enforce status=published in the API query and pin CMS_URL to the production delivery endpoint via CI/CD secrets — never the management API.
Multi-locale slug collisions. When a CMS serves content in multiple locales, two posts with the same English slug but different locale prefixes (/en/blog/my-post and /fr/blog/my-post) can both map to the same [slug] param if locale is not included in the manifest path. Add the locale prefix to every path field and adjust generateStaticParams to return { locale, slug }.
Incremental builds caching stale manifests. Vercel, Netlify, and Cloudflare Pages cache build dependencies between runs. If routes.json is generated inside the build step (not committed to the repository), a cache hit can skip manifest regeneration and serve a stale artifact. Either commit the manifest to the repository (so its content change triggers a new build hash) or disable dependency caching for the manifest generation step in your CI config.
Webhook race conditions on bulk CMS operations. Bulk-publishing 50 posts fires 50 rapid webhooks. If each webhook triggers a full rebuild, builds queue up and the last one may use a manifest fetched before all posts were fully written to the CMS delivery API. Add a debounce delay (30–60 seconds) to webhook handling, or use a scheduled manifest refresh every 15 minutes in place of per-post webhooks.
Slug changes without redirect mapping. If a CMS author changes a post’s slug, the old URL returns 404 and any external links or indexed copies of the old URL lose equity. Enforce immutable slugs at the CMS schema level (read-only after first publish), or detect slug changes in the manifest diff and write corresponding 301 entries to your redirect chain management config before deploying.
FAQ
How do I validate that all CMS entries generated valid routes post-deployment? Diff the generated route manifest against the CMS content count using the script above, then run a headless crawler over a random sample to verify 200 status codes and canonical tags.
What is the safest rollback strategy if automated routing breaks indexation?
Maintain a versioned route manifest in Git (tagged before each deploy). On failure: restore the previous manifest tag, redeploy, and purge the CDN cache for /blog/*. The pre-rendered HTML from the previous build remains cached at the edge while the rollback completes.
How do I handle pagination for headless blog archives without duplicate content?
Use self-referencing canonicals on paginated archive pages (/blog/page/2/ canonicals to itself, not to page 1). Apply noindex, follow on pages 2 and beyond so bots traverse links but do not index the paginated views. Full guidance is in pagination handling for headless APIs.
Part of: Dynamic Route Generation
Related
- Fixing 404s in Headless Dynamic Routes — diagnose and repair broken route resolution in CI/CD pipelines
- Implementing SEO-Friendly Slug Normalization — prevent slug drift that causes the manifest mismatches this guide guards against
- Configuring Next.js ISR for Optimal Crawl Budget — tune revalidation intervals so ISR-generated routes do not waste crawl allocation
- XML Sitemap Generation for Headless — keep the sitemap in sync with the same manifest this pipeline produces