Fixing 404s in Headless Dynamic Routes

Q: How do I tell the difference between a true 404 and a soft 404?

True 404s return an HTTP 404 status code. Soft 404s return HTTP 200 with a 'not found' UI — Google treats them as valid pages and may index the empty shell. Check with curl -I and cross-reference the GSC URL Inspection tool: if the declared status is 200 but the page has no content, you have a soft 404.

Q: What ISR revalidation interval prevents stale 404s without hammering the origin?

Use revalidate: 60 to 300 seconds for high-frequency editorial content and pair it with on-demand revalidation via CMS webhooks. This means crawlers always receive a fresh page within the revalidation window, while the origin is only hit once per interval rather than on every request.

Headless routing pipelines introduce multiple failure points between a CMS slug and an HTTP 200 response — and each one can hand a crawler a 404 that wastes crawl budget and shrinks index coverage.

When to apply this fix

Apply this guide when at least one of the following is true:

Google Search Console coverage reports show a non-trivial volume of “Not found (404)” URLs that were previously indexed or are listed in your sitemap.
Your server logs show crawler requests hitting 404s on routes that exist in the CMS but were not included in the last static build.
A deployment or content deletion event has caused a spike in 404 responses visible in CDN access logs or uptime monitoring.

Step 1 — Establish a 404 baseline before touching any config

Pull 30 days of server access logs and compare them against the URL coverage report in Google Search Console. Record the following numbers before you make any changes; you will need them to prove the fix worked:

404 response rate — target: below 0.5% of total crawler requests
CDN cache-hit ratio — target: above 85%
Route generation latency — target: below 200 ms per route at build time

# Count 404s from nginx/CDN combined log (adjust field index to match your format)
awk '$9 == "404" {count++} END {print count " 404 responses"}' access.log

# Pull GSC coverage data via the API (requires a GSC API token in $GSC_TOKEN)
curl -s "https://searchconsole.googleapis.com/v1/sites/$(python3 -c 'import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1],safe=""))' 'https://example.com')/urlInspectionResult" \
  -H "Authorization: Bearer $GSC_TOKEN"

Validation: Run both commands before and after each fix in the steps below. A passing baseline is a 404 rate below 0.5% with no trending increase over the last 7 days in GSC.

Step 2 — Audit the route manifest against CMS slug outputs

Dynamic route generation relies on a contract between what the CMS outputs as a slug and what the frontend’s routing table expects. Any case-sensitivity difference, URL-encoding gap, or trailing-slash inconsistency breaks that contract.

Extract raw slug payloads directly from your CMS webhook payload schema.
Dump the current route manifest (Next.js: .next/server/pages-manifest.json or the App Router’s route tree; SvelteKit: build/server/manifest.js).
Diff the two lists for discrepancies.

// scripts/diff-slugs.mjs
// Run: node scripts/diff-slugs.mjs
import { readFileSync } from 'fs';

// Replace with your CMS API endpoint or a local export of slug payloads
const cmsslugs = await fetch(`${process.env.CMS_URL}/api/slugs`)
  .then((r) => r.json()); // string[]

const manifest = JSON.parse(readFileSync('.next/routes-manifest.json', 'utf-8'));
const builtRoutes = new Set(
  manifest.dynamicRoutes.map((r) => r.page)
);

const mismatches = cmsslugs.filter(
  (slug) => !builtRoutes.has(`/blog/${slug.toLowerCase()}`)
);
console.log(`Slug mismatches: ${mismatches.length}`);
mismatches.forEach((s) => console.log(' -', s));

Validation: The script should output Slug mismatches: 0. Any slug that appears in the mismatch list is a live 404 risk. Fix the upstream normalisation first — do not patch the manifest manually.

The root cause is almost always a missing lowercase transform. Apply it at the CMS API level so it is enforced at publish time, not only in the frontend resolver. Implementing SEO-friendly slug normalisation covers the full normalisation pipeline.

Step 3 — Fix ISR fallback and on-demand revalidation in Next.js

Stale Incremental Static Regeneration cache is the most common source of 404s after a content publish. If the CDN returns a cached response before ISR has had a chance to regenerate the route, the crawler receives a stale 404.

The fix has two parts: ensure dynamicParams = true so unknown slugs are generated on first request, and call notFound() when the slug does not resolve — never return a soft 404 (HTTP 200 with an empty template).

// app/blog/[slug]/page.tsx  — Next.js 15+ (App Router)
import { notFound } from 'next/navigation';

// Pre-generate known slugs at build time from the CMS
export async function generateStaticParams(): Promise<Array<{ slug: string }>> {
  const slugs: string[] = await fetch(`${process.env.CMS_URL}/api/slugs`).then(
    (r) => r.json()
  );
  // Normalise here so the manifest matches what the CMS will later emit
  return slugs.map((slug) => ({ slug: slug.toLowerCase() }));
}

// Allow on-demand generation for slugs not in generateStaticParams
export const dynamicParams = true;
// Revalidate the cached page every 5 minutes
export const revalidate = 300;

export default async function Page({
  params,
}: {
  params: Promise<{ slug: string }>;
}) {
  const { slug } = await params; // params is a Promise in Next.js 15+
  const post = await fetchPost(slug.toLowerCase());

  // Must call notFound() — returning null silently renders a soft 404
  if (!post) notFound();

  return <article>{post.content}</article>;
}

Validation:

# 1. Confirm a slug that was NOT in the last build returns 200 after first hit
curl -I https://example.com/blog/new-post-slug

# 2. Confirm a genuinely deleted slug returns 404 (not 200)
curl -I https://example.com/blog/deleted-post-slug

# 3. Verify the custom not-found page itself returns the correct status
curl -I https://example.com/blog/slug-that-does-not-exist
# Expected: HTTP/2 404

Pair this with an on-demand revalidation webhook so newly published content is pre-warmed before the first crawler hit arrives. The ISR vs SSG vs CSR routing page covers the tradeoffs in choosing a revalidation interval.

Step 4 — Intercept trailing-slash 404s at the CDN edge

A trailing-slash discrepancy between CMS permalink settings and framework routing config is responsible for a large fraction of apparent 404s — the content exists, but the URL /blog/my-post/ and /blog/my-post resolve to different cache keys. Fix this once at the edge rather than in every framework config.

// cloudflare-worker.js — deploy via wrangler or Cloudflare Dashboard
export default {
  async fetch(request) {
    const url = new URL(request.url);

    // 301-redirect trailing-slash variants to the canonical non-slash form
    // Exception: root path ("/") must not be redirected
    if (url.pathname !== '/' && url.pathname.endsWith('/')) {
      url.pathname = url.pathname.slice(0, -1);
      return Response.redirect(url.toString(), 301);
    }

    return fetch(request);
  },
};

Validation:

# Should return 301 Location: https://example.com/blog/my-post
curl -I https://example.com/blog/my-post/

# Should return 200 at the canonical URL
curl -IL https://example.com/blog/my-post/

This edge fix consolidates canonical URL enforcement to one place and prevents duplicate 404 entries accumulating in GSC for both slash and non-slash variants of the same path.

Step 5 — Validate the full route manifest and set a rollback threshold

Run a lightweight route-validation script against your deployed manifest before every production release. Integrating it into CI/CD catches mass 404s before they reach crawlers.

// scripts/validate-routes.mjs
// Usage: node scripts/validate-routes.mjs
import { readFileSync } from 'fs';

const BASE_URL = process.env.SITE_URL ?? 'https://example.com';
const urls = JSON.parse(readFileSync('./route-manifest.json', 'utf-8')); // [{ path: '/blog/slug' }]

let failures = 0;
for (const { path } of urls) {
  const res = await fetch(`${BASE_URL}${path}`);
  if (res.status === 404) {
    console.error(`FAIL [404]: ${path}`);
    failures++;
  } else if (res.status !== 200) {
    console.warn(`WARN [${res.status}]: ${path}`);
  }
}

if (failures / urls.length > 0.02) {
  console.error(`ERROR: 404 rate ${((failures / urls.length) * 100).toFixed(1)}% exceeds 2% threshold — blocking deploy`);
  process.exit(1);
}
console.log(`Validation complete. ${failures} failures out of ${urls.length} routes.`);

Validation: The script exits with code 1 when the 404 rate exceeds 2%. Wire exit 1 to your CI pipeline’s failure condition so the deployment is blocked automatically.

Rollback protocol:

Revert to the previous stable deployment via your CI/CD platform.
Trigger a full CDN cache purge across all edge nodes.
Restore the last verified route manifest snapshot.
Re-run the validation script against staging before promoting to production.

SEO impact summary

Signal	Correctly configured	Misconfigured
GSC index coverage	404 URLs drop out of coverage report within 1–2 crawl cycles	Indexed 404s accumulate, wasting crawl budget allocation
Crawl budget	Crawler skips dead URLs quickly, allocating budget to live pages	Repeated 404s train Googlebot to reduce crawl frequency
Link equity	Internal links point to 200 pages; equity flows correctly	Links pointing to 404s lose equity and create broken UX
Rich results eligibility	Clean status codes keep HowTo/FAQ schema eligible	Soft 404s (200 status, no content) cause schema to be ignored

The measurable signals to watch over the 30 days after deploying fixes: GSC coverage errors trending to zero, CDN 4xx log count dropping below 0.5% of total requests, and crawl rate in server logs stabilising or increasing.

Edge cases and gotchas

Preview environments. CMS preview URLs often use unpublished slugs that do not exist in the route manifest. If your preview domain is discoverable (no X-Robots-Tag: noindex), Googlebot may crawl preview slugs and log them as 404s once the draft is deleted. Block preview domains at the CDN layer or add noindex headers to all preview responses.

Multi-locale routing. In Next.js i18n config, the locale prefix is prepended before the slug: /en/blog/my-post. A slug normalisation script that strips the locale prefix before checking the manifest will produce false-positive mismatches. Ensure the diff script accounts for locale prefixes.

Incremental builds in SSG frameworks. Gatsby and older Next.js Pages Router builds do not regenerate routes that were not touched in the current build. A content deletion in the CMS will not automatically remove the stale static file from the output directory. Use a post-build cleanup step that deletes any static HTML file whose slug is no longer present in the CMS manifest.

Soft 404 detection. A custom 404 page that returns HTTP 200 is invisible in server logs but penalises crawl budget identically to a true 404 once Google detects the content mismatch. Audit with curl -I and the GSC URL Inspection tool. In Next.js App Router, the not-found.tsx file in the app/ directory handles this automatically; in Pages Router, verify pages/404.js is present and the server returns status 404.

Frequently asked questions

How do I tell the difference between a true 404 and a soft 404? True 404s return an HTTP 404 status code. Soft 404s return 200 with a “not found” UI — Google treats them as valid pages and may index the empty shell. Check with curl -I <url> and cross-reference the GSC URL Inspection tool: if the declared status is 200 but the page has no indexable content, you have a soft 404.

What ISR revalidation interval prevents stale 404s without hammering the origin? Use revalidate: 60 to 300 seconds for high-frequency editorial content and pair it with on-demand revalidation via CMS webhooks. This means crawlers always receive a fresh page within the revalidation window, while the origin is only hit once per interval rather than on every request.

Can a 404 spike hurt domain authority? Indexed 404s waste crawl budget and signal poor site health. Google’s documentation notes that URLs returning 404 for extended periods are dropped from the index. A short-lived spike during a deployment is unlikely to cause lasting damage, but sustained 404 rates above 1–2% of indexed URLs will reduce crawl frequency and index coverage over time.

Part of: Dynamic Route Generation

Related

Automating Dynamic Route Generation for Headless Blogs — pre-build all known slugs from the CMS before the first crawler hit
Implementing SEO-Friendly Slug Normalisation — enforce lowercase and character rules at the source
Canonical URL Enforcement — consolidate trailing-slash and case variants to a single canonical path
Crawl Budget Impact in Headless — understand how 404 accumulation reduces Googlebot’s crawl allocation