Edge Caching Behavior for SEO

Q: How do I confirm whether the CDN or origin served the page to a crawler?

Inspect X-Cache, CF-Cache-Status, or x-nextjs-cache headers via curl -I or a synthetic crawl tool. A HIT confirms edge delivery; a MISS indicates an origin request that bypassed the cache.

Q: What happens when a CDN caches a 404 or 500 response?

Error responses cached at the edge poison SERP indexation by serving the error page to Googlebot on subsequent crawls. Set Cache-Control: no-store on all error templates and configure CDN bypass rules for 4xx and 5xx status codes.

CDN edge nodes sit between search crawlers and your origin server, so every cache misconfiguration becomes an indexation problem. This page covers how to set Cache-Control directives correctly for headless deployments, eliminate Vary fragmentation that wastes crawl budget in headless deployments, and wire invalidation so bots always receive fresh HTML without hammering your origin.

Prerequisites

Before adjusting edge caching, confirm the following are in place:

Framework version: Next.js 13+, SvelteKit 2+, or Nuxt 3+ (earlier versions lack granular route-level cache header APIs)
CDN access: admin access to Cloudflare, Fastly, or Vercel Edge Network dashboard to create cache rules and purge policies
curl and jq installed locally for header inspection
CMS webhook endpoint: your headless CMS (Contentful, Sanity, Hygraph, etc.) must support publish webhooks for cache invalidation
Environment variables: CDN_PURGE_API_KEY and CDN_ZONE_ID available as secrets in your build pipeline

How Edge Caching Interacts with Search Bots

The diagram below shows the request path Googlebot follows when hitting a headless site with CDN edge nodes in front of the origin.

The key SEO insight: once a page enters the edge cache, Googlebot receives the same HTML snapshot on every recrawl until s-maxage expires or a purge fires. Misconfigured Vary headers or absent s-maxage values break that consistency.

Step-by-Step Implementation Workflow

Step 1 — Audit your current cache posture

curl -sI https://yourdomain.com/ | grep -iE "cache-control|vary|x-cache|cf-cache|age"

Expected output for a correctly cached static route:

cache-control: public, s-maxage=300, stale-while-revalidate=86400
vary: Accept-Encoding
cf-cache-status: HIT
age: 47

Any Vary: User-Agent, Vary: Cookie, or missing s-maxage is a defect to fix before moving on.

Step 2 — Map route patterns to TTL tiers

Classify every route in your app into one of three tiers:

Tier	Route pattern	Recommended `s-maxage`	`stale-while-revalidate`
Static	`/`, `/about`, build-time blog posts	3600 s (1 hr)	86400 s (24 hr)
Semi-dynamic	ISR-eligible routes, product pages	300 s (5 min)	86400 s (24 hr)
Dynamic	User sessions, previews, cart	0 / `no-store`	—

The rendering strategy chosen in ISR vs SSG vs CSR Routing maps directly onto these tiers: SSG routes take Tier 1, ISR routes take Tier 2, and CSR routes with personalised data take Tier 3.

Step 3 — Inject headers at the framework layer

Set headers in your framework’s routing layer rather than the CDN dashboard so they travel with the code and are visible in version control. Framework-specific examples follow in the next section.

Step 4 — Create CDN cache rules to enforce `s-maxage`

In Cloudflare: Caching > Cache Rules > Create rule. Match the route pattern and set Edge Cache TTL = Respect existing headers so your framework’s s-maxage is authoritative. Add a secondary rule for error pages:

If: http.response.code in {400 404 500 503}
Then: Cache-Control: no-store

Caching 4xx/5xx responses is one of the fastest ways to poison Googlebot’s view of a site — pages indexed as errors rather than content.

Step 5 — Wire CMS webhooks to the purge API

Every time an editor publishes in your headless CMS, trigger a targeted purge. Using Cloudflare’s tag-based purge API:

curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"tags":["post-slug-my-article"]}'

Tag pages at render time by returning a Cache-Tag: post-slug-<slug> response header, then purge that tag on publish. This is narrower than a full-zone purge and preserves the cache-hit ratio for unchanged routes.

Step 6 — Validate the full loop

# Confirm initial MISS
curl -sI -A "Googlebot/2.1" https://yourdomain.com/blog/my-article | grep -iE "cf-cache|age|cache-control"

# Confirm subsequent HIT
curl -sI -A "Googlebot/2.1" https://yourdomain.com/blog/my-article | grep -iE "cf-cache|age"

# Trigger purge, then confirm re-MISS followed by HIT

Framework-Specific Cache Header Implementation

Next.js App Router

// next.config.js
module.exports = {
  async headers() {
    return [
      {
        source: '/blog/:slug*',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, s-maxage=300, stale-while-revalidate=86400',
          },
        ],
      },
      {
        source: '/((?!api|_next).*)',
        headers: [
          {
            key: 'Cache-Control',
            value: 'public, s-maxage=3600, stale-while-revalidate=86400',
          },
        ],
      },
    ];
  },
};

SEO impact: The App Router’s fetch cache and CDN edge cache are independent layers. s-maxage controls the CDN; revalidate controls the server-side fetch. Aligning them prevents the situation where the CDN serves an edge-cached page whose server-rendered HTML fetched stale data from the CMS.

Validation: Check x-nextjs-cache (MISS / HIT / STALE) alongside CF-Cache-Status. Both should converge on HIT within two requests.

SvelteKit

// src/hooks.server.ts
import type { Handle } from '@sveltejs/kit';

export const handle: Handle = async ({ event, resolve }) => {
  const response = await resolve(event);

  // Do not cache authenticated or preview routes
  if (!event.locals.user && !event.url.searchParams.has('preview')) {
    response.headers.set(
      'CDN-Cache-Control',
      'public, s-maxage=300, stale-while-revalidate=86400'
    );
  }

  return response;
};

SEO impact: SvelteKit’s CDN-Cache-Control header is respected by Cloudflare and Fastly, overriding any Cache-Control: private that SvelteKit injects for cookie-bearing requests. This ensures bot traffic receives cacheable responses even when the edge worker sees a cookie jar.

Validation: Fetch a route twice from a clean client (no cookies). Confirm the second response carries X-Cache: HIT.

Nuxt (Nitro)

// nuxt.config.ts
export default defineNuxtConfig({
  routeRules: {
    '/': { swr: 3600 },
    '/blog/**': { swr: 300 },
    '/products/**': { cache: { maxAge: 600 } },
    '/api/cart/**': { cache: false },
  },
});

SEO impact: Nitro’s routeRules emit correct Cache-Control headers server-side, including stale-while-revalidate for swr values. This means Googlebot always receives a non-stale response: the SWR window allows background refresh without forcing a bot-visible MISS.

Validation: Fetch /blog/ twice; confirm the second response includes cache-control: public, max-age=300, stale-while-revalidate=300.

Remix

// app/routes/posts.$slug.tsx
import { json } from '@remix-run/node';

export async function loader({ params }: LoaderFunctionArgs) {
  const post = await getPost(params.slug);
  return json(post, {
    headers: {
      'Cache-Control': 'public, max-age=60, s-maxage=300, stale-while-revalidate=86400',
    },
  });
}

SEO impact: Remix loader headers propagate to the document response. Setting both max-age (browser) and s-maxage (CDN) separately lets you serve instant browser navigations without preventing CDN caching for bots.

Validation: Confirm s-maxage=300 appears in curl -I output, then verify CF-Cache-Status: HIT on the second request.

HTTP Headers and CDN Directives Reference

Header	Required value	Rationale
`Cache-Control`	`public, s-maxage=N, stale-while-revalidate=M`	`s-maxage` controls shared/CDN cache TTL; `stale-while-revalidate` enables background refresh without bot-visible latency
`CDN-Cache-Control`	Same pattern as above	SvelteKit/Fastly-specific override that takes precedence over `Cache-Control: private` for edge nodes
`Vary`	`Accept-Encoding` only	Any additional `Vary` field (e.g. `User-Agent`, `Cookie`) multiplies cache entries and fragments bot delivery
`Cache-Tag`	`page-<slug>, section-<slug>`	Enables tag-based purge on CMS publish without a full-zone flush
`Surrogate-Control`	`max-age=N`	Fastly-specific TTL directive, stripped before the browser sees the response
`Cache-Control` on errors	`no-store`	Prevents `4xx`/`5xx` responses from being cached and served to subsequent bot requests
`Age`	(response, read-only)	Number of seconds the response has been in cache; used to verify TTL and detect stale edge nodes

Vary Header and Cache Fragmentation

The Vary header tells CDNs which request headers differentiate responses. Every unique combination of Vary field values creates a separate cache entry.

Problematic pattern:

Vary: Accept-Encoding, User-Agent, Cookie

This forces the CDN to store a separate HTML copy for every browser user-agent string and cookie fingerprint. Googlebot’s user-agent string alone produces dozens of variants across different crawl versions (Googlebot/2.1, Googlebot-Image, etc.), effectively fragmenting the cache and causing constant origin misses.

Correct pattern:

Vary: Accept-Encoding

To strip unneeded Vary values in Cloudflare Workers:

// worker.js — strip Vary to only Accept-Encoding
export default {
  async fetch(request, env, ctx) {
    const response = await fetch(request);
    const newHeaders = new Headers(response.headers);
    newHeaders.set('Vary', 'Accept-Encoding');
    return new Response(response.body, { ...response, headers: newHeaders });
  },
};

Vary fragmentation wastes the crawl budget managed in headless setups because every bot hit looks like a unique uncached request to the CDN.

Validation Protocol

Run the following sequence after any change to cache configuration:

# 1. Initial MISS check
curl -sI https://yourdomain.com/target-page \
  | grep -iE "cache-control|vary|cf-cache-status|x-cache|age"

# 2. Subsequent HIT check (run immediately after step 1)
curl -sI https://yourdomain.com/target-page \
  | grep -iE "cf-cache-status|x-cache|age"

# 3. Bot user-agent check
curl -sI -A "Googlebot/2.1" https://yourdomain.com/target-page \
  | grep -iE "cf-cache-status|vary|cache-control"

# 4. Error response check (should never cache)
curl -sI https://yourdomain.com/non-existent-path \
  | grep -iE "cache-control|cf-cache-status"

Expected results:

Check	Expected value
`CF-Cache-Status` after first request	`MISS`
`CF-Cache-Status` after second request	`HIT`
`Vary` on all cacheable routes	`Accept-Encoding` only
`Cache-Control` on 404 pages	`no-store`
`Age` on HIT responses	Integer between 1 and `s-maxage` value

Google Search Console signal: After correcting caching configuration, monitor the “Crawled — currently not indexed” count in GSC’s Pages report. A sustained drop over 2–4 weeks indicates bots are now receiving consistent, cacheable HTML.

Lighthouse CI: Add a resourceSummary assertion in lighthouserc.js to alert when TTFB exceeds 200 ms — a symptom of cache bypass at scale.

Troubleshooting

Symptom	Root cause	Fix
`CF-Cache-Status: BYPASS` on all routes	Framework emitting `Set-Cookie` or `Authorization` header in response	Strip or scope cookies to subpaths; use `CDN-Cache-Control` to override
`CF-Cache-Status: DYNAMIC`	Route matched a Cloudflare cache rule with “bypass” or no cache-control rule exists	Create a Cache Rule matching the route with TTL = Respect existing headers
`Vary: User-Agent` appearing in responses	Framework middleware reading `User-Agent` for bot detection and reflecting it in `Vary`	Move bot detection to edge worker; never reflect `User-Agent` in `Vary`
Stale SERP snippets after CMS publish	CDN cache not purged on content update	Connect CMS publish webhook to CDN purge API; use tag-based invalidation
`Cache-Control: no-cache` overriding `s-maxage`	`no-cache` forces CDN revalidation on every request, defeating edge caching	Replace `no-cache` with `stale-while-revalidate` for routes that tolerate brief staleness
`404` pages served from cache	Error responses cached before `no-store` rule was applied	Purge error routes explicitly; add CDN rule: status in {4xx, 5xx} → `Cache-Control: no-store`
Hydration mismatch in Next.js App Router	Cached HTML differs from client-rendered tree due to time-sensitive data	Move time-sensitive data to client components with `'use client'`; keep cached RSC output stable
Googlebot receiving `MISS` on every crawl	`s-maxage` set to 0 or missing on key routes	Audit via `curl -I` and add `s-maxage=300` minimum for all public routes

FAQ

How does edge caching affect Googlebot’s rendering pipeline?

Googlebot fetches the cached HTML snapshot directly from the nearest edge node. Misconfigured TTLs force the crawler to either receive stale content or trigger origin rate limits — both delay fresh content discovery and degrade ranking velocity for newly published pages.

Should headless API responses be cached at the edge?

Cache public, non-personalised API responses at the edge using s-maxage. Isolate user-specific endpoints with private or no-store directives to prevent cache poisoning and data leakage to other users.

How do I confirm whether the CDN or origin served a page to a crawler?

Inspect X-Cache, CF-Cache-Status, or x-nextjs-cache headers via curl -I or a synthetic crawl tool. A HIT confirms edge delivery. A MISS indicates the request reached the origin server.

What happens when a CDN caches a 404 or 500 response?

Error responses cached at the edge poison SERP indexation: Googlebot receives the error page on every subsequent crawl until the cache entry expires. Set Cache-Control: no-store on all error templates and create a CDN bypass rule for 4xx and 5xx status codes.

Crawl Budget Impact in Headless — how CDN cache-hit ratios and origin response times directly affect the number of pages Googlebot indexes per day
ISR vs SSG vs CSR Routing — choosing the rendering strategy that determines which TTL tier each route belongs to
Framework-Specific Rendering Tradeoffs — per-framework analysis of how Next.js, Nuxt, SvelteKit, and Astro emit cache headers differently
XML Sitemap Generation for Headless — ensuring freshly purged and re-cached routes are discoverable through an up-to-date sitemap
Canonical URL Enforcement — preventing duplicate cache entries caused by trailing-slash variants or protocol mismatches reaching the CDN

Part of: Headless Architecture & Rendering Strategy Fundamentals