Edge Caching Behavior for SEO
CDN edge nodes sit between search crawlers and your origin server, so every cache misconfiguration becomes an indexation problem. This page covers how to set Cache-Control directives correctly for headless deployments, eliminate Vary fragmentation that wastes crawl budget in headless deployments, and wire invalidation so bots always receive fresh HTML without hammering your origin.
Prerequisites
Before adjusting edge caching, confirm the following are in place:
- Framework version: Next.js 13+, SvelteKit 2+, or Nuxt 3+ (earlier versions lack granular route-level cache header APIs)
- CDN access: admin access to Cloudflare, Fastly, or Vercel Edge Network dashboard to create cache rules and purge policies
curlandjqinstalled locally for header inspection- CMS webhook endpoint: your headless CMS (Contentful, Sanity, Hygraph, etc.) must support publish webhooks for cache invalidation
- Environment variables:
CDN_PURGE_API_KEYandCDN_ZONE_IDavailable as secrets in your build pipeline
How Edge Caching Interacts with Search Bots
The diagram below shows the request path Googlebot follows when hitting a headless site with CDN edge nodes in front of the origin.
The key SEO insight: once a page enters the edge cache, Googlebot receives the same HTML snapshot on every recrawl until s-maxage expires or a purge fires. Misconfigured Vary headers or absent s-maxage values break that consistency.
Step-by-Step Implementation Workflow
Step 1 — Audit your current cache posture
curl -sI https://yourdomain.com/ | grep -iE "cache-control|vary|x-cache|cf-cache|age"
Expected output for a correctly cached static route:
cache-control: public, s-maxage=300, stale-while-revalidate=86400
vary: Accept-Encoding
cf-cache-status: HIT
age: 47
Any Vary: User-Agent, Vary: Cookie, or missing s-maxage is a defect to fix before moving on.
Step 2 — Map route patterns to TTL tiers
Classify every route in your app into one of three tiers:
| Tier | Route pattern | Recommended s-maxage |
stale-while-revalidate |
|---|---|---|---|
| Static | /, /about, build-time blog posts |
3600 s (1 hr) | 86400 s (24 hr) |
| Semi-dynamic | ISR-eligible routes, product pages | 300 s (5 min) | 86400 s (24 hr) |
| Dynamic | User sessions, previews, cart | 0 / no-store |
— |
The rendering strategy chosen in ISR vs SSG vs CSR Routing maps directly onto these tiers: SSG routes take Tier 1, ISR routes take Tier 2, and CSR routes with personalised data take Tier 3.
Step 3 — Inject headers at the framework layer
Set headers in your framework’s routing layer rather than the CDN dashboard so they travel with the code and are visible in version control. Framework-specific examples follow in the next section.
Step 4 — Create CDN cache rules to enforce s-maxage
In Cloudflare: Caching > Cache Rules > Create rule. Match the route pattern and set Edge Cache TTL = Respect existing headers so your framework’s s-maxage is authoritative. Add a secondary rule for error pages:
If: http.response.code in {400 404 500 503}
Then: Cache-Control: no-store
Caching 4xx/5xx responses is one of the fastest ways to poison Googlebot’s view of a site — pages indexed as errors rather than content.
Step 5 — Wire CMS webhooks to the purge API
Every time an editor publishes in your headless CMS, trigger a targeted purge. Using Cloudflare’s tag-based purge API:
curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{"tags":["post-slug-my-article"]}'
Tag pages at render time by returning a Cache-Tag: post-slug-<slug> response header, then purge that tag on publish. This is narrower than a full-zone purge and preserves the cache-hit ratio for unchanged routes.
Step 6 — Validate the full loop
# Confirm initial MISS
curl -sI -A "Googlebot/2.1" https://yourdomain.com/blog/my-article | grep -iE "cf-cache|age|cache-control"
# Confirm subsequent HIT
curl -sI -A "Googlebot/2.1" https://yourdomain.com/blog/my-article | grep -iE "cf-cache|age"
# Trigger purge, then confirm re-MISS followed by HIT
Framework-Specific Cache Header Implementation
Next.js App Router
// next.config.js
module.exports = {
async headers() {
return [
{
source: '/blog/:slug*',
headers: [
{
key: 'Cache-Control',
value: 'public, s-maxage=300, stale-while-revalidate=86400',
},
],
},
{
source: '/((?!api|_next).*)',
headers: [
{
key: 'Cache-Control',
value: 'public, s-maxage=3600, stale-while-revalidate=86400',
},
],
},
];
},
};
SEO impact: The App Router’s fetch cache and CDN edge cache are independent layers. s-maxage controls the CDN; revalidate controls the server-side fetch. Aligning them prevents the situation where the CDN serves an edge-cached page whose server-rendered HTML fetched stale data from the CMS.
Validation: Check x-nextjs-cache (MISS / HIT / STALE) alongside CF-Cache-Status. Both should converge on HIT within two requests.
SvelteKit
// src/hooks.server.ts
import type { Handle } from '@sveltejs/kit';
export const handle: Handle = async ({ event, resolve }) => {
const response = await resolve(event);
// Do not cache authenticated or preview routes
if (!event.locals.user && !event.url.searchParams.has('preview')) {
response.headers.set(
'CDN-Cache-Control',
'public, s-maxage=300, stale-while-revalidate=86400'
);
}
return response;
};
SEO impact: SvelteKit’s CDN-Cache-Control header is respected by Cloudflare and Fastly, overriding any Cache-Control: private that SvelteKit injects for cookie-bearing requests. This ensures bot traffic receives cacheable responses even when the edge worker sees a cookie jar.
Validation: Fetch a route twice from a clean client (no cookies). Confirm the second response carries X-Cache: HIT.
Nuxt (Nitro)
// nuxt.config.ts
export default defineNuxtConfig({
routeRules: {
'/': { swr: 3600 },
'/blog/**': { swr: 300 },
'/products/**': { cache: { maxAge: 600 } },
'/api/cart/**': { cache: false },
},
});
SEO impact: Nitro’s routeRules emit correct Cache-Control headers server-side, including stale-while-revalidate for swr values. This means Googlebot always receives a non-stale response: the SWR window allows background refresh without forcing a bot-visible MISS.
Validation: Fetch /blog/ twice; confirm the second response includes cache-control: public, max-age=300, stale-while-revalidate=300.
Remix
// app/routes/posts.$slug.tsx
import { json } from '@remix-run/node';
export async function loader({ params }: LoaderFunctionArgs) {
const post = await getPost(params.slug);
return json(post, {
headers: {
'Cache-Control': 'public, max-age=60, s-maxage=300, stale-while-revalidate=86400',
},
});
}
SEO impact: Remix loader headers propagate to the document response. Setting both max-age (browser) and s-maxage (CDN) separately lets you serve instant browser navigations without preventing CDN caching for bots.
Validation: Confirm s-maxage=300 appears in curl -I output, then verify CF-Cache-Status: HIT on the second request.
HTTP Headers and CDN Directives Reference
| Header | Required value | Rationale |
|---|---|---|
Cache-Control |
public, s-maxage=N, stale-while-revalidate=M |
s-maxage controls shared/CDN cache TTL; stale-while-revalidate enables background refresh without bot-visible latency |
CDN-Cache-Control |
Same pattern as above | SvelteKit/Fastly-specific override that takes precedence over Cache-Control: private for edge nodes |
Vary |
Accept-Encoding only |
Any additional Vary field (e.g. User-Agent, Cookie) multiplies cache entries and fragments bot delivery |
Cache-Tag |
page-<slug>, section-<slug> |
Enables tag-based purge on CMS publish without a full-zone flush |
Surrogate-Control |
max-age=N |
Fastly-specific TTL directive, stripped before the browser sees the response |
Cache-Control on errors |
no-store |
Prevents 4xx/5xx responses from being cached and served to subsequent bot requests |
Age |
(response, read-only) | Number of seconds the response has been in cache; used to verify TTL and detect stale edge nodes |
Vary Header and Cache Fragmentation
The Vary header tells CDNs which request headers differentiate responses. Every unique combination of Vary field values creates a separate cache entry.
Problematic pattern:
Vary: Accept-Encoding, User-Agent, Cookie
This forces the CDN to store a separate HTML copy for every browser user-agent string and cookie fingerprint. Googlebot’s user-agent string alone produces dozens of variants across different crawl versions (Googlebot/2.1, Googlebot-Image, etc.), effectively fragmenting the cache and causing constant origin misses.
Correct pattern:
Vary: Accept-Encoding
To strip unneeded Vary values in Cloudflare Workers:
// worker.js — strip Vary to only Accept-Encoding
export default {
async fetch(request, env, ctx) {
const response = await fetch(request);
const newHeaders = new Headers(response.headers);
newHeaders.set('Vary', 'Accept-Encoding');
return new Response(response.body, { ...response, headers: newHeaders });
},
};
Vary fragmentation wastes the crawl budget managed in headless setups because every bot hit looks like a unique uncached request to the CDN.
Validation Protocol
Run the following sequence after any change to cache configuration:
# 1. Initial MISS check
curl -sI https://yourdomain.com/target-page \
| grep -iE "cache-control|vary|cf-cache-status|x-cache|age"
# 2. Subsequent HIT check (run immediately after step 1)
curl -sI https://yourdomain.com/target-page \
| grep -iE "cf-cache-status|x-cache|age"
# 3. Bot user-agent check
curl -sI -A "Googlebot/2.1" https://yourdomain.com/target-page \
| grep -iE "cf-cache-status|vary|cache-control"
# 4. Error response check (should never cache)
curl -sI https://yourdomain.com/non-existent-path \
| grep -iE "cache-control|cf-cache-status"
Expected results:
| Check | Expected value |
|---|---|
CF-Cache-Status after first request |
MISS |
CF-Cache-Status after second request |
HIT |
Vary on all cacheable routes |
Accept-Encoding only |
Cache-Control on 404 pages |
no-store |
Age on HIT responses |
Integer between 1 and s-maxage value |
Google Search Console signal: After correcting caching configuration, monitor the “Crawled — currently not indexed” count in GSC’s Pages report. A sustained drop over 2–4 weeks indicates bots are now receiving consistent, cacheable HTML.
Lighthouse CI: Add a resourceSummary assertion in lighthouserc.js to alert when TTFB exceeds 200 ms — a symptom of cache bypass at scale.
Troubleshooting
| Symptom | Root cause | Fix |
|---|---|---|
CF-Cache-Status: BYPASS on all routes |
Framework emitting Set-Cookie or Authorization header in response |
Strip or scope cookies to subpaths; use CDN-Cache-Control to override |
CF-Cache-Status: DYNAMIC |
Route matched a Cloudflare cache rule with “bypass” or no cache-control rule exists | Create a Cache Rule matching the route with TTL = Respect existing headers |
Vary: User-Agent appearing in responses |
Framework middleware reading User-Agent for bot detection and reflecting it in Vary |
Move bot detection to edge worker; never reflect User-Agent in Vary |
| Stale SERP snippets after CMS publish | CDN cache not purged on content update | Connect CMS publish webhook to CDN purge API; use tag-based invalidation |
Cache-Control: no-cache overriding s-maxage |
no-cache forces CDN revalidation on every request, defeating edge caching |
Replace no-cache with stale-while-revalidate for routes that tolerate brief staleness |
404 pages served from cache |
Error responses cached before no-store rule was applied |
Purge error routes explicitly; add CDN rule: status in {4xx, 5xx} → Cache-Control: no-store |
| Hydration mismatch in Next.js App Router | Cached HTML differs from client-rendered tree due to time-sensitive data | Move time-sensitive data to client components with 'use client'; keep cached RSC output stable |
Googlebot receiving MISS on every crawl |
s-maxage set to 0 or missing on key routes |
Audit via curl -I and add s-maxage=300 minimum for all public routes |
FAQ
How does edge caching affect Googlebot’s rendering pipeline?
Googlebot fetches the cached HTML snapshot directly from the nearest edge node. Misconfigured TTLs force the crawler to either receive stale content or trigger origin rate limits — both delay fresh content discovery and degrade ranking velocity for newly published pages.
Should headless API responses be cached at the edge?
Cache public, non-personalised API responses at the edge using s-maxage. Isolate user-specific endpoints with private or no-store directives to prevent cache poisoning and data leakage to other users.
How do I confirm whether the CDN or origin served a page to a crawler?
Inspect X-Cache, CF-Cache-Status, or x-nextjs-cache headers via curl -I or a synthetic crawl tool. A HIT confirms edge delivery. A MISS indicates the request reached the origin server.
What happens when a CDN caches a 404 or 500 response?
Error responses cached at the edge poison SERP indexation: Googlebot receives the error page on every subsequent crawl until the cache entry expires. Set Cache-Control: no-store on all error templates and create a CDN bypass rule for 4xx and 5xx status codes.
Related Pages
- Crawl Budget Impact in Headless — how CDN cache-hit ratios and origin response times directly affect the number of pages Googlebot indexes per day
- ISR vs SSG vs CSR Routing — choosing the rendering strategy that determines which TTL tier each route belongs to
- Framework-Specific Rendering Tradeoffs — per-framework analysis of how Next.js, Nuxt, SvelteKit, and Astro emit cache headers differently
- XML Sitemap Generation for Headless — ensuring freshly purged and re-cached routes are discoverable through an up-to-date sitemap
- Canonical URL Enforcement — preventing duplicate cache entries caused by trailing-slash variants or protocol mismatches reaching the CDN
Part of: Headless Architecture & Rendering Strategy Fundamentals