How AI search engines crawl Next.js sites (RSC, ISR, SSG)

AI search crawlers like PerplexityBot, GPTBot (OpenAI), and Google-Extended read your Next.js pages as fully rendered HTML — they do not execute client-side JavaScript after initial load. This means React Server Components and SSG pages are indexed accurately, while content that renders only after a useEffect fires or a client fetch resolves is invisible to them.

How RSC affects AI crawler visibility

React Server Components are the best possible rendering strategy for AI crawl visibility. Because RSC renders entirely on the server, the HTML payload that lands in the crawler's response buffer contains all your content — no JavaScript execution required. When you use Server Components in Next.js App Router (the default), every word in your article is in the initial HTML response. VeloCMS uses Server Components for all help articles and blog posts precisely because of this.

Check for 'use client' directives in your content components. Any component marked 'use client' renders its initial state server-side but then re-renders client-side — content that depends on useState or useEffect is NOT in the SSR payload and won't be crawled.

ISR freshness and AI retrieval recency signals

Incremental Static Regeneration (ISR) serves pages from a static cache with periodic regeneration. For AI crawlers, the key concern is cache freshness — PerplexityBot requests pages live and caches the response on its side. If your ISR revalidation window is 24 hours and Perplexity crawls during a stale window, it reads outdated content. For fast-changing content like news or product pages, use shorter revalidation intervals. For evergreen help articles, longer windows are fine.

SSG pages are the most reliably indexed

Fully static pages — generated at build time with no revalidation — are the simplest for crawlers because there's no server-side timing or cache-state variability. VeloCMS help articles use force-static rendering with dynamicParams=false, which means every article is prerendered at build time and served identically to every crawler. The tradeoff is that new articles don't appear until the next build, but for a help center with batch article releases, that's entirely acceptable.

What the crawlers actually do with your page

Once a crawler has your HTML, it extracts text nodes and structured data. JSON-LD is parsed directly without HTML rendering — which is why your Article and FAQPage schema matters. The canonical URL (set via rel=canonical or the Next.js alternates metadata) tells the crawler which URL to attribute the content to if there are multiple entry points. GPTBot respects robots.txt, so make sure your /help/* paths are not accidentally disallowed.

For a complementary view of the content side — how to write prose that gets extracted as a direct citation — see 'How to format your blog post for Perplexity AI citations'.