Website Spec
← SEO
Required

Meta robots and X-Robots-Tag

Every page must have an explicit, correct indexing policy — either implicit (default index, follow) on public pages, or an explicit noindex / X-Robots-Tag on staging, admin, thin, or private content. Get this wrong and you either disappear from search or expose what you didn't mean to.

What it is

The robots meta tag and the equivalent X-Robots-Tag HTTP header tell search engines and other compliant crawlers whether they may index a page, follow its links, and how to render it in results. Together they are the per-page complement to robots.txt — robots.txt controls crawling, robots meta controls indexing.

<meta name="robots" content="index, follow">
<meta name="robots" content="noindex, nofollow">
<meta name="robots" content="noindex, max-image-preview:large">
X-Robots-Tag: noindex, nofollow
X-Robots-Tag: googlebot: noindex

The HTTP header form is the only way to control indexing for non-HTML resources — PDFs, images, JSON endpoints, downloads.

Why it matters

Every URL a search engine fetches has a policy, whether you set one or not. The default is index, follow — every fetched page becomes a candidate for the index, and every link is a discovery hop. That is right for your homepage and your articles; it is wrong for:

Public pages are required to be indexable; non-public pages are required to be non-indexable. Either way, the policy must be explicit and correct.

How to implement

Add the meta tag in <head> for every HTML page that needs a non-default policy:

<meta name="robots" content="noindex, follow">

follow keeps link equity flowing to the destinations even when the host page is not indexed. Pair it with noindex for staging, internal search, faceted listings.

Use directives precisely. The most useful ones, with the values search engines respect:

| Directive | Effect |

|---|---|

| index / noindex | Allow / disallow indexing of this page |

| follow / nofollow | Allow / disallow following links on this page |

| noarchive | Don't show a cached copy in results |

| nosnippet | Don't show a text snippet in results |

| max-snippet:[n] | Cap the snippet at n characters; -1 means no limit |

| max-image-preview:[none\|standard\|large] | Control image-preview size |

| max-video-preview:[n] | Cap video preview at n seconds |

| noimageindex | Don't index images on this page |

| unavailable_after:[date] | Drop from the index after this RFC 822 / ISO 8601 date |

Target a specific crawler when you need to. Replace robots with googlebot, bingbot, applebot, or another user-agent token. A robots directive applies to all; a named-bot directive overrides it for that bot.

<meta name="robots" content="noindex">
<meta name="googlebot-news" content="noindex, nofollow">

Use X-Robots-Tag for non-HTML. Cloudflare, Nginx, Apache, and CDN edge config can all set it per-path. Required for PDFs you don't want indexed.

X-Robots-Tag: noindex, nofollow

Don't combine robots.txt: Disallow with noindex. If a URL is disallowed in robots.txt, crawlers never fetch it — and therefore never see the noindex. The page can still be indexed from external links, with no snippet. To truly de-index, allow crawl and serve noindex.

Default policy is implicit. A public page with no robots meta tag is index, follow. You do not need to add <meta name="robots" content="index, follow"> — it's the default, and shipping it on every page is noise. Add the tag only when the policy differs.

Common mistakes

Verification

Sources