Website Spec
← SEO
Recommended

robots.txt

A plain-text file at the site root that tells crawlers which paths they may or may not fetch. Standardised in RFC 9309 and supported by every major search engine.

What it is

robots.txt is a plain-text file served at the root of a host that tells automated crawlers which URL paths they are allowed to fetch. The format was finalised as a standard in 2022 as RFC 9309.

It must be reachable at exactly /robots.txt on the host, served as text/plain, and returned with a 200 OK. A 404 is interpreted as "no restrictions"; a 5xx is interpreted as "crawl nothing" by most crawlers.

User-agent: *
Disallow: /admin/
Disallow: /cart
Allow: /admin/public-policy

Sitemap: https://example.com/sitemap.xml

Why it matters

It is not a security mechanism. Anything you do not want public must be behind authentication. Disallowed URLs can still appear in search results if they are linked from elsewhere.

How to implement

Group rules by user-agent. The most specific user-agent wins per RFC 9309.

User-agent: Googlebot
Disallow: /private/

User-agent: *
Disallow: /admin/
Allow: /

Sitemap: https://example.com/sitemap_index.xml

Rules to follow:

Common mistakes

Verification

Sources