SEO
Search visibility — robots.txt, sitemaps, canonicals, structured data.
- Recommended robots.txt — A plain-text file at the site root that tells crawlers which paths they may or may not fetch. Standardised in RFC 9309 and supported by every major search engine.
- Recommended XML sitemaps — An XML file listing the canonical URLs of a site, with optional metadata about when each was last changed. The fastest way to tell a search engine what exists.
- Recommended Sitemap index files — A sitemap of sitemaps. Used when a site has more than 50,000 URLs or wants to split sitemaps by content type for cleaner reporting.
- Optional Image and video sitemap extensions — Optional XML extensions that add image and video metadata to sitemap entries. Useful when media is loaded by JavaScript or hosted on a CDN that crawlers cannot reach by following links.
- Recommended URL structure — URLs are the most stable identifier on the web. Keep them lowercase, hyphenated, descriptive, and shallow. Treat them as a public API for your content.
- Required Redirects (301/302/308) — HTTP redirects send a client from one URL to another. Use 301 or 308 for permanent moves, 302 or 307 for temporary ones, and never chain more than necessary.
- Avoid Soft 404s — A page that looks like a 'not found' message to a user but returns 200 OK to a crawler. Search engines treat soft 404s as a quality problem and often refuse to index them.
- Required Meta robots and X-Robots-Tag — Every page must have an explicit, correct indexing policy — either implicit (default index, follow) on public pages, or an explicit noindex / X-Robots-Tag on staging, admin, thin, or private content. Get this wrong and you either disappear from search or expose what you didn't mean to.
- Required Heading hierarchy — Headings describe the sections of a page. They must form a nested outline, never be used for visual styling alone, and never skip levels.
- Recommended Internal linking — Links from one page on a site to another. The strongest signal you control for telling crawlers and AI agents what a page is about and how important it is.
- Recommended Structured data (JSON-LD) — Machine-readable annotations that describe the content of a page using the schema.org vocabulary. JSON-LD is the format search engines and AI agents expect.
- Recommended Breadcrumbs — A short trail showing the page's position in the site hierarchy. Visible in the UI for users, marked up as BreadcrumbList JSON-LD for search engines.
- Optional IndexNow — An open protocol for telling participating search engines that a URL has changed. One HTTP request pushes Bing, Yandex, Naver, and Seznam to recrawl — Google does not participate.