Website Spec
← Agent Readiness
Optional

Content Signals in robots.txt

Add Content-Signal directives to robots.txt to declare whether AI crawlers may search, ingest, or train on your content. An emerging IETF AI Preferences / IAB Tech Lab proposal that some validators already check.

What it is

Content Signals is a proposed extension to robots.txt that adds new directives expressing how a site wants its content treated downstream — specifically by AI systems. The directives live in normal robots.txt groups and declare boolean preferences such as "you may use this for search" or "you may not train on this".

User-agent: *
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=no

The three values that are emerging as canonical:

Each takes yes or no. Multiple values are comma-separated on a single Content-Signal: line.

Status of the proposal

This is not yet a settled standard. The work is split across two ongoing efforts:

Drafts have been circulating since 2024 and the vocabulary is converging. Treat Content Signals as recommended-to-experiment-with, not as a finalised standard. The directive will be ignored by every crawler that does not yet parse it — which today is most of them.

Why it matters

How to implement

Per-group, in robots.txt. Place the Content-Signal: line inside the same group as User-agent: and Allow: / Disallow:.

User-agent: *
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=yes

The example above says: "any crawler may use this content for search, for AI input, and for AI training." That is the right declaration for a public spec that wants to be readable.

Different signals per crawler if your policy varies. Use a targeted group:

User-agent: GPTBot
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=no

User-agent: *
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=yes

Pair with crawler-specific blocks where the answer is "no". Content Signals is a hint; many crawlers still only obey Disallow:. A Content-Signal: ai-train=no paired with User-agent: GPTBot \n Disallow: / is stronger than either alone.

Don't treat it as legal force. It is a declaration. Compliance is voluntary, and the legal status of "you used my content for training despite my Content-Signal" is still developing.

Common mistakes

Verification

Sources