Understanding the four AI preference categories defined by the IETF AIPREF standard
The AIPREF standard defines four preference categories that allow content owners to express granular control over how automated systems and AI models use their content. Each category targets a specific type of usage, enabling precise permission management.
The bots
preference applies to general automated processing and analysis of content by non-human agents. This is the broadest category and encompasses any automated system that accesses and processes your content.
Set bots=n
to prevent automated crawling while still allowing search indexing and AI training through more specific preferences. Set bots=y
to permit general automated access.
The train-ai
preference controls whether your content can be used to train machine learning and AI models. This applies to all types of AI training, including both generative and non-generative models.
Set train-ai=n
to opt out of all AI training. This is the most common preference for protecting proprietary content or creative works. Note that train-genai
provides more specific control for generative AI.
The train-genai
preference is a more specific control that applies exclusively to training models that generate synthetic content. This preference takes precedence over the broader train-ai
category for generative AI use cases.
The train-genai
preference is more specific than train-ai
. According to AIPREF conflict resolution rules, more specific preferences override general ones:
train-ai=y, train-genai=n
means non-generative AI training is allowed, but generative AI training is nottrain-ai=n, train-genai=y
means generative AI training is allowed, but other AI training is not (uncommon scenario)Many content creators set train-genai=n
to specifically prevent their creative works from being used in generative AI systems while potentially allowing other forms of AI training (analytics, recommendations, etc.).
The search
preference controls whether search engines can index your content for search results that direct users back to your original content. This is distinct from AI training because search engines return users to the source.
Search indexing is categorized separately because it creates a fundamentally different value exchange. Search engines drive traffic to your content rather than replacing it. Most content owners want search indexing (search=y
) even when blocking AI training.
Most public websites set search=y
to allow search engine indexing. Set search=n
for private content, paywalled sections, or internal documentation that should not appear in search results.
Each preference category supports three possible states:
y
AllowExplicit permission granted. The content owner allows this type of usage.
n
DisallowExplicit denial. The content owner does not permit this type of usage.
(omitted)
UnstatedNo explicit preference. The content owner has not stated a preference for this category. Systems should apply their default behavior or defer to other policies.