AIPREF
AIPREF Generator
Control AI usage of your content
๐Ÿ“š

AIPREF Vocabulary

Understanding the four AI preference categories defined by the IETF AIPREF standard

Preference Categories

The AIPREF standard defines four preference categories that allow content owners to express granular control over how automated systems and AI models use their content. Each category targets a specific type of usage, enabling precise permission management.

bots

Automated Processing

The bots preference applies to general automated processing and analysis of content by non-human agents. This is the broadest category and encompasses any automated system that accesses and processes your content.

What It Covers

  • Web crawlers and spiders that scan and index content
  • Automated analysis tools that extract data or metadata
  • Bots that monitor for changes or updates
  • Automated content aggregation systems
  • Any non-search, non-training automated processing

Common Use Cases

Set bots=n to prevent automated crawling while still allowing search indexing and AI training through more specific preferences. Set bots=y to permit general automated access.

train-ai

AI Model Training

The train-ai preference controls whether your content can be used to train machine learning and AI models. This applies to all types of AI training, including both generative and non-generative models.

What It Covers

  • Training data collection for machine learning models
  • Fine-tuning existing AI models with your content
  • Building training datasets that include your content
  • Both generative AI (like LLMs) and non-generative AI (like classifiers)
  • Transfer learning and model adaptation using your content

Common Use Cases

Set train-ai=n to opt out of all AI training. This is the most common preference for protecting proprietary content or creative works. Note that train-genai provides more specific control for generative AI.

train-genai

Generative AI Training

The train-genai preference is a more specific control that applies exclusively to training models that generate synthetic content. This preference takes precedence over the broader train-ai category for generative AI use cases.

What It Covers

  • Large Language Models (LLMs) like GPT, Claude, LLaMA
  • Image generation models like DALL-E, Stable Diffusion, Midjourney
  • Code generation models like GitHub Copilot, Amazon CodeWhisperer
  • Audio and video synthesis models
  • Any AI system designed to create new synthetic content

Relationship with train-ai

The train-genai preference is more specific than train-ai. According to AIPREF conflict resolution rules, more specific preferences override general ones:

  • train-ai=y, train-genai=n means non-generative AI training is allowed, but generative AI training is not
  • train-ai=n, train-genai=y means generative AI training is allowed, but other AI training is not (uncommon scenario)

Common Use Cases

Many content creators set train-genai=n to specifically prevent their creative works from being used in generative AI systems while potentially allowing other forms of AI training (analytics, recommendations, etc.).

search

Search Indexing

The search preference controls whether search engines can index your content for search results that direct users back to your original content. This is distinct from AI training because search engines return users to the source.

What It Covers

  • Traditional web search engines (Google, Bing, DuckDuckGo)
  • Site-specific search functionality
  • Search engines that provide links back to your content
  • Discovery systems that help users find your content
  • Any system that indexes for the purpose of directing traffic to you

Why It Is Separate

Search indexing is categorized separately because it creates a fundamentally different value exchange. Search engines drive traffic to your content rather than replacing it. Most content owners want search indexing (search=y) even when blocking AI training.

Common Use Cases

Most public websites set search=y to allow search engine indexing. Set search=n for private content, paywalled sections, or internal documentation that should not appear in search results.

Preference States

Each preference category supports three possible states:

yAllow

Explicit permission granted. The content owner allows this type of usage.

nDisallow

Explicit denial. The content owner does not permit this type of usage.

(omitted)Unstated

No explicit preference. The content owner has not stated a preference for this category. Systems should apply their default behavior or defer to other policies.

Quick Reference

bots - Automated processing
train-ai - AI model training (all types)
train-genai - Generative AI training (specific)
search - Search engine indexing

Learn More