Real-world scenarios and practical examples for implementing AI preferences
Open source projects and public documentation often welcome AI training to improve developer tools and code assistants. This configuration maximizes discoverability and utility while maintaining proper attribution.
Content-Usage: bots=y, train-ai=y, train-genai=y, search=y
Subscription-based content, paywalled articles, and premium resources need protection from AI training while remaining discoverable through search engines to attract subscribers.
Content-Usage: train-ai=n, train-genai=n, search=y
This blocks AI training while allowing search indexing. The bots
preference is unstated, allowing general crawling for non-AI purposes.
// middleware.ts export function middleware(request: NextRequest) { const response = NextResponse.next(); // Premium articles - block AI training if (request.nextUrl.pathname.startsWith('/premium')) { response.headers.set( 'Content-Usage', 'train-ai=n, train-genai=n, search=y' ); } return response; }
Artists, photographers, writers, and creators want their work discoverable but protected from generative AI that could create derivative works. This configuration specifically targets generative AI while allowing other uses.
Content-Usage: train-genai=n, search=y
This specifically blocks generative AI training (image generators, text synthesis) while leaving other AI uses unstated. Search indexing remains enabled for discoverability.
Academic institutions and researchers often have nuanced needs depending on publication status, licensing, and institutional policies. Different sections may require different preferences.
Content-Usage: bots=y, train-ai=y, train-genai=y, search=y
Open access publications can allow all AI training to advance scientific discovery and research tools.
Content-Usage: train-ai=n, train-genai=n, search=y
Protect unpublished work while maintaining discoverability through academic search engines.
Content-Usage: bots=n, train-ai=n, train-genai=n, search=n
Block all automated access to proprietary research data and datasets.
Online stores need product pages discoverable through search while protecting proprietary product descriptions, pricing strategies, and customer reviews from AI scraping.
Content-Usage: train-ai=n, train-genai=n, search=y
Allow search indexing for product discovery while protecting unique descriptions and reviews.
Content-Usage: bots=n, train-ai=n, train-genai=n, search=n
Completely block automated access to sensitive business data like real-time pricing.
User-Agent: * Allow: /products/ Content-Usage: train-ai=n, train-genai=n, search=y User-Agent: * Disallow: /api/ Content-Usage: bots=n, train-ai=n, train-genai=n, search=n User-Agent: * Disallow: /checkout/ Content-Usage: bots=n, train-ai=n, train-genai=n, search=n
News organizations want articles discoverable through search and news aggregators while protecting original reporting from AI summarization that could reduce direct readership.
Content-Usage: train-genai=n, search=y
Prevent AI from generating summaries while allowing search indexing and general AI training for fact-checking models.
Internal tools, admin panels, and private documentation should block all automated access including search indexing.
Content-Usage: bots=n, train-ai=n, train-genai=n, search=n
bots=y, train-ai=y, train-genai=y, search=y
train-ai=n, train-genai=n, search=y
train-genai=n, search=y
bots=n, train-ai=n, train-genai=n, search=n