How does the AI Crawler Access (llms.txt) work?

Ensure AI platforms can discover, crawl, and understand your store content through proper crawler access configuration.

Written By Tom van den Heuvel

Last updated 6 months ago

What is llms.txt?

llms.txt is a standardized file that provides AI crawlers with a directory of your most important content. It serves as a roadmap for AI platforms, highlighting:

  • Your most valuable products and collections

  • Key business pages (shipping, returns, FAQ)

  • Content priorities and categories

  • Store information and contact details

Why AI Crawler Access Matters

Discoverability: Help AI platforms find your best content
Prioritization: Guide crawlers to your most important pages
Efficiency: Reduce crawler resources while maximizing coverage Control: Specify what content you want AI platforms to focus on

llms.txt Configuration

Accessing the Settings

  1. Go to Settings > AI Crawler Access

  2. Review the llms.txt Directory Configuration section

  3. Note the auto-sync status and last update timestamp

Existing Settings Integration

The system automatically pulls information from:

  • Schema settings (business info, contact details)

  • Brand profile (category, voice, language)

  • Store configuration (currency, policies)

Priority Content Selection

Priority Products (Max 5): Choose your best-performing or most representative products:

  • Best sellers with strong reviews

  • Flagship or signature products

  • Products with comprehensive descriptions

  • Items with competitive advantages

Priority Collections (Max 3): Select collections that represent your brand:

  • Main product categories

  • Seasonal or featured collections

  • Best-selling product groups

Key Pages: Add important informational pages:

  • Shipping and delivery information

  • Returns and exchange policies

  • Size guides and fitting information

  • FAQ and customer support

  • About us and brand story

Auto-Generation Features

Automatic Updates:

  • Regenerates when products or collections change

  • Updates when business information is modified

  • Refreshes schema settings integration

  • Maintains current timestamp

Content Validation:

  • Ensures all links are accessible

  • Validates product and collection availability

  • Checks page existence and accessibility

  • Removes broken or outdated links

Crawler Permission Management

Supported AI Crawlers

ChatGPT:

  • User-agent: GPTBot

  • Crawling behavior: Comprehensive content analysis

  • Update frequency: Regular crawling cycles

Claude:

  • User-agent: anthropic-ai

  • Crawling patterns: Focused content extraction

  • Processing: Text-heavy content analysis

Perplexity:

  • User-agent: PerplexityBot

  • Methodology: Real-time content access

  • Focus: Current information and availability

Google Gemini:

  • User-agent: GoogleOther

  • Integration: Google ecosystem alignment

  • Scope: Comprehensive site understanding

Meta AI:

  • User-agent: FacebookBot

  • Social integration: Profile and product linking

  • Platform: Instagram and Facebook integration

Bing Copilot:

  • User-agent: BingBot

  • Microsoft integration: Office and browser compatibility

  • Scope: Productivity-focused content access

All other AI Crawlers:

  • Allowed by default

Access Status Monitoring

The crawler access section shows individual crawler permissions:

Allowed Status: Crawler has full access to your content
Blocked Status: Crawler is restricted from accessing content
Limited Status: Partial access with specific restrictions
Unknown Status: Crawler permission unclear or not configured

llms.txt Content Structure

The generated file includes:

Header Information

# LLM Crawler Guidelines # Store: [Your Store Name] # Category: [Primary Business Category] # Language: [Content Language] # Updated: [Current Date]

Priority Content Sections

Allow: /products/* Allow: /collections/* Allow: /pages/*  Priority-Pages: - /products/[priority-product-1] - /products/[priority-product-2] - /collections/[priority-collection-1] - /pages/shipping - /pages/returns

Business Context

Store-Description: [Business description from schema] Contact: [Customer service email] Brand-Voice: [Configured brand tone] Primary-Category: [Product category]

Best Practices

Content Selection:

  1. Choose products with detailed, accurate descriptions

  2. Include items with competitive advantages

  3. Select collections representing your brand range

  4. Add pages that build trust and authority

Update Frequency:

  1. Review priority selections monthly

  2. Update when launching new key products

  3. Refresh seasonal or promotional priorities

  4. Remove discontinued or outdated items

Monitoring and Optimization:

  1. Check crawler access status weekly

  2. Monitor which content gets crawled most

  3. Adjust priorities based on AI mention performance

  4. Validate that priority content drives results