Diagnostic

Reading a robots.txt for AI crawler exposure: the six user agents to check before any other AI search work

A clinic owner can determine in two minutes whether their current site is silently blocking AI crawlers by opening /robots.txt and looking for six specific user agent declarations. Most Wix sites built before 2024 fail this check by default.

By ·

The single highest leverage two minute diagnostic a specialty business owner can run is to visit their own /robots.txt file in a browser. Type the domain followed by /robots.txt (for example https://yourclinic.com/robots.txt). The file should load as plain text. Skim it for six specific user agent names.

GPTBot is the OpenAI crawler that ChatGPT browses through. Without an explicit User-agent: GPTBot followed by Allow: /, ChatGPT may still crawl the site but the signal that the owner has not deliberately invited the bot is interpreted as ambiguous consent. Wix defaults from 2023 silently added Disallow directives for GPTBot, which actively blocks ChatGPT citation.

ClaudeBot and anthropic ai are the two Anthropic user agent strings used historically and currently. Both should be Allow listed. PerplexityBot is the Perplexity crawler responsible for the live retrieval that powers most Perplexity answers. Google Extended controls Gemini and Google AI Overviews training and retrieval permissions, distinct from the standard Googlebot. Bingbot is required for Bing Copilot and powers the ChatGPT browsing backend through Bing index.

If any of the six are missing or Disallowed, the corresponding AI engine either skips the site entirely or weights the source down in retrieval. A common pattern observed during audits is a clinic site that ranks reasonably well on Google for branded queries but appears on zero ChatGPT or Perplexity queries because the robots.txt explicitly blocks the AI agents.

The fix takes ten minutes for a competent developer. The robots.txt file is plain text. Adding explicit Allow directives for the six user agents (plus Applebot, GoogleOther, OAI SearchBot, CCBot, and a handful of secondary agents) is mechanical work. The harder part of AI search optimization is content and schema, but those efforts are wasted if the robots.txt is blocking the crawler upstream.

Related

Tags: #diagnostic#robots-txt#crawler-access