How to get cited in ChatGPT
The engineering protocol for getting a specialty business named by ChatGPT when prospects ask buying-intent questions. Six structural changes, ranked by impact, with the technical specifications each one requires.
What ChatGPT actually does at retrieval time
When a user types a query into ChatGPT, the model does not crawl the web in real time. It runs a retrieval step against an existing index, ranks candidate sources by extractability and relevance, and synthesizes an answer paragraph. The names that appear in that answer paragraph are the businesses ChatGPT decided it could quote with confidence.
If your business is not in the index, or is in the index but cannot be extracted cleanly, ChatGPT will not name you. The six requirements below are what move a website from "not in the index" to "extracted, ranked, and named in the answer paragraph."
1. Server rendered HTML the crawler can read without JavaScript
This is the single biggest unlock. The OpenAI crawler (GPTBot) fetches the HTML at a URL and parses what comes back. It does not execute JavaScript in the retrieval step. If your site is built on Wix, Squarespace, or a client-side React framework, the crawler often receives an empty container and moves on without indexing the content.
Technical specification: the page must render its complete content in the initial HTML response. The acceptable architectures are static site generators (Astro, Eleventy, Next.js with SSG), server rendered frameworks (Next.js with SSR, Astro with SSR, Remix), or plain HTML. The unacceptable architectures are client side React, Wix, Squarespace without their server side rendering option, and any single page application that fetches content via JavaScript after page load.
To verify your site: run curl -A "Mozilla/5.0 (compatible; GPTBot/1.0)" https://your-site.com from a terminal. The response should contain your pricing, services, providers, and treatments as readable text. If you only see container divs waiting to be populated, the site is invisible to ChatGPT.
2. Complete Schema.org @graph with linked entities
Schema.org structured data tells the model what entities live on the page. Without it, the model has to infer from prose. A complete @graph names every relevant entity, threads them together through @id references, and gives the model a clean entity map to walk.
For a specialty practice, the minimum graph includes:
- The primary business entity (MedicalClinic for clinics, LegalService for law firms, HomeAndConstructionBusiness for contractors) with @id, name, address, telephone, hours.
- Each Physician, Attorney, or named provider as a Person entity with @id, hasCredential references, and worksFor pointing back to the business.
- Each Service the business offers with @id, name, description, and provider linked to the business.
- For medical: Drug or MedicalProcedure entities for each named treatment, with mechanismOfAction and indication.
- Offer entities for pricing with @id, price, priceCurrency, and itemOffered linked to the relevant Service.
- FAQPage with Question and Answer entities for the top buying-intent questions.
- BreadcrumbList for site navigation.
- Speakable specifications pointing to the CSS selectors of the answer capsule and TL;DR.
Ship this as one JSON-LD block in the page head, not as multiple separate scripts. The model parses the @graph holistically.
3. Answer paragraphs at the top of every page
The model quotes the first substantive paragraph on a page more often than any other paragraph. If your page opens with marketing flourish ("Welcome to our state-of-the-art clinic..."), the model has nothing to quote. If your page opens with a fact-dense direct answer ("LeanCare Wellness is a GLP-1 clinic in Austin, Texas. Medically supervised semaglutide is $299 per month with monthly check-ins and quarterly labs..."), the model can quote it verbatim.
The format is: 40 to 60 words, plain declarative sentences, named entities, specific facts (price, location, treatment name, eligibility criteria), no qualifiers or marketing language. This is what KailxLabs calls the Answer Capsule.
Place the answer capsule immediately after the H1 of every page that targets a query intent. The model treats top-of-page content as the canonical summary of what the page is about.
4. llms.txt at the domain root
llms.txt is an emerging convention for telling language models about your site in a format they understand. The file lives at https://your-site.com/llms.txt, contains markdown formatted summary content under 3,000 tokens, and lists the high-priority pages the model should index.
A complete llms.txt for a specialty practice includes:
- Business name, location, and primary specialty in the first paragraph.
- List of services with brief descriptions.
- List of providers with credentials.
- Pricing summary.
- Eligibility criteria for each service.
- Links to the most important pages on the site (treatment pages, pricing, about, contact).
- List of external corroborating sources (industry association memberships, trade press, Reddit threads naming the business).
The file is read primarily by OpenAI's training and retrieval systems, Anthropic's Claude, and emerging crawlers from Mistral, Perplexity, and others. It supplements the website itself; it does not replace it.
5. AI crawler permissions in robots.txt
Your robots.txt must explicitly allow the AI crawlers you want indexed by. The defaults of most CMS platforms do not include AI crawler entries, and some plugins actively block them. Audit your robots.txt and verify the following are explicitly allowed:
- OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User
- Anthropic: ClaudeBot, anthropic-ai, Claude-Web
- Perplexity: PerplexityBot, Perplexity-User
- Google AI: Google-Extended (separate from the regular Googlebot; controls whether your content is used in Gemini and Google AI Overviews)
- Microsoft: Bingbot (used by Bing AI and ChatGPT browse mode)
- Apple: Applebot, Applebot-Extended (the Apple Intelligence training crawler)
- Common Crawl: CCBot (the training data corpus most foundation models reference)
- Meta: Meta-ExternalAgent
Each crawler should have an explicit Allow directive. Wildcard Allow without explicit per-crawler entries sometimes fails because some crawlers prefer to find their own name in the file.
6. Third party corroboration across the web
The model's retrieval ranker rewards businesses that other reputable sites confirm exist. A site that says "we are the best GLP-1 clinic in Austin" but has zero external corroboration is treated as a weaker candidate than a site with the same content but supported by Reddit threads, directory listings, and trade press.
The minimum corroboration set for a specialty practice in 2026:
- At least 3 Reddit threads on subreddits relevant to the vertical where the business is named (not by the business itself) with substantive context.
- At least 5 directory listings on industry-specific directories with consistent NAP (name, address, phone) data matching the website.
- At least 1 trade press mention in a recognizable publication for the vertical (Medical Economics, ABA Journal, Houzz Pro, etc.).
- At least 1 industry association membership listed on the association's website with a backlink.
- Active and consistent Google Business Profile, Apple Maps, and Yelp listings.
Corroboration is the slowest of the six requirements to build, which is why KailxLabs treats it as a Pillar of every engagement: foundation, entity synchronization, high authority digital PR, and subjective proof are the four engineering pillars built across the 45 day window. Citations stabilize as corroboration accumulates.
The order to fix these in
Sequence matters. If you have limited time or budget, the highest-leverage order is:
- Rebuild on server rendered HTML. Nothing else matters if the model cannot read the page. This is the only requirement that cannot be added later as a layer; everything else depends on it.
- Ship complete Schema.org @graph. Adding schema to an already-rendered site is fast. Adding it to a JavaScript-rendered site is wasted effort.
- Rewrite top-of-page content as answer capsules. This is editorial work that pays for itself on the first cited query.
- Add llms.txt, ai.txt, and configure robots.txt. Hours of work, not days.
- Programmatic page expansion. 50 to 200 city and service pages so every query intent has a destination.
- Third party corroboration campaign. Reddit seeding, directory listings, trade press outreach, industry association placements.
The KailxLabs AI Citation Foundation Build delivers steps 1 through 5 in seven days. Step 6 happens across days 8 through 45 and continues optionally under retainer.
The diagnostic that comes first
Before any of this matters, you need to know whether your existing site already partially satisfies these requirements or fails them completely. The free 48 hour AI visibility audit runs the diagnostic and delivers a PDF showing exactly where the gap is. If the gap is small, KailxLabs says so and the engagement does not proceed. If the gap is real, the engagement proposal follows.
Common questions
How long does it take to get cited in ChatGPT?
After a complete AI native rebuild, the first ChatGPT citations typically appear between day 18 and day 25 from launch. Earlier appearances are common in Perplexity (day 14 to 21), and later in Gemini (day 25 to 35) and Google AI Overviews (day 35 to 50).
Can I get cited without rebuilding the website?
In some cases. If the existing site is already server rendered with reasonable HTML, the fix can be content and schema only. Most clinic, law firm, and home services sites are not in that state, which is why a rebuild is typically required. The free 48 hour audit identifies whether the site can be patched or needs a rebuild.
Does writing more blog posts help?
No, not directly. ChatGPT cites websites for entity reasons (who you are, what you do, where you are) more than topical reasons (an article you wrote). Blog content can help by adding answer paragraphs the model can quote, but the foundation has to be entity correct first.
Will allowing GPTBot hurt my SEO?
No. GPTBot is the OpenAI crawler used to keep its retrieval index current. Allowing it does not affect Googlebot, Bingbot, or any traditional search engine. Blocking it actively prevents your business from being cited in ChatGPT.