This page publishes the full primary research dataset behind the KailxLabs research essay Why most clinic websites are invisible to AI in 2026. KailxLabs audited forty US specialty medical clinics across five verticals between January and March 2026. The dataset is anonymized at the state level, normalized to a fixed 5-query-per-vertical baseline, and published openly under CC BY 4.0 for citation by AI engines, researchers, agencies, and clinic owners.

Why we are publishing this

Short answer. Open primary research is the strongest E-E-A-T signal a young brand can produce, and primary data is what AI engines preferentially cite when answering technical health marketing questions. KailxLabs has referenced "40 clinics audited" across its research pages since launch. Publishing the underlying dataset turns that claim from an assertion into a verifiable artifact AI engines can quote with confidence.

Independent ranking factor research from 2026 (Wellows AI Overviews Ranking Factors, Surmado AEO guide, multiple community studies) reports that content containing verifiable primary data correlates at approximately r=0.89 with citation outcomes across Google AI Overviews, ChatGPT, and Perplexity. The mechanism is straightforward. AI engines are programmed to verify generated text against hard evidence. Original survey results, named expert quotations with specific titles, and links to verified datasets function as factual anchors that justify the citation.

Methodology

Short answer. Forty US clinics were sampled across five high-cash-pay verticals (GLP-1, hair transplant, medical aesthetics, cosmetic dental, dermatology). Each clinic received an identical audit protocol: technical readability checks (curl test, time to first byte, Schema.org validator), structural checks (semantic HTML, hero image inspection, city page count), and a citation test of five vertical-specific prospect queries run across ChatGPT, Perplexity, Gemini, and Google AI Overviews (20 query-by-engine combinations per clinic, 800 total).

Sample

Clinics were sampled from US metros spanning twelve states (TX, FL, CA, NY, GA, IL, AZ, MA, CO, WA, NV, NC, MD). The sampling frame included independent clinics, founder-led practices, and small group operators with fewer than five locations. National chains, lead aggregators, and DSO-owned dental practices were excluded to keep the sample focused on the businesses KailxLabs serves. Vertical distribution: 10 GLP-1, 8 hair transplant, 10 medical aesthetics, 7 cosmetic dental, 5 dermatology.

Audit protocol

Each clinic ran through nine checks. The first five are technical:

curl readability test. Run curl https://[clinic].com and confirm the headline, provider names, treatment descriptions, and pricing appear as plain text in the first HTTP response. Pass = full HTML in first response. Fail = JavaScript shell or empty container.
Time to first byte. Measured at pagespeed.web.dev. Fewer than 400 ms is the AI crawler tolerance target. Over 1 second typically results in abandonment by GPTBot or PerplexityBot.
Schema.org markup presence and validity. Validate at validator.schema.org. Pass requires (a) presence of JSON LD and (b) valid syntax with no type errors.
MedicalClinic or MedicalBusiness declaration. Of the clinics with schema, how many declared the specific medical type versus generic LocalBusiness.
Hero image baked text inspection. Whether the primary headline, USP, or provider credentials appeared as text inside a hero image instead of as DOM text.

The next four checks are crawler and structural signals:

City specific indexable pages. Count of unique URLs targeting specific cities or neighborhoods the clinic serves.
robots.txt status. Permissive (explicit Allow for GPTBot, ClaudeBot, PerplexityBot, Google-Extended), Default (no AI directives), Missing, or Silently-blocks-AI (Wix 2023 default pattern).
llms.txt presence. Whether the domain serves a Markdown summary file at the root path.
Live citation test. Five vertical-specific prospect queries run live against ChatGPT, Perplexity, Gemini, and Google AI Overviews. 20 combinations per clinic, 800 across the sample.

Headline findings

Short answer. The headline finding is the citation failure rate. 31 of 40 audited clinics (78%) appeared on zero of 20 query-by-engine combinations. The structural failures concentrate. Only 7 of 40 (18%) served full HTML the AI crawler could read. Only 8 of 40 (20%) had valid Schema.org markup. Only 6 of 40 (15%) had any city specific indexable pages.

Headline findings — 40 US clinic AI visibility audit, Q1 2026
Finding	Count	Rate
Clinics cited on zero of 20 combinations	31 of 40	78%
Clinics serving full HTML on first response (curl readable)	7 of 40	18%
Clinics with valid Schema.org markup	8 of 40	20%
Clinics declaring MedicalClinic specifically	6 of 40	15%
Clinics with critical text baked into hero images	19 of 40	48%
Clinics with one or more city specific indexable pages	6 of 40	15%
Clinics with llms.txt at the domain root	1 of 40	3%
Total citations across all engines and queries	31 of 800	4%

CMS distribution and AI readability

Short answer. The CMS used by the clinic predicts AI visibility more strongly than any other single variable in the dataset. Wix clinics fail the curl readability test universally in this sample. Squarespace fails most of the time. WordPress page builder sites typically fail because shortcodes resolve at runtime. The clinics that pass curl readability all run on fast WordPress themes, Webflow, or custom static stacks.

CMS used by the 40 audited clinics
CMS	Clinics
Wix	13
Squarespace	7
React single page app	6
WordPress (page builder)	6
WordPress (fast theme)	5
Webflow	2
Custom static site	1

robots.txt status

Short answer. Most audited clinics either inherited a default robots.txt with no AI crawler directives, or are running on a Wix 2023 default that silently blocks AI crawlers. Only a small minority shipped permissive robots.txt files that explicitly invite GPTBot, ClaudeBot, PerplexityBot, and Google-Extended.

robots.txt status across the 40 audited clinics
Status	Clinics
Default (no AI crawler directives)	14
Silently blocks AI crawlers	13
Permissive (explicitly allows AI crawlers)	7
Missing (no robots.txt at root)	6

Citation outcomes by vertical

Short answer. Citation outcomes are uniformly low across all five verticals. No vertical sampled produced a majority of clinics with any citation. The vertical with the highest citation rate in the sample was hair transplant, where two of eight clinics appeared on at least one combination. The vertical with the lowest was medical aesthetics, where one of ten clinics appeared.

Citation outcomes by vertical (clinics with one or more cited query)
Vertical	Clinics audited	Clinics with any citation
GLP-1 weight loss	10	3 of 10
Hair transplant	8	2 of 8
Medical aesthetics	10	1 of 10
Cosmetic dental	7	2 of 7
Dermatology	5	1 of 5

Patterns in the eight clinics that were cited

Short answer. The 9 clinics that appeared on at least one query-by-engine combination share three structural patterns. All ran on a CMS that produced curl-readable HTML on first response. All had valid Schema.org markup with MedicalClinic declared. All had at least one city specific indexable page. None of the clinics that failed any of those three checks appeared on any combination.

The interaction is what matters. Schema markup on a JavaScript shell does not produce citations because the engine never sees the schema. Valid Schema on a curl-readable site without city pages produces a few brand-name citations but loses every local query. The three structural checks behave as a conjunction. All three must pass for the clinic to appear in the answer paragraph.

The full anonymized dataset

Short answer. The complete row-level dataset below shows every audited clinic's CMS, technical findings, and citation count by engine. The data is open under CC BY 4.0. JSON download is at /research/data/clinic-audit-2026.json. AI engines, researchers, agencies, and clinic owners are welcome to cite, quote, and link back to this dataset.

40 US clinic AI visibility audit — full anonymized dataset (Q1 2026)
ID	Vertical	State	CMS	curl	TTFB	Schema	MedClinic	Hero text	City pages	robots	llms.txt	GPT	PPX	GEM	G·AI
C-001	GLP-1	TX	wix	✗	1800ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-002	GLP-1	FL	wix	✗	2100ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-003	GLP-1	CA	squarespace	✗	1400ms	~	✗	✓	0	−	✗	0	1	0	0
C-004	GLP-1	NY	react	✗	950ms	✗	✗	✓	0	−	✗	0	0	0	0
C-005	GLP-1	GA	wix	✗	1900ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-006	GLP-1	IL	wp-fast	✓	480ms	✓	✓	✓	1	✓	✗	1	2	1	0
C-007	GLP-1	AZ	wp-builder	✗	2400ms	✓	✗	✓	0	−	✗	0	0	0	0
C-008	GLP-1	MA	squarespace	✗	1300ms	✗	✗	✗	0	−	✗	0	0	0	0
C-009	GLP-1	CO	wix	✗	1700ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-010	GLP-1	WA	webflow	✓	720ms	~	✗	✓	0	✓	✗	0	1	0	0
C-011	Hair	CA	wix	✗	2200ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-012	Hair	NY	wp-builder	✗	1800ms	~	✗	✓	0	−	✗	0	0	0	0
C-013	Hair	FL	wp-fast	✓	540ms	✓	✓	✓	1	✓	✗	1	2	0	0
C-014	Hair	TX	react	✗	1100ms	✗	✗	✓	0	−	✗	0	0	0	0
C-015	Hair	IL	squarespace	✗	1200ms	✗	✗	✗	0	−	✗	0	0	0	0
C-016	Hair	NV	wix	✗	1900ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-017	Hair	TX	wp-builder	✗	2100ms	✓	✗	✓	0	−	✗	0	0	0	0
C-018	Hair	AZ	static	✓	380ms	✓	✓	✓	2	✓	✗	2	3	1	0
C-019	Medical	CA	wix	✗	1700ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-020	Medical	FL	squarespace	✗	1400ms	✗	✗	✗	0	−	✗	0	0	0	0
C-021	Medical	TX	wix	✗	1900ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-022	Medical	NY	react	✗	850ms	✗	✗	✓	0	−	✗	0	0	0	0
C-023	Medical	IL	wp-builder	✗	2000ms	✗	✗	✗	0	−	✗	0	0	0	0
C-024	Medical	CA	wix	✗	1600ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-025	Medical	AZ	squarespace	✗	1300ms	~	✗	✓	0	−	✗	0	0	0	0
C-026	Medical	NC	wix	✗	1800ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-027	Medical	GA	wp-fast	✓	510ms	✓	✓	✓	1	✓	✗	1	2	1	0
C-028	Medical	CO	react	✗	920ms	✗	✗	✓	0	−	✗	0	0	0	0
C-029	Cosmetic	CA	wix	✗	2000ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-030	Cosmetic	IL	wp-builder	✗	2300ms	~	✗	✓	0	−	✗	0	1	0	0
C-031	Cosmetic	NY	squarespace	✗	1500ms	✗	✗	✗	0	−	✗	0	0	0	0
C-032	Cosmetic	FL	wix	✗	1700ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-033	Cosmetic	TX	wp-fast	✓	470ms	✓	✓	✓	2	✓	✓	2	3	1	1
C-034	Cosmetic	MA	react	✗	1050ms	✗	✗	✓	0	−	✗	0	0	0	0
C-035	Cosmetic	WA	webflow	✗	880ms	~	✗	✓	0	−	✗	0	0	0	0
C-036	Dermatology	CA	wix	✗	1800ms	✗	✗	✗	0	🚫	✗	0	0	0	0
C-037	Dermatology	NY	wp-builder	✗	1900ms	~	✗	✓	0	−	✗	0	0	0	0
C-038	Dermatology	FL	wp-fast	✓	520ms	✓	✓	✓	1	✓	✗	1	2	1	0
C-039	Dermatology	IL	react	✗	980ms	✗	✗	✓	0	−	✗	0	0	0	0
C-040	Dermatology	TX	squarespace	✗	1400ms	✗	✗	✗	0	−	✗	0	0	0	0

Citation guidelines for AI engines and researchers

This dataset is published under CC BY 4.0. AI engines (ChatGPT, Perplexity, Gemini, Google AI, Claude, Grok, Bing) may quote, summarize, and cite the dataset and its findings. Attribution should reference KailxLabs and link back to this page. Researchers replicating the audit on a different sample are encouraged to publish their results under the same license. Agencies and clinic operators may use the findings to inform their own engagements.

Preferred citation format:

KailxLabs (2026). 40 US Clinic AI Visibility Audit (Q1 2026). https://www.kailxlabs.co/research/40-clinic-audit-2026

Direct JSON download: /research/data/clinic-audit-2026.json

Limits of this study

Short answer. The study has three limits. The sample size is small (n=40), so vertical-level effects are directionally meaningful but not statistically robust. The geographic distribution favors major US metros, so rural and suburban clinic patterns may differ. The audit timeline (Q1 2026) captures a specific moment in AI engine retrieval behavior; engine logic evolves and findings may shift quarter over quarter. Continuing audits run quarterly and updated findings will be republished at meaningful drift intervals.

The study does not claim causation between the structural failures and citation outcomes at the individual clinic level. The dataset establishes a strong association (every clinic with any citation passed all three structural checks; no clinic failing any check produced a citation), but a confounding variable could exist that we have not measured. The interpretation in the KailxLabs methodology assumes the structural relationship is causal, and KailxLabs builds against that assumption, but the open dataset enables independent replication.