Glossary

Working definitions for AI search optimization.

The exact terms a clinic owner, managing partner, or contractor encounters when reading about AI search, ChatGPT citation, and Schema.org structured data. Each definition is written as a standalone Answer Capsule the AI tools can quote directly.

Generative engine optimization (GEO)

Generative engine optimization (GEO) is the engineering discipline of making a website readable, citable, and quotable by large language models powering ChatGPT, Perplexity, Gemini, and Google AI Overviews. GEO works through three layers: retrieval readiness so the crawler can fetch the page, semantic clarity so the model can understand it, and citation optimization so individual sentences are quotable in isolation.

Read the definition→

Answer engine optimization (AEO)

Answer engine optimization (AEO) is the measurable outcome of being cited as the answer inside AI generated responses. AEO is the metric that GEO and SEO work feeds into. A page that is cited by ChatGPT, Perplexity, Gemini, or Google AI Overviews inside its answer paragraph for a given query has achieved AEO for that query. AEO is measured as citation frequency across a fixed query set tracked weekly.

Read the definition→

llms.txt

llms.txt is a Markdown file placed at the root of a domain that gives language models a structured summary of the site. Proposed by Answer.AI in September 2024, llms.txt tells AI engines what is worth quoting, what the canonical pages are, and how the entity wants to be described. Unlike robots.txt which controls crawler access, llms.txt controls how AI engines understand the site.

Read the definition→

Content pattern

Answer Capsule

An Answer Capsule is a 40 to 60 word standalone paragraph placed immediately beneath every H2 heading on a page. The Answer Capsule directly answers the implied question of the H2 in declarative, fact dense, citation ready language. Answer Capsules are the primary citation target for ChatGPT, Perplexity, and Google AI Overviews because they pre chunk the page into extractable answers.

Read the definition→

Island Test

The Island Test is the editorial standard a passage must pass to be considered AEO ready. A passage passes the Island Test when it retains complete factual accuracy and semantic context if isolated entirely from the rest of the page. The Island Test passes only when the passage contains 134 to 167 words and resolves every named entity, drug, location, and reference inside itself.

Read the definition→

Unresolved Reference Rate (URR)

Unresolved Reference Rate (URR) is the percentage of pronouns and demonstratives in a passage that lack a clear, in passage antecedent. A URR below five percent is the AEO standard. AI extraction modules systematically drop passages with floating pronouns because the entity context is lost when the passage is severed from its document during retrieval augmented generation.

Read the definition→

Structured data

Schema.org @graph

A Schema.org @graph is a single connected JSON LD block placed in the document head that declares every relevant entity on a page and links them through @id references. The @graph pattern allows a language model reading the page to reconstruct the page as a complete object (clinic, services, providers, location, reviews) without inferring relationships. The connected @graph is the AEO standard for structured data.

Read the definition→

GPTBot

GPTBot is the OpenAI web crawler that fetches public pages for ChatGPT browsing and for training future GPT models. GPTBot identifies itself in the user agent string and respects robots.txt directives. A website that wants to be cited in ChatGPT must explicitly Allow GPTBot in robots.txt and serve full HTML in the first response, because GPTBot does not execute JavaScript.

Read the definition→

AI architecture

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is the architecture used by ChatGPT, Perplexity, Gemini, and Google AI Overviews to answer user queries. When a user asks a question, the AI executes a live web search, retrieves the top candidate documents, ranks them, synthesizes the information inside its context window, and generates a narrative response with inline citations. RAG is why AI search citation is engineerable, not just trainable.

Read the definition→

Cosine similarity (AEO)

Cosine similarity in AEO is the mathematical alignment between the vector embeddings of a page passage and the vector embeddings of a user prompt. A cosine similarity score above 0.88 indicates the passage contains the densest, most coherent cluster of semantically related concepts relevant to the query. Passages with high cosine similarity are preferentially selected by AI extraction modules over distant or sparse passages.

Read the definition→

ClaudeBot

ClaudeBot is the Anthropic web crawler that fetches public pages to ground responses from Claude, the Anthropic family of large language models. ClaudeBot identifies itself in the user agent string and respects robots.txt directives. A website that wants to be cited by Claude (including Claude.ai and Claude through API integrations) must explicitly Allow ClaudeBot in robots.txt and serve full HTML in the first response, because ClaudeBot does not execute JavaScript.

Read the definition→

PerplexityBot

PerplexityBot is the Perplexity web crawler that fetches pages in real time to answer user queries on perplexity.ai. Unlike ChatGPT (which browses on demand) or Gemini (which uses Google index), Perplexity retrieves live on nearly every query through a three layer reranking process. PerplexityBot does not execute JavaScript and abandons slow responses, so retrieval ready HTML and time to first byte under 400 milliseconds are essential.

Read the definition→

Google-Extended

Google-Extended is a robots.txt user agent token that controls whether Google may use a websites content to train and refine Gemini, Google AI Overviews, and other Google AI products. Google-Extended is distinct from Googlebot, which controls Google Search indexing. A website that wants Gemini to cite it should Allow Google-Extended in robots.txt. A website that wants to opt out of Gemini training without breaking Search should Disallow only Google-Extended.

Read the definition→

Share of Model

Share of Model (SoM) is the percentage of prompts in a defined query set where a brand is cited inside the answer paragraph across one or more AI engines. SoM replaces share of voice in traditional search measurement. A clinic with 20 percent Share of Model across a 50 query Prompt Library is cited on 10 of 50 prompts at the moment of measurement.

Read the definition→

Citation Frequency

Citation Frequency is the count of distinct AI generated answers per measurement window that cite a given source domain or brand entity. For a clinic measuring weekly, citation frequency is the number of unique cited mentions across ChatGPT, Perplexity, Gemini, and Google AI Overviews for that seven day period. Citation frequency is the underlying count that aggregates into Share of Model.

Read the definition→

Fact Density

Fact Density is the count of verifiable, specific, named facts per 100 words of body copy. A passage stating "The clinic offers GLP-1 therapy" is one weak fact. A passage stating "The clinic offers semaglutide and tirzepatide programs starting at $299 per month, with BMI 27+ eligibility under physician supervision" is five strong facts. Fact density above 6 per 100 words is the AEO standard. Below 3 is filler.

Read the definition→

Content pattern

Semantic Chunking

Semantic Chunking is the editorial practice of breaking page content into 134 to 167 word passages, each of which passes the Island Test (retains factual context if isolated). The 134 to 167 word window aligns with the optimal extraction size for Google AI Mode, ChatGPT, Perplexity, and Gemini. Passages outside this window are systematically dropped or truncated by the extraction modules.

Read the definition→

ai.txt

ai.txt is an emerging convention for declaring AI training and citation permissions at the root of a domain. Distinct from robots.txt (which controls crawler access), ai.txt declares whether AI engines may use scraped content for training, live retrieval, citation, and commercial summarization. Adoption is not yet enforced by major engines but the file functions as a public trust signal and as a paper trail for future copyright disputes.

Read the definition→

Measurement protocol

Prompt Library

A Prompt Library is the fixed set of test prompts run weekly across ChatGPT, Perplexity, Gemini, and Google AI Overviews to measure Share of Model and citation frequency. Three categories of prompts are required: Discovery prompts (broad informational like "best GLP-1 clinic in Austin"), Comparison prompts ("Brand X vs Brand Y for veneers"), and Reputation prompts ("is Brand X reliable for personal injury cases").

Read the definition→

Agentic Engine Optimization

Agentic Engine Optimization is the discipline of preparing a website for autonomous AI agents that crawl, retrieve, reason, and act on behalf of users. Agentic AI extends beyond static answer extraction (handled by GEO and AEO) to include AI agents that book appointments, compare options, fill forms, and complete transactions. Agentic optimization centers on machine readable protocols like llms.txt, agents.md, and structured data that agents can act on without human translation.

Read the definition→

Knowledge Graph Entity

A Knowledge Graph Entity is a named business, person, place, or concept that Google has verified and assigned a unique identifier in the Knowledge Graph. Verified entities receive richer citation treatment from Gemini and Google AI Overviews, including knowledge panels, attribution metadata, and consistent name resolution across queries. Establishing a verified Knowledge Graph entry is the first step in a Gemini focused AEO strategy.

Read the definition→

Quality framework

E-E-A-T

E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trust. It is the framework Google uses to evaluate content quality, with particular weight on YMYL (Your Money or Your Life) topics like healthcare, finance, and legal services. E-E-A-T signals influence ranking in classical search and citation in Google AI Overviews. 96 percent of AI Overview citations derive from sources with robust E-E-A-T markers per 2026 ranking factor analysis.

Read the definition→

Structured data

MedicalClinic Schema

MedicalClinic is a Schema.org type that declares a website as representing a medical clinic. The MedicalClinic type extends MedicalBusiness and LocalBusiness, allowing inheritance of properties like address, telephone, openingHoursSpecification, and aggregateRating, while adding clinic specific properties like medicalSpecialty, availableService, and medicalServiceProvider. Declaring MedicalClinic on every clinic page is essential for ChatGPT and Perplexity to confidently identify the site as a healthcare entity.

Read the definition→

Structured data

LegalService Schema

LegalService is a Schema.org type that declares a website as representing a law firm or legal practice. LegalService extends LocalBusiness and combines with Attorney (a Person specialization) to declare individual lawyers with state bar credentials. A law firm site that declares LegalService plus an Attorney entity per lawyer with hasCredential listing state bar admissions creates the entity structure AI engines preferentially cite for client search queries.

Read the definition→

Technical metric

Time to First Byte (TTFB)

Time to First Byte (TTFB) is the elapsed time between a crawler or browser sending an HTTP request and receiving the first byte of the response. For AI crawler readiness, the KailxLabs target is TTFB under 400 milliseconds. AI crawlers rate limit aggressively and abandon slow responses. A clinic, firm, or contractor with TTFB above 1 second sees frequent crawler abandonment and inconsistent citation across ChatGPT, Perplexity, Gemini, and Google AI Overviews.

Read the definition→

Speakable specification

SpeakableSpecification is a Schema.org property that identifies the parts of a web page best suited to be read aloud by voice assistants and AI summarization surfaces. The specification uses a cssSelector array pointing to the H1, lede paragraph, Answer Capsule blocks, or other elements the model should preferentially extract. Speakable is the schema bridge between visual content and voice or audio AI surfaces.

Read the definition→

Dataset schema

Dataset is the Schema.org entity for declaring a structured data publication. The entity carries variableMeasured, distribution, license, temporalCoverage, spatialCoverage, and creator properties. AI engines weight content backed by a declared Dataset entity significantly higher than ambient research claims because the dataset is independently auditable.

Read the definition→

reviewedBy attestation

reviewedBy is a Schema.org property on Article, WebPage, and related entities that names the credentialed professional who reviewed the content for accuracy before publication. The reviewer is declared as a Person entity with hasCredential pointing to the relevant license or board certification. For medical and legal YMYL content, reviewedBy is the schema level E-E-A-T signal that satisfies AI engines without requiring the byline to be the named expert.

Read the definition→

ImageObject schema

ImageObject is the Schema.org entity for declaring image content with structured metadata. The entity carries contentUrl, caption, creditText, dateCreated, license, and subjectOf properties. For visual verticals (plastic surgery, cosmetic dentistry, luxury home services), ImageObject is the schema bridge between image galleries and citable content because AI engines cannot rank images directly but can rank pages with structured image metadata.

Read the definition→