AI architecture

Retrieval Augmented Generation (RAG)

Also: RAG, retrieval augmented generation

RAG distinguishes live retrieval from training recall. A clinic that optimizes only for training recall is betting that OpenAI will include its website in the next training cut, which happens once every twelve to eighteen months. A clinic that optimizes for RAG can begin appearing in answers within fourteen to twenty one days.

Different AI engines use different RAG variants. Perplexity uses a three layer (L3) reranking process: broad retrieval, quality reranking, final synthesis. ChatGPT browses through Bing on demand. Gemini blends model knowledge with the Google Search index. Each variant rewards slightly different signals.

Related