AEO · 11 min read

How to get cited by ChatGPT

A concrete playbook for earning citations inside ChatGPT answers. Passage structure, schema, GPTBot access, entity work, monitoring.

Format
Article
Updated
Apr 16, 2026
Read time
11 min read

TL;DR

Getting cited by ChatGPT in 2026 comes down to five things: allow GPTBot in robots.txt and serve full server-rendered HTML, structure every page with self-contained 40 to 80 word citable passages under question H2s, ship a clean schema graph with stable Organization, Person, and Service entities, build entity authority through consistent sameAs links and cross-domain mentions, and monitor citations monthly through a query panel and the DataForSEO ChatGPT scraper. Done well, first citations land in four to eight weeks.

01

How ChatGPT picks sources

ChatGPT, when answering a query that triggers web search, retrieves a small set of candidate URLs through a Bing-backed retrieval layer, evaluates each for relevance and source authority, and composes an answer that may quote or paraphrase the candidates with attribution. The signals that influence selection include passage structure, schema clarity, classical authority (links and brand mentions), and freshness. Sites that disallow GPTBot are excluded from the corpus entirely.

ChatGPT uses two paths to source content. The first is the model's training data, which is a snapshot of the open web through the model's training cutoff. The second is real-time web retrieval, which OpenAI added in late 2023 and has expanded since. Citations almost always come from the retrieval path, because OpenAI does not surface training-data sources for compliance reasons.

The retrieval layer for ChatGPT runs on a Bing-backed index plus OpenAI's own crawl through GPTBot. When a user asks a question that requires fresh information, the model fetches a set of candidate URLs, reads them, and composes an answer with citations. The selection of which URLs to fetch is driven by classical retrieval signals, and the selection of which passages to cite from those URLs is driven by passage structure.

02

Step 1: let GPTBot in

The first step costs nothing and is missed by roughly a third of the sites we audit. Open robots.txt and confirm GPTBot is not disallowed.

```

User-agent: GPTBot

Allow: /

User-agent: OAI-SearchBot

Allow: /

User-agent: ChatGPT-User

Allow: /

```

OAI-SearchBot is the dedicated retrieval crawler for ChatGPT search. ChatGPT-User is the user-agent used for direct browse actions inside chat sessions. GPTBot is OpenAI's general-purpose crawler. Allow all three. Some teams want to block training but allow retrieval; the convention is to disallow GPTBot but allow OAI-SearchBot and ChatGPT-User. We recommend allowing all three unless there is a specific legal reason not to.

Verify with curl that the file returns 200 and the directives are correct. Verify in your server logs that the bots are actually fetching pages within a week of going live.

03

Step 2: serve real HTML

GPTBot reads HTML, not JavaScript-hydrated React. If your site is client-rendered, half your content is invisible to the bot. The fix is server-side rendering or static generation. For Next.js sites, default to Server Components. For SPAs, ship a server-rendered fallback or move to a framework that supports SSR.

Verify by fetching your page with curl and inspecting the response. If the body contains your headings and paragraphs as plain HTML, you are good. If you see a near-empty <div id="root"></div>, you have work to do.

04

Step 3: write passages the model can lift

Citation behavior favors self-contained passages. ChatGPT prefers to quote a single coherent block over stitching fragments together. The unit of citation is roughly 40 to 80 words: long enough to be substantive, short enough to be quotable.

Structure each major section with an H2 in the form of a question or a clear topic statement, immediately followed by a citable answer block. The block should open with the entity (not 'It is...' but 'AEO is...'), define the concept, add one piece of context, and stop. Follow with the deeper exposition for human readers.

Three patterns work well. The TL;DR block at the top of the article, engineered as the primary citable passage. Per-section answer blocks under each H2. FAQ blocks at the bottom, marked up with FAQPage schema. All three should be in plain HTML, server-rendered, with no JavaScript dependency.

  • Open every citable block with the entity, not a pronoun
  • Keep blocks between 40 and 80 words
  • Define the concept in the first sentence
  • Add context in the second or third sentence
  • Avoid hedging language and marketing puffery
05

Step 4: ship a real schema graph

Schema does not directly cause citations, but it gives the model stable references. Treating each major entity (Organization, Person, Service, Article) as a node with a stable @id URI lets the model cross-reference them across pages. This matters because ChatGPT's citation behavior weights entity-level authority, not just page-level relevance.

Minimum viable schema for an AEO program: Organization with @id, sameAs links to LinkedIn, GitHub, X, Crunchbase, Wikidata where applicable. Person schema for every byline, with sameAs to the author's professional profiles. Service schema for every service page, linked back to the Organization @id. Article schema on every long-form piece, with author linked to Person @id. FAQPage schema on FAQ blocks.

Validate with Google's Rich Results test and Schema.org's validator. Render server-side as JSON-LD in the head. Do not rely on client-side schema injection because the bots will not see it.

06

Step 5: build entity authority

ChatGPT's source weighting reflects what the model learned during training plus what its retrieval layer fetches at query time. Entity authority shows up in both. The training data favored sources with strong cross-domain mentions, Wikipedia presence, news coverage, and consistent professional profiles. The retrieval layer favors sources with current authority signals.

The work: get your business named on third-party sites that the model trusts. This includes industry publications, podcast appearances, conference talks, guest posts on relevant blogs, and Wikipedia entries where notability is genuinely defensible. Each mention reinforces the entity. Each consistent sameAs across LinkedIn, GitHub, Crunchbase, and X reinforces the entity.

This is slow work. It is also the work that compounds the most. A brand that is consistently named across the open web for the past two years gets cited more often than a brand that started last month, even when the page-level structure is identical.

07

Step 6: monitor monthly

There is no Search Console for ChatGPT. You build the measurement yourself. The minimum cadence is monthly, the minimum panel is 50 target queries, the minimum data points per citation are platform, query, position within the answer, and sentiment.

Two methods, used together. The DataForSEO ChatGPT scraper runs your panel programmatically and returns the cited URLs. A manual prompt panel, run by a human in a fresh session, captures behavior the scraper might miss. We log both, diff month over month, and report citations gained, citations lost, and competitive deltas.

Track AI-referred traffic in analytics through UTM hygiene on any URL that ChatGPT might surface (rare; most citations are not clicked) and through direct-detection heuristics (sudden direct traffic spikes for long-tail URLs, often correlate with a citation).

08

What does not work

Three patterns fail. Keyword-stuffed pages with no citable structure fail because the model cannot find a coherent passage to quote. Pages that hide behind login or paywall fail because GPTBot cannot read them. Pages that disallow GPTBot in robots.txt fail because they are not in the corpus.

Also failing: fake bylines, unattributable author claims, schema with broken @id references, and content that contradicts the entity description in llms.txt or the Organization schema. The model is large enough to detect these inconsistencies and downweight the source.

09

Realistic timelines

First citations on a properly structured page typically appear within four to eight weeks of shipping the changes, assuming GPTBot has crawled the page (verify in server logs) and the page targets a query with non-trivial volume. Compounding citation growth, the phase where the model reaches for your domain reflexively, takes three to six months of consistent publishing and monitoring.

The compounding is real because ChatGPT's retrieval layer learns from session behavior. Domains that get cited and produce useful answers get cited again. Domains that stay invisible stay invisible. The flywheel rewards starting now.

Questions

Answered below.

  • Possibly, for retrieval-time queries. OAI-SearchBot is the crawler that fetches pages when ChatGPT performs web search. Allowing it lets ChatGPT cite you in real-time queries even if you blocked GPTBot for training. We recommend allowing all three (GPTBot, OAI-SearchBot, ChatGPT-User) unless there is a specific legal or contractual reason not to.

Want this work done for you?

Let's talk.