A lot of SEO discussion still treats AI visibility as a simple extension of rank. That model breaks quickly in production. Pages can rank well and get few citations. Other pages can rank lower and get cited frequently for narrow prompts.
Ranking, citation, and grounding are different systems.
If you separate those systems and measure each one directly, AI behavior gets much easier to diagnose.
- Rank. Position in traditional SERPs.
- Citation.Whether your domain is shown in the AI answer's source list.
- Grounding. Whether your content was used as evidence in answer construction, even if not explicitly cited.
The common failure is to optimize one layer and expect movement in all three.
Fan-out theory has a true part: answer engines expand prompts into related intents and sub-queries. But fan-out only explains candidate discovery. It does not explain which evidence is chosen or attributed.
A better model is four stages:
- Retrieval eligibility (can your page enter the candidate set?)
- Evidence fitness (does the page contain extractable, query-matching evidence blocks?)
- Attribution likelihood (is the source likely to be shown in citations?)
- Answer assembly constraints (slot limits, deduping, diversity, UI truncation)
Fan-out is recall plumbing. Citation is evidence selection plus attribution policy.
If you already have a citation checker, keep it as the primary instrumentation layer and add a small evidence-quality rubric.
Citation Rate = cited answers / tested prompts
Self Citation Share = your-domain citations / all citations
Unique Source Share = unique cited domains / all citations
Source Drift = prompts where cited domain changes across runs
Grounding Quality (manual)
2 = directly supported by snippet
1 = partially supported
0 = unsupported or inferredAdd one binary label that drives most interpretation:
- Evidence-fit: High / Medium / Low per page for that exact prompt.
A page can be retrieval-eligible and still fail citation for the target prompt. A common case is directory or aggregator pages that rank for address queries but do not expose directly citable resident-name mappings in accessible content.
That produces a predictable output pattern:
- Model retrieves page candidate
- Model finds low evidence fit for requested claim
- Model cites alternative domains or returns sparse answer
This is not model inconsistency. It is evidence quality doing what it should do.
Prompt Set: 20-30 prompts
Models: 2 (e.g., Gemini + ChatGPT)
Modes: neutral prompt + source-constrained prompt
Runs: 3 per prompt/mode/model
Output per run:
answer text
citations
exact supporting snippets
grounding quality labelThis is enough to detect model-level behavior differences and source-level evidence weaknesses without overbuilding tooling.
- Findability. Crawl/index health, entity clarity, prompt-aligned titles.
- Extractability. One claim per paragraph, clear headings, concise answer blocks.
- Attributability. Verifiable facts, explicit sourcing, stable canonical source URL.
Most teams over-index on findability and under-invest in extractability. Citation performance is usually lost in that gap.
Use this exactly as your first test run. Keep prompt wording fixed across models so your citation differences are comparable.
Prompt A (neutral)
Who lives on McCullough Ln, Hilliard, OH?
List the names (if available), the source URL for each claim, and quote the exact snippet used.
If names are not available on the source page, say "not available on page" explicitly.
Do not use agent names, listing agents, or inferred occupants as residents.
Prompt B (source-constrained)
Evaluate this page first:
https://www.spokeo.com/Mccullough+Ln+Hilliard+OH+addresses
1) Determine whether it contains directly citable resident names.
2) Quote exact snippets from that page only.
3) If it fails, explain exactly why it fails evidence quality for the query.
4) Then provide best-available alternatives with exact snippets + URLs.
Rules: no inference; use "not available on page" when needed.Score each run with this minimal sheet:
run_id,model,prompt_mode,query,cited_domains,self_cited,source_drift,grounding_quality,evidence_fit,notes
1,chatgpt,neutral,"who lives on mccullough ln hilliard oh","homes.com|zillow.com",0,0,2,medium,"no names for most addresses"
2,gemini,neutral,"who lives on mccullough ln hilliard oh","spokeo.com|directory",1,1,1,low,"retrieved spokeo but not citable"- grounding_quality: 2 = directly supported, 1 = partial, 0 = unsupported/inferred.
- evidence_fit: high / medium / low for that prompt-page pair.
- source_drift: 1 when the winning cited domain changes across repeated runs.
If your goal is AI visibility, rank is a leading indicator but not the target metric. The target is usable evidence. Measure that directly, and the optimization path becomes obvious.
Ranking gets you discovered. Evidence gets you cited.
A framework that separates rank, citation, and grounding is the fastest way to move from SEO folklore to repeatable GEO engineering.