Why Measurement Matters
You can't improve what you can't measure. Without a consistent, multi-factor scoring system, brands are left guessing whether their efforts to appear in AI recommendations are actually working.
The challenge with measuring AI visibility is that AI responses are inherently variable. The same prompt can produce different results across models, across runs, and across time. A single observation tells you almost nothing about your actual visibility. You need aggregate data, weighted scoring, and statistical awareness of variance to draw meaningful conclusions.
Clarify's visibility scoring system combines multiple signals into a single 0-100 score that represents how visible your brand is across AI recommendation systems, while accounting for the natural variability of AI-generated responses.
Visibility Score Components
The visibility score is a composite metric calculated as: VisibilityScore = BaseVisibility × (1 + StrengthBonus + CitationBonus) × ModelWeight
- BaseVisibility (0–100): Presence × Rank — whether you appear and where you rank.
- StrengthBonus (0–0.3): How strongly AI recommends you (explicit vs. passing mention).
- CitationBonus (0–0.2): Whether AI cites sources or provides links alongside the recommendation.
- ModelWeight (0.3–0.4): Relative importance of the AI model based on market share.
Mention Rate
Mention Rate measures the percentage of relevant prompts where an AI model includes your brand. If you track 20 prompts and AI mentions you on 12, your Mention Rate is 60%. Each prompt's mention includes a stability indicator: High (80%+ of runs), Medium (50-79%), or Low (less than 50%). A High stability mention is worth more in the overall score.
Rank & Top-3 Rate
When AI generates recommendation lists, position matters. Rank measures your average position across prompts where you appear. Top-3 Rate measures how often you appear in the first three positions. Rank is normalized across list lengths to ensure fair comparison.
Recommendation Strength
Not all mentions are equal. Clarify classifies recommendation strength into four levels:
- Strong (1.3×): Explicit recommendation — AI specifically suggests or endorses the brand.
- Listed (1.0×): Mentioned in a recommendation list without special emphasis.
- Weak (0.6×): Passing mention — referenced but not as a recommendation.
- None (0.0×): Not mentioned.
AI Confidence
AI Confidence measures how reliable the visibility data itself is. It is a weighted combination of four factors:
- Stability (40%): Run-to-run consistency for the same prompt on the same model.
- Agreement (25%): Cross-model consistency — whether ChatGPT, Claude, and Gemini agree.
- Evidence Strength (25%): How well the raw response matches brand detection criteria.
- Parse Reliability (10%): Whether the AI response was successfully parsed.
Model Weights
- ChatGPT (GPT-4): 0.4 — Largest user base, most common entry point for AI-driven discovery.
- Claude: 0.3 — Growing market share, strong among professional and technical users.
- Gemini: 0.3 — Integrated into Google ecosystem, significant reach.
A brand that scores well on ChatGPT but poorly on Claude and Gemini will have a moderate overall score. Cross-model visibility is more durable and reflective of genuine authority.
Limitations & Variance
AI responses are inherently variable. Clarify runs multiple queries per prompt per scan cycle and aggregates across runs. Scores may fluctuate between scans — a difference of a few points may be within normal variance. Clarify flags statistically significant changes and distinguishes them from noise.
Factors Clarify cannot measure directly include personalization, real-time web access variability, and model update timing. The scoring is designed to be robust in the face of these limitations.