Shipping AI answers without checking them against your own docs is how products lose trust in a week. The good news: you can catch most issues before writing integration code. Oprag’s dashboard chat uses the same retrieval and citation pipeline as the REST API, so what you verify there is what your users will see later.
Why validate in the dashboard first
Founders and PMs often jump straight to engineering tickets. That hides two problems:
- Doc quality issues — missing pages, outdated policies, contradictory sections
- Question phrasing gaps — users ask differently than you expect
Fixing docs and test questions in the dashboard is cheaper than redeploying an embed or rolling back an API integration.
Build a golden question set
Create 15–30 questions your users actually ask. Pull them from:
- Support tickets (last 90 days)
- Sales call notes
- Onboarding drop-off surveys
- Search logs in your help center
Group them by theme: billing, security, setup, troubleshooting. Tag each with the document you expect to answer it.
Run the checklist on every answer
For each golden question, score the reply against these criteria:
| Check | Pass criteria |
|---|---|
| Correctness | Matches the source doc, no invented details |
| Citations | At least one relevant source; page numbers look right |
| Scope | Stays on your docs; does not answer unrelated topics |
| Tone | Appropriate for customer-facing use |
| Refusal | Says “not in your docs” when coverage is missing |
Export failures as doc tickets, not model tickets. Most failures are fixable by editing or uploading content.
Test edge cases deliberately
Do not only ask easy FAQs. Include:
- Ambiguous questions — “How much does it cost?” when you have multiple plans
- Stale content — questions about deprecated features still in old PDFs
- Cross-doc answers — policies split across Terms and Help Center
- Adversarial prompts — attempts to get answers not in your corpus
Note where citations help users self-serve vs where a human should still intervene.
Set a launch bar
Define what “good enough” means for v1. Example bar for a B2B SaaS help widget:
- 90% of golden questions pass correctness
- 100% of billing/security answers have citations
- Zero hallucinated policy statements in the test set
If you miss the bar, fix docs or narrow scope (e.g., only billing FAQs in v1) before calling the API done.
Loop in support before go-live
Have one support lead run the same golden set in the dashboard. They will catch phrasing mismatches and missing articles faster than engineering will.
When citations look right, hand off to REST integration or widget embed.
After launch — keep the set alive
Add new tickets to golden questions monthly. Re-run the set when you upload major doc changes. Treat it like a smoke test suite for your knowledge base.
Starter includes 500 queries per month with full API access — enough to iterate without a credit card. See free RAG API, no credit card for plan details.