CondoAsk RAG Dashboard

All Cases

RAG System Status

Retrieval — Latest Run

Generation — Latest Run

Behavioral Tests — Latest Run

Audio Parity — Latest Run

Golden Dataset

⬇ Download JSONL

Add Test Case

1 · Query

›

2 · Chunks

›

3 · Answer

›

4 · Save

What is a test case?
A question a resident would ask the condo bot. The system will automatically verify the bot answers it correctly. This case will be evaluated on every future run.

Property

Question

What happens when you save?
The case is added to the golden dataset. From this point, the generation evaluator will automatically include it in every run. No files to touch.

Review the case before saving:

Run Evals

Measures how well the system finds the correct chunks. No LLM — fast and cheap. Only needs Voyage + Supabase.

Save as baseline idle

Calls the real bot and evaluates responses with an AI judge. Uses Claude + n8n — slower and has API cost.

Smoke only (fast) idle

Runs isolation, multi-turn, error-handling, and format tests. Uses Claude + n8n — only behavioral cases (faster than full generation).

idle

Checks that audio and text give semantically equivalent answers. Uses Claude + n8n + Groq. Requires audio fixtures in tests/audio/ — generate them once with GROQ_API_KEY=xxx node tests/scripts/generate-audio-fixtures.js.

idle