What I Learned Building a Multi-Domain Triage Agent with Google ADK and Local RAG
I built a Multi-Domain Support Triage Agent for the HackerRank Orchestrate challenge — a system that routes and responds to real support tickets across three isolated knowledge domains: HackerRank, Claude, and Visa. The constraint was strict: responses had to be grounded entirely in retrieved source documents. No hallucination. No guessing.
This post covers three things I didn't find good documentation for while building it — embedding model tradeoffs, RAG preprocessing sensitivity, and why I abandoned single-step ADK pipelines for a sequential multi-agent pattern. If you're building anything with Google ADK, local RAG, or Gemini Flash — this is for you.
The pipeline
Tech stack: Python, Google ADK, Gemini Flash, ChromaDB, Sentence Transformers, Pydantic. The system is a two-stage agent pipeline:
- Retrieval Agent — identifies the correct knowledge domain and retrieves supporting evidence from a local ChromaDB RAG index built over 700+ support documents.
- Format Agent — takes the retrieved evidence and outputs a strict Pydantic-validated schema: response, product area, status, request type, and citation justification.
If structured output is missing or invalid, the system auto-falls back to a safe escalation response with status: "escalated" — instead of guessing.
Lesson 1 — Embedding model tradeoff: SBERT vs EmbeddingGemma
Before committing to a model, I ran a manual benchmark on a 10-ticket sample with citation checks. SBERT (all-MiniLM-L6-v2) offered fast inference and low memory but struggled on ambiguous and multi-hop tickets. EmbeddingGemma was slower but significantly stronger on citation alignment.
I chose EmbeddingGemma and prioritized citation quality over speed — for a triage system where wrong citations mean wrong responses, accuracy wasn't negotiable.
Lesson 2 — RAG quality depends heavily on preprocessing
The corpus was ~774 articles across three domains with high variance in document length. Chunks too large buried relevant sentences; chunks too small split mid-concept. Tuned chunk size and overlap iteratively, validating by checking whether expected evidence appeared in top-k retrieval results.
With 700+ documents, chunking strategy is a first-class engineering decision — not a config detail.
Lesson 3 — ADK architecture tradeoff: why sequential beats single-step
Combining retrieval and structured output in a single agent step didn't work reliably — ADK's behavior is model-dependent when combining tool use with response_schema enforcement. I split the pipeline into two discrete stages: Retrieval Agent (ChromaDB query, raw evidence) then Format Agent (Pydantic schema enforcement).
Failures became isolated and debuggable; schema output became stable. If you're hitting inconsistent structured output with Google ADK and Gemini — separate retrieval from generation.
Results
- 29 consecutive pilot tickets processed end-to-end at 96.6% citation grounding accuracy.
- Zero hallucinations — the fallback safety rail caught every case where the model would have otherwise guessed.
- Fully local retrieval with no cloud dependency on the vector store.
What I'd do differently
- Add a domain classifier stage before retrieval.
- Benchmark more embedding models including text-embedding-004 from Vertex AI.
- Build a proper evaluation harness with ground truth labels instead of manual citation checks on 10 tickets.