What I Learned Building a Multi-Domain Triage Agent with Google ADK and Local RAG

May 31, 2025

PythonGoogle ADKRAGChromaDBGeminiAIMulti-Agent

[ WHY THIS MATTERS ]

Support teams automating triage need answers grounded in the right product domain — wrong citations erode trust faster than slow responses.

This post documents the engineering tradeoffs behind a HackerRank Orchestrate eval that hit 96.6% citation grounding across 29 tickets.

I built a Multi-Domain Support Triage Agent for the HackerRank Orchestrate challenge — a system that routes and responds to real support tickets across three isolated knowledge domains: HackerRank, Claude, and Visa. The constraint was strict: responses had to be grounded entirely in retrieved source documents. No hallucination. No guessing.

This post covers three things I didn't find good documentation for while building it — embedding model tradeoffs, RAG preprocessing sensitivity, and why I abandoned single-step ADK pipelines for a sequential multi-agent pattern. If you're building anything with Google ADK, local RAG, or Gemini Flash — this is for you.

The pipeline

Tech stack: Python, Google ADK, Gemini Flash, ChromaDB, Sentence Transformers, Pydantic. The system is a two-stage agent pipeline:

Retrieval Agent — identifies the correct knowledge domain and retrieves supporting evidence from a local ChromaDB RAG index built over 700+ support documents.
Format Agent — takes the retrieved evidence and outputs a strict Pydantic-validated schema: response, product area, status, request type, and citation justification.

If structured output is missing or invalid, the system auto-falls back to a safe escalation response with status: "escalated" — instead of guessing.

Lesson 1 — Embedding model tradeoff: SBERT vs EmbeddingGemma

Before committing to a model, I ran a manual benchmark on a 10-ticket sample with citation checks. SBERT (all-MiniLM-L6-v2) offered fast inference and low memory but struggled on ambiguous and multi-hop tickets. EmbeddingGemma was slower but significantly stronger on citation alignment.

I chose EmbeddingGemma and prioritized citation quality over speed — for a triage system where wrong citations mean wrong responses, accuracy wasn't negotiable.

Lesson 2 — RAG quality depends heavily on preprocessing

The corpus was ~774 articles across three domains with high variance in document length. Chunks too large buried relevant sentences; chunks too small split mid-concept. Tuned chunk size and overlap iteratively, validating by checking whether expected evidence appeared in top-k retrieval results.

With 700+ documents, chunking strategy is a first-class engineering decision — not a config detail.

Lesson 3 — ADK architecture tradeoff: why sequential beats single-step

Combining retrieval and structured output in a single agent step didn't work reliably — ADK's behavior is model-dependent when combining tool use with response_schema enforcement. I split the pipeline into two discrete stages: Retrieval Agent (ChromaDB query, raw evidence) then Format Agent (Pydantic schema enforcement).

Failures became isolated and debuggable; schema output became stable. If you're hitting inconsistent structured output with Google ADK and Gemini — separate retrieval from generation.

Results

29 consecutive Orchestrate eval tickets processed end-to-end at 96.6% citation grounding accuracy.
Zero hallucinations — the fallback safety rail caught every case where the model would have otherwise guessed.
Fully local retrieval with no cloud dependency on the vector store.

What I'd do differently

Add a domain classifier stage before retrieval.
Benchmark more embedding models including text-embedding-004 from Vertex AI.
Build a proper evaluation harness with ground truth labels instead of manual citation checks on 10 tickets.