Skip to main content
Kofeejan
All blogs

What I Learned Building a Multi-Domain Triage Agent with Google ADK and Local RAG

PythonGoogle ADKRAGChromaDBGeminiAIMulti-Agent

I built a Multi-Domain Support Triage Agent for the HackerRank Orchestrate challenge — a system that routes and responds to real support tickets across three isolated knowledge domains: HackerRank, Claude, and Visa. The constraint was strict: responses had to be grounded entirely in retrieved source documents. No hallucination. No guessing.

This post covers three things I didn't find good documentation for while building it — embedding model tradeoffs, RAG preprocessing sensitivity, and why I abandoned single-step ADK pipelines for a sequential multi-agent pattern. If you're building anything with Google ADK, local RAG, or Gemini Flash — this is for you.

The pipeline

Tech stack: Python, Google ADK, Gemini Flash, ChromaDB, Sentence Transformers, Pydantic. The system is a two-stage agent pipeline:

  • Retrieval Agent — identifies the correct knowledge domain and retrieves supporting evidence from a local ChromaDB RAG index built over 700+ support documents.
  • Format Agent — takes the retrieved evidence and outputs a strict Pydantic-validated schema: response, product area, status, request type, and citation justification.

If structured output is missing or invalid, the system auto-falls back to a safe escalation response with status: "escalated" — instead of guessing.

Lesson 1 — Embedding model tradeoff: SBERT vs EmbeddingGemma

Before committing to a model, I ran a manual benchmark on a 10-ticket sample with citation checks. SBERT (all-MiniLM-L6-v2) offered fast inference and low memory but struggled on ambiguous and multi-hop tickets. EmbeddingGemma was slower but significantly stronger on citation alignment.

I chose EmbeddingGemma and prioritized citation quality over speed — for a triage system where wrong citations mean wrong responses, accuracy wasn't negotiable.

Lesson 2 — RAG quality depends heavily on preprocessing

The corpus was ~774 articles across three domains with high variance in document length. Chunks too large buried relevant sentences; chunks too small split mid-concept. Tuned chunk size and overlap iteratively, validating by checking whether expected evidence appeared in top-k retrieval results.

With 700+ documents, chunking strategy is a first-class engineering decision — not a config detail.

Lesson 3 — ADK architecture tradeoff: why sequential beats single-step

Combining retrieval and structured output in a single agent step didn't work reliably — ADK's behavior is model-dependent when combining tool use with response_schema enforcement. I split the pipeline into two discrete stages: Retrieval Agent (ChromaDB query, raw evidence) then Format Agent (Pydantic schema enforcement).

Failures became isolated and debuggable; schema output became stable. If you're hitting inconsistent structured output with Google ADK and Gemini — separate retrieval from generation.

Results

  • 29 consecutive pilot tickets processed end-to-end at 96.6% citation grounding accuracy.
  • Zero hallucinations — the fallback safety rail caught every case where the model would have otherwise guessed.
  • Fully local retrieval with no cloud dependency on the vector store.

What I'd do differently

  • Add a domain classifier stage before retrieval.
  • Benchmark more embedding models including text-embedding-004 from Vertex AI.
  • Build a proper evaluation harness with ground truth labels instead of manual citation checks on 10 tickets.