[Google Engineering Blog] Production-Ready AI Agents: 5 Lessons from Refactoring a Monolith

Tóm tắt

Google’s AI Agent Clinic đã phân tích và tái kiến trúc “Titanium” — một sales research agent hoạt động được trong môi trường dev nhưng không đáp ứng được yêu cầu production. Monolith ban đầu chạy một vòng for tuyến tính: khi một sub-task thất bại (API timeout hoặc hallucination), toàn bộ quy trình dừng mà không báo lỗi. Giải pháp là tách thành pipeline dùng Google Agent Development Kit (ADK) với các specialized sub-agents: Company Researcher, Search Planner, Case Study Researcher, Selector, Email Drafter.

Bài học thứ hai là về structured outputs: thay vì mô tả JSON schema trong prompt string dẫn đến fragile parsing và waste tokens, ADK native Pydantic objects được inject trực tiếp làm schema definition. ADK tự động dùng Structured Outputs API để đảm bảo output luôn hợp lệ. Bài học thứ ba là về RAG: hardcode 12 case studies trong Python file không scale được — giải pháp là async crawler (Playwright) tự động scrape Google Cloud customer success pages, batch-index vào Google Cloud Vector Search, và dùng Hybrid Search kết hợp semantic và keyword để retrieve.

Về observability, ADK tích hợp OpenTelemetry out-of-the-box: một lệnh configure_telemetry() duy nhất emit distributed traces toàn bộ execution flow, capturing model requests, token counts, và tool executions. Cuối cùng, cost control được thực hiện qua ADK’s native orchestration tự động áp dụng exponential backoff, timeout boundaries, và retry loops — không cần viết custom try-catch logic. Bài học tổng quát: circuit breakers và structured orchestration framework là prerequisite để đưa AI agent vào production.

👉 Đọc bài gốc

Tóm tắt#

Tóm tắt