When we set out to automate our daily on-call reporting workflow, I thought it would be straightforward: connect an LLM to Datadog, ask it to fetch some metrics, and generate a nice Markdown report. Simple, right?
Well, not quite. What started as a weekend experiment turned into a deep exploration of agent frameworks, the Model Context Protocol (MCP), and the surprising challenges of making LLMs behave deterministically. Along the way, we discovered Fast-Agent — a framework that made building multi-agent workflows feel natural rather than painful.
This is the story of building that system, the patterns we discovered, and the hard-won lessons about making AI agents work reliably in production.
Why We Needed This
Our platform engineering team monitors multiple production Kubernetes clusters and serverless functions. Every morning, someone on-call would spend 30-60 minutes cobbling together a status report:
- Check Datadog for active incidents and monitor alerts
- Query pod error states across clusters (CrashLoopBackOff, OOMKilled, ImagePullBackOff)
- Review resource utilization trends
- Scan error logs for new patterns
- Check service mesh health and serverless function metrics
- Compile everything into a readable report
The data was there. The queries were repeatable. But the manual compilation was tedious and error-prone. We needed automation, but not the brittle shell-script-and-jq kind. We needed something that could understand the data contextually—when a pod restart spike matters versus when it’s routine churn.
Enter Fast-Agent and MCP
Why Fast-Agent?
I’d looked at several agent frameworks (LangChain, AutoGen, CrewAI), but most felt either too heavyweight or too opinionated. Fast-Agent stood out for three reasons:
1. Decorator-based agent definition: Agents are just async functions with a decorator. No class hierarchies, no complex configuration schemas.
from fast_agent import FastAgent
from fast_agent.core.prompt import Prompt
fast = FastAgent("Platform On-Call Report")
@fast.agent(
name="k8s_pod_overview",
servers=["datadog"],
instruction="Report pod error states across clusters",
max_tokens=24000
)
async def k8s_pod_overview(task: str) -> str:
return "Kubernetes pod overview generated."
That’s it. You define what the agent does via the instruction parameter (more on that later), specify which MCP servers it can access, and you’re done.
2. First-class MCP support: Fast-Agent was built with MCP in mind. You configure servers in fastagent.config.yaml, and they’re available to agents automatically:
mcp:
servers:
datadog:
transport: http
url: https://mcp.datadoghq.com/api/unstable/mcp-server/mcp
headers:
DD_API_KEY: ${DD_API_KEY}
DD_APPLICATION_KEY: ${DD_APPLICATION_KEY}
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", ".", "/tmp"]
No manual tool registration, no adapter layers. Agents just work with MCP servers.
3. Built-in parallel execution: The @fast.parallel decorator handles fan-out/fan-in workflows natively. This turned out to be critical for performance—more on that below.
What is MCP, Really?
If you’re not familiar with the Model Context Protocol, think of it as an API standard that lets LLMs interact with external systems in a structured way. Instead of manually implementing function calling for every tool, MCP servers expose resources (like Datadog dashboards) and tools (like querying metrics) in a standard format.
Datadog’s MCP server gives you tools like:
mcp_datadog_get_datadog_metric- Fetch time-series metricsmcp_datadog_search_datadog_logs- Query logs with filtersmcp_datadog_search_datadog_monitors- List monitor alertsmcp_datadog_search_datadog_spans- Retrieve APM traces
The beauty is that the LLM decides which tools to call based on your instruction prompt. You don’t write explicit code to fetch metrics—you tell the agent what you want, and it figures out the API calls.
Architecture: The Prompt Server Pattern
We settled on what we call the “prompt server” pattern. Each section of our report (incidents, pod errors, resource consumption, service mesh metrics, etc.) is:
- A separate agent with its own prompt file
- Executed in parallel alongside other sections
- Strictly output-only—no conversational fluff, just structured data
Here’s how it works:
The Decorator Factory
We created a decorator factory to reduce boilerplate:
def shared_agent(
name: str,
prompt_filename: str,
servers: list[str] | None = None,
max_tokens: int | None = None,
):
"""Decorator factory for shared fast-agents."""
servers = servers or ["datadog"]
request_params = None
if max_tokens:
request_params = RequestParams(maxTokens=max_tokens)
def _decorator(fn):
agent_kwargs = {
"name": name,
"servers": servers,
"instruction": _build_instruction(prompt_filename),
}
if request_params:
agent_kwargs["request_params"] = request_params
return fast.agent(**agent_kwargs)(fn)
return _decorator
This lets us define agents cleanly:
@shared_agent("k8s_pod_overview", "03-k8s-pod-overview.md",
servers=["datadog", "filesystem"], max_tokens=24000)
async def k8s_pod_overview(task: str) -> str:
return "Kubernetes pod overview generated."
@shared_agent("serverless_health", "10-serverless-health.md", max_tokens=20000)
async def serverless_health(task: str) -> str:
return "Serverless health report generated."
Notice something interesting: the function body is trivial. The real work happens via the instruction parameter, which loads from a Markdown file.
Prompt Files as Configuration
This was a key insight: prompts are specifications, not conversations. Each prompt file is a detailed spec for what data to fetch and how to format it.
Here’s a simplified example from 05-active-monitor-alerts.md:
# Active Monitor Alerts — Platform
## 🚨 STOP - READ THIS FIRST - MANDATORY RULES 🚨
YOU ARE FORBIDDEN FROM OUTPUTTING ANY COMMENTARY, EXPLANATIONS, OR ANALYSIS.
YOUR FIRST LINE MUST BE EXACTLY:
**24-Hour Alert Activity (team:platform):**
IF YOU WRITE EVEN ONE WORD BEFORE THIS LINE, YOU HAVE FAILED THIS TASK.
---
## Objective
Produce a Datadog monitor report that:
- Highlights currently active alerts (alert and warn)
- Counts alerts that triggered in the last 24 hours
- Separately counts "no data" alerts
## Tools
- `mcp_datadog_search_datadog_monitors`
- `mcp_datadog_search_datadog_events`
## Queries & Processing
...detailed query specifications...
## Output Format
Return ONLY this content, in this exact order:
1) First line: **24-Hour Alert Activity (team:platform):**
2) Activity summary:
- ⚡ Triggered (last 24h): {count}
- ✅ Resolved (last 24h): {count}
...
The prompt is aggressive about suppressing LLM commentary. More on why this was necessary below.
Parallel Execution
With nine sections to generate (incidents, infrastructure, pod status, monitors, logs, metrics, service mesh, serverless, etc.), sequential execution would take several minutes. Fast-Agent’s built-in async handling makes parallelization trivial:
async def _collect_section_outputs(agent) -> Dict[str, str]:
results: Dict[str, str] = {}
async def _invoke(section_name: str):
print(f"\n🔎 Gathering data for section: {section_name}")
response = await getattr(agent, section_name).send(Prompt.user("run"))
results[section_name] = response.strip()
tasks = [asyncio.create_task(_invoke(name)) for name in ACTIVE_SECTION_NAMES]
await asyncio.gather(*tasks)
return results
All sections run concurrently, each making independent Datadog MCP calls. What took 3-4 minutes sequentially now takes ~45 seconds.
Template Assembly
Once we have all section outputs, we use simple string replacement to assemble the final report:
template_str = REPORT_TEMPLATE.read_text(encoding="utf-8")
rendered_report = _render_template(template_str, {
"TIMESTAMP": ts,
"INCIDENTS_OUTPUT": sections.get("incidents", "No data"),
"POD_OVERVIEW_OUTPUT": sections.get("k8s_pod_overview", "No data"),
"SERVICE_MESH_OUTPUT": sections.get("service_mesh_metrics", "No data"),
# ... etc
})
No LLM involvement here—just deterministic text substitution. The LLM’s job is to generate well-formatted section content, not to understand the overall report structure.
The Hard Problems
Building this system exposed several non-obvious challenges. Here’s what we learned.
Challenge 1: LLMs Are Conversational By Default
This was the biggest surprise. LLMs are trained to be helpful conversational assistants. When you ask for a report, they naturally want to explain what they’re doing:
Great! Now I have all the data I need. Let me analyze the results...
Based on the metrics collected, I can see that cluster-a has...
### Pod Error States
| Cluster | Pod | Error | Count |
For a chat interface, this is great. For programmatic report generation, it’s terrible. The commentary pollutes the structured output and breaks template substitution.
Our solution: aggressively explicit prompt engineering.
We added “MANDATORY OUTPUT RULES” sections at the top of every prompt:
## 🚨 MANDATORY OUTPUT RULES 🚨
**YOU ARE FORBIDDEN FROM OUTPUTTING ANY COMMENTARY, EXPLANATIONS, OR THOUGHTS.**
**EXAMPLES OF FORBIDDEN OUTPUT:**
❌ "Let me query..." / "Now I'll check..."
❌ "I can see..." / "Based on the data..."
❌ Any explanation of what you're about to do
**YOUR FIRST LINE MUST BE:** `### Cluster Status Table:`
**IF YOU WRITE EVEN ONE WORD BEFORE THAT HEADING, YOU HAVE COMPLETELY FAILED.**
The strong language (“COMPLETELY FAILED”) and specific negative examples were necessary. Subtle hints didn’t work. We had to be blunt.
We also discovered that showing what not to do was more effective than just stating rules. Listing actual phrases to avoid (“Let me query…”, “Based on the data…”) dramatically reduced unwanted commentary.
Challenge 2: Datadog Metrics Don’t Support OR Syntax
This caught us off-guard. We wanted to query metrics for specific clusters:
sum:kubernetes.cpu.usage.total{cluster:(cluster-a OR cluster-b OR cluster-c)}
But Datadog’s metrics API returns a 400 error: “Error parsing query”. Turns out, OR syntax works for logs but not metrics.
The fix:
sum:kubernetes.cpu.usage.total{*} by {kube_cluster_name}
Query all clusters, then filter in post-processing. Not elegant, but it works. We documented this quirk explicitly in our shared parameters file so every agent prompt knows about it:
**CRITICAL**: Datadog metric queries DO NOT support OR syntax in tags.
**❌ WRONG** (will cause 400 error):
sum:metric{cluster:(A OR B OR C)}
**✅ CORRECT**:
sum:metric{*} by {cluster}
Challenge 3: Token Limits and Data Volume
Querying metrics for all production clusters over 24 hours generates a lot of data. We hit token limits constantly in early iterations.
Our multi-layered solution:
1. Dynamic token scaling in prompts:
## Query Strategy
1. Start with max_tokens: 15000
2. If response truncated or URL-only:
- Retry with max_tokens: 30000
- If still truncated: 50000
3. If still failing: reduce time window or increase rollup
The “URL-only” case is interesting—when Datadog’s response would be huge, it sometimes returns a link to the metrics explorer instead of actual data. We trained our agents to detect this and retry with higher token limits.
2. Rollup intervals:
sum:kubernetes.cpu.usage.total{*} by {cluster}.rollup(sum, 60).as_rate()
The .rollup(sum, 60) aggregates data into 60-second buckets, dramatically reducing the response size while preserving trends.
3. Top-N filtering:
Every prompt includes instructions to return only the top 10 items by severity. No need to report every single pod across all production clusters. just the worst offenders.
Challenge 4: Output Format Consistency
LLMs occasionally wrapped tables in code blocks:
```
| Cluster | Status |
|---------|--------|
| cluster-a | Active |
```
This breaks Markdown rendering in our template. We added explicit format constraints:
**CRITICAL: Output tables as actual markdown tables, NOT inside code blocks (```).
Do NOT wrap your output in triple backticks.**
We also had to enforce:
- No line numbers (
1 |,2 |prefixes) - No extra indentation
- Use
-for empty fields (notN/Aornull) - Exact emoji usage (🔴 🟡 🟢 only)
Turns out, making LLMs produce deterministic structured output requires treating the prompt like a strict API specification.
What Fast-Agent Got Right
Looking back, Fast-Agent’s design choices aligned perfectly with our needs:
1. Lightweight Agents
Agents are just functions. No boilerplate, no mandatory base classes. You can start simple and add complexity only when needed:
@fast.agent(name="simple")
async def simple_agent(task: str) -> str:
return "Done"
This is a valid agent. As requirements grow, you add servers, instruction, max_tokens, etc., but the core pattern stays clean.
2. Configuration Over Code
The fastagent.config.yaml approach means we can switch MCP servers or LLM providers without touching application code. Want to test with GPT-4 instead of Claude? Change one line:
default_model: "gpt-4o"
Want to add a new MCP server for Slack notifications? Add it to the config:
mcp:
servers:
datadog: {...}
filesystem: {...}
slack:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-slack"]
Agents that specify servers=["slack"] now have access.
3. Context Handling
Fast-Agent automatically manages conversation history and MCP server connections. In our parallel execution model, each agent gets its own isolated context. We never worry about cross-contamination between sections.
4. Built-in Async
Everything is async by default. Parallel execution, MCP calls, LLM requests—it all just works. The framework handles connection pooling, retries, and timeouts transparently.
Patterns That Emerged
Building this system surfaced some useful patterns:
Section Metadata as Control Center
Instead of hardcoding which sections to run, we use a dictionary:
SECTION_METADATA = {
"incidents": incidents,
"infrastructure_overview": infrastructure_overview,
"k8s_pod_overview": k8s_pod_overview,
"k8s_resource_consumption": k8s_resource_consumption,
"active_monitor_alerts": active_monitor_alerts,
"recent_error_logs": recent_error_logs,
"performance_metrics": performance_metrics,
"service_mesh_metrics": service_mesh_metrics,
"serverless_health": serverless_health,
}
ACTIVE_SECTION_NAMES = list(SECTION_METADATA.keys())
Comment out a line to disable a section—no other code changes needed. This made iteration fast during development.
Executive Summary as Post-Processing
Rather than trying to generate the executive summary in parallel with data collection, we do it as a second pass:
# Collect all sections in parallel
sections = await _collect_section_outputs(agent)
# Generate executive summary using section outputs
summary_payload = {
"k8s_pod_overview_full": sections.get("k8s_pod_overview"),
"service_mesh_metrics_full": sections.get("service_mesh_metrics"),
"section_snippets": {
name: _truncate_text(sections.get(name, ""), max_chars=1200)
for name in ["active_monitor_alerts", "recent_error_logs", "serverless_health"]
}
}
summary_response = await agent.executive_summary.send(
Prompt.user(json.dumps(summary_payload))
)
The executive summary agent gets truncated versions of section outputs (to stay within token limits) and synthesizes key insights. This two-stage approach keeps prompts focused and improves reliability.
What We’d Do Differently
Use Fast-Agent’s Built-in Parallel Pattern
We currently use asyncio.gather() manually. Fast-Agent actually has a @fast.parallel decorator for fan-out/fan-in workflows:
@fast.parallel(
fan_out=["incidents", "k8s_pods", "monitors", "service_mesh", "serverless"],
fan_in="executive_summary",
name="generate_report"
)
async def generate_report(task: str) -> str:
pass
This would give us automatic progress tracking, better error handling, and cleaner code. We plan to migrate once we validate output consistency.
Add Evaluation Loop
Right now, if a section generates malformed output, we don’t catch it until template assembly fails. An evaluator agent could validate each section’s output before proceeding:
@fast.evaluator_optimizer(
generator="k8s_pod_overview",
evaluator="format_validator",
min_rating="GOOD"
)
Fast-Agent supports this pattern natively—we just haven’t implemented it yet.
Explore Router Pattern
We generate the same report structure every day. It would be more efficient to route based on urgency:
@fast.router(
name="smart_report",
agents=["full_report", "critical_only", "summary_only"]
)
On quiet days, generate a summary. When there’s an active incident, generate the full detailed report. The router agent decides based on context.
Practical Takeaways
If you’re building something similar, here’s what we’d recommend:
1. Start With Clear Constraints
LLMs need boundaries. Don’t assume they’ll “figure it out.” Write prompts like API specs:
- Exact output format (with examples)
- Forbidden patterns (with specific phrases to avoid)
- Fallback behaviors (if query fails, do X)
- Token budgets and limits
The more explicit, the better.
2. Separate Data Fetching from Synthesis
Don’t ask one agent to “fetch metrics and generate insights.” Split it:
- Agent 1: Fetch and format metrics (deterministic)
- Agent 2: Synthesize insights from formatted data (creative)
This makes debugging easier and improves reliability.
3. Embrace Parallelization
MCP calls are I/O-bound. Running agents in parallel is nearly free and dramatically improves latency. Fast-Agent makes this trivial.
4. Test Prompts in Isolation
We built a prompt_test.py script to test individual prompts without running the full pipeline:
python prompt_test.py --prompt prompts/10-serverless-health.md --max-tokens 30000
This was invaluable for iteration. Prompt changes are cached at startup, so testing in the full system required a restart each time. Isolated testing gave us instant feedback.
5. Document API Quirks in Prompts
Every API has gotchas (like Datadog’s OR syntax limitation). Don’t document them separately—put them directly in the prompts:
**CRITICAL**: Datadog metrics DO NOT support OR syntax.
**WRONG**: {cluster:(A OR B)}
**CORRECT**: {*} by {cluster}
This ensures agents always have the right context.
7. Version Your Prompts
Prompts are code. We track them in Git, and changes go through code review. This prevented subtle regressions and made debugging easier (“which version of the prompt generated this output?”).
The Results
Our agent now generates a comprehensive daily report covering:
- Active incidents and recent alert activity
- Infrastructure status across all monitored clusters
- Pod error states (CrashLoopBackOff, OOMKilled, etc.)
- Resource utilization trends
- Recent error log patterns
- Service mesh health metrics
- Serverless function health
- Executive summary with critical action items
The report runs unattended every morning and takes ~45 seconds. It’s consistent, comprehensive, and actionable—the on-call engineer gets a clear picture without manual data gathering.
More importantly, it’s maintainable. Adding a new section is just:
- Write a prompt file
- Add an agent definition with the
@shared_agentdecorator - Register it in
SECTION_METADATA
No plumbing changes, no framework wrestling.
Why Fast-Agent Worked For Us
Reflecting on this project, Fast-Agent succeeded because it got out of our way. We didn’t fight the framework—it aligned with how we naturally thought about the problem:
- Agents as functions: Simple mental model, easy to reason about
- MCP as the integration layer: Standard protocol, broad ecosystem
- Configuration over code: Swap servers/models without refactoring
- Async by default: Parallel execution “just works”
The framework had opinions (decorator-based agents, YAML config, Prompt objects), but they were lightweight opinions. We could still structure our code however made sense for our use case.
What’s Next
We’re exploring several extensions:
1. Slack Integration: Post reports automatically via an MCP Slack server
2. Anomaly Detection: Add an agent that compares today’s metrics to historical trends and flags unusual patterns
3. Interactive Mode: Use Fast-Agent’s built-in interactive prompt for ad-hoc queries during incidents
4. Multi-Report Support: Extend beyond daily reports—weekly summaries, incident retrospectives, capacity planning reports
The architecture we built is flexible enough to support all of these without major changes. That’s the beauty of the modular agent approach.
Closing Thoughts
Building agents that work reliably in production is harder than it looks. The demo-to-production gap is real. LLMs are powerful but nondeterministic, APIs have quirks, and integrating multiple systems introduces failure modes you don’t see in tutorials.
But when you pair the right abstractions (Fast-Agent) with the right protocol (MCP) and invest in good prompt engineering, you end up with something that actually works. Not a proof-of-concept, not a demo—a tool your team uses every day.
If you’re building something similar, I hope our experience helps you avoid some of the pitfalls we encountered. And if you’re evaluating agent frameworks, give Fast-Agent a look. It might not be the flashiest option, but for production systems that need to work, it’s been solid.
The code for our on-call report agent is internal, but the patterns and techniques are broadly applicable. Fast-Agent is open source and well-documented at fast-agent.ai, and Datadog’s MCP server is publicly available. If you’re working on similar problems, feel free to reach out—I’m always curious to hear about other approaches.