Coach Guide · Extra — MAF + Hosted Long-Running Agents
Coach-only. The most prereq-heavy Extra: it needs both Deploy-as-a-Hosted-Agent and Extra C (Magentic). Don’t let a team start it cold — they’ll be assembling deploy plumbing and MAF at once. Best run as a capstone for a strong team.
What this challenge is really teaching
Async, durable agent work. Everything so far has been request/response in a live process. This Extra introduces background=True: submit → get a handle → work continues → poll later. The keeper insight is that a long-running agent decouples the caller’s session from the work, and observability (App Insights) is what makes async work trustworthy — you can prove what happened after the fact.
Infra to pre-provision
All already created by azd up (Foundations/Deploy), but verify:
- ACR exists and the team can push (or use ACR cloud build — no local Docker needed).
- Hosted-agent endpoints enabled on the project (
azd ai agentworks — same as Deploy challenge). - App Insights wired:
APPLICATIONINSIGHTS_CONNECTION_STRINGpresent so background runs trace. - The Action Tools backend reachable from the hosted environment —
localhostwon’t resolve from a container. They must pointACTION_MCP_URLat a tunneled/Container-Apps URL, not localhost.
Flag for the coordinator: the localhost→container gap is the #1 deploy-time surprise. If the Action tool is local, the hosted workflow can’t reach it — plan a dev tunnel or deploy the backend too.
Search-Before-Implement
Two preview surfaces here: azd ai agent / agent.yaml (via foundry-hosted-agents) and the background-run API (background=True, submit/poll). Both move. Send teams to microsoft-docs / foundry-mcp for current signatures rather than guessing.
Per-step facilitation
Step 1 — containerize the workflow
-
This is the Deploy challenge applied to the workflow instead of the single agent. If they did Deploy, it’s mostly reuse. Pitfall: forgetting to include
agent-frameworkin the containerrequirements.txt→ the workflow won’t start in the image. -
Pitfall:
ACTION_MCP_URLstill pointing at localhost → Action specialist fails remotely. Fix the URL.
Step 2 — background run
-
The teaching beat: the submit call must return immediately. If they’re blocking on completion, they haven’t actually used the background path — they’ve just deployed a slow synchronous agent.
-
A good batch task: loop the Action sub-agent over a list of enrollment requests. Keep the list small for the demo (3–5) so it completes within the session but is visibly “a batch.”
Step 3 — poll + trace
-
The “close the tab, come back” demo: have them submit in one terminal, kill it, then poll from a fresh process with only the run id. Retrieving the result proves durability.
-
App Insights closes the loop: the background run’s span tree (manager → specialists → actions) is the evidence. Reuse the Tracing challenge’s KQL muscle — list spans by duration.
Why no validate.py
The deliverables are a deployed endpoint, an async run that outlives a process, and App Insights traces — all portal/live state, not statically checkable. Verify via: hosted agent in the project with run history; an immediate-return submit; a fresh-process poll retrieving the result; spans in App Insights.
Common failure modes
| Symptom | Cause | Fix |
|---|---|---|
| Workflow won’t start in container | agent-framework missing from image | add to container requirements.txt |
| Action specialist fails when hosted | ACTION_MCP_URL = localhost | point at tunneled/Container-Apps URL |
| Submit blocks until done | not using background path | use background=True submit + poll |
| No spans for background run | App Insights conn string unset in container | pass APPLICATIONINSIGHTS_CONNECTION_STRING |
| Can’t retrieve result later | relying on in-memory state | poll the run handle, not local vars |