Advanced β Tracing & Observability
Tier 2 Β· Advanced β modular. You can attempt this in any order with the other Advanced challenges. Prerequisite: the Foundations end-state (a deployed, grounded Northfield IQ Assistant). Complete Foundations, or run the bootstrap skip-path:
azd up && ./scripts/setup-foundations.sh && python scripts/validate-foundations.py.
Why this challenge
Your Northfield IQ Assistant answers grounded questions today β but when a student gets a slow, wrong, or uncited answer, can you explain why? Right now the agent is a black box: a model call, a knowledge-base retrieval, and (if you did Action Tools) a tool call all happen inside one request, and you can see none of it.
In this challenge you make every answer observable end to end. You enable OpenTelemetry (OTel) GenAI tracing, export the spans to Application Insights, and then read the same data two ways β the Foundry portal Tracing tab and a KQL query in App Insights. By the end you can take a single student question and reconstruct its entire journey: model β retrieval β tool, with token counts, latency per span, and the inputs/outputs at each hop.
This is the observability layer that the Evaluation and Deploy challenges both build on: evals become trustworthy when you can trace the row that failed, and a hosted agent is only production-ready when you can watch it run.
student question
|
v
one request (operation_Id)
|
+--> agent span
+--> model span (gen_ai.usage.* tokens)
+--> retrieval span (knowledge-base query)
+--> tool span (if Action Tools attached)
Foundry portal (Tracing tab) <-- App Insights (OTel exporter) --> KQL
(dependencies / traces / requests)
What you will need
- The Foundations
.env(or bootstrap.env) with at least:AZURE_AI_PROJECT_ENDPOINTβ your Foundry project endpointAZURE_AI_MODEL_DEPLOYMENT_NAMEβ the chat model deploymentAZURE_FOUNDRY_AGENT_NAMEβ the Northfield IQ Assistant agent name (e.g.northfield-iq-assistant)
-
An Application Insights resource linked to your Foundry project. Foundations provisions one; its connection string lands in
.envasAPPLICATIONINSIGHTS_CONNECTION_STRING. If that variable is missing, see Step 1 β you will fetch it from the project. - Packages from
requirements.txt(already installed in the devcontainer):azure-ai-projects,azure-monitor-opentelemetry,azure-core-tracing-opentelemetry.
π‘ Tracing data takes 1β3 minutes to land in App Insights after a run. Budget for that lag before you decide something is broken.
Step 1 β Enable GenAI instrumentation
Goal: Your agent process emits OpenTelemetry GenAI spans and ships them to Application Insights.
Tasks:
-
Confirm your project has an App Insights connection. If
APPLICATIONINSIGHTS_CONNECTION_STRINGis not already in.env, fetch it from the project and add it (the portal shows it under Monitoring β Application analytics, or read it via the SDK as shown below). -
Create
challenges/advanced-tracing-observability/trace_setup.pythat wires tracing in the exact order below. The twoos.environ[...]lines MUST run before anyazure.ai.*import β this is the single most common mistake in this challenge (see the gotcha box). -
Run the file. It should print that instrumentation is enabled and the App Insights connection was resolved, without emitting spans yet.
# trace_setup.py
import os
# ββ 1. Tracing flags MUST be set BEFORE importing the Azure AI SDK. ββββββββββββββ
# The instrumentation reads these at import time. Set them first or message
# content (prompts/answers) will silently never appear on your spans.
os.environ["AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING"] = "true"
os.environ["OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT"] = "true"
# ββ 2. NOW import the SDK. βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from azure.monitor.opentelemetry import configure_azure_monitor
def enable_tracing() -> AIProjectClient:
project = AIProjectClient(
endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"],
credential=DefaultAzureCredential(),
)
# Resolve the App Insights connection string (env first, then ask the project).
conn = os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING")
if not conn:
conn = project.telemetry.get_application_insights_connection_string()
os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"] = conn
# Route all OpenTelemetry spans to Application Insights.
configure_azure_monitor(connection_string=conn)
# Turn on GenAI semantic-convention spans for model / retrieval / tool calls.
from azure.ai.projects.telemetry import AIProjectInstrumentor
AIProjectInstrumentor().instrument()
print("β
GenAI tracing enabled; spans will export to Application Insights.")
return project
if __name__ == "__main__":
enable_tracing()
β οΈ The βset env before importβ gotcha. If you put the two
os.environ[...]assignments afterfrom azure.ai... import ..., the instrumentation has already initialized and will ignore them. Symptom: spans appear but every prompt/response field is empty, or no GenAI spans show up at all. Keep the env lines at the very top of the file, above all Azure imports.
Success Criteria:
APPLICATIONINSIGHTS_CONNECTION_STRINGis present in.env(or resolved at runtime).- Running
trace_setup.pyprintsβ GenAI tracing enabledwith no import error. - The two tracing env flags are set above every
azure.ai.*import in the file.
Checkpoint:
python validate.py --step 1
# expected: "β
Step 1 PASS β instrumentation wired, App Insights connection resolved"
Coach note: see solution.md.
Step 2 β Run the agent and emit spans
Goal: A real student question flows through the agent and produces a trace (model + retrieval, plus a tool span if Action Tools is attached).
Tasks:
-
Create
challenges/advanced-tracing-observability/traced_run.py. Import and callenable_tracing()from Step 1 first, then ask the Northfield IQ Assistant a grounded question that forces a knowledge-base lookup β e.g. βWhat documents do I need for financial aid?β -
Drive the agent through the Responses API against your
AZURE_FOUNDRY_AGENT_NAME. Print the answer and the trace/operation id so you can find the run later. -
Run it. Then wait 1β3 minutes for the spans to land in App Insights.
# traced_run.py
import os
from trace_setup import enable_tracing # importing this runs the env-first setup
project = enable_tracing()
client = project.get_openai_client()
QUESTION = "What documents do I need to apply for financial aid at Northfield?"
response = client.responses.create(
input=QUESTION,
extra_body={
"agent_reference": {
"name": os.environ["AZURE_FOUNDRY_AGENT_NAME"],
"type": "agent_reference",
}
},
)
print("Q:", QUESTION)
print("A:", response.output_text)
print("response id:", response.id) # use this to locate the trace
Success Criteria:
- The script prints a grounded answer (the financial-aid answer cites the FAQ corpus).
- The run completes without a tracing/auth error.
- Within ~3 minutes, at least one new GenAI span is visible in App Insights (you confirm this in Step 3).
Checkpoint:
python challenges/advanced-tracing-observability/traced_run.py
python validate.py --step 2
# expected: "β
Step 2 PASS β agent run emitted >=1 GenAI span to App Insights"
Coach note: see solution.md.
Step 3 β Inspect the spans (portal Tracing tab)
Goal: You can read a single run as a span tree in the Foundry Tracing tab and identify the model, retrieval, and (if present) tool spans.
Tasks:
-
In the Foundry portal, open your project β Tracing (under Observability / Monitoring). Find the trace for the run you just made (sort by most recent; match the timestamp).
- Expand the trace into its span tree. Identify and note:
- the model span β look for the
gen_ai.usage.*token attributes, - the retrieval span β the knowledge-base / AI Search query and the documents returned,
- the tool span β only present if you completed the Action Tools challenge.
- the model span β look for the
- Open the model span and record: input tokens, output tokens, total tokens, and span duration (latency). Note where the prompt and the generated answer appear as attributes (this only works because you set
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=truein Step 1).
Success Criteria:
- The Tracing tab shows your run as a parent span with at least one child span.
- You can name which child is the model call and which is the retrieval call.
- You can read token counts and duration off the model span.
Checkpoint: Portal state β the Tracing tab shows the runβs span tree with an expandable model span exposing gen_ai.usage.total_tokens. Capture the traceβs operation id for Step 4.
python validate.py --step 3
# expected: "β
Step 3 PASS β span tree present with model + retrieval spans"
Coach note: see solution.md.
Step 4 β Correlate one question end-to-end with KQL
Goal: Read the same run as data β write a KQL query that pulls every span for one student question and surfaces token, latency, and cost signals.
Tasks:
-
In App Insights β Logs, start from this starter query to list recent GenAI spans. Run it, then copy the
operation_Idof your financial-aid run:// Starter: recent GenAI spans across model / retrieval / tool tiers dependencies | where timestamp > ago(1h) | where customDimensions has "gen_ai" or name has_any ("chat", "responses", "retrieval", "tool", "agent") | project timestamp, operation_Id, name, duration_ms = duration, total_tokens = toint(customDimensions["gen_ai.usage.total_tokens"]) | order by timestamp desc -
Pivot to a single end-to-end correlation β replace the placeholder with your
operation_Idto reconstruct the whole request as an ordered timeline (model + retrieval + tool):// End-to-end: one student question, every span, in order let opId = "<paste-your-operation_Id>"; union dependencies, requests, traces | where operation_Id == opId | project timestamp, itemType, span = name, duration_ms = duration, input_tokens = toint(customDimensions["gen_ai.usage.input_tokens"]), output_tokens = toint(customDimensions["gen_ai.usage.output_tokens"]), total_tokens = toint(customDimensions["gen_ai.usage.total_tokens"]) | order by timestamp asc -
Extend the correlation query to surface token, latency, and cost in one summary row. Compute cost from total tokens using your modelβs per-1K-token rate (pick the rate for your deployed model; the exact number is not whatβs graded β the calculation is):
let opId = "<paste-your-operation_Id>"; let price_per_1k = 0.005; // <-- your model's $ / 1K tokens dependencies | where operation_Id == opId | summarize total_tokens = sum(toint(customDimensions["gen_ai.usage.total_tokens"])), total_latency_ms = sum(duration), span_count = count() | extend est_cost_usd = round(total_tokens / 1000.0 * price_per_1k, 6)
Success Criteria:
- The starter query returns β₯1 row for your run.
-
The correlation query returns all spans for one
operation_Idin timestamp order (you can see model and retrieval, plus tool if attached). - The summary query outputs
total_tokens,total_latency_ms, and anest_cost_usdfor the run. - You save your final correlation query to
challenges/advanced-tracing-observability/correlate.kql.
Checkpoint:
python validate.py --step 4
# expected: "β
Step 4 PASS β correlate.kql present and returns end-to-end spans for one run"
Coach note: see solution.md.
Done β what you can now do
- Every Northfield IQ answer is observable end to end, two ways: portal Tracing tab and KQL.
- You can take one student question and account for its model, retrieval, and tool spans, with tokens, latency, and an estimated cost.
This unlocks: Evaluation & Red Teaming (trace the exact row that failed an eval) and Deploy as a Hosted Agent (the same tracing follows the agent to its live endpoint).
Stretch goals
- Pin your three queries into an Azure Workbook so the team has a live observability dashboard.
- Add a
gen_ai.response.modelbreakdown to compare token/latency profiles across models. - Wire an App Insights alert that fires when p95 latency or total tokens per run crosses a threshold.