Monitoring Flagger Canary Deployments on Kubernetes: Grafana Dashboard + AlertManager Slack Notification Guide

At 2 AM, the canary deployment was automatically rolled back. No one on the team knew. It wasn't until we arrived at work at 9 AM that we dug through Flagger event logs to trace the cause of the rollback — during those seven hours, some of the traffic had been returning incorrect responses. Flagger automates the canary analysis loop, but without a pipeline to deliver the results immediately to the team, it is only a half-baked tool.

This article covers how to build a structure that enables on-call personnel to detect and respond to canary analysis failures within 30 seconds by directly connecting the observation stack from Flagger → Prometheus → Grafana/AlertManager → Slack. While the target audience is teams already operating Flagger, prerequisites are specified first so that teams configuring the entire stack for the first time can also follow along.

Prerequisites: A Kubernetes cluster, Helm 3, Prometheus, Grafana, AlertManager, Flagger, and Istio (or NGINX/Traefik ingress) must be installed. To install Prometheus + Grafana + AlertManager at once, refer to kube-prometheus-stack Helm chart, and for Istio, refer to Official Installation Guide.

Key Concepts

Overall Data Flow — Start with the Big Picture

Flagger ──(메트릭 기록)──▶ Prometheus
                               │
              ┌────────────────┴─────────────────┐
              ▼                                   ▼
         Grafana                          AlertManager
    (실시간 대시보드)                    (알림 그루핑·라우팅)
                                               │
                                               ▼
                                     Slack (#deployments-alert)
                                    (30초 이내 팀 전체 인지)

You must keep this flow in mind before diving into the detailed settings. The roles of Grafana for visualization and AlertManager for notification policies are clearly separated.

Flagger's Canary Analysis Loop

Flagger splits the Deployment into two services: Primary (stable version) and Canary (new version). It evaluates Prometheus metrics at configured intervals to determine whether to increment traffic weights or roll back. Istio's VirtualService converts these weights into actual traffic splitting — the structure is such that when Flagger updates the weight field of VirtualService at intervals, the Istio sidecar reflects this to change the actual packet routing.

Primary (90%) ──▶ 실 사용자 트래픽
Canary  (10%) ──▶ 점진적으로 증가 (stepWeight: 10)
               └▶ 분석 실패 시 0%로 즉시 롤백

There are two key metrics that Flagger exposes to Prometheus.

Metrics	Meaning
`flagger_canary_status`	Canary status code. 0=Reset, 1=In Progress, 2=Success (Promotion), 3=Failure (Rollback)
`flagger_canary_weight`	Current traffic flow to Canary (%)

Important: The flagger_canary_status value may vary depending on the Flagger version. The above value is based on the official source code, and it is recommended to check the actual metric value directly with kubectl exec before deployment and write the Alert Rule.

For Prometheus to collect this metric, a ServiceMonitor CRD for the Flagger Pod is required. It is automatically generated as --set serviceMonitor.enabled=true when installing the Flagger Helm chart, or applied manually as shown below.

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: flagger
  namespace: flagger-system
  labels:
    release: kube-prometheus-stack   # Prometheus Operator의 selector에 맞게 조정
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: flagger
  namespaceSelector:
    matchNames:
      - flagger-system
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Without this ServiceMonitor, Flagger does not appear in the Prometheus target list, so the flagger_canary_status metric itself cannot be viewed. If there is no data on the Grafana dashboard, this is the first point to check.

Canary Deployment: A deployment strategy that exposes a new version to a small number of users first and gradually increases traffic only when metrics such as error rates and response times remain within thresholds. Derived from the "canary in a mine" (early warning for detecting danger).

AlertManager: The notification routing hub of the Prometheus ecosystem. It is responsible for grouping recurring alerts, silencing alerts to turn them off during specific time periods, and routing alerts to different channels based on conditions.

Practical Application

Example 1: Visualizing Canary Status with Grafana Dashboard

Step 1 — Install Grafana for Flagger

helm upgrade -i flagger-grafana flagger/grafana \
  --create-namespace \
  --namespace=flagger-system \
  --set url=http://prometheus:9090

Step 2 — Import Official Dashboard (ID: 15158)

Grafana UI → Dashboards → Import → Enter ID 15158 → Select Prometheus data source.

If there is no data: Check in order whether the Prometheus data source URL is correct (check the service name within the namespace, such as http://prometheus-operated:9090) and whether ServiceMonitor is applied.

Step 3 — Core PromQL Queries

python

# 카나리 현재 상태 (3 = 실패/롤백)
flagger_canary_status{name="my-app", namespace="default"}
 
# 현재 트래픽 가중치 (%)
flagger_canary_weight{name="my-app", namespace="default"}
 
# 요청 성공률 (5xx 제외)
sum(rate(http_requests_total{status!~"5.."}[5m]))
  / sum(rate(http_requests_total[5m])) * 100
 
# 카나리 분석 경과 시간 (초)
flagger_canary_duration_seconds{name="my-app"}

Example 2: Prometheus Alert Rule — Rollback Detection

Apply the YAML below as the PrometheusRule CRD, or add it to additionalPrometheusRulesMap in Helm values.

yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: flagger-alerts
  namespace: flagger-system
  labels:
    release: kube-prometheus-stack   # Prometheus Operator selector에 맞게 조정
spec:
  groups:
    - name: flagger.rules
      rules:
        - alert: CanaryRollback
          expr: flagger_canary_status == 3
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "카나리 롤백 발생: {{ $labels.name }}"
            description: >
              네임스페이스 {{ $labels.namespace }}의
              {{ $labels.name }} 카나리 분석 실패로 롤백됨
 
        - alert: CanaryProgressing
          expr: flagger_canary_status == 1
          for: 1m
          labels:
            severity: info
          annotations:
            summary: "카나리 배포 진행 중: {{ $labels.name }}"
 
        - alert: CanaryWeightHigh
          expr: flagger_canary_weight > 50
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: "카나리 트래픽 50% 초과: {{ $labels.name }} ({{ $value }}%)"

Apply kubectl apply -f flagger-alerts.yaml. If using kube-prometheus-stack, the Prometheus Operator automatically detects PrometheusRule.

Example 3: AlertManager → Slack Routing (Recommended)

Issuing Slack Incoming Webhook URL: Slack Workspace → Apps → Incoming WebHooks → Add → Select Channel → Copy Webhook URL.

There are two ways to apply AlertManager settings to Kubernetes.

Method A — Apply with kube-prometheus-stack Helm values (Recommended)

yaml

# values.yaml
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 10s
      group_interval: 10m
      repeat_interval: 1h
      receiver: 'slack-default'
      routes:
        - match:
            alertname: CanaryRollback
            severity: critical
          receiver: 'slack-canary-rollback'
          continue: false
    receivers:
      - name: 'slack-canary-rollback'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/T.../B.../XXXX'
            channel: '#deployments-alert'
            send_resolved: true
            title: ':rotating_light: 카나리 롤백 발생'
            text: |
              *앱:* {{ .GroupLabels.name }}
              *네임스페이스:* {{ .GroupLabels.namespace }}
              *상태:* {{ .CommonAnnotations.description }}
            color: 'danger'
      - name: 'slack-default'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/T.../B.../XXXX'
            channel: '#deployments'
            send_resolved: true
            title: '{{ .GroupLabels.alertname }}'
            text: '{{ .CommonAnnotations.summary }}'

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml

Method B — Directly Apply ConfigMap

kubectl create secret generic alertmanager-kube-prometheus-stack-alertmanager \
  --from-file=alertmanager.yaml=./alertmanager.yaml \
  --namespace monitoring \
  --dry-run=client -o yaml | kubectl apply -f -

For security reasons, it is correct to specify api_url separately for each receiver. Since global.slack_api_url exposes the same Webhook URL to all receivers, it becomes difficult to manage when using different Webhooks for each channel.

Example 4: Directly Integrating Flagger with Slack Without AlertManager (Simple Alternative)

This example is an alternative to Example 3. It is used for small teams that do not operate AlertManager or when only simple notifications are needed. If you set up Example 3 and Example 4 simultaneously, two Slack messages will be sent at once, so you must choose only one.

Step 1 — Create AlertProvider CRD

Save the Slack Webhook URL as a Secret, then create AlertProvider.

kubectl create secret generic slack-webhook-secret \
  --from-literal=address='https://hooks.slack.com/services/T.../B.../XXXX' \
  --namespace flagger-system

yaml

apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Provider
metadata:
  name: slack-provider
  namespace: flagger-system
spec:
  type: slack
  channel: deployments
  secretRef:
    name: slack-webhook-secret

Step 2 — Connect Notifications to Canary CRD

yaml

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 30s
  alerts:
    - name: "rollback-alert"
      severity: error
      providerRef:
        name: slack-provider
        namespace: flagger-system

If set to severity: error, notifications are triggered only for failure events. If changed to info, all deployment start, completion, and rollback events are received.

Pros and Cons Analysis

Advantages

Item	Content
Automatic Rollback	Minimize downtime with unattended rollback when metric thresholds are exceeded
Gradual Risk Exposure	Validate with a small number of users first using `stepWeight`, then scale
Multiple Metric Providers	Integrate Prometheus, Datadog, and CloudWatch via `MetricTemplate` CRD
GitOps Friendly	Native Integration with Flux/Argo CD
Multi-channel Notifications	Supports Slack, Teams, Discord, and Rocket.Chat

Disadvantages and Precautions

Item	Content	Response Plan
Prometheus Dependency	Analysis loop cannot operate without Prometheus	Batch install kube-prometheus-stack using Helm chart
Minimum Traffic Requirement	Unreliable metric if traffic is too low	Secure artificial traffic within the analysis period with `flagger-loadtester`
Short-term Data Retention	Flagger-exclusive Prometheus default retention period 2 hours	Long-term retention extended to Thanos·Cortex
Risk of Duplicate Notifications	Sending duplicate notifications when Example 3 + Example 4 are set simultaneously	Use only one of them or suppress with `inhibit_rules`
Service mesh required	Traffic splitting requires Istio, Linked, or Ingress	Can be configured without a mesh using NGINX or Traefik
RBAC Permissions	ServiceAccount permission error occurs depending on the cluster	Check `rbac.create: true` in Flagger Helm chart

The Most Common Mistakes in Practice

Create Alert Rule without verifying flagger_canary_status value — Status codes vary depending on the Flagger version. After deployment, you must directly access the Flagger metric endpoint via kubectl port-forward to verify the actual value before applying it to the Alert Rule. If you write it with an incorrect value, no notification will be received even if a rollback occurs.
Example 3 + Example 4 Dual Configuration — Rollback: Two Slack messages arrive at once. You must suppress them with inhibit_rules or use only one of them.
Passing Canary Analysis with No Traffic — If there is no traffic during a dawn deployment, the success rate appears as 100% and the analysis passes. You must secure artificial traffic within the analysis period using flagger-loadtester or connect a load test to the analysis loop using webhooks configuration.

In Conclusion

Flagger Canary Analysis is truly complete when an on-call manager receives a 4 AM deployment failure via Slack within 30 seconds and can track the cause by viewing traffic weights and success rate graphs on a Grafana dashboard.

3 Steps to Start Right Now:

Install Grafana using helm upgrade -i flagger-grafana flagger/grafana --create-namespace --namespace=flagger-system --set url=http://prometheus:9090 and import dashboard ID 15158. If there is no data, first check the Prometheus URL and whether ServiceMonitor is applied.
Apply CanaryRollback and PrometheusRule of the flagger_canary_status == 3 condition to the cluster, and set it to for: 0m so that a notification occurs immediately upon rollback. Verify the actual metric values directly before applying.
Add slack-canary-rollback receiver to AlertManager values, connect routing to #deployments-alert channel, and then reflect it in helm upgrade.

Next Post: Flagger MetricTemplate How to integrate external APM metrics such as Datadog and New Relic based on canary analysis using CRD

Reference Materials

Monitoring Flagger Canary Deployments on Kubernetes: Grafana Dashboard + AlertManager Slack Notification Guide | DEV BAK - 기술블로그

Monitoring Flagger Canary Deployments on Kubernetes: Grafana Dashboard + AlertManager Slack Notification Guide

Key Concepts

Overall Data Flow — Start with the Big Picture

Flagger ──(메트릭 기록)──▶ Prometheus
                               │
              ┌────────────────┴─────────────────┐
              ▼                                   ▼
         Grafana                          AlertManager
    (실시간 대시보드)                    (알림 그루핑·라우팅)
                                               │
                                               ▼
                                     Slack (#deployments-alert)
                                    (30초 이내 팀 전체 인지)

You must keep this flow in mind before diving into the detailed settings. The roles of Grafana for visualization and AlertManager for notification policies are clearly separated.

Flagger's Canary Analysis Loop

Primary (90%) ──▶ 실 사용자 트래픽
Canary  (10%) ──▶ 점진적으로 증가 (stepWeight: 10)
               └▶ 분석 실패 시 0%로 즉시 롤백

There are two key metrics that Flagger exposes to Prometheus.

Metrics	Meaning
`flagger_canary_status`	Canary status code. 0=Reset, 1=In Progress, 2=Success (Promotion), 3=Failure (Rollback)
`flagger_canary_weight`	Current traffic flow to Canary (%)

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: flagger
  namespace: flagger-system
  labels:
    release: kube-prometheus-stack   # Prometheus Operator의 selector에 맞게 조정
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: flagger
  namespaceSelector:
    matchNames:
      - flagger-system
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Practical Application

Example 1: Visualizing Canary Status with Grafana Dashboard

Step 1 — Install Grafana for Flagger

helm upgrade -i flagger-grafana flagger/grafana \
  --create-namespace \
  --namespace=flagger-system \
  --set url=http://prometheus:9090

Step 2 — Import Official Dashboard (ID: 15158)

Grafana UI → Dashboards → Import → Enter ID 15158 → Select Prometheus data source.

Step 3 — Core PromQL Queries

python

# 카나리 현재 상태 (3 = 실패/롤백)
flagger_canary_status{name="my-app", namespace="default"}
 
# 현재 트래픽 가중치 (%)
flagger_canary_weight{name="my-app", namespace="default"}
 
# 요청 성공률 (5xx 제외)
sum(rate(http_requests_total{status!~"5.."}[5m]))
  / sum(rate(http_requests_total[5m])) * 100
 
# 카나리 분석 경과 시간 (초)
flagger_canary_duration_seconds{name="my-app"}

Example 2: Prometheus Alert Rule — Rollback Detection

Apply the YAML below as the PrometheusRule CRD, or add it to additionalPrometheusRulesMap in Helm values.

yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: flagger-alerts
  namespace: flagger-system
  labels:
    release: kube-prometheus-stack   # Prometheus Operator selector에 맞게 조정
spec:
  groups:
    - name: flagger.rules
      rules:
        - alert: CanaryRollback
          expr: flagger_canary_status == 3
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "카나리 롤백 발생: {{ $labels.name }}"
            description: >
              네임스페이스 {{ $labels.namespace }}의
              {{ $labels.name }} 카나리 분석 실패로 롤백됨
 
        - alert: CanaryProgressing
          expr: flagger_canary_status == 1
          for: 1m
          labels:
            severity: info
          annotations:
            summary: "카나리 배포 진행 중: {{ $labels.name }}"
 
        - alert: CanaryWeightHigh
          expr: flagger_canary_weight > 50
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: "카나리 트래픽 50% 초과: {{ $labels.name }} ({{ $value }}%)"

Apply kubectl apply -f flagger-alerts.yaml. If using kube-prometheus-stack, the Prometheus Operator automatically detects PrometheusRule.

Example 3: AlertManager → Slack Routing (Recommended)

Issuing Slack Incoming Webhook URL: Slack Workspace → Apps → Incoming WebHooks → Add → Select Channel → Copy Webhook URL.

There are two ways to apply AlertManager settings to Kubernetes.

Method A — Apply with kube-prometheus-stack Helm values (Recommended)

yaml

# values.yaml
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 10s
      group_interval: 10m
      repeat_interval: 1h
      receiver: 'slack-default'
      routes:
        - match:
            alertname: CanaryRollback
            severity: critical
          receiver: 'slack-canary-rollback'
          continue: false
    receivers:
      - name: 'slack-canary-rollback'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/T.../B.../XXXX'
            channel: '#deployments-alert'
            send_resolved: true
            title: ':rotating_light: 카나리 롤백 발생'
            text: |
              *앱:* {{ .GroupLabels.name }}
              *네임스페이스:* {{ .GroupLabels.namespace }}
              *상태:* {{ .CommonAnnotations.description }}
            color: 'danger'
      - name: 'slack-default'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/T.../B.../XXXX'
            channel: '#deployments'
            send_resolved: true
            title: '{{ .GroupLabels.alertname }}'
            text: '{{ .CommonAnnotations.summary }}'

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml

Method B — Directly Apply ConfigMap

kubectl create secret generic alertmanager-kube-prometheus-stack-alertmanager \
  --from-file=alertmanager.yaml=./alertmanager.yaml \
  --namespace monitoring \
  --dry-run=client -o yaml | kubectl apply -f -

Example 4: Directly Integrating Flagger with Slack Without AlertManager (Simple Alternative)

Step 1 — Create AlertProvider CRD

Save the Slack Webhook URL as a Secret, then create AlertProvider.

kubectl create secret generic slack-webhook-secret \
  --from-literal=address='https://hooks.slack.com/services/T.../B.../XXXX' \
  --namespace flagger-system

yaml

apiVersion: notification.toolkit.fluxcd.io/v1beta2
kind: Provider
metadata:
  name: slack-provider
  namespace: flagger-system
spec:
  type: slack
  channel: deployments
  secretRef:
    name: slack-webhook-secret

Step 2 — Connect Notifications to Canary CRD

yaml

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 30s
  alerts:
    - name: "rollback-alert"
      severity: error
      providerRef:
        name: slack-provider
        namespace: flagger-system

If set to severity: error, notifications are triggered only for failure events. If changed to info, all deployment start, completion, and rollback events are received.

Pros and Cons Analysis

Advantages

Item	Content
Automatic Rollback	Minimize downtime with unattended rollback when metric thresholds are exceeded
Gradual Risk Exposure	Validate with a small number of users first using `stepWeight`, then scale
Multiple Metric Providers	Integrate Prometheus, Datadog, and CloudWatch via `MetricTemplate` CRD
GitOps Friendly	Native Integration with Flux/Argo CD
Multi-channel Notifications	Supports Slack, Teams, Discord, and Rocket.Chat

Disadvantages and Precautions

Item	Content	Response Plan
Prometheus Dependency	Analysis loop cannot operate without Prometheus	Batch install kube-prometheus-stack using Helm chart
Minimum Traffic Requirement	Unreliable metric if traffic is too low	Secure artificial traffic within the analysis period with `flagger-loadtester`
Short-term Data Retention	Flagger-exclusive Prometheus default retention period 2 hours	Long-term retention extended to Thanos·Cortex
Risk of Duplicate Notifications	Sending duplicate notifications when Example 3 + Example 4 are set simultaneously	Use only one of them or suppress with `inhibit_rules`
Service mesh required	Traffic splitting requires Istio, Linked, or Ingress	Can be configured without a mesh using NGINX or Traefik
RBAC Permissions	ServiceAccount permission error occurs depending on the cluster	Check `rbac.create: true` in Flagger Helm chart

The Most Common Mistakes in Practice

Create Alert Rule without verifying flagger_canary_status value — Status codes vary depending on the Flagger version. After deployment, you must directly access the Flagger metric endpoint via kubectl port-forward to verify the actual value before applying it to the Alert Rule. If you write it with an incorrect value, no notification will be received even if a rollback occurs.
Example 3 + Example 4 Dual Configuration — Rollback: Two Slack messages arrive at once. You must suppress them with inhibit_rules or use only one of them.
Passing Canary Analysis with No Traffic — If there is no traffic during a dawn deployment, the success rate appears as 100% and the analysis passes. You must secure artificial traffic within the analysis period using flagger-loadtester or connect a load test to the analysis loop using webhooks configuration.

In Conclusion

3 Steps to Start Right Now:

Install Grafana using helm upgrade -i flagger-grafana flagger/grafana --create-namespace --namespace=flagger-system --set url=http://prometheus:9090 and import dashboard ID 15158. If there is no data, first check the Prometheus URL and whether ServiceMonitor is applied.
Apply CanaryRollback and PrometheusRule of the flagger_canary_status == 3 condition to the cluster, and set it to for: 0m so that a notification occurs immediately upon rollback. Verify the actual metric values directly before applying.
Add slack-canary-rollback receiver to AlertManager values, connect routing to #deployments-alert channel, and then reflect it in helm upgrade.

Next Post: Flagger MetricTemplate How to integrate external APM metrics such as Datadog and New Relic based on canary analysis using CRD

Key Concepts

Overall Data Flow — Start with the Big Picture

Flagger's Canary Analysis Loop

Practical Application

Example 1: Visualizing Canary Status with Grafana Dashboard

Example 2: Prometheus Alert Rule — Rollback Detection

Example 3: AlertManager → Slack Routing (Recommended)

Example 4: Directly Integrating Flagger with Slack Without AlertManager (Simple Alternative)

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Key Concepts

Overall Data Flow — Start with the Big Picture

Flagger's Canary Analysis Loop

Practical Application

Example 1: Visualizing Canary Status with Grafana Dashboard

Example 2: Prometheus Alert Rule — Rollback Detection

Example 3: AlertManager → Slack Routing (Recommended)

Example 4: Directly Integrating Flagger with Slack Without AlertManager (Simple Alternative)

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Recommended Posts

Local LLM TCO Analysis: How to Calculate the On-Premises Break-Even Point and GPU Utilization Optimization Strategies

How to Fine-Tune a Domain-Specific SLM with QLoRA on a Single Consumer GPU

How AI Coding Agents Are Reshaping Dev Team Structure: How to Transition into an Orchestrator

AI Writes It, AI Reviews It: Building a `/code-review ultra` Multi-Agent Pipeline

Cutting Long-Horizon Agent Costs by 60–90%: Caching, Compression, and Routing Strategies

Type-Safe LLM Response Validation with Pydantic AI