Simplifying Canary Deployment with a Single Flagger CRD: From KEDA ScaledObject Separation Issues to Argo CD ApplicationSet Multicluster MCP Server Automation
Target Audience for this Article: Backend developers or SREs who have used Kubernetes in a production environment. This article assumes experience with Helm deployment and a knowledge of CRD concepts.
If you have configured a Canary deployment in Kubernetes, you have likely experienced a situation like this at least once. You have attached autoscaling with KEDA ScaledObject and added metric analysis with Argo Rollouts AnalysisTemplate, but if the deployment fails, you cannot immediately identify which CRD is the problem. That ambiguous moment, when ScaledObject incorrectly scales up the Canary Pod and AnalysisTemplate has not yet triggered a rollback, is precisely the downtime.
This article explains how Flagger's single Canary CRD resolves the root cause of the problem—that the Autoscaler and Canary Analytics Controller are separated into distinct CRDs and manage state independently. Going a step further, it covers a practical configuration for deploying a Canary across multiple clusters using an Argo CD ApplicationSet as the Model Context Protocol (MCP) server connecting the AI Assistant and the Kubernetes cluster.
The key is simple: having a single object own the state allows you to escape distributed debugging hell.
Key Concepts
The Real Pain of KEDA + AnalysisTemplate Separation Structures
First, let me clear up a misunderstanding. KEDA and Flagger are not interchangeable. KEDA is an event-based autoscaler, while Flagger is a progressive delivery controller. The two tools have different roles. So why do problems arise when they are used together?
Imagine the following objects being alive simultaneously during a canary deployment:
primary-deployment ← Deployment (메인)
canary-deployment ← Deployment (카나리)
primary-scaledobject ← KEDA ScaledObject (primary용)
canary-scaledobject ← KEDA ScaledObject (canary용)
canary-analysis ← AnalysisTemplate (Argo Rollouts)
canary-analysisrun ← AnalysisRun (실행 중인 분석)These objects are unaware of each other's existence. While KEDA checks the queue depth and increases the canary pods to 10, AnalysisRun may decide to roll back because it exceeds the error rate threshold. During the 10 seconds in between, traffic continues to flow into the canary pod spitting out errors. After the rollback, canary-scaledobject remains, and if it is not deleted, it will crash in the next deployment.
Flagger Canary CRD: Single Ownership Model
Flagger's approach is simple: Concentrates ownership of all child resources into a single Canary CRD.
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: mcp-server
namespace: ai-platform
spec:
# 대상 Deployment — 원본은 건드리지 않음
targetRef:
apiVersion: apps/v1
kind: Deployment
name: mcp-server
# KEDA ScaledObject를 Flagger가 관리하도록 위임
autoscalerRef:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
name: mcp-server
primaryScalerQueries:
# primary: 큐 depth 기반 스케일링
queueLength: "sum(keda_scaler_metrics_value{scaledObject='mcp-server-primary'})"
canaryScalerQueries:
# canary: 더 낮은 임계값으로 보수적 스케일링
queueLength: "sum(keda_scaler_metrics_value{scaledObject='mcp-server-canary'})"
# 서비스 메시 트래픽 제어
service:
port: 8080
targetPort: 8080
gateways:
- public-gateway.istio-system.svc.cluster.local
# 카나리 분석 — AnalysisTemplate 불필요
analysis:
interval: 1m
threshold: 5 # 5번 실패 시 롤백
maxWeight: 50 # 최대 50%까지 카나리로 전송
stepWeight: 10 # 10%씩 증가
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500 # 500ms 이내
interval: 1m
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://mcp-server-canary.ai-platform/"Key Operation Principle: Flagger detects this Canary object and automatically creates copies of mcp-server-primary Deployment, mcp-server-canary Deployment, two ClusterIP Services, and KEDA ScaledObject. The original Deployment is not touched.
When scaling of a KEDA ScaledObject needs to be paused while a canary analysis is in progress, Flagger automatically adds the following annotation:
# Flagger가 자동 관리 — 직접 수정 불필요
metadata:
annotations:
autoscaling.keda.sh/paused-replicas: "0"This single-line annotation blocks KEDA from scaling up, and once the canary analysis is complete, Flagger removes the annotation to resume scaling. Since Flagger holds the initiative for state changes, there are no conflicts between the two controllers.
Argo CD ApplicationSet: Declaratively Multiclustering
ApplicationSet is a tool that automatically generates multiple clusters by templated Application objects from Argo CD. The key is the Generator concept.
| Generator | Usage Scenarios | Selection Criteria |
|---|---|---|
clusters |
All (or label-filtered) clusters registered in Argo CD | When the number of clusters changes dynamically |
git |
Git directory structure or JSON file-based | When cluster-specific settings exist as files |
list |
List of statically defined clusters | When the target is fixed and small |
matrix |
Combination of two Generators (Cluster × Environment) | When a complex deployment matrix is required |
The clusters Generator is suitable for deploying the same configuration to each team/region cluster like an MCP server, but automating as the cluster grows.
Practical Application
Example 1: Flagger + KEDA ScaledObject Integrated Canary Deployment
First, assume you have an existing KEDA ScaledObject:
# 기존 ScaledObject — 이 파일은 그대로 유지
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: mcp-server
namespace: ai-platform
spec:
scaleTargetRef:
name: mcp-server
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
metricName: mcp_request_queue_depth
query: sum(mcp_request_queue_depth{service="mcp-server"})
threshold: "10"Do not delete this ScaledObject, but connect it to .spec.autoscalerRef in Canary CRD. Flagger automatically duplicates and manages ScaledObjects for primary/canary.
The deployment event flow is as follows:
1. mcp-server Deployment의 이미지 태그 변경 (예: v1 → v2)
↓
2. Flagger가 변경 감지 → Canary 분석 시작
↓
3. KEDA ScaledObject에 paused-replicas: 0 annotation 추가
(스케일링 일시 중단, 파드 수 고정)
↓
4. 트래픽 10% → 20% → ... → 50% (stepWeight 기준)
각 단계마다 Prometheus 메트릭 분석
↓
5-A. 분석 성공 → promotion: canary가 primary로 승격
KEDA annotation 제거, 스케일링 재개
5-B. 분석 실패 (5회 임계값 초과) → rollback
트래픽 100% primary로 복귀, canary 파드 0으로 축소Example 2: Deploying a Multicluster MCP Server with Argo CD ApplicationSet
The Model Context Protocol (MCP) server is a bridge that allows AI assistants like Claude and Cursor IDE to communicate with a Kubernetes cluster. If multiple teams operate MCP servers in their respective clusters, a canary deployment is required during upgrades—because the impact can be widespread if an incorrect version breaks the AI assistant's access permissions to the cluster.
Step 1: Add Labels to Clusters
# Argo CD에 클러스터 등록 시 레이블 부여
argocd cluster add prod-cluster-kr \
--label env=production \
--label region=kr \
--label team=platform
argocd cluster add prod-cluster-jp \
--label env=production \
--label region=jp \
--label team=platformStep 2: Create ApplicationSet
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: mcp-server-canary
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
team: platform # platform 팀 클러스터만 대상
values:
# 클러스터별 오버라이드 가능한 기본값
minReplicas: "2"
maxReplicas: "20"
template:
metadata:
name: "mcp-server-{{name}}" # 클러스터 이름이 자동 삽입
annotations:
# Progressive sync: 첫 번째 웨이브 클러스터 배포 후 5분 대기
argocd.argoproj.io/sync-wave: "{{metadata.labels.region}}"
spec:
project: ai-platform
source:
repoURL: https://github.com/my-org/mcp-server-helm
targetRevision: HEAD
path: charts/mcp-server
helm:
valueFiles:
- values.yaml
- "values-{{metadata.labels.region}}.yaml" # 리전별 설정
parameters:
- name: autoscaling.minReplicas
value: "{{values.minReplicas}}"
- name: autoscaling.maxReplicas
value: "{{values.maxReplicas}}"
- name: flagger.enabled
value: "true"
destination:
server: "{{server}}" # 클러스터 API 서버 URL 자동 삽입
namespace: ai-platform
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueStep 3: Flagger Canary Resources Inside the Helm Chart
# charts/mcp-server/templates/canary.yaml
{{- if .Values.flagger.enabled }}
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: mcp-server
namespace: {{ .Release.Namespace }}
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: mcp-server
autoscalerRef:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
name: mcp-server
service:
port: {{ .Values.service.port }}
gateways:
- {{ .Values.flagger.gateway }}
analysis:
interval: 1m
threshold: 3
maxWeight: 30 # MCP 서버는 보수적으로 30%까지만
stepWeight: 5
metrics:
- name: request-success-rate
thresholdRange:
min: 99.5
interval: 1m
- name: mcp-tool-call-latency
templateRef:
name: mcp-latency-template # 커스텀 메트릭 템플릿
namespace: flagger-system
thresholdRange:
max: 2000 # LLM 툴 호출은 2초 이내
interval: 1m
{{- end }}The moment a new cluster is added to Argo CD with the label env: production, team: platform, ApplicationSet automatically creates Application for that cluster. The MCP server is deployed without manual configuration, and Flagger canary analysis is automatically enabled.
Pros and Cons Analysis
Advantages
| Item | Content |
|---|---|
| Single State Ownership | Canary A single CRD owns ScaledObject, Service, and VirtualService — No need for distributed debugging |
| Preservation of source code | Flagger can be introduced without modifying existing Deployments or ScaledObjects |
| Automatic Rollback | Immediate rollback without human intervention upon metric threshold violation |
| GitOps Affinity | Adding a cluster with ApplicationSet + Flagger combination = One Git commit |
| CNCF Graduate Project | Flagger is CNCF Graduated — Long-term Support Guaranteed for Community Stability |
Disadvantages and Precautions
| Item | Content | Response Plan |
|---|---|---|
| Argo CD synchronization crash | Flagger modifies resources outside Argo CD, raising OutOfSync warning |
Exclude Flagger managed fields with ignoreDifferences setting |
| KEDA Replica Crash | Primary setting can override canary when cloning ScaledObject | primaryScalerQueries Explicit specification required |
| Limited UI | No Flagger dashboard of its own | Use Grafana Flagger dashboard or Argo CD UI plugin |
| Flux Optimized Design | Designed with Flux integration in mind first — Additional configuration required with Argo CD | selfHeal: false or ignoreDifferences combination |
The Most Common Mistakes in Practice
- Canary analysis deadlock after setting scale-to-zero with
minReplicaCount: 0: When KEDA scales the canary pod to zero when there is no queue, the Flagger waits indefinitely for the "pod ready" signal. The solution is to specify a separate ScaledObject override inautoscalerRef.canaryScalerQueriesthat guaranteesminReplicaCount: 1during the canary analysis period. - Continuous resynchronization due to missing
ignoreDifferencesin ApplicationSet: Flagger modifies the Deployment'sspec.replicasin real-time. Argo CD detects this as a state different from Git and continuously attempts to synchronizeselfHeal. Be sure to add the following to the ApplicationSet template: ignoreDifferences: - group: apps kind: Deployment jsonPointers: - /spec/replicas - group: keda.sh kind: ScaledObject jsonPointers: - /metadata/annotations- Issue where two original ScaledObjects remain after canary promotion: If the
mcp-server-primaryScaledObject created by Flagger and the originalmcp-serverScaledObject coexist, both controllers scale the same Deployment simultaneously. You must ensure that when the Helm values areflagger.enabled: true, the original ScaledObject'sscaleTargetRefpoints to the Deployment name managed by Flagger, and remove the manual ScaledObject settings after delegating to Flagger as.spec.autoscalerRef.
In Conclusion
Flagger's single Canary CRD resolves state ownership disputes between KEDA ScaledObjects and Canary Analysis Controllers. It can be deployed without modifying the source resources, and when combined with Argo CD ApplicationSet, it allows for the declarative configuration of MCP Server Canary deployment automation using a single cluster label.
To start right now:
- Install the Flagger Helm Chart on the existing Kubernetes cluster (
helm install flagger flagger/flagger), and create and apply theCanaryCRD to the Deployment that is currently the most troublesome. - Register the second cluster with Argo CD and label it
env: production, then apply the ApplicationSet YAML above to verify that an application is automatically created in the new cluster. - Import the Flagger Official Dashboard into Grafana to monitor the progress of the canary analysis in real time.
There are currently limitations to this configuration. Flagger's Gateway API support is still in a mature stage, and automating per-PR preview environments using ApplicationSet's Pull Request Generator still requires complex configuration. Both features are expected to stabilize by 2025.
Next Post: Automating Canary Analysis Based on Custom Metrics with Flagger + Prometheus Operator — A Practical Guide to Writing MetricTemplate: Setting LLM Tool Invocation Latency as a Rollback Trigger
Reference Materials
- Flagger KEDA ScaledObject Integration Official Tutorial | docs.flagger.app
- Details on how Flagger works | docs.flagger.app
- Gateway API Progressive Delivery | docs.flagger.app
- Argo CD ApplicationSet Cluster Generator | argo-cd.readthedocs.io
- Argo CD Sync Waves | argo-cd.readthedocs.io
- ApplicationSet Multi-cluster Deployment Guide | codefresh.io
- KEDA Conceptual Documentation | keda.sh
- Practical Example of Flagger Canary Deployment (Expedia Group) | medium.com
- Kubernetes MCP Server AI-based Cluster Management | developers.redhat.com
- AWS EKS MCP Server Introduction | docs.aws.amazon.com
- Multi-cluster Deployment Automation (Argo CD) | developers.redhat.com
- Argo CD ApplicationSet Generator 심화 | piotrminkowski.com
- Canary 배포: Kubernetes Gateway API + Flagger + Google Cloud Deploy | cloud.google.com