Implementing AI Agent (MCP) Server Canary Deployment and Automatic Rollback with KEDA + Argo Rollouts

In the Model Context Protocol (MCP) server, where the LLM directly calls external tools, a single incorrect version can bring the entire AI workflow to a halt. Existing rolling updates do not adequately control this risk. During version transitions, old and new version pods coexist, and even if some requests are routed to a defective version, immediate rollback is impossible, leaving error detection to humans. The MCP server provides the schemas (function signatures, parameter types) of the tools used by the LLM agent; short transition times are critical because a schema mismatch between versions causes the LLM tool call itself to fail.

This article examines step-by-step how to automate MCP server canary deployments by combining the Kubernetes event-driven autoscaler KEDA with the incremental deployment controller Argo Rollouts. The goal is to create an autonomous system that reduces operating costs by up to 90% by achieving scale-to-zero during off-peak hours when there are no requests, and automatically executes rollbacks based on Prometheus metrics the moment the success rate drops below 95%.

By combining KEDA's event-driven autoscaling with Argo Rollouts' incremental traffic switching, you can achieve both stability and cost-effectiveness in MCP server deployments.

Before Reading This Article: It is assumed that you are familiar with Kubernetes Pod, Service, and HPA concepts, Prometheus metric basics, and Istio VirtualService concepts. This article is intended for backend and DevOps developers with experience in Kubernetes operations.

Key Concepts

KEDA: Event-based Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) is a CNCF graduation project that extends the native Kubernetes HPA (Horizontal Pod Autoscaler). The key difference lies in the breadth of trigger sources. In addition to CPU and memory, it can use over 65 external event sources as scaling criteria, including Kafka queue depth, Prometheus query results, and the number of AWS SQS messages.

Two features particularly useful for MCP servers:

scale-to-zero: Reduces the number of pods to zero when there are no requests to completely reduce costs.
Argo Rollout Direct Target Support: You can directly specify a Rollout CRD for scaleTargetRef within ScaledObject to auto-scale resources currently in canary deployment simultaneously.

yaml

# 주의: KEDA는 내부적으로 HPA를 자동 생성한다
# 기존 HPA가 있으면 충돌하므로, 반드시 HPA를 먼저 제거하고 ScaledObject만 사용할 것
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef:
    apiVersion: argoproj.io/v1alpha1  # Deployment가 아닌 Rollout CRD를 타겟
    kind: Rollout
    name: mcp-server

When multiple triggers are configured, KEDA applies the largest value among the target replica counts calculated by each trigger. If the Connection Count trigger requires 5 pods and the Queue Depth trigger requires 8 pods, it scales up to 8. If you do not understand this operating principle, you may be confused by the phenomenon where more pods appear than expected.

scale-to-zero: A feature where KEDA polls external event sources and sets minReplicaCount to 0 when no events are found. The default HPA maintains at least 1, but KEDA can be completely shut down.

Argo Rollouts: Incremental Deployment Controller

Argo Rollouts is a Progressive Delivery controller that replaces the default Kubernetes Deployment. If you declare a Canary stage in the Rollout CRD, it works with service meshes like Istio or Linked to split traffic based on weights.

The key is AnalysisTemplate. It executes a Prometheus query during the deployment phase to evaluate the success rate, and if the conditions are not met, it automatically reverts to the stable version. There is no need for a human to press the approval button.

[배포 파이프라인]
10% 카나리 → 2분 대기 → 30% 카나리
  → AnalysisTemplate 평가 (성공률 ≥ 95%?)
    ✅ 성공 → 60% → 100% (프로모션 완료)
    ❌ 실패 → stable 100% 복구 (자동 롤백)

AnalysisTemplate: Argo Rollouts' CRD that automatically evaluates deployment quality by integrating with metric providers such as Prometheus, Datadog, and CloudWatch. It declaratively defines evaluation intervals, allowed number of failures, and success conditions.

Considerations for Deploying MCP Server and Kubernetes

MCP (Model Context Protocol) is an LLM tool integration standard designed by Anthropic and currently managed by the Linux Foundation. Thanks to the Streamable HTTP transport method introduced in the 2025 specification update, MCP servers can now be horizontally scaled as remote services on Kubernetes.

There are two reasons why the canary strategy is particularly important for MCP servers. First, the MCP server provides the schemas for the tools called by the LLM; if schema inconsistencies occur between versions, the LLM tool calls themselves fail. Even a short transition time can cause the entire AI workflow to come to a halt if old and new versions are mixed, leading to differing LLM inference results. Second, if the MCP server maintains session state, the connection may be lost during the canary transition.

The MCP 2025 specification recommends a stateless design, and KEDA's scale-to-zero and canary traffic switching operate safely only if there is no session state. If state is required, it is recommended to separate it into an external storage such as Redis.

Overall Architecture

css

[MCP 클라이언트 (AI Agent)]
        ↓ 요청 트래픽
[Ingress / Istio VirtualService]
    ↙ (stable 90%)    ↘ (canary 10%)
[Rollout - Stable Pod]  [Rollout - Canary Pod]
        ↕                       ↕
[KEDA ScaledObject — 이벤트 기반 스케일링]
                 ↕
[Prometheus → AnalysisTemplate → 자동 프로모션/롤백]

Practical Application

Prerequisites: kubectl, Helm, Argo Rollouts kubectl plugin(kubectl argo rollouts) must be installed. Without the plugin, commands such as kubectl argo rollouts get rollout cannot be executed.

# Argo Rollouts 설치
kubectl create namespace argo-rollouts
helm install argo-rollouts argo/argo-rollouts -n argo-rollouts
 
# KEDA 설치
helm install keda kedacore/keda -n keda --create-namespace

The examples below are applied in the following order:

python

kubectl apply -f destination-rule.yaml    # 예시 1: Istio 서브셋 정의
kubectl apply -f virtual-service.yaml     # 예시 1: 트래픽 라우팅 초기 구성
kubectl apply -f rollout.yaml             # 예시 2: Argo Rollout
kubectl apply -f analysis-template.yaml   # 예시 3: 자동 품질 게이트
kubectl apply -f scaled-object.yaml       # 예시 4: KEDA 오토스케일러

Example 1: Istio DestinationRule + VirtualService — Traffic Splitting-based Configuration

For Argo Rollouts to perform weighted traffic splitting, subsets of stable and canary must first be defined in Istio's DestinationRule. Since referencing subset in VirtualService without DestinationRule will immediately result in a routing error, these must be applied first.

yaml

# destination-rule.yaml
# VirtualService의 subset(stable/canary) 참조에 필요한 선행 리소스
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: mcp-server-destrule
  namespace: ai-services
spec:
  host: mcp-server   # Kubernetes Service 이름과 일치해야 함
  subsets:
    - name: stable
      labels:
        app: mcp-server
        # Argo Rollouts가 stable 파드에 rollouts-pod-template-hash 레이블을 자동 추가
    - name: canary
      labels:
        app: mcp-server

yaml

# virtual-service.yaml
# 초기 가중치는 Argo Rollouts가 배포 단계마다 자동으로 수정한다
# 수동으로 편집하면 Rollout 상태와 불일치가 발생하므로 주의
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mcp-server-vsvc
  namespace: ai-services
spec:
  hosts:
    - mcp-server
  http:
    - name: primary
      route:
        - destination:
            host: mcp-server
            subset: stable
          weight: 100    # 초기값. 배포 시작 후 Argo Rollouts가 자동 조정
        - destination:
            host: mcp-server
            subset: canary
          weight: 0

Example 2: Argo Rollout — MCP Server Canary Deployment Strategy

Replace the existing Deployment with a Rollout. Transition traffic in stages from 10% → 30% → 60% → 100%, and run AnalysisTemplate at the 30% stage to automatically verify the success rate. The key is to pass the hash value of the current Canary Pod to AnalysisTemplate via analysis.args. Without this value, it is impossible to evaluate metrics filtered solely for Canary Pods.

yaml

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: mcp-server
  namespace: ai-services
spec:
  replicas: 4
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
        - name: mcp-server
          image: my-org/mcp-server:v2.0.0   # 실제 이미지 주소로 교체
          ports:
            - containerPort: 8080
  strategy:
    canary:
      steps:
        - setWeight: 10            # 1단계: 10% 트래픽을 카나리로
        - pause: {duration: 2m}    # 2분 관찰
        - setWeight: 30
        - analysis:                # 자동 품질 분석 실행
            templates:
              - templateName: mcp-success-rate
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest  # 현재 카나리 파드 해시를 AnalysisRun에 주입
        - setWeight: 60
        - pause: {duration: 5m}    # 5분 추가 관찰
        - setWeight: 100           # 전체 프로모션
      trafficRouting:
        istio:
          virtualService:
            name: mcp-server-vsvc
            routes:
              - primary
          destinationRule:
            name: mcp-server-destrule
            canarySubsetName: canary
            stableSubsetName: stable

Field	Description
`setWeight: 10`	Routing 10% of traffic to Canary Pods in Istio VirtualService
`pause: {duration: 2m}`	Automatically proceed to the next step after waiting 2 minutes
`analysis.args.canary-hash`	Rollout automatically injects the current canary's `rollouts-pod-template-hash` label values into AnalysisRun
If you specify	`podTemplateHashValue: Latest`

Example 3: AnalysisTemplate — Prometheus-based Automatic Rollback Conditions

AnalysisTemplate is the "automatic quality gate" for canary deployments. It queries Prometheus every 60 seconds, records a failure if the MCP request success rate is less than 95%, and triggers an automatic rollback after 3 consecutive failures.

yaml

# analysis-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: mcp-success-rate
  namespace: ai-services
spec:
  args:
    - name: canary-hash   # Rollout의 analysis.args에서 주입받는 변수 선언
  metrics:
    - name: success-rate
      interval: 60s         # 60초마다 평가 (Prometheus 스크레이프 주기 15s의 4배 이상 권장)
      failureLimit: 3       # 3번 연속 실패 시 롤백 (일시적 메트릭 이상에 의한 오탐 방지)
      # result[0]: Prometheus 쿼리가 단일 스칼라 값을 반환할 때의 접근 방식
      # 벡터(시계열 집합) 쿼리는 result 구조가 다르므로 주의
      successCondition: result[0] >= 0.95
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            # rate(metric[2m]): 지난 2분 동안의 초당 평균 증가율 계산
            # sum(): 여러 파드의 값을 합산
            # {version="{{args.canary-hash}}"}: 카나리 파드만 필터링
            #   이 레이블 없이 전체를 집계하면 카나리 오류가 stable에 희석되어 문제를 감지 못 함
            sum(rate(mcp_requests_total{status="success",
              version="{{args.canary-hash}}"}[2m]))
            /
            sum(rate(mcp_requests_total{
              version="{{args.canary-hash}}"}[2m]))
    - name: error-rate
      interval: 60s
      failureLimit: 2
      successCondition: result[0] <= 0.05   # 오류율 5% 이하
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(rate(mcp_requests_total{status="error",
              version="{{args.canary-hash}}"}[2m]))
            /
            sum(rate(mcp_requests_total{
              version="{{args.canary-hash}}"}[2m]))

How to read PromQL: rate(metric[2m]) calculates the rate of change (increase rate) per second over the past 2 minutes. Summing the values of multiple pods with sum() and dividing the number of successful requests by the total number of requests gives the success rate. Without the {version="..."} label filter, stable and canary pods are mixed, and the errors from canaries are diluted in the overall success rate.

Example 4: KEDA ScaledObject — Event-based Autoscaling

Once Rollout is defined, attach KEDA ScaledObject to add event-based scaling. The key is to directly specify Rollout in scaleTargetRef.

yaml

# scaled-object.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: mcp-server-scaler
  namespace: ai-services
spec:
  scaleTargetRef:
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout            # Deployment가 아닌 Rollout을 타겟
    name: mcp-server
  minReplicaCount: 2         # 카나리 단계에서 최소 replica가 2 이상이어야 가중치 계산이 올바름
  maxReplicaCount: 20
  cooldownPeriod: 60         # 단위: 초(seconds). 스케일 다운 전 60초 대기
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        metricName: mcp_active_connections
        query: sum(mcp_active_connections{namespace="ai-services"})
        threshold: "50"       # 활성 연결 50개당 파드 1개
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        metricName: mcp_request_queue_depth
        query: sum(mcp_request_queue_depth{namespace="ai-services"})
        threshold: "100"      # 큐 깊이 100당 파드 1개

Field	Description
`minReplicaCount: 2`	If set to 0, the replica count becomes 0 during canary deployment, causing the Rollout weight calculation to break.
`cooldownPeriod: 60`	Units are in seconds. Prevents hasty scale-down when MCP sessions are processing
Multiple Trigger Priority	Between the number of connections and queue depth, the higher value of the replica count calculated by each trigger is applied first.

Check the deployment status in real-time with the following command:

# Rollout 단계 및 AnalysisRun 상태 확인
kubectl argo rollouts get rollout mcp-server --watch -n ai-services
 
# KEDA ScaledObject 동작 확인
kubectl get scaledobject mcp-server-scaler -n ai-services

Pros and Cons Analysis

Advantages

Item	Content
Zero-Downtime Incremental Deployment	Minimize failure radius with 10% → 30% → 60% → 100% phase transitions, and limit impact to canary traffic in the event of an issue
Automatic Rollback	Immediately restores to the stable version without human intervention when Prometheus metrics fall below threshold, shortening MTTR
Event-Driven Cost Optimization	Reduce MCP Server Operating Costs by Up to 90% by Scale-to-Zero During No AI Agent Request Periods
Fully Integrated with GitOps	Declaratively manage all deployment states in Git with the combination of Argo CD + Argo Rollouts
Multiple Metric Conditions	Sophisticated quality gate configuration possible by combining multiple conditions such as success rate, error rate, and latency P99 using AND

Disadvantages and Precautions

Item	Content	Response Plan
KEDA ↔ Rollout State Conflict	Canary weight ratio is distorted if KEDA changes replica count during rollout	Set `minReplicaCount` larger than minimum canary replica, adjust `cooldownPeriod`
KEDA and HPA Conflict	Since KEDA automatically generates HPAs internally, a conflict occurs if an existing HPA exists	Remove all existing HPAs and use only ScaledObject
Metric Detection Delay	Error detection is delayed due to the time lag between the Prometheus scrap period (15s) and the AnalysisTemplate interval	Set the interval to at least 4 times the scrap period (60s recommended)
MCP Session Drop	Connection may be lost during canary switching if streamable HTTP session state is maintained	Design MCP server as stateless, separate session state to external storage such as Redis
Complex Debugging	Tracing the cause of failure is difficult as three components—KEDA, Argo Rollouts, and Istio—are all involved	Step-by-step status check using `kubectl argo rollouts get rollout mcp-server --watch`, separate collection of logs for each component

The Most Common Mistakes in Practice

Set failureLimit: 1 in AnalysisTemplate: Temporary NaN results may occur due to Prometheus scrap timing issues. Set to at least failureLimit: 3 to prevent false positive rollbacks.
Simultaneous use of KEDA minReplicaCount: 0 + Canary deployment: If the replica scales down to 0 during the canary phase, the weight calculation of the Rollout controller breaks. It must be maintained at a minimum of 2 or more during deployment.
Version labels not included in Prometheus queries: If you aggregate the entire metric without distinguishing between stable and canary pods, errors in the canary will be diluted in the stable success rate, preventing problem detection. Be sure to add the version or rollouts-pod-template-hash labels as filters.

In Conclusion

Rolling updates struggle to resolve schema inconsistencies and the inability to immediately roll back that occur during MCP server version transitions. The MCP server deployment pipeline combining KEDA and Argo Rollouts utilizes "automatic verification during deployment" rather than "post-deployment observation," simultaneously achieving two benefits without human intervention: immediate rollback when the success rate falls below 95% and up to 90% cost reduction during downtime.

3 Steps to Start Right Now:

Migrate to Rollout after installing Argo Rollouts + KEDA: Install the two components using Helm, and replace the existing MCP Server Deployment with the YAML from Example 2 above. Check the progress of the steps in real-time using kubectl argo rollouts get rollout mcp-server --watch.
Connect MCP-specific metrics to AnalysisTemplate: If the mcp_requests_total metric does not exist, first add the Prometheus exporter to the server. Deploy the AnalysisTemplate with conditions failureLimit: 3 and successCondition: result[0] >= 0.95, and connect the canary-hash injection (podTemplateHashValue: Latest) to the analysis.args of the Rollout.
Configure scale-to-zero with KEDA ScaledObject: minReplicaCount: 2, maxReplicaCount: 20, set the sum(mcp_active_connections) query to threshold 50 using a Prometheus trigger, and verify normal operation with kubectl get scaledobject.

Next Article: We will examine how Flagger resolves the limitation of this configuration—that "state management is complex because KEDA ScaledObject and AnalysisTemplate are separated into different CRDs"—using a single CRD, and how to automate multi-cluster MCP server canary deployment with Argo CD ApplicationSet.

Reference Materials

For Basic Learning

Sources directly referenced in this article

Advanced MCP Server Kubernetes Deployment

Practical Exercises and Advanced Applications

Implementing AI Agent (MCP) Server Canary Deployment and Automatic Rollback with KEDA + Argo Rollouts | DEV BAK - 기술블로그

Implementing AI Agent (MCP) Server Canary Deployment and Automatic Rollback with KEDA + Argo Rollouts

By combining KEDA's event-driven autoscaling with Argo Rollouts' incremental traffic switching, you can achieve both stability and cost-effectiveness in MCP server deployments.

Key Concepts

KEDA: Event-based Autoscaling

Two features particularly useful for MCP servers:

scale-to-zero: Reduces the number of pods to zero when there are no requests to completely reduce costs.
Argo Rollout Direct Target Support: You can directly specify a Rollout CRD for scaleTargetRef within ScaledObject to auto-scale resources currently in canary deployment simultaneously.

yaml

# 주의: KEDA는 내부적으로 HPA를 자동 생성한다
# 기존 HPA가 있으면 충돌하므로, 반드시 HPA를 먼저 제거하고 ScaledObject만 사용할 것
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef:
    apiVersion: argoproj.io/v1alpha1  # Deployment가 아닌 Rollout CRD를 타겟
    kind: Rollout
    name: mcp-server

Argo Rollouts: Incremental Deployment Controller

[배포 파이프라인]
10% 카나리 → 2분 대기 → 30% 카나리
  → AnalysisTemplate 평가 (성공률 ≥ 95%?)
    ✅ 성공 → 60% → 100% (프로모션 완료)
    ❌ 실패 → stable 100% 복구 (자동 롤백)

Considerations for Deploying MCP Server and Kubernetes

Overall Architecture

css

[MCP 클라이언트 (AI Agent)]
        ↓ 요청 트래픽
[Ingress / Istio VirtualService]
    ↙ (stable 90%)    ↘ (canary 10%)
[Rollout - Stable Pod]  [Rollout - Canary Pod]
        ↕                       ↕
[KEDA ScaledObject — 이벤트 기반 스케일링]
                 ↕
[Prometheus → AnalysisTemplate → 자동 프로모션/롤백]

Practical Application

# Argo Rollouts 설치
kubectl create namespace argo-rollouts
helm install argo-rollouts argo/argo-rollouts -n argo-rollouts
 
# KEDA 설치
helm install keda kedacore/keda -n keda --create-namespace

The examples below are applied in the following order:

python

kubectl apply -f destination-rule.yaml    # 예시 1: Istio 서브셋 정의
kubectl apply -f virtual-service.yaml     # 예시 1: 트래픽 라우팅 초기 구성
kubectl apply -f rollout.yaml             # 예시 2: Argo Rollout
kubectl apply -f analysis-template.yaml   # 예시 3: 자동 품질 게이트
kubectl apply -f scaled-object.yaml       # 예시 4: KEDA 오토스케일러

Example 1: Istio DestinationRule + VirtualService — Traffic Splitting-based Configuration

yaml

# destination-rule.yaml
# VirtualService의 subset(stable/canary) 참조에 필요한 선행 리소스
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: mcp-server-destrule
  namespace: ai-services
spec:
  host: mcp-server   # Kubernetes Service 이름과 일치해야 함
  subsets:
    - name: stable
      labels:
        app: mcp-server
        # Argo Rollouts가 stable 파드에 rollouts-pod-template-hash 레이블을 자동 추가
    - name: canary
      labels:
        app: mcp-server

yaml

# virtual-service.yaml
# 초기 가중치는 Argo Rollouts가 배포 단계마다 자동으로 수정한다
# 수동으로 편집하면 Rollout 상태와 불일치가 발생하므로 주의
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mcp-server-vsvc
  namespace: ai-services
spec:
  hosts:
    - mcp-server
  http:
    - name: primary
      route:
        - destination:
            host: mcp-server
            subset: stable
          weight: 100    # 초기값. 배포 시작 후 Argo Rollouts가 자동 조정
        - destination:
            host: mcp-server
            subset: canary
          weight: 0

Example 2: Argo Rollout — MCP Server Canary Deployment Strategy

yaml

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: mcp-server
  namespace: ai-services
spec:
  replicas: 4
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
        - name: mcp-server
          image: my-org/mcp-server:v2.0.0   # 실제 이미지 주소로 교체
          ports:
            - containerPort: 8080
  strategy:
    canary:
      steps:
        - setWeight: 10            # 1단계: 10% 트래픽을 카나리로
        - pause: {duration: 2m}    # 2분 관찰
        - setWeight: 30
        - analysis:                # 자동 품질 분석 실행
            templates:
              - templateName: mcp-success-rate
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest  # 현재 카나리 파드 해시를 AnalysisRun에 주입
        - setWeight: 60
        - pause: {duration: 5m}    # 5분 추가 관찰
        - setWeight: 100           # 전체 프로모션
      trafficRouting:
        istio:
          virtualService:
            name: mcp-server-vsvc
            routes:
              - primary
          destinationRule:
            name: mcp-server-destrule
            canarySubsetName: canary
            stableSubsetName: stable

Field	Description
`setWeight: 10`	Routing 10% of traffic to Canary Pods in Istio VirtualService
`pause: {duration: 2m}`	Automatically proceed to the next step after waiting 2 minutes
`analysis.args.canary-hash`	Rollout automatically injects the current canary's `rollouts-pod-template-hash` label values into AnalysisRun
If you specify	`podTemplateHashValue: Latest`

Example 3: AnalysisTemplate — Prometheus-based Automatic Rollback Conditions

yaml

# analysis-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: mcp-success-rate
  namespace: ai-services
spec:
  args:
    - name: canary-hash   # Rollout의 analysis.args에서 주입받는 변수 선언
  metrics:
    - name: success-rate
      interval: 60s         # 60초마다 평가 (Prometheus 스크레이프 주기 15s의 4배 이상 권장)
      failureLimit: 3       # 3번 연속 실패 시 롤백 (일시적 메트릭 이상에 의한 오탐 방지)
      # result[0]: Prometheus 쿼리가 단일 스칼라 값을 반환할 때의 접근 방식
      # 벡터(시계열 집합) 쿼리는 result 구조가 다르므로 주의
      successCondition: result[0] >= 0.95
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            # rate(metric[2m]): 지난 2분 동안의 초당 평균 증가율 계산
            # sum(): 여러 파드의 값을 합산
            # {version="{{args.canary-hash}}"}: 카나리 파드만 필터링
            #   이 레이블 없이 전체를 집계하면 카나리 오류가 stable에 희석되어 문제를 감지 못 함
            sum(rate(mcp_requests_total{status="success",
              version="{{args.canary-hash}}"}[2m]))
            /
            sum(rate(mcp_requests_total{
              version="{{args.canary-hash}}"}[2m]))
    - name: error-rate
      interval: 60s
      failureLimit: 2
      successCondition: result[0] <= 0.05   # 오류율 5% 이하
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(rate(mcp_requests_total{status="error",
              version="{{args.canary-hash}}"}[2m]))
            /
            sum(rate(mcp_requests_total{
              version="{{args.canary-hash}}"}[2m]))

Example 4: KEDA ScaledObject — Event-based Autoscaling

Once Rollout is defined, attach KEDA ScaledObject to add event-based scaling. The key is to directly specify Rollout in scaleTargetRef.

yaml

# scaled-object.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: mcp-server-scaler
  namespace: ai-services
spec:
  scaleTargetRef:
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout            # Deployment가 아닌 Rollout을 타겟
    name: mcp-server
  minReplicaCount: 2         # 카나리 단계에서 최소 replica가 2 이상이어야 가중치 계산이 올바름
  maxReplicaCount: 20
  cooldownPeriod: 60         # 단위: 초(seconds). 스케일 다운 전 60초 대기
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        metricName: mcp_active_connections
        query: sum(mcp_active_connections{namespace="ai-services"})
        threshold: "50"       # 활성 연결 50개당 파드 1개
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring:9090
        metricName: mcp_request_queue_depth
        query: sum(mcp_request_queue_depth{namespace="ai-services"})
        threshold: "100"      # 큐 깊이 100당 파드 1개

Field	Description
`minReplicaCount: 2`	If set to 0, the replica count becomes 0 during canary deployment, causing the Rollout weight calculation to break.
`cooldownPeriod: 60`	Units are in seconds. Prevents hasty scale-down when MCP sessions are processing
Multiple Trigger Priority	Between the number of connections and queue depth, the higher value of the replica count calculated by each trigger is applied first.

Check the deployment status in real-time with the following command:

# Rollout 단계 및 AnalysisRun 상태 확인
kubectl argo rollouts get rollout mcp-server --watch -n ai-services
 
# KEDA ScaledObject 동작 확인
kubectl get scaledobject mcp-server-scaler -n ai-services

Pros and Cons Analysis

Advantages

Item	Content
Zero-Downtime Incremental Deployment	Minimize failure radius with 10% → 30% → 60% → 100% phase transitions, and limit impact to canary traffic in the event of an issue
Automatic Rollback	Immediately restores to the stable version without human intervention when Prometheus metrics fall below threshold, shortening MTTR
Event-Driven Cost Optimization	Reduce MCP Server Operating Costs by Up to 90% by Scale-to-Zero During No AI Agent Request Periods
Fully Integrated with GitOps	Declaratively manage all deployment states in Git with the combination of Argo CD + Argo Rollouts
Multiple Metric Conditions	Sophisticated quality gate configuration possible by combining multiple conditions such as success rate, error rate, and latency P99 using AND

Disadvantages and Precautions

Item	Content	Response Plan
KEDA ↔ Rollout State Conflict	Canary weight ratio is distorted if KEDA changes replica count during rollout	Set `minReplicaCount` larger than minimum canary replica, adjust `cooldownPeriod`
KEDA and HPA Conflict	Since KEDA automatically generates HPAs internally, a conflict occurs if an existing HPA exists	Remove all existing HPAs and use only ScaledObject
Metric Detection Delay	Error detection is delayed due to the time lag between the Prometheus scrap period (15s) and the AnalysisTemplate interval	Set the interval to at least 4 times the scrap period (60s recommended)
MCP Session Drop	Connection may be lost during canary switching if streamable HTTP session state is maintained	Design MCP server as stateless, separate session state to external storage such as Redis
Complex Debugging	Tracing the cause of failure is difficult as three components—KEDA, Argo Rollouts, and Istio—are all involved	Step-by-step status check using `kubectl argo rollouts get rollout mcp-server --watch`, separate collection of logs for each component

The Most Common Mistakes in Practice

Set failureLimit: 1 in AnalysisTemplate: Temporary NaN results may occur due to Prometheus scrap timing issues. Set to at least failureLimit: 3 to prevent false positive rollbacks.
Simultaneous use of KEDA minReplicaCount: 0 + Canary deployment: If the replica scales down to 0 during the canary phase, the weight calculation of the Rollout controller breaks. It must be maintained at a minimum of 2 or more during deployment.
Version labels not included in Prometheus queries: If you aggregate the entire metric without distinguishing between stable and canary pods, errors in the canary will be diluted in the stable success rate, preventing problem detection. Be sure to add the version or rollouts-pod-template-hash labels as filters.

In Conclusion

3 Steps to Start Right Now:

Migrate to Rollout after installing Argo Rollouts + KEDA: Install the two components using Helm, and replace the existing MCP Server Deployment with the YAML from Example 2 above. Check the progress of the steps in real-time using kubectl argo rollouts get rollout mcp-server --watch.
Connect MCP-specific metrics to AnalysisTemplate: If the mcp_requests_total metric does not exist, first add the Prometheus exporter to the server. Deploy the AnalysisTemplate with conditions failureLimit: 3 and successCondition: result[0] >= 0.95, and connect the canary-hash injection (podTemplateHashValue: Latest) to the analysis.args of the Rollout.
Configure scale-to-zero with KEDA ScaledObject: minReplicaCount: 2, maxReplicaCount: 20, set the sum(mcp_active_connections) query to threshold 50 using a Prometheus trigger, and verify normal operation with kubectl get scaledobject.

Reference Materials

For Basic Learning

Sources directly referenced in this article

Advanced MCP Server Kubernetes Deployment

Practical Exercises and Advanced Applications

Key Concepts

KEDA: Event-based Autoscaling

Argo Rollouts: Incremental Deployment Controller

Considerations for Deploying MCP Server and Kubernetes

Overall Architecture

Practical Application

Example 1: Istio DestinationRule + VirtualService — Traffic Splitting-based Configuration

Example 2: Argo Rollout — MCP Server Canary Deployment Strategy

Example 3: AnalysisTemplate — Prometheus-based Automatic Rollback Conditions

Example 4: KEDA ScaledObject — Event-based Autoscaling

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Key Concepts

KEDA: Event-based Autoscaling

Argo Rollouts: Incremental Deployment Controller

Considerations for Deploying MCP Server and Kubernetes

Overall Architecture

Practical Application

Example 1: Istio DestinationRule + VirtualService — Traffic Splitting-based Configuration

Example 2: Argo Rollout — MCP Server Canary Deployment Strategy

Example 3: AnalysisTemplate — Prometheus-based Automatic Rollback Conditions

Example 4: KEDA ScaledObject — Event-based Autoscaling

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Recommended Posts

Simplifying Canary Deployment with a Single Flagger CRD: From KEDA ScaledObject Separation Issues to Argo CD ApplicationSet Multicluster MCP Server Automation

Configuring LLM p99 Latency-Based Canary Auto-Rollback with Flagger MetricTemplate

Using Flagger MetricTemplate CRD for automating Datadog and New Relic canary deployments

Stabilizing MCP Servers with HPA Custom Metrics + Grafana Dashboards: Practical Operation of AI Agent Servers on Kubernetes

MCP Server Docker Deployment in 3 Steps — SSE Deprecated, Now Streamable HTTP

Practical Guide to MCP Server Development — Transforming Internal APIs and DBs into a Professional Domain Agent