Flagger + Istio A/B Routing: Integrating New Relic NRQL with Conversion Rate as Distribution Gating Criteria

Deployment is no longer a mere act of "uploading and watching." In a modern production environment, a release is a moment of hypothesis testing, requiring data to prove that the new version actually improves business metrics. However, many teams still judge the success or failure of a deployment solely based on error rates and latency. Whether users purchased more products or maintained longer sessions is not visible on the Prometheus dashboard.

After reading this, you can configure a pipeline for production today that automatically rolls back without human intervention the moment the conversion rate drops below 2.5%. By connecting a New Relic NRQL query to Flagger's MetricTemplate, you can directly utilize business KPIs such as conversion rate and session length as gating conditions for deployment automation. The changes brought about by this pipeline extend beyond the technical aspects. SREs communicate with PMs based on Canary CR's thresholdRange, and you can verify through Git commits that the criterion agreed upon by the PM—"conversion rate above 2.5%"—is directly reflected in the automation pipeline. Infrastructure engineering and product analysis meet on the same declarative configuration file.

This article covers the entire pipeline step-by-step, from writing Flagger Canary CRs and understanding the Istio VirtualService structure to NRQL pre-validation and configuring a new Relic MetricTemplate. Before starting, ensure that the following prerequisites are met.

Prerequisites

Flagger installation completed on Kubernetes cluster
Istio Service Mesh in operation (Sidecar injection enabled)
Own a New Relic Account and Issue Insights Query API Key
Integration of New Relic APM Agent and Browser Agent with the application is complete MetricTemplate operates only if PageAction, PageView events are being collected in New Relic)

Key Concepts

Flagger's Progressive Delivery Pipeline

Flagger is a Kubernetes operator that declaratively manages the entire A/B test pipeline with a single Canary CR (Custom Resource). When a developer changes the Deployment image, Flagger detects this and automatically executes the next flow.

Canary CR 변경 감지
    → Istio VirtualService 생성/수정 (헤더 매칭 라우팅)
    → 분석 인터벌마다 MetricTemplate으로 NRQL 쿼리 실행
    → 임계값 통과 시 배포 승인 / 초과 시 자동 롤백

MetricTemplate: A CRD for Flagger to send queries to external metric providers (New Relic, Datadog, Prometheus, etc.). The query result must be a single float64 value and is compared to a threshold in the metrics field of Canary.

HTTP Header-based A/B Routing vs. Weighted Canary

The two strategies have different purposes.

Classification	Weighted Canary	HTTP Header A/B Routing
Routing Criteria	Traffic Percentage (e.g., 10%)	Request Header/Cookie Value
User Consistency	Difficult to Guarantee	Same User → Same Version (Session Affinity)
Suitable Services	Backend API	Frontend, Payment Flow
Analysis Purpose	Stability Verification	Business KPI Comparison

The header-based method routes only users with the x-user-group: beta header to the new version. If this header is injected into specific user segments in API Gateway or BFF, beta user groups can consistently experience the new version, enabling accurate comparison testing.

Integration Structure of NRQL and Flagger MetricTemplate

NRQL (New Relic Query Language) queries New Relic's MELT data (Metrics, Events, Logs, Traces) using syntax similar to SQL. At each analysis interval, Flagger sends the NRQL defined in MetricTemplate to the New Relic Insights Query API and compares the returned single numeric value with thresholdRange.

MELT Data: The four core data types of New Relic. PageAction(User behavior events), PageView(Page views), Transaction(Server transactions), Metric(Numerical measures). PageAction and PageView are primarily used for business metric analysis.

Flagger template variables such as {{ target }} and {{ interval }} can be used in NRQL queries. {{ target }} is replaced with the app name in Canary, and {{ interval }} is replaced with the interval setting value (in seconds) of the corresponding metric.

Relationship between metric level interval and analysis cycle: For Canary CR, analysis.interval: 1m is the cycle in which the Flagger evaluates the metric. For individual metrics in the metrics array, interval: 5m is the value (300 seconds) that the Flagger uses when substituting the NRQL {{ interval }} variable. In other words, the Flagger still executes the query every minute, but the SINCE range of that query is set to 5 minutes (300 seconds). This is a pattern utilized to obtain stable values with a wider aggregation window when short-term sample volatility is high, such as with business metrics.

Pre-preparation Checklist

Before applying this in practice, let's check the following items in order.

Item	How to check
Install Flagger	`kubectl get pods -n flagger-system`
Enable Istio Sidecar Injection	`kubectl get ns prod --show-labels` → `istio-injection=enabled` Confirm
New Relic APM Integration	New Relic UI → APM → Check App Name List
New Relic Browser Agent	New Relic UI → Browser → `PageView` / `PageAction` Verify Event Collection
Insights Query API Key	New Relic UI → API keys → Create Ingest/Query Key
Inject Canary Identification Custom Attribute	Check if `newrelic.setCustomAttribute('userGroup', 'canary')` is called in Browser Agent

Important: If the browser agent does not collect PageAction and PageView events, the NRQL query for the Conversion Rate·Session Length MetricTemplate will always return null, causing the analysis to fail. Proceed to the next step only after passing all items on the checklist.

Practical Application

Example 1: Flagger Canary CR — Full HTTP Header A/B Routing Configuration

This is an overall configuration where requests with the x-user-group: beta header are routed to the Canary (new version), and other requests to the Stable (primary) version. Deployment proceeds only if all three business metrics (error rate, conversion rate, and session length) pass the AND condition.

yaml

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: prod
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  # 카나리 파드 시작 시간 + 첫 분석 인터벌 합산보다 충분히 크게 설정
  # 120초처럼 너무 짧으면 첫 분석이 완료되기 전에 타임아웃되어 즉시 롤백됨
  progressDeadlineSeconds: 300
  service:
    port: 80
    targetPort: 8080
    gateways:
      - istio-system/public-gateway
    hosts:
      - app.example.com
    trafficPolicy:
      tls:
        mode: ISTIO_MUTUAL
  analysis:
    interval: 1m
    threshold: 5          # 연속 실패 허용 횟수 초과 시 롤백
    iterations: 10        # 총 10회 분석 통과 시 배포 완료
    match:
      - headers:
          x-user-group:
            exact: "beta"
      - headers:
          cookie:
            regex: ".*canary=true.*"   # 헤더 조건과 OR로 동작 (둘 중 하나만 일치해도 카나리로 라우팅)
    metrics:
      - name: error-rate
        templateRef:
          name: newrelic-error-rate
          namespace: prod
        thresholdRange:
          max: 5          # 에러율 5% 초과 시 롤백
        interval: 1m
      - name: conversion-rate
        templateRef:
          name: newrelic-conversion-rate
          namespace: prod
        thresholdRange:
          min: 2.5        # 전환율 2.5% 미만 시 롤백
        interval: 5m      # NRQL {{ interval }} 변수를 300초로 설정 (넓은 집계 윈도우)
      - name: session-duration
        templateRef:
          name: newrelic-session-duration
          namespace: prod
        thresholdRange:
          min: 120        # 평균 세션 120초 미만 시 롤백
        interval: 5m
    webhooks:
      # flagger-loadtester의 실제 Service 엔드포인트로 교체하세요
      # 기본값: kubectl get svc -n <loadtester-namespace> | grep flagger-loadtester
      - name: load-test
        url: http://flagger-loadtester.prod/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://my-app.prod/"

Field	Description
`progressDeadlineSeconds`	The maximum time a canary pod must be in the Ready state. Must be set sufficiently longer than the pod start time.
`analysis.match`	Multiple items in an array operate on OR conditions. If at least one matches, route to the canary.
`analysis.interval`	Frequency at which Flagger evaluates metrics
Metric Level `interval`	NRQL `{{ interval }}` Variable Substitution Value. Controls only the `SINCE` aggregation scope of the query, independent of the evaluation cycle
`thresholdRange.min`	If the return value is less than this value, it is judged as a failure
`thresholdRange.max`	If the return value exceeds this value, it is judged as a failure

analysis.match OR vs AND Caution: Multiple items in the match array (header condition A, cookie condition B) operate as OR, meaning routing to the canary occurs if at least one of them matches. On the other hand, listing multiple header conditions within a single match block of an Istio VirtualService evaluates as AND. Be careful not to confuse the two structures.

Example 2: Istio VirtualService — Routing structure automatically generated by Flagger

Flagger automatically generates and manages the following VirtualServices based on Canary CR. While you do not need to write this file manually, you must understand its structure to correctly interpret troubleshooting and Kiali visualizations.

yaml

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
  namespace: prod
spec:
  gateways:
    - istio-system/public-gateway
  hosts:
    - app.example.com
  http:
    # 1순위: 헤더 매칭 → 카나리(신버전) 서비스
    - match:
        - headers:
            x-user-group:
              exact: "beta"
      route:
        - destination:
            host: my-app-canary
            port:
              number: 80
          weight: 100
    # 2순위: 기본 트래픽 → 안정(primary) 서비스
    - route:
        - destination:
            host: my-app-primary
            port:
              number: 80
          weight: 100

my-app-canary vs my-app-primary: Flagger replicates the original Deployment (my-app) to my-app-primary and exposes the new version pod as my-app-canary. Users are routed to one of these two Services.

Example 3: NRQL Query Pre-validation — Essential Step Before Applying MetricTemplate

Before applying NRQL to MetricTemplate, be sure to check the data types and value ranges by running the query below in New Relic Query Builder. If no data is retrieved, do not proceed to the next step; instead, check the New Relic agent integration status first.

sql

-- [검증용] 전환율: userGroup 커스텀 속성 기반 그룹별 비교
-- FACET/TIMESERIES 포함 — 대시보드 시각화용이며 MetricTemplate에는 사용 불가
SELECT
  filter(count(*), WHERE action = 'purchase_complete') /
  filter(count(*), WHERE action = 'product_view') * 100 AS 'ConversionRate'
FROM PageAction
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
TIMESERIES 5 minutes
SINCE 1 hour ago
 
-- [검증용] 세션 길이: 백분위수 포함 (A/B 그룹 비교)
SELECT average(session.duration) AS 'AvgSessionSec',
       percentile(session.duration, 50, 90, 99) AS 'P50/P90/P99'
FROM PageView
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
SINCE 30 minutes ago
 
-- [검증용] 세션당 페이지뷰 수
SELECT count(*) / uniqueCount(session) AS 'PageviewsPerSession'
FROM PageView
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
SINCE 1 hour ago
 
-- [검증용] 장바구니 이탈율
SELECT
  filter(uniqueCount(session), WHERE action = 'cart_abandon') /
  uniqueCount(session) * 100 AS 'CartAbandonRate'
FROM PageAction
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
TIMESERIES 10 minutes
SINCE 2 hours ago

MetricTemplate Transition Rule: Remove the FACET and TIMESERIES clauses from the dashboard query, modify it to return a single number, and apply it. Link with the Flagger interval using SINCE {{ interval }} seconds ago. If there is no data in userAttributes.userGroup, check if the browser agent is correctly calling newrelic.setCustomAttribute('userGroup', 'canary').

Example 4: New Relic MetricTemplate — Conversion Rate · Session Length · Error Rate

The authentication secret and the three MetricTemplates must all be deployed in the same namespace. This is because secretRef can only reference secrets in the same namespace.

Secret Generation (kubectl method recommended)

kubectl create secret generic newrelic-credentials \
  -n prod \
  --from-literal=newrelic_account_id=<your-account-id> \
  --from-literal=newrelic_query_key=<your-insights-query-key>

If you manage with YAML, using stringData allows Kubernetes to automatically base64 encode even if written in plain text:

yaml

apiVersion: v1
kind: Secret
metadata:
  name: newrelic-credentials
  namespace: prod
type: Opaque
stringData:
  newrelic_account_id: "your-account-id-here"
  newrelic_query_key: "your-insights-query-key-here"

MetricTemplate Deployment

yaml

---
# 전환율 MetricTemplate
# 전제: Browser 에이전트에서 newrelic.setCustomAttribute('userGroup', 'canary')를
#       카나리로 라우팅된 사용자에게 주입해야 이 쿼리가 동작한다
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: newrelic-conversion-rate
  namespace: prod
spec:
  provider:
    type: newrelic
    secretRef:
      name: newrelic-credentials
  query: |
    SELECT
      IF(
        filter(count(*), WHERE action = 'product_view') > 0,
        filter(count(*), WHERE action = 'purchase_complete') /
        filter(count(*), WHERE action = 'product_view') * 100,
        null
      )
    FROM PageAction
    WHERE appName = '{{ target }}'
    AND userAttributes.userGroup = 'canary'
    SINCE {{ interval }} seconds ago
---
# 세션 길이 MetricTemplate
# 전제: Browser 에이전트에서 newrelic.setCustomAttribute('userGroup', 'canary')를
#       카나리로 라우팅된 사용자에게 주입해야 이 쿼리가 동작한다
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: newrelic-session-duration
  namespace: prod
spec:
  provider:
    type: newrelic
    secretRef:
      name: newrelic-credentials
  query: |
    SELECT average(session.duration)
    FROM PageView
    WHERE appName = '{{ target }}'
    AND userAttributes.userGroup = 'canary'
    SINCE {{ interval }} seconds ago
---
# 에러율 MetricTemplate
# httpResponseCode >= 500: 서버 5xx 에러 기준
# 4xx도 포함하려면 >= 400으로 변경하고 팀 기준에 맞게 조정
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: newrelic-error-rate
  namespace: prod
spec:
  provider:
    type: newrelic
    secretRef:
      name: newrelic-credentials
  query: |
    SELECT
      filter(count(*), WHERE httpResponseCode >= 500) /
      count(*) * 100
    FROM Transaction
    WHERE appName = '{{ target }}'
    SINCE {{ interval }} seconds ago

Item	Description
Use custom attributes instead of	`userAttributes.userGroup = 'canary'`
`IF(...) > 0, ..., null`	Prevent division errors when denominator (ProductView) is 0. When `null` returns, Flagger skips the evaluation of that interval
`httpResponseCode >= 500`	Only server errors (5xx) are counted. Whether to include 4xx is determined on a team basis depending on service characteristics
`secretRef`	Can only reference Secrets in the same namespace as the MetricTemplate

Method for Injecting Custom Canary Identification Attributes: Call newrelic.setCustomAttribute('userGroup', 'canary') in the New Relic Browser agent initialization code. Server-side transactions identify deployment groups using the APM agent's addCustomAttribute('deploymentGroup', 'canary') API or the environment variable NEW_RELIC_METADATA_KUBERNETES_DEPLOYMENT_NAME.

Pros and Cons Analysis

Advantages

Item	Content
Automated Deployment Gating	Immediate automatic rollback without human intervention if business metrics exceed thresholds
Declarative GitOps Management	Version control of the entire A/B pipeline in Git with a single Canary CR YAML
Business KPI Integration	Utilize conversion rate and session length directly as deployment criteria, going beyond error rate and latency
Session Affinity Guaranteed	Same users experience the same version throughout the experiment period via header/cookie routing
Multi-metric AND Validation	Multi-layered validation possible by combining Prometheus, New Relic, and Datadog results with an AND condition

Disadvantages and Precautions

Item	Content	Response Plan
NRQL Single Value Constraint	MetricTemplate queries must return only one float64. `TIMESERIES` · Parsing errors when returning multiple columns	Must perform the Query Builder pre-validation step in Example 3 and convert to a single value format
Data Collection Delay	Delay of tens of seconds to minutes exists in New Relic event collection	`interval` minimum `1m`, business metrics set to `5m` or higher
Traffic representativeness bias	Beta header users may not be representative of the whole	Validated first with QA and internal staff, then gradually expanded to the actual beta user group
Istio Sidecar Overhead	Envoy Proxy Latency and Memory Burden	Consider Istio Ambient Mesh Migration for High-Traffic Services
Secret Namespace Constraints	`secretRef` can only reference Secrets within the same namespace	Centralized management via External Secrets Operator or Vault Agent integration when operating with multiple namespaces

Istio Ambient Mesh: A method of configuring a service mesh using node-level L4 proxies (ztunnel) and namespace-level L7 proxies (waypoints) without sidecars. It is expected to enter the GA stage by 2025 and significantly reduce sidecar overhead.

The Most Common Mistakes in Practice

Using TIMESERIES or FACET as is in a MetricTemplate query: New Relic returns an array or multiple rows, causing Flagger to fail to parse and abort analysis. It must be converted into an aggregate form that returns a single number. Diagnosis: Check for errors in kubectl describe canary my-app -n prod, "unexpected type", or "no values found".
Set progressDeadlineSeconds too short (e.g., 120 seconds): If Pod startup takes 60 seconds, a timeout occurs and an immediate rollback happens before the first analysis interval (1m) is completed. It must be set sufficiently larger than the sum of Pod startup time + first analysis interval (recommended at least 300 seconds). Diagnosis: Check the "deadline exceeded" message during the kubectl describe canary my-app -n prod → Progressing steps.
analysis.interval set too short (30 seconds or less): New Relic events have not yet been collected, causing query results to return 0 or null; as this value exceeds the threshold, unnecessary rollbacks occur repeatedly. Diagnosis: No data confirmed when executing a query with SINCE 30 seconds ago in Query Builder.
Setting a fixed threshold without reviewing time zones or seasonality: Conversion rates naturally vary significantly between weekday mornings and weekend evenings. A narrow thresholdRange causes a false positive rollback of a normal deployment. You must first identify the normal range using at least 2 to 4 weeks of historical data and leave sufficient margin for the lower limit. Diagnosis: Check the distribution by time zone using SINCE 4 weeks ago TIMESERIES 1 day.
Aggregating all traffic without the custom attribute for canary identification: If you filter only with appName without the userAttributes.userGroup = 'canary' condition, primary + canary traffic is aggregated, diluting the comparison experiment value. Diagnosis: First, check whether groups are separated using FACET userAttributes.userGroup in the Query Builder.

In Conclusion

The Flagger + Istio + New Relic NRQL stack is the most practical path to elevating "deployment" to "product experiment automation." A pipeline that rolls back if the conversion rate drops, even if the error rate is normal, enables SREs and PMs to reach KPI consensus based on the same YAML thresholdRange. Infrastructure engineering and product analysis meet on the same declarative configuration file.

3 Steps to Start Right Now:

NRQL Validation in New Relic Query Builder: Check the app's PageAction, PageView event schemas and verify that the conversion rate query returns a single float64 (SELECT filter(count(*), WHERE action='purchase_complete') / filter(count(*), WHERE action='product_view') * 100 FROM PageAction WHERE appName='<your-app-name-in-newrelic>' SINCE 5 minutes ago)
Deploy MetricTemplate and Secret: Apply the verified NRQL to the MetricTemplate CRD and register the New Relic Insights Query Key as kubectl create secret generic newrelic-credentials -n prod --from-literal=newrelic_account_id=<account-id> --from-literal=newrelic_query_key=<query-key>.
Connect analysis.match and metrics to Canary CR: Apply the Canary CR to the existing Deployment and monitor the analysis progress status in real-time using kubectl describe canary my-app -n prod.

Next Post: How to Build an A/B Test Gating Layer Robust to Natural Variation by Connecting a Statistical Significance Service to Flagger webhooks

Reference Materials

Flagger + Istio A/B Routing: Integrating New Relic NRQL with Conversion Rate as Distribution Gating Criteria | DEV BAK - 기술블로그

Flagger + Istio A/B Routing: Integrating New Relic NRQL with Conversion Rate as Distribution Gating Criteria

Prerequisites

Flagger installation completed on Kubernetes cluster
Istio Service Mesh in operation (Sidecar injection enabled)
Own a New Relic Account and Issue Insights Query API Key
Integration of New Relic APM Agent and Browser Agent with the application is complete MetricTemplate operates only if PageAction, PageView events are being collected in New Relic)

Key Concepts

Flagger's Progressive Delivery Pipeline

Canary CR 변경 감지
    → Istio VirtualService 생성/수정 (헤더 매칭 라우팅)
    → 분석 인터벌마다 MetricTemplate으로 NRQL 쿼리 실행
    → 임계값 통과 시 배포 승인 / 초과 시 자동 롤백

HTTP Header-based A/B Routing vs. Weighted Canary

The two strategies have different purposes.

Classification	Weighted Canary	HTTP Header A/B Routing
Routing Criteria	Traffic Percentage (e.g., 10%)	Request Header/Cookie Value
User Consistency	Difficult to Guarantee	Same User → Same Version (Session Affinity)
Suitable Services	Backend API	Frontend, Payment Flow
Analysis Purpose	Stability Verification	Business KPI Comparison

Integration Structure of NRQL and Flagger MetricTemplate

Pre-preparation Checklist

Before applying this in practice, let's check the following items in order.

Item	How to check
Install Flagger	`kubectl get pods -n flagger-system`
Enable Istio Sidecar Injection	`kubectl get ns prod --show-labels` → `istio-injection=enabled` Confirm
New Relic APM Integration	New Relic UI → APM → Check App Name List
New Relic Browser Agent	New Relic UI → Browser → `PageView` / `PageAction` Verify Event Collection
Insights Query API Key	New Relic UI → API keys → Create Ingest/Query Key
Inject Canary Identification Custom Attribute	Check if `newrelic.setCustomAttribute('userGroup', 'canary')` is called in Browser Agent

Practical Application

Example 1: Flagger Canary CR — Full HTTP Header A/B Routing Configuration

yaml

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: prod
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  # 카나리 파드 시작 시간 + 첫 분석 인터벌 합산보다 충분히 크게 설정
  # 120초처럼 너무 짧으면 첫 분석이 완료되기 전에 타임아웃되어 즉시 롤백됨
  progressDeadlineSeconds: 300
  service:
    port: 80
    targetPort: 8080
    gateways:
      - istio-system/public-gateway
    hosts:
      - app.example.com
    trafficPolicy:
      tls:
        mode: ISTIO_MUTUAL
  analysis:
    interval: 1m
    threshold: 5          # 연속 실패 허용 횟수 초과 시 롤백
    iterations: 10        # 총 10회 분석 통과 시 배포 완료
    match:
      - headers:
          x-user-group:
            exact: "beta"
      - headers:
          cookie:
            regex: ".*canary=true.*"   # 헤더 조건과 OR로 동작 (둘 중 하나만 일치해도 카나리로 라우팅)
    metrics:
      - name: error-rate
        templateRef:
          name: newrelic-error-rate
          namespace: prod
        thresholdRange:
          max: 5          # 에러율 5% 초과 시 롤백
        interval: 1m
      - name: conversion-rate
        templateRef:
          name: newrelic-conversion-rate
          namespace: prod
        thresholdRange:
          min: 2.5        # 전환율 2.5% 미만 시 롤백
        interval: 5m      # NRQL {{ interval }} 변수를 300초로 설정 (넓은 집계 윈도우)
      - name: session-duration
        templateRef:
          name: newrelic-session-duration
          namespace: prod
        thresholdRange:
          min: 120        # 평균 세션 120초 미만 시 롤백
        interval: 5m
    webhooks:
      # flagger-loadtester의 실제 Service 엔드포인트로 교체하세요
      # 기본값: kubectl get svc -n <loadtester-namespace> | grep flagger-loadtester
      - name: load-test
        url: http://flagger-loadtester.prod/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://my-app.prod/"

Field	Description
`progressDeadlineSeconds`	The maximum time a canary pod must be in the Ready state. Must be set sufficiently longer than the pod start time.
`analysis.match`	Multiple items in an array operate on OR conditions. If at least one matches, route to the canary.
`analysis.interval`	Frequency at which Flagger evaluates metrics
Metric Level `interval`	NRQL `{{ interval }}` Variable Substitution Value. Controls only the `SINCE` aggregation scope of the query, independent of the evaluation cycle
`thresholdRange.min`	If the return value is less than this value, it is judged as a failure
`thresholdRange.max`	If the return value exceeds this value, it is judged as a failure

Example 2: Istio VirtualService — Routing structure automatically generated by Flagger

yaml

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
  namespace: prod
spec:
  gateways:
    - istio-system/public-gateway
  hosts:
    - app.example.com
  http:
    # 1순위: 헤더 매칭 → 카나리(신버전) 서비스
    - match:
        - headers:
            x-user-group:
              exact: "beta"
      route:
        - destination:
            host: my-app-canary
            port:
              number: 80
          weight: 100
    # 2순위: 기본 트래픽 → 안정(primary) 서비스
    - route:
        - destination:
            host: my-app-primary
            port:
              number: 80
          weight: 100

Example 3: NRQL Query Pre-validation — Essential Step Before Applying MetricTemplate

sql

-- [검증용] 전환율: userGroup 커스텀 속성 기반 그룹별 비교
-- FACET/TIMESERIES 포함 — 대시보드 시각화용이며 MetricTemplate에는 사용 불가
SELECT
  filter(count(*), WHERE action = 'purchase_complete') /
  filter(count(*), WHERE action = 'product_view') * 100 AS 'ConversionRate'
FROM PageAction
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
TIMESERIES 5 minutes
SINCE 1 hour ago
 
-- [검증용] 세션 길이: 백분위수 포함 (A/B 그룹 비교)
SELECT average(session.duration) AS 'AvgSessionSec',
       percentile(session.duration, 50, 90, 99) AS 'P50/P90/P99'
FROM PageView
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
SINCE 30 minutes ago
 
-- [검증용] 세션당 페이지뷰 수
SELECT count(*) / uniqueCount(session) AS 'PageviewsPerSession'
FROM PageView
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
SINCE 1 hour ago
 
-- [검증용] 장바구니 이탈율
SELECT
  filter(uniqueCount(session), WHERE action = 'cart_abandon') /
  uniqueCount(session) * 100 AS 'CartAbandonRate'
FROM PageAction
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
TIMESERIES 10 minutes
SINCE 2 hours ago

Example 4: New Relic MetricTemplate — Conversion Rate · Session Length · Error Rate

The authentication secret and the three MetricTemplates must all be deployed in the same namespace. This is because secretRef can only reference secrets in the same namespace.

Secret Generation (kubectl method recommended)

kubectl create secret generic newrelic-credentials \
  -n prod \
  --from-literal=newrelic_account_id=<your-account-id> \
  --from-literal=newrelic_query_key=<your-insights-query-key>

If you manage with YAML, using stringData allows Kubernetes to automatically base64 encode even if written in plain text:

yaml

apiVersion: v1
kind: Secret
metadata:
  name: newrelic-credentials
  namespace: prod
type: Opaque
stringData:
  newrelic_account_id: "your-account-id-here"
  newrelic_query_key: "your-insights-query-key-here"

MetricTemplate Deployment

yaml

---
# 전환율 MetricTemplate
# 전제: Browser 에이전트에서 newrelic.setCustomAttribute('userGroup', 'canary')를
#       카나리로 라우팅된 사용자에게 주입해야 이 쿼리가 동작한다
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: newrelic-conversion-rate
  namespace: prod
spec:
  provider:
    type: newrelic
    secretRef:
      name: newrelic-credentials
  query: |
    SELECT
      IF(
        filter(count(*), WHERE action = 'product_view') > 0,
        filter(count(*), WHERE action = 'purchase_complete') /
        filter(count(*), WHERE action = 'product_view') * 100,
        null
      )
    FROM PageAction
    WHERE appName = '{{ target }}'
    AND userAttributes.userGroup = 'canary'
    SINCE {{ interval }} seconds ago
---
# 세션 길이 MetricTemplate
# 전제: Browser 에이전트에서 newrelic.setCustomAttribute('userGroup', 'canary')를
#       카나리로 라우팅된 사용자에게 주입해야 이 쿼리가 동작한다
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: newrelic-session-duration
  namespace: prod
spec:
  provider:
    type: newrelic
    secretRef:
      name: newrelic-credentials
  query: |
    SELECT average(session.duration)
    FROM PageView
    WHERE appName = '{{ target }}'
    AND userAttributes.userGroup = 'canary'
    SINCE {{ interval }} seconds ago
---
# 에러율 MetricTemplate
# httpResponseCode >= 500: 서버 5xx 에러 기준
# 4xx도 포함하려면 >= 400으로 변경하고 팀 기준에 맞게 조정
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: newrelic-error-rate
  namespace: prod
spec:
  provider:
    type: newrelic
    secretRef:
      name: newrelic-credentials
  query: |
    SELECT
      filter(count(*), WHERE httpResponseCode >= 500) /
      count(*) * 100
    FROM Transaction
    WHERE appName = '{{ target }}'
    SINCE {{ interval }} seconds ago

Item	Description
Use custom attributes instead of	`userAttributes.userGroup = 'canary'`
`IF(...) > 0, ..., null`	Prevent division errors when denominator (ProductView) is 0. When `null` returns, Flagger skips the evaluation of that interval
`httpResponseCode >= 500`	Only server errors (5xx) are counted. Whether to include 4xx is determined on a team basis depending on service characteristics
`secretRef`	Can only reference Secrets in the same namespace as the MetricTemplate

Pros and Cons Analysis

Advantages

Item	Content
Automated Deployment Gating	Immediate automatic rollback without human intervention if business metrics exceed thresholds
Declarative GitOps Management	Version control of the entire A/B pipeline in Git with a single Canary CR YAML
Business KPI Integration	Utilize conversion rate and session length directly as deployment criteria, going beyond error rate and latency
Session Affinity Guaranteed	Same users experience the same version throughout the experiment period via header/cookie routing
Multi-metric AND Validation	Multi-layered validation possible by combining Prometheus, New Relic, and Datadog results with an AND condition

Disadvantages and Precautions

Item	Content	Response Plan
NRQL Single Value Constraint	MetricTemplate queries must return only one float64. `TIMESERIES` · Parsing errors when returning multiple columns	Must perform the Query Builder pre-validation step in Example 3 and convert to a single value format
Data Collection Delay	Delay of tens of seconds to minutes exists in New Relic event collection	`interval` minimum `1m`, business metrics set to `5m` or higher
Traffic representativeness bias	Beta header users may not be representative of the whole	Validated first with QA and internal staff, then gradually expanded to the actual beta user group
Istio Sidecar Overhead	Envoy Proxy Latency and Memory Burden	Consider Istio Ambient Mesh Migration for High-Traffic Services
Secret Namespace Constraints	`secretRef` can only reference Secrets within the same namespace	Centralized management via External Secrets Operator or Vault Agent integration when operating with multiple namespaces

The Most Common Mistakes in Practice

Using TIMESERIES or FACET as is in a MetricTemplate query: New Relic returns an array or multiple rows, causing Flagger to fail to parse and abort analysis. It must be converted into an aggregate form that returns a single number. Diagnosis: Check for errors in kubectl describe canary my-app -n prod, "unexpected type", or "no values found".
Set progressDeadlineSeconds too short (e.g., 120 seconds): If Pod startup takes 60 seconds, a timeout occurs and an immediate rollback happens before the first analysis interval (1m) is completed. It must be set sufficiently larger than the sum of Pod startup time + first analysis interval (recommended at least 300 seconds). Diagnosis: Check the "deadline exceeded" message during the kubectl describe canary my-app -n prod → Progressing steps.
analysis.interval set too short (30 seconds or less): New Relic events have not yet been collected, causing query results to return 0 or null; as this value exceeds the threshold, unnecessary rollbacks occur repeatedly. Diagnosis: No data confirmed when executing a query with SINCE 30 seconds ago in Query Builder.
Setting a fixed threshold without reviewing time zones or seasonality: Conversion rates naturally vary significantly between weekday mornings and weekend evenings. A narrow thresholdRange causes a false positive rollback of a normal deployment. You must first identify the normal range using at least 2 to 4 weeks of historical data and leave sufficient margin for the lower limit. Diagnosis: Check the distribution by time zone using SINCE 4 weeks ago TIMESERIES 1 day.
Aggregating all traffic without the custom attribute for canary identification: If you filter only with appName without the userAttributes.userGroup = 'canary' condition, primary + canary traffic is aggregated, diluting the comparison experiment value. Diagnosis: First, check whether groups are separated using FACET userAttributes.userGroup in the Query Builder.

In Conclusion

3 Steps to Start Right Now:

NRQL Validation in New Relic Query Builder: Check the app's PageAction, PageView event schemas and verify that the conversion rate query returns a single float64 (SELECT filter(count(*), WHERE action='purchase_complete') / filter(count(*), WHERE action='product_view') * 100 FROM PageAction WHERE appName='<your-app-name-in-newrelic>' SINCE 5 minutes ago)
Deploy MetricTemplate and Secret: Apply the verified NRQL to the MetricTemplate CRD and register the New Relic Insights Query Key as kubectl create secret generic newrelic-credentials -n prod --from-literal=newrelic_account_id=<account-id> --from-literal=newrelic_query_key=<query-key>.
Connect analysis.match and metrics to Canary CR: Apply the Canary CR to the existing Deployment and monitor the analysis progress status in real-time using kubectl describe canary my-app -n prod.

Next Post: How to Build an A/B Test Gating Layer Robust to Natural Variation by Connecting a Statistical Significance Service to Flagger webhooks

Key Concepts

Flagger's Progressive Delivery Pipeline

HTTP Header-based A/B Routing vs. Weighted Canary

Integration Structure of NRQL and Flagger MetricTemplate

Pre-preparation Checklist

Practical Application

Example 1: Flagger Canary CR — Full HTTP Header A/B Routing Configuration

Example 2: Istio VirtualService — Routing structure automatically generated by Flagger

Example 3: NRQL Query Pre-validation — Essential Step Before Applying MetricTemplate

Example 4: New Relic MetricTemplate — Conversion Rate · Session Length · Error Rate

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Key Concepts

Flagger's Progressive Delivery Pipeline

HTTP Header-based A/B Routing vs. Weighted Canary

Integration Structure of NRQL and Flagger MetricTemplate

Pre-preparation Checklist

Practical Application

Example 1: Flagger Canary CR — Full HTTP Header A/B Routing Configuration

Example 2: Istio VirtualService — Routing structure automatically generated by Flagger

Example 3: NRQL Query Pre-validation — Essential Step Before Applying MetricTemplate

Example 4: New Relic MetricTemplate — Conversion Rate · Session Length · Error Rate

Pros and Cons Analysis

Advantages

Disadvantages and Precautions

The Most Common Mistakes in Practice

In Conclusion

Reference Materials

Recommended Posts

Implementing Canary Deployment Gating Without Unnecessary Rollbacks with Flagger Webhook — The Complete Guide to Mann-Whitney Statistical Validation Services

Implementing Alpha Spending Sequential Testing in Flagger Webhook — How to Reduce Canary Rollbacks by Up to 66% with Statistical Early Exit

How to Statistically Automatically Terminate Canaries with Utility Stopping and Hierarchical Testing: A Practical Guide to Beta-Spending Design

Using Flagger MetricTemplate CRD for automating Datadog and New Relic canary deployments

Configuring LLM p99 Latency-Based Canary Auto-Rollback with Flagger MetricTemplate

Simplifying Canary Deployment with a Single Flagger CRD: From KEDA ScaledObject Separation Issues to Argo CD ApplicationSet Multicluster MCP Server Automation