Flagger + Istio A/B Routing: Integrating New Relic NRQL with Conversion Rate as Distribution Gating Criteria
Deployment is no longer a mere act of "uploading and watching." In a modern production environment, a release is a moment of hypothesis testing, requiring data to prove that the new version actually improves business metrics. However, many teams still judge the success or failure of a deployment solely based on error rates and latency. Whether users purchased more products or maintained longer sessions is not visible on the Prometheus dashboard.
After reading this, you can configure a pipeline for production today that automatically rolls back without human intervention the moment the conversion rate drops below 2.5%. By connecting a New Relic NRQL query to Flagger's MetricTemplate, you can directly utilize business KPIs such as conversion rate and session length as gating conditions for deployment automation. The changes brought about by this pipeline extend beyond the technical aspects. SREs communicate with PMs based on Canary CR's thresholdRange, and you can verify through Git commits that the criterion agreed upon by the PM—"conversion rate above 2.5%"—is directly reflected in the automation pipeline. Infrastructure engineering and product analysis meet on the same declarative configuration file.
This article covers the entire pipeline step-by-step, from writing Flagger Canary CRs and understanding the Istio VirtualService structure to NRQL pre-validation and configuring a new Relic MetricTemplate. Before starting, ensure that the following prerequisites are met.
Prerequisites
- Flagger installation completed on Kubernetes cluster
- Istio Service Mesh in operation (Sidecar injection enabled)
- Own a New Relic Account and Issue Insights Query API Key
- Integration of New Relic APM Agent and Browser Agent with the application is complete
MetricTemplate operates only if
PageAction,PageViewevents are being collected in New Relic)
Key Concepts
Flagger's Progressive Delivery Pipeline
Flagger is a Kubernetes operator that declaratively manages the entire A/B test pipeline with a single Canary CR (Custom Resource). When a developer changes the Deployment image, Flagger detects this and automatically executes the next flow.
Canary CR 변경 감지
→ Istio VirtualService 생성/수정 (헤더 매칭 라우팅)
→ 분석 인터벌마다 MetricTemplate으로 NRQL 쿼리 실행
→ 임계값 통과 시 배포 승인 / 초과 시 자동 롤백MetricTemplate: A CRD for Flagger to send queries to external metric providers (New Relic, Datadog, Prometheus, etc.). The query result must be a single float64 value and is compared to a threshold in the metrics field of Canary.
HTTP Header-based A/B Routing vs. Weighted Canary
The two strategies have different purposes.
| Classification | Weighted Canary | HTTP Header A/B Routing |
|---|---|---|
| Routing Criteria | Traffic Percentage (e.g., 10%) | Request Header/Cookie Value |
| User Consistency | Difficult to Guarantee | Same User → Same Version (Session Affinity) |
| Suitable Services | Backend API | Frontend, Payment Flow |
| Analysis Purpose | Stability Verification | Business KPI Comparison |
The header-based method routes only users with the x-user-group: beta header to the new version. If this header is injected into specific user segments in API Gateway or BFF, beta user groups can consistently experience the new version, enabling accurate comparison testing.
Integration Structure of NRQL and Flagger MetricTemplate
NRQL (New Relic Query Language) queries New Relic's MELT data (Metrics, Events, Logs, Traces) using syntax similar to SQL. At each analysis interval, Flagger sends the NRQL defined in MetricTemplate to the New Relic Insights Query API and compares the returned single numeric value with thresholdRange.
MELT Data: The four core data types of New Relic. PageAction(User behavior events), PageView(Page views), Transaction(Server transactions), Metric(Numerical measures). PageAction and PageView are primarily used for business metric analysis.
Flagger template variables such as {{ target }} and {{ interval }} can be used in NRQL queries. {{ target }} is replaced with the app name in Canary, and {{ interval }} is replaced with the interval setting value (in seconds) of the corresponding metric.
Relationship between metric level interval and analysis cycle: For Canary CR, analysis.interval: 1m is the cycle in which the Flagger evaluates the metric. For individual metrics in the metrics array, interval: 5m is the value (300 seconds) that the Flagger uses when substituting the NRQL {{ interval }} variable. In other words, the Flagger still executes the query every minute, but the SINCE range of that query is set to 5 minutes (300 seconds). This is a pattern utilized to obtain stable values with a wider aggregation window when short-term sample volatility is high, such as with business metrics.
Pre-preparation Checklist
Before applying this in practice, let's check the following items in order.
| Item | How to check |
|---|---|
| Install Flagger | kubectl get pods -n flagger-system |
| Enable Istio Sidecar Injection | kubectl get ns prod --show-labels → istio-injection=enabled Confirm |
| New Relic APM Integration | New Relic UI → APM → Check App Name List |
| New Relic Browser Agent | New Relic UI → Browser → PageView / PageAction Verify Event Collection |
| Insights Query API Key | New Relic UI → API keys → Create Ingest/Query Key |
| Inject Canary Identification Custom Attribute | Check if newrelic.setCustomAttribute('userGroup', 'canary') is called in Browser Agent |
Important: If the browser agent does not collect PageAction and PageView events, the NRQL query for the Conversion Rate·Session Length MetricTemplate will always return null, causing the analysis to fail. Proceed to the next step only after passing all items on the checklist.
Practical Application
Example 1: Flagger Canary CR — Full HTTP Header A/B Routing Configuration
This is an overall configuration where requests with the x-user-group: beta header are routed to the Canary (new version), and other requests to the Stable (primary) version. Deployment proceeds only if all three business metrics (error rate, conversion rate, and session length) pass the AND condition.
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
namespace: prod
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
# 카나리 파드 시작 시간 + 첫 분석 인터벌 합산보다 충분히 크게 설정
# 120초처럼 너무 짧으면 첫 분석이 완료되기 전에 타임아웃되어 즉시 롤백됨
progressDeadlineSeconds: 300
service:
port: 80
targetPort: 8080
gateways:
- istio-system/public-gateway
hosts:
- app.example.com
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
analysis:
interval: 1m
threshold: 5 # 연속 실패 허용 횟수 초과 시 롤백
iterations: 10 # 총 10회 분석 통과 시 배포 완료
match:
- headers:
x-user-group:
exact: "beta"
- headers:
cookie:
regex: ".*canary=true.*" # 헤더 조건과 OR로 동작 (둘 중 하나만 일치해도 카나리로 라우팅)
metrics:
- name: error-rate
templateRef:
name: newrelic-error-rate
namespace: prod
thresholdRange:
max: 5 # 에러율 5% 초과 시 롤백
interval: 1m
- name: conversion-rate
templateRef:
name: newrelic-conversion-rate
namespace: prod
thresholdRange:
min: 2.5 # 전환율 2.5% 미만 시 롤백
interval: 5m # NRQL {{ interval }} 변수를 300초로 설정 (넓은 집계 윈도우)
- name: session-duration
templateRef:
name: newrelic-session-duration
namespace: prod
thresholdRange:
min: 120 # 평균 세션 120초 미만 시 롤백
interval: 5m
webhooks:
# flagger-loadtester의 실제 Service 엔드포인트로 교체하세요
# 기본값: kubectl get svc -n <loadtester-namespace> | grep flagger-loadtester
- name: load-test
url: http://flagger-loadtester.prod/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://my-app.prod/"| Field | Description |
|---|---|
progressDeadlineSeconds |
The maximum time a canary pod must be in the Ready state. Must be set sufficiently longer than the pod start time. |
analysis.match |
Multiple items in an array operate on OR conditions. If at least one matches, route to the canary. |
analysis.interval |
Frequency at which Flagger evaluates metrics |
Metric Level interval |
NRQL {{ interval }} Variable Substitution Value. Controls only the SINCE aggregation scope of the query, independent of the evaluation cycle |
thresholdRange.min |
If the return value is less than this value, it is judged as a failure |
thresholdRange.max |
If the return value exceeds this value, it is judged as a failure |
analysis.match OR vs AND Caution: Multiple items in the match array (header condition A, cookie condition B) operate as OR, meaning routing to the canary occurs if at least one of them matches. On the other hand, listing multiple header conditions within a single match block of an Istio VirtualService evaluates as AND. Be careful not to confuse the two structures.
Example 2: Istio VirtualService — Routing structure automatically generated by Flagger
Flagger automatically generates and manages the following VirtualServices based on Canary CR. While you do not need to write this file manually, you must understand its structure to correctly interpret troubleshooting and Kiali visualizations.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-app
namespace: prod
spec:
gateways:
- istio-system/public-gateway
hosts:
- app.example.com
http:
# 1순위: 헤더 매칭 → 카나리(신버전) 서비스
- match:
- headers:
x-user-group:
exact: "beta"
route:
- destination:
host: my-app-canary
port:
number: 80
weight: 100
# 2순위: 기본 트래픽 → 안정(primary) 서비스
- route:
- destination:
host: my-app-primary
port:
number: 80
weight: 100my-app-canary vs my-app-primary: Flagger replicates the original Deployment (my-app) to my-app-primary and exposes the new version pod as my-app-canary. Users are routed to one of these two Services.
Example 3: NRQL Query Pre-validation — Essential Step Before Applying MetricTemplate
Before applying NRQL to MetricTemplate, be sure to check the data types and value ranges by running the query below in New Relic Query Builder. If no data is retrieved, do not proceed to the next step; instead, check the New Relic agent integration status first.
-- [검증용] 전환율: userGroup 커스텀 속성 기반 그룹별 비교
-- FACET/TIMESERIES 포함 — 대시보드 시각화용이며 MetricTemplate에는 사용 불가
SELECT
filter(count(*), WHERE action = 'purchase_complete') /
filter(count(*), WHERE action = 'product_view') * 100 AS 'ConversionRate'
FROM PageAction
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
TIMESERIES 5 minutes
SINCE 1 hour ago
-- [검증용] 세션 길이: 백분위수 포함 (A/B 그룹 비교)
SELECT average(session.duration) AS 'AvgSessionSec',
percentile(session.duration, 50, 90, 99) AS 'P50/P90/P99'
FROM PageView
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
SINCE 30 minutes ago
-- [검증용] 세션당 페이지뷰 수
SELECT count(*) / uniqueCount(session) AS 'PageviewsPerSession'
FROM PageView
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
SINCE 1 hour ago
-- [검증용] 장바구니 이탈율
SELECT
filter(uniqueCount(session), WHERE action = 'cart_abandon') /
uniqueCount(session) * 100 AS 'CartAbandonRate'
FROM PageAction
WHERE appName = '<your-app-name-in-newrelic>'
FACET userAttributes.userGroup
TIMESERIES 10 minutes
SINCE 2 hours agoMetricTemplate Transition Rule: Remove the FACET and TIMESERIES clauses from the dashboard query, modify it to return a single number, and apply it. Link with the Flagger interval using SINCE {{ interval }} seconds ago. If there is no data in userAttributes.userGroup, check if the browser agent is correctly calling newrelic.setCustomAttribute('userGroup', 'canary').
Example 4: New Relic MetricTemplate — Conversion Rate · Session Length · Error Rate
The authentication secret and the three MetricTemplates must all be deployed in the same namespace. This is because secretRef can only reference secrets in the same namespace.
Secret Generation (kubectl method recommended)
kubectl create secret generic newrelic-credentials \
-n prod \
--from-literal=newrelic_account_id=<your-account-id> \
--from-literal=newrelic_query_key=<your-insights-query-key>If you manage with YAML, using stringData allows Kubernetes to automatically base64 encode even if written in plain text:
apiVersion: v1
kind: Secret
metadata:
name: newrelic-credentials
namespace: prod
type: Opaque
stringData:
newrelic_account_id: "your-account-id-here"
newrelic_query_key: "your-insights-query-key-here"MetricTemplate Deployment
---
# 전환율 MetricTemplate
# 전제: Browser 에이전트에서 newrelic.setCustomAttribute('userGroup', 'canary')를
# 카나리로 라우팅된 사용자에게 주입해야 이 쿼리가 동작한다
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: newrelic-conversion-rate
namespace: prod
spec:
provider:
type: newrelic
secretRef:
name: newrelic-credentials
query: |
SELECT
IF(
filter(count(*), WHERE action = 'product_view') > 0,
filter(count(*), WHERE action = 'purchase_complete') /
filter(count(*), WHERE action = 'product_view') * 100,
null
)
FROM PageAction
WHERE appName = '{{ target }}'
AND userAttributes.userGroup = 'canary'
SINCE {{ interval }} seconds ago
---
# 세션 길이 MetricTemplate
# 전제: Browser 에이전트에서 newrelic.setCustomAttribute('userGroup', 'canary')를
# 카나리로 라우팅된 사용자에게 주입해야 이 쿼리가 동작한다
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: newrelic-session-duration
namespace: prod
spec:
provider:
type: newrelic
secretRef:
name: newrelic-credentials
query: |
SELECT average(session.duration)
FROM PageView
WHERE appName = '{{ target }}'
AND userAttributes.userGroup = 'canary'
SINCE {{ interval }} seconds ago
---
# 에러율 MetricTemplate
# httpResponseCode >= 500: 서버 5xx 에러 기준
# 4xx도 포함하려면 >= 400으로 변경하고 팀 기준에 맞게 조정
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: newrelic-error-rate
namespace: prod
spec:
provider:
type: newrelic
secretRef:
name: newrelic-credentials
query: |
SELECT
filter(count(*), WHERE httpResponseCode >= 500) /
count(*) * 100
FROM Transaction
WHERE appName = '{{ target }}'
SINCE {{ interval }} seconds ago| Item | Description |
|---|---|
| Use custom attributes instead of | userAttributes.userGroup = 'canary' |
IF(...) > 0, ..., null |
Prevent division errors when denominator (ProductView) is 0. When null returns, Flagger skips the evaluation of that interval |
httpResponseCode >= 500 |
Only server errors (5xx) are counted. Whether to include 4xx is determined on a team basis depending on service characteristics |
secretRef |
Can only reference Secrets in the same namespace as the MetricTemplate |
Method for Injecting Custom Canary Identification Attributes: Call newrelic.setCustomAttribute('userGroup', 'canary') in the New Relic Browser agent initialization code. Server-side transactions identify deployment groups using the APM agent's addCustomAttribute('deploymentGroup', 'canary') API or the environment variable NEW_RELIC_METADATA_KUBERNETES_DEPLOYMENT_NAME.
Pros and Cons Analysis
Advantages
| Item | Content |
|---|---|
| Automated Deployment Gating | Immediate automatic rollback without human intervention if business metrics exceed thresholds |
| Declarative GitOps Management | Version control of the entire A/B pipeline in Git with a single Canary CR YAML |
| Business KPI Integration | Utilize conversion rate and session length directly as deployment criteria, going beyond error rate and latency |
| Session Affinity Guaranteed | Same users experience the same version throughout the experiment period via header/cookie routing |
| Multi-metric AND Validation | Multi-layered validation possible by combining Prometheus, New Relic, and Datadog results with an AND condition |
Disadvantages and Precautions
| Item | Content | Response Plan |
|---|---|---|
| NRQL Single Value Constraint | MetricTemplate queries must return only one float64. TIMESERIES · Parsing errors when returning multiple columns |
Must perform the Query Builder pre-validation step in Example 3 and convert to a single value format |
| Data Collection Delay | Delay of tens of seconds to minutes exists in New Relic event collection | interval minimum 1m, business metrics set to 5m or higher |
| Traffic representativeness bias | Beta header users may not be representative of the whole | Validated first with QA and internal staff, then gradually expanded to the actual beta user group |
| Istio Sidecar Overhead | Envoy Proxy Latency and Memory Burden | Consider Istio Ambient Mesh Migration for High-Traffic Services |
| Secret Namespace Constraints | secretRef can only reference Secrets within the same namespace |
Centralized management via External Secrets Operator or Vault Agent integration when operating with multiple namespaces |
Istio Ambient Mesh: A method of configuring a service mesh using node-level L4 proxies (ztunnel) and namespace-level L7 proxies (waypoints) without sidecars. It is expected to enter the GA stage by 2025 and significantly reduce sidecar overhead.
The Most Common Mistakes in Practice
- Using
TIMESERIESorFACETas is in a MetricTemplate query: New Relic returns an array or multiple rows, causing Flagger to fail to parse and abort analysis. It must be converted into an aggregate form that returns a single number. Diagnosis: Check for errors inkubectl describe canary my-app -n prod,"unexpected type", or"no values found". - Set
progressDeadlineSecondstoo short (e.g., 120 seconds): If Pod startup takes 60 seconds, a timeout occurs and an immediate rollback happens before the first analysis interval (1m) is completed. It must be set sufficiently larger than the sum of Pod startup time + first analysis interval (recommended at least 300 seconds). Diagnosis: Check the"deadline exceeded"message during thekubectl describe canary my-app -n prod→Progressingsteps. analysis.intervalset too short (30 seconds or less): New Relic events have not yet been collected, causing query results to return 0 or null; as this value exceeds the threshold, unnecessary rollbacks occur repeatedly. Diagnosis: No data confirmed when executing a query withSINCE 30 seconds agoin Query Builder.- Setting a fixed threshold without reviewing time zones or seasonality: Conversion rates naturally vary significantly between weekday mornings and weekend evenings. A narrow
thresholdRangecauses a false positive rollback of a normal deployment. You must first identify the normal range using at least 2 to 4 weeks of historical data and leave sufficient margin for the lower limit. Diagnosis: Check the distribution by time zone usingSINCE 4 weeks ago TIMESERIES 1 day. - Aggregating all traffic without the custom attribute for canary identification: If you filter only with
appNamewithout theuserAttributes.userGroup = 'canary'condition, primary + canary traffic is aggregated, diluting the comparison experiment value. Diagnosis: First, check whether groups are separated usingFACET userAttributes.userGroupin the Query Builder.
In Conclusion
The Flagger + Istio + New Relic NRQL stack is the most practical path to elevating "deployment" to "product experiment automation." A pipeline that rolls back if the conversion rate drops, even if the error rate is normal, enables SREs and PMs to reach KPI consensus based on the same YAML thresholdRange. Infrastructure engineering and product analysis meet on the same declarative configuration file.
3 Steps to Start Right Now:
- NRQL Validation in New Relic Query Builder: Check the app's
PageAction,PageViewevent schemas and verify that the conversion rate query returns a single float64 (SELECT filter(count(*), WHERE action='purchase_complete') / filter(count(*), WHERE action='product_view') * 100 FROM PageAction WHERE appName='<your-app-name-in-newrelic>' SINCE 5 minutes ago) - Deploy MetricTemplate and Secret: Apply the verified NRQL to the
MetricTemplateCRD and register the New Relic Insights Query Key askubectl create secret generic newrelic-credentials -n prod --from-literal=newrelic_account_id=<account-id> --from-literal=newrelic_query_key=<query-key>. - Connect
analysis.matchandmetricsto Canary CR: Apply the Canary CR to the existing Deployment and monitor the analysis progress status in real-time usingkubectl describe canary my-app -n prod.
Next Post: How to Build an A/B Test Gating Layer Robust to Natural Variation by Connecting a Statistical Significance Service to Flagger webhooks
Reference Materials
- Istio A/B Testing | Flagger Official Documentation
- Metrics Analysis | Flagger Official Documentation
- Istio VirtualService Official Reference
- Istio Request Routing Official Task
- NRQL Reference | New Relic Official Documentation
- App data NRQL query examples | New Relic
- New Relic Data Types (MELT)
- fluxcd/flagger GitHub repository
- Flagger Error with New Relic metrics | GitHub Issue #820
- How to Configure A/B Testing with Flagger and Flux | OneUptime
- Mastering Progressive Delivery with Istio and Flagger | Medium
- Header-Based Routing in Istio without Header Propagation | Tetrate