OpenTelemetry Collector Horizontal Scaling: Maintaining Tail Sampling Accuracy with Agent-Gateway Architecture and loadbalancingexporter

If you're already using the tail_sampling processor and considering scaling your Collector instances to two or more, this article addresses exactly your situation. When processing thousands of spans per second, there will inevitably come a point where a single Collector can no longer keep up. However, simply adding instances introduces a problem that can't be solved by scaling alone: degraded Tail Sampling accuracy. When spans belonging to the same trace are scattered across different Collector instances, even a request that produced an error may not be preserved by any instance.

This article targets backend and DevOps engineers who are running Kubernetes and have just started adopting OpenTelemetry. By combining the Agent-Gateway two-tier architecture with the loadbalancingexporter, you can maintain Tail Sampling aggregation accuracy even in a horizontally scaled environment.

Core Concepts

Why Tail Sampling Breaks Under Horizontal Scaling

Head Sampling decides immediately at the start of a request whether to record it. Tail Sampling, on the other hand, makes its decision after all spans in a trace have been collected, by looking at the complete picture. This allows far more sophisticated policy configuration, since decisions can be based on actual outcomes — whether an error occurred, total latency, specific attribute values, and so on.

The problem is that this approach requires the premise that "all spans must reside on the same instance." When you scale out to three Collectors and a load balancer distributes spans randomly, the spans that make up a single trace end up split across different instances.

# The problem that arises with naive horizontal scaling
Trace ABC's Spans 1,4,7  → Gateway-1 (no errors → drop decision)
Trace ABC's Spans 2,3,5  → Gateway-2 (incomplete trace → drop decision)
Trace ABC's Span 6 (ERROR) → Gateway-3 (only 1 span → drop decision)
 
Result: Trace ABC, which contained an error, is not preserved by any instance

Agent-Gateway Two-Tier Architecture

The standard architecture for solving this problem is the Agent-Gateway two-tier architecture.

Application
    │ OTLP
    ▼
[Agent Collector] ─── loadbalancingexporter (routing_key: traceID) ───▶ [Gateway-1] ── tailsamplingprocessor
  (DaemonSet)                                                        ├──▶ [Gateway-2] ── tailsamplingprocessor
                                                                     └──▶ [Gateway-3] ── tailsamplingprocessor

Tier	Deployment	Role
Tier 1 (Agent)	DaemonSet / Sidecar	Resource detection, basic filtering, batch processing, then forwarding to Gateway
Tier 2 (Gateway)	Deployment (horizontally scaled)	Tail Sampling, metrics aggregation, sensitive data removal, multi-backend export

Agents are deployed close to applications and handle only lightweight processing. All heavy processing is delegated to the Gateway tier, and the loadbalancingexporter determines which Gateway instance to send to.

Consistent Hashing in loadbalancingexporter

The loadbalancingexporter takes a trace ID as input and routes it to the same Gateway instance every time via consistent hashing. Because the same trace ID always goes to the same Gateway regardless of which Agent processes it, Tail Sampling can see the complete trace.

Consistent Hashing: A hashing method that minimizes the number of keys redistributed when nodes are added or removed. With ordinary hashing, most keys move to different nodes when the node count changes, but with consistent hashing only the keys associated with the changed node are redistributed. When a Gateway pod is added, only a fraction of all traces move to the new instance.

The resolver determines how to obtain the list of Gateway instances.

Resolver	Use Case	Notes
`static`	Small fixed environments, testing	No additional permissions required
`dns`	Automatic pod discovery via Kubernetes Headless Service	No additional RBAC required
`k8s`	Direct Kubernetes API queries, faster pod change detection	Requires Endpoints `list/watch` RBAC

The examples in this article use the dns resolver. It can be applied immediately in most Kubernetes environments without additional RBAC configuration, lowering the barrier to initial adoption. The k8s resolver detects pod changes more quickly but requires separate ClusterRole and ServiceAccount configuration.

Practical Application

Agent DaemonSet Configuration

One Agent is deployed per node, collecting spans from applications on that node and forwarding them to the Gateway. Trace ID-based routing is performed via the loadbalancingexporter.

yaml

# agent-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
 
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  batch:
    timeout: 5s
    send_batch_size: 1000
 
exporters:
  loadbalancing:
    routing_key: "traceID"
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      dns:
        hostname: "gateway-collector-headless.observability.svc.cluster.local"
        port: 4317
        interval: 5s   # Delay for reflecting Gateway Pod additions/removals
        timeout: 1s    # Allowed time for DNS lookup failure
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loadbalancing]

Configuration	Description
`routing_key: traceID`	Trace ID-based routing (Service Name is also an option)
`resolver.dns.hostname`	Kubernetes Headless Service domain
`interval: 5s`	DNS re-query interval. Shorter values reflect pod changes faster but increase DNS load
`timeout: 1s`	DNS query timeout. If no response within this time, the previous list is retained
`processors: [memory_limiter, batch]`	Agent handles only lightweight processing — no `tail_sampling`

The key point is not placing the tail_sampling processor on the Agent side. All sampling decisions are made at the Gateway.

Gateway Deployment and Headless Service Configuration

The Gateway handles heavy processing including Tail Sampling. The Kubernetes Headless Service exposes each pod's IP directly, allowing the loadbalancingexporter to build its consistent hashing ring.

yaml

# gateway-headless-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: gateway-collector-headless
  namespace: observability
spec:
  clusterIP: None          # Headless: DNS queries return individual Pod IP list
  selector:
    app: gateway-collector
  ports:
    - port: 4317
      name: otlp-grpc

yaml

# gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gateway-collector
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gateway-collector
  template:
    metadata:
      labels:
        app: gateway-collector
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values: [gateway-collector]
              topologyKey: kubernetes.io/hostname
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.115.0
          resources:
            requests:
              memory: "2Gi"
              cpu: "500m"
            limits:
              memory: "8Gi"
              cpu: "2000m"

yaml

# gateway-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
 
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 6144       # ~75% of Deployment limits.memory
    spike_limit_mib: 1024
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    sampled_cache_size: 100000
    non_sampled_cache_size: 100000
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}
  batch:
    timeout: 5s
    send_batch_size: 1000
 
exporters:
  otlp/backend:
    endpoint: "backend:4317"
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp/backend]

Configuration	Description
`decision_wait: 30s`	Maximum time to wait for trace completion. In service mesh environments (Envoy/Linkerd sidecars), proxy latency accumulates — it is recommended to set this to the service P99 response time plus 10–15 seconds of buffer
`num_traces: 50000`	Number of concurrently processed traces. Assuming an average span size of ~1KB and an average of 20 spans per trace, 50,000 traces ≈ 1GB as a baseline; plan for 2–3x that when including indexing overhead
`sampled_cache_size`	An LRU cache that applies prior decisions to spans that arrive late, after a trace has been evicted from memory
`policies` order	Evaluated top to bottom; if any policy results in `sampled`, that trace is preserved

PodDisruptionBudget (PDB): A Kubernetes policy that guarantees a minimum number of available pods when pods are forcibly terminated. Applying a PDB to the Gateway tier prevents trace loss during node maintenance or upgrades. Starting with minAvailable: 2 is recommended.

Span Metrics Pipeline Configuration

When deriving RED metrics (Request rate, Error rate, Duration) from trace data, care must be taken to ensure that spans dropped by Tail Sampling are not omitted from the metrics. The spanmetricsconnector is declared in the connectors section, not processors, and the full span count must be calculated first in a separate pipeline that is completely independent from the Tail Sampling pipeline.

yaml

# Gateway config: correct spanmetrics configuration
connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100ms, 250ms, 500ms, 1s, 5s]
    dimensions:
      - name: http.method
      - name: http.status_code
 
service:
  pipelines:
    # Pipeline 1: Calculate metrics from all traces (before sampling)
    traces/metrics:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [spanmetrics]        # connector used as exporter
 
    # Pipeline 2: Tail Sampling (runs independently, shares the same receiver)
    traces/sampling:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp/backend]
 
    # Pipeline 3: Metrics output
    metrics:
      receivers: [spanmetrics]        # connector used as receiver
      exporters: [prometheus]

The otlp receiver is shared between the two pipelines (traces/metrics and traces/sampling), so the same span is delivered to both. The traces/metrics pipeline aggregates all spans regardless of drop decisions, while the traces/sampling pipeline performs Tail Sampling independently.

spanmetricsconnector: An OpenTelemetry Collector component that automatically generates RED metrics — including latency histograms, call counts, and error rates — from trace span data. It is declared in the connectors section rather than processors, and is used simultaneously as both an exporter (output) and a receiver (input) in the pipeline.

Trade-off Analysis

Advantages

Item	Detail
Tail Sampling accuracy	Trace ID-based routing guarantees complete trace aggregation
Horizontal scalability	Only the Gateway tier needs to be scaled out independently
Fault isolation	Agent failures do not propagate to the Gateway, and vice versa
Flexible policy management	Sampling policies can be centralized at the Gateway for consistent management
Automatic service discovery	Gateway pod additions/removals are automatically reflected via the DNS/k8s resolver

Disadvantages and Caveats

Item	Detail	Mitigation
Memory pressure	All spans must be kept in memory for the duration of `decision_wait`. At ~1KB per span and 20 spans per trace, 50,000 traces ≈ 2–3GB including overhead	Set `memory_limiter`'s `limit_mib` to no more than 75% of the container memory limit, and tune `num_traces` to match actual traffic
Rehashing problem	When the number of Gateway pods changes, some traces may be redistributed to different instances and become incomplete	Blue/green switchover is recommended over rolling updates; changes should be made in minimal increments
Single point of failure risk	A full Gateway tier outage results in trace loss	Apply `PodDisruptionBudget (minAvailable: 2)` and anti-affinity rules
Late span arrival	Spans that arrive after `decision_wait` are dropped	Tune `decision_wait` to service P99 response time plus buffer (10–15 seconds)
Operational complexity	Both tiers must be managed with separate monitoring and alerting systems	Configure per-tier alerts based on the metrics below

yaml

# Key monitoring metrics for the Gateway tier
otelcol_exporter_queue_size              # Export queue utilization (scale-out signal when exceeding 60–70%)
otelcol_processor_dropped_spans_total    # Dropped span count (requires immediate attention if it spikes)
otelcol_processor_tail_sampling_*        # Tail Sampling decision statistics
otelcol_receiver_accepted_spans_total    # Inbound throughput

If otelcol_receiver_accepted_spans_total values are unevenly distributed across Gateway instances, consider reducing the DNS TTL or lowering the interval value. If the imbalance persists, switching from the dns resolver to the k8s resolver will improve pod change detection speed (note: Endpoints list/watch RBAC configuration is required).

The Most Common Mistakes in Practice

Adding tail_sampling to the Agent as well: Agents see only a portion of the full trace, making accurate sampling decisions impossible. tail_sampling must be placed exclusively on the Gateway tier.
Using a regular ClusterIP Service instead of a Headless Service: A regular Service returns a single virtual IP on DNS queries. Since the loadbalancingexporter needs to receive a list of individual pod IPs to build its consistent hashing ring, a Headless Service with clusterIP: None is strictly required.
Declaring spanmetricsconnector in the processors section: The spanmetricsconnector must be declared in the connectors section and branched into a separate pipeline. Placing it in processors causes the Collector to refuse to start, and it is also impossible to run it before Tail Sampling within the same pipeline. The pipelines must always be split into traces/metrics and traces/sampling.

Closing Thoughts

The combination of the Agent-Gateway two-tier architecture and the loadbalancingexporter is the officially recommended pattern by the CNCF, and is a proven OpenTelemetry Collector horizontal scaling pattern for production environments processing tens of thousands of spans per second.

Three steps you can take right now:

Review your current architecture: Run kubectl get pods -n observability to check your current Collector configuration, and determine whether Tail Sampling is running on a single instance or is already in a distributed environment.
Deploy the Headless Service first, then apply the Gateway Deployment: It is recommended to deploy in the order kubectl apply -f gateway-headless-svc.yaml && kubectl apply -f gateway-deployment.yaml. The Service must come up first so that the loadbalancingexporter can successfully resolve DNS at startup.
Apply loadbalancingexporter to the Agent and validate: After adding the loadbalancingexporter to the Agent configuration, compare the otelcol_receiver_accepted_spans_total metric per instance to verify that each Gateway is receiving traces evenly.

Next article: We will explore how to integrate HPA (Horizontal Pod Autoscaler) with the Gateway tier using the OpenTelemetry Operator's TargetAllocator, enabling automatic response to traffic spikes while maintaining Tail Sampling accuracy.

References

OpenTelemetry Collector Horizontal Scaling: Maintaining Tail Sampling Accuracy with Agent-Gateway Architecture and loadbalancingexporter | DEV BAK - 기술블로그

DevOps

OpenTelemetry Collector Horizontal Scaling: Maintaining Tail Sampling Accuracy with Agent-Gateway Architecture and loadbalancingexporter

Core Concepts

Why Tail Sampling Breaks Under Horizontal Scaling

# The problem that arises with naive horizontal scaling
Trace ABC's Spans 1,4,7  → Gateway-1 (no errors → drop decision)
Trace ABC's Spans 2,3,5  → Gateway-2 (incomplete trace → drop decision)
Trace ABC's Span 6 (ERROR) → Gateway-3 (only 1 span → drop decision)
 
Result: Trace ABC, which contained an error, is not preserved by any instance

Agent-Gateway Two-Tier Architecture

The standard architecture for solving this problem is the Agent-Gateway two-tier architecture.

Application
    │ OTLP
    ▼
[Agent Collector] ─── loadbalancingexporter (routing_key: traceID) ───▶ [Gateway-1] ── tailsamplingprocessor
  (DaemonSet)                                                        ├──▶ [Gateway-2] ── tailsamplingprocessor
                                                                     └──▶ [Gateway-3] ── tailsamplingprocessor

Tier	Deployment	Role
Tier 1 (Agent)	DaemonSet / Sidecar	Resource detection, basic filtering, batch processing, then forwarding to Gateway
Tier 2 (Gateway)	Deployment (horizontally scaled)	Tail Sampling, metrics aggregation, sensitive data removal, multi-backend export

Consistent Hashing in loadbalancingexporter

Consistent Hashing: A hashing method that minimizes the number of keys redistributed when nodes are added or removed. With ordinary hashing, most keys move to different nodes when the node count changes, but with consistent hashing only the keys associated with the changed node are redistributed. When a Gateway pod is added, only a fraction of all traces move to the new instance.

The resolver determines how to obtain the list of Gateway instances.

Resolver	Use Case	Notes
`static`	Small fixed environments, testing	No additional permissions required
`dns`	Automatic pod discovery via Kubernetes Headless Service	No additional RBAC required
`k8s`	Direct Kubernetes API queries, faster pod change detection	Requires Endpoints `list/watch` RBAC

Practical Application

Agent DaemonSet Configuration

One Agent is deployed per node, collecting spans from applications on that node and forwarding them to the Gateway. Trace ID-based routing is performed via the loadbalancingexporter.

yaml

# agent-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"
 
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  batch:
    timeout: 5s
    send_batch_size: 1000
 
exporters:
  loadbalancing:
    routing_key: "traceID"
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      dns:
        hostname: "gateway-collector-headless.observability.svc.cluster.local"
        port: 4317
        interval: 5s   # Delay for reflecting Gateway Pod additions/removals
        timeout: 1s    # Allowed time for DNS lookup failure
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loadbalancing]

Configuration	Description
`routing_key: traceID`	Trace ID-based routing (Service Name is also an option)
`resolver.dns.hostname`	Kubernetes Headless Service domain
`interval: 5s`	DNS re-query interval. Shorter values reflect pod changes faster but increase DNS load
`timeout: 1s`	DNS query timeout. If no response within this time, the previous list is retained
`processors: [memory_limiter, batch]`	Agent handles only lightweight processing — no `tail_sampling`

The key point is not placing the tail_sampling processor on the Agent side. All sampling decisions are made at the Gateway.

Gateway Deployment and Headless Service Configuration

yaml

# gateway-headless-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: gateway-collector-headless
  namespace: observability
spec:
  clusterIP: None          # Headless: DNS queries return individual Pod IP list
  selector:
    app: gateway-collector
  ports:
    - port: 4317
      name: otlp-grpc

yaml

# gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gateway-collector
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gateway-collector
  template:
    metadata:
      labels:
        app: gateway-collector
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values: [gateway-collector]
              topologyKey: kubernetes.io/hostname
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.115.0
          resources:
            requests:
              memory: "2Gi"
              cpu: "500m"
            limits:
              memory: "8Gi"
              cpu: "2000m"

yaml

# gateway-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
 
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 6144       # ~75% of Deployment limits.memory
    spike_limit_mib: 1024
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    sampled_cache_size: 100000
    non_sampled_cache_size: 100000
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}
  batch:
    timeout: 5s
    send_batch_size: 1000
 
exporters:
  otlp/backend:
    endpoint: "backend:4317"
 
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp/backend]

Configuration	Description
`decision_wait: 30s`	Maximum time to wait for trace completion. In service mesh environments (Envoy/Linkerd sidecars), proxy latency accumulates — it is recommended to set this to the service P99 response time plus 10–15 seconds of buffer
`num_traces: 50000`	Number of concurrently processed traces. Assuming an average span size of ~1KB and an average of 20 spans per trace, 50,000 traces ≈ 1GB as a baseline; plan for 2–3x that when including indexing overhead
`sampled_cache_size`	An LRU cache that applies prior decisions to spans that arrive late, after a trace has been evicted from memory
`policies` order	Evaluated top to bottom; if any policy results in `sampled`, that trace is preserved

PodDisruptionBudget (PDB): A Kubernetes policy that guarantees a minimum number of available pods when pods are forcibly terminated. Applying a PDB to the Gateway tier prevents trace loss during node maintenance or upgrades. Starting with minAvailable: 2 is recommended.

Span Metrics Pipeline Configuration

yaml

# Gateway config: correct spanmetrics configuration
connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100ms, 250ms, 500ms, 1s, 5s]
    dimensions:
      - name: http.method
      - name: http.status_code
 
service:
  pipelines:
    # Pipeline 1: Calculate metrics from all traces (before sampling)
    traces/metrics:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [spanmetrics]        # connector used as exporter
 
    # Pipeline 2: Tail Sampling (runs independently, shares the same receiver)
    traces/sampling:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp/backend]
 
    # Pipeline 3: Metrics output
    metrics:
      receivers: [spanmetrics]        # connector used as receiver
      exporters: [prometheus]

spanmetricsconnector: An OpenTelemetry Collector component that automatically generates RED metrics — including latency histograms, call counts, and error rates — from trace span data. It is declared in the connectors section rather than processors, and is used simultaneously as both an exporter (output) and a receiver (input) in the pipeline.

Trade-off Analysis

Advantages

Item	Detail
Tail Sampling accuracy	Trace ID-based routing guarantees complete trace aggregation
Horizontal scalability	Only the Gateway tier needs to be scaled out independently
Fault isolation	Agent failures do not propagate to the Gateway, and vice versa
Flexible policy management	Sampling policies can be centralized at the Gateway for consistent management
Automatic service discovery	Gateway pod additions/removals are automatically reflected via the DNS/k8s resolver

Disadvantages and Caveats

Item	Detail	Mitigation
Memory pressure	All spans must be kept in memory for the duration of `decision_wait`. At ~1KB per span and 20 spans per trace, 50,000 traces ≈ 2–3GB including overhead	Set `memory_limiter`'s `limit_mib` to no more than 75% of the container memory limit, and tune `num_traces` to match actual traffic
Rehashing problem	When the number of Gateway pods changes, some traces may be redistributed to different instances and become incomplete	Blue/green switchover is recommended over rolling updates; changes should be made in minimal increments
Single point of failure risk	A full Gateway tier outage results in trace loss	Apply `PodDisruptionBudget (minAvailable: 2)` and anti-affinity rules
Late span arrival	Spans that arrive after `decision_wait` are dropped	Tune `decision_wait` to service P99 response time plus buffer (10–15 seconds)
Operational complexity	Both tiers must be managed with separate monitoring and alerting systems	Configure per-tier alerts based on the metrics below

yaml

# Key monitoring metrics for the Gateway tier
otelcol_exporter_queue_size              # Export queue utilization (scale-out signal when exceeding 60–70%)
otelcol_processor_dropped_spans_total    # Dropped span count (requires immediate attention if it spikes)
otelcol_processor_tail_sampling_*        # Tail Sampling decision statistics
otelcol_receiver_accepted_spans_total    # Inbound throughput

The Most Common Mistakes in Practice

Adding tail_sampling to the Agent as well: Agents see only a portion of the full trace, making accurate sampling decisions impossible. tail_sampling must be placed exclusively on the Gateway tier.
Using a regular ClusterIP Service instead of a Headless Service: A regular Service returns a single virtual IP on DNS queries. Since the loadbalancingexporter needs to receive a list of individual pod IPs to build its consistent hashing ring, a Headless Service with clusterIP: None is strictly required.
Declaring spanmetricsconnector in the processors section: The spanmetricsconnector must be declared in the connectors section and branched into a separate pipeline. Placing it in processors causes the Collector to refuse to start, and it is also impossible to run it before Tail Sampling within the same pipeline. The pipelines must always be split into traces/metrics and traces/sampling.

Closing Thoughts

Three steps you can take right now:

Review your current architecture: Run kubectl get pods -n observability to check your current Collector configuration, and determine whether Tail Sampling is running on a single instance or is already in a distributed environment.
Deploy the Headless Service first, then apply the Gateway Deployment: It is recommended to deploy in the order kubectl apply -f gateway-headless-svc.yaml && kubectl apply -f gateway-deployment.yaml. The Service must come up first so that the loadbalancingexporter can successfully resolve DNS at startup.
Apply loadbalancingexporter to the Agent and validate: After adding the loadbalancingexporter to the Agent configuration, compare the otelcol_receiver_accepted_spans_total metric per instance to verify that each Gateway is receiving traces evenly.

Next article: We will explore how to integrate HPA (Horizontal Pod Autoscaler) with the Gateway tier using the OpenTelemetry Operator's TargetAllocator, enabling automatic response to traffic spikes while maintaining Tail Sampling accuracy.

Core Concepts

Why Tail Sampling Breaks Under Horizontal Scaling

Agent-Gateway Two-Tier Architecture

Consistent Hashing in loadbalancingexporter

Practical Application

Agent DaemonSet Configuration

Gateway Deployment and Headless Service Configuration

Span Metrics Pipeline Configuration

Trade-off Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Why Tail Sampling Breaks Under Horizontal Scaling

Agent-Gateway Two-Tier Architecture

Consistent Hashing in loadbalancingexporter

Practical Application

Agent DaemonSet Configuration

Gateway Deployment and Headless Service Configuration

Span Metrics Pipeline Configuration

Trade-off Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

OpenTelemetry Operator + HPA: 2-Layer Gateway Pattern for Preserving Tail Sampling Accuracy

OTel Collector Tail Sampling Memory Optimization: A Configuration Guide for `decision_wait` and `num_traces` to Prevent Production OOM

Complete Guide to loadbalancingexporter: Guaranteeing Tail Sampling Accuracy with a 2-Tier Architecture

Configuring Accurate RED Metrics Independent of Sampling with OpenTelemetry Collector spanmetrics

A 2-Stage Pipeline Design Using OpenTelemetry Collector Tail Sampling to Retain 100% of Error & Latency Traces While Cutting Observability Costs by 74% (OTel Collector)

Controlling Span Noise and Cardinality Explosion with filterprocessor · transformprocessor in OTel Collector (OpenTelemetry Collector)