The Reality of Sidecar-Less Service Mesh: How eBPF Replaces Istio Sidecars

When operating microservices, you eventually encounter a strange sight. You have 100 services, but kubectl get pods shows more than 200 running containers. Each Pod has an Envoy sidecar attached to it. When I first introduced Istio and saw this screen, I was momentarily stunned. "I adopted this to secure inter-service communication, and now half the cluster is filled with proxy containers?" I thought.

Each of those proxy containers consumes 50–100MB of memory and adds 1–3ms of latency every time traffic passes through. With 100 services, that's 100 sidecars — and every time you upgrade the Istio version, you have to roll the entire fleet. There comes a moment when you wonder whether the service mesh is solving problems or creating new ones.

So is there a way to approach this problem with a fundamentally different structure?

Core Concepts

Why Do We Need a Service Mesh?

As the number of microservices grows, inter-service communication becomes spaghetti. Which service talks to which, how to prevent cascading failures when a particular service slows down, where to handle authentication between services — implementing all of this directly inside each service's code quickly hits its limits.

A service mesh extracts these concerns out of application code. Traffic management, mTLS-based inter-service authentication, circuit breaking, and distributed tracing are handled at the infrastructure layer.

mTLS (Mutual TLS): A method where client and server exchange certificates with each other to verify identity in both directions. Regular TLS only authenticates the server, but mTLS requires both sides to "prove yourself" in inter-service communication. It is the foundation of zero-trust security between services.

The Limitations of the Traditional Sidecar Approach

The conventional Istio + Envoy model looks like this:

css

[Pod A]                    [Pod B]
 ├── App Container          ├── App Container
 └── Envoy Sidecar          └── Envoy Sidecar
         ↕                          ↕
   iptables Interception       iptables Interception
         ↕                          ↕
   [L7 Processing → Forward to Destination]

L4/L7: Layer numbers in the OSI network model. L4 (Transport Layer) handles traffic at the TCP/UDP port level, while L7 (Application Layer) understands application protocols such as HTTP headers and gRPC methods. Having L7 processing capability in a service mesh means fine-grained control like "only route when a specific header is present."

All inbound/outbound traffic is redirected through iptables rules to Envoy before being forwarded to the destination. It's powerful, but it comes at a cost. The Envoy sidecar per Pod consumes 50–100MB of memory, and the additional network hops add 1–3ms of latency.

How eBPF Replaces This Role

eBPF (Extended Berkeley Packet Filter) is a technology that safely runs user-defined programs inside the Linux kernel — without modifying kernel source or loading kernel modules.

eBPF Sandbox: Before an eBPF program is loaded into the kernel, a verifier checks its safety. Infinite loops and invalid memory accesses are blocked in advance, so programs can be dynamically loaded without worrying about kernel crashes. Note that since Linux 5.3, bounded loops with a provably finite iteration count are permitted.

The reason eBPF is especially powerful in service meshes is socket redirection. Instead of packets traversing the entire network stack, they are connected directly to the destination at the socket level — bypassing iptables and handling traffic without proxy containers.

eBPF programs share state through BPF Maps. Service endpoint lists and policies are stored in BPF Maps, so routing decisions are made directly in the kernel without going through user space every time traffic arrives. The XDP (eXpress Data Path) hook processes at the driver level for maximum speed, while the TC (Traffic Control) hook operates deeper in the network stack with access to richer metadata. Service mesh implementations primarily use TC hooks together with socket-level hooks.

The architecture looks like this:

[Node]
 ┌──────────────────────────────────────────────┐
 │  Linux Kernel                                │
 │  ┌──────────────────────────────────────┐    │
 │  │  eBPF Programs (TC hook / Socket hook)│    │
 │  │  BPF Maps: Endpoints, Policies, Metrics│   │
 │  └──────────────────────────────────────┘    │
 │         ↕ Direct Traffic Handling (iptables bypass) │
 │  ┌──────────────┐   ┌──────────────┐         │
 │  │    Pod A     │   │    Pod B     │         │
 │  │  App Container│  │  App Container│         │
 │  └──────────────┘   └──────────────┘         │
 └──────────────────────────────────────────────┘

The sidecars don't disappear — rather, the kernel absorbs their role.

Practical Application

There are three scenarios to consider. If you're building a new cluster, Cilium is the cleanest choice. If you want to improve performance in an existing Istio environment without a full replacement, Merbridge can be applied with minimal changes. If you want to first look inside how your current cluster behaves, starting with observability via Pixie is a good entry point.

Example 1: Building a Sidecar-Less Service Mesh with Cilium (New Cluster)

Cilium is a solution that implements the service mesh using only eBPF from the ground up — without sidecars. Major clouds have already adopted it as their default data plane: GKE Dataplane V2, and AKS's Azure CNI Powered by Cilium, among others.

CNI (Container Network Interface): The plugin interface responsible for Pod networking in Kubernetes. flannel, Calico, and Cilium are CNI implementations. The range of network capabilities available depends on which CNI you choose at cluster creation time.

Replacing the CNI on an existing cluster is a significant undertaking, so if you're provisioning a new cluster, you can choose Cilium from the start.

bash

# Based on Cilium v1.15 — Install with Service Mesh + Hubble observability via Helm
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set serviceMesh.enabled=true \
  --set authentication.mutual.spire.enabled=true  # v1.14+ feature
 
# Check installation status
cilium status --wait

Upon successful installation, you should see output like this:

yaml

    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/        ClusterMesh:        disabled
 
DaemonSet         cilium             Desired: 3, Ready: 3/3, Available: 3/3
Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment        hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1

From there, policies for applying mTLS between services are declared as Kubernetes CRDs.

yaml

# Enforce mTLS between services without any code changes
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: require-mtls
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: order-service
    authentication:
      mode: required

SPIFFE/SPIRE: SPIFFE (Secure Production Identity Framework for Everyone) is the standard specification for proving service identity, and SPIRE is its implementation. Enabling authentication.mutual.spire.enabled=true causes SPIRE to issue a unique cryptographic identity (SVID) to each service, upon which mTLS authentication is based. The key benefit is managing service identity without any changes to application code.

Item	Sidecar Approach	Cilium eBPF Approach
Traffic Path	App → iptables → Envoy → Destination	App → eBPF (kernel) → Destination
Additional Containers	One Envoy per Pod	None
Policy Enforcement Location	Inside sidecar (user space)	Kernel level (BPF Maps)
mTLS Support	Handled by Envoy	Integrated with SPIFFE/SPIRE

Example 2: Applying eBPF Acceleration to an Existing Istio Environment with Merbridge

If you're already using Istio, you have the option to reduce latency by replacing iptables with eBPF — without a full replacement. That's what Merbridge does. The practical appeal is that it requires no Istio configuration changes and no code modifications.

bash

# Apply Merbridge to an existing Istio cluster (also supports Linkerd, Kuma)
# ⚠️ The command below is intended for lab environments.
#    In production, download the manifest first, review its contents, then apply.
kubectl apply -f https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml
 
# Verify — check eBPF program load status
kubectl -n merbridge get pods
kubectl -n merbridge logs -l app=merbridge --tail=50

Merbridge replaces existing iptables rules with eBPF socket redirection. Because packets connect directly to the destination at the socket level rather than traversing the entire network stack, round-trip latency decreases. The biggest advantage is that the Istio control plane configuration doesn't need to be touched at all.

Example 3: Gaining Observability with Pixie Without Changing a Single Line of Code

If you've put off setting up distributed tracing because it's a hassle, only to regret it after an incident — Pixie can eliminate that nagging feeling. Without changing a single line of code and without any sidecars, it automatically captures service traffic.

One prerequisite worth knowing: px deploy requires a Pixie Cloud account and the cluster must allow outbound connections to the external cloud. The phrase "zero instrumentation" does not mean "fully self-contained." For air-gapped environments or environments with restricted cloud connectivity, consider the Cilium + Hubble combination as an alternative.

bash

# ⚠️ The curl | bash pattern does not let you inspect script contents beforehand.
#    In untrusted environments, download the script first, review it, then execute.
# Install Pixie CLI (requires a Pixie Cloud account)
curl -fsSL https://withpixie.ai/install.sh | bash
 
# Deploy Pixie to the cluster
px deploy
 
# View HTTP requests/responses in real time (last 5 minutes, production namespace)
px run px/http_data -- \
  -start_time="-5m" \
  -namespace="production"

Immediately after deployment, it auto-parses protocols such as HTTP, gRPC, MySQL, and Redis, displaying requests/responses, latency, and error rates. This is possible because eBPF intercepts socket data at the kernel level.

Pros and Cons Analysis

Advantages

When switching to the eBPF approach, the most tangible benefit is resource efficiency. With 100 services, eliminating sidecars can free up to 10GB of memory. I was quite surprised when I first calculated that number — it's enough to potentially reduce the node count by one or two. Performance genuinely differs as well. In high-TPS environments, a single sidecar hop adding 1–3ms clearly accumulates at the p99 latency level.

Item	Details
Performance	Eliminates 1–3ms of added latency per sidecar hop. Noticeable difference in high-TPS environments
Resource Efficiency	Saves 50–100MB of sidecar memory per Pod. Up to 10GB savings across 100 services
Visibility Scope	Kernel-level collection enables observation of syscalls, network events, and CPU cycles
Transparency	Network policies and monitoring applied without changes to application code or configuration
Operational Simplicity	No sidecar version management or rolling upgrades required

Disadvantages and Caveats

Having many advantages doesn't mean migration is always the right answer. The L7 processing area in particular still has real-world limitations, and in multi-tenant environments you must carefully evaluate the level of security isolation.

Item	Details	Mitigation
L7 Processing Limitations	Size limits and loop count restrictions on eBPF programs make complex HTTP/gRPC routing difficult	Delegate L7 to a separate proxy, such as Istio Ambient's waypoint proxy
Security Isolation Level	Node-shared proxy approach provides weaker isolation than sidecars in multi-tenant environments	Consider running sidecars in parallel where sensitive tenant separation is required
Maturity	Cilium Service Mesh and Istio Ambient Mesh still have limited enterprise production references	Recommended to adopt gradually, starting from new clusters
Debugging Difficulty	Tracing kernel-level issues is more complex than with sidecars	Recommended to become familiar with eBPF-specific debugging tools such as Hubble and bpftool
Kernel Version Dependency	Linux 5.10+ recommended for advanced eBPF features; some require 5.15+	Mandatory to verify node OS version in advance

waypoint proxy: The L7 processing component in Istio Ambient Mesh. L4 is handled by ztunnel per node, and the per-namespace waypoint proxy only intervenes when L7 policies like HTTP/gRPC are needed. Because proxies are placed only where needed, this is far more efficient than maintaining sidecars everywhere.

The Most Common Mistakes in Practice

Attaching eBPF features without checking the kernel version — Run uname -r to check the node kernel version first. If you're running Ubuntu 20.04 LTS (kernel 5.4) in an on-premises environment, some advanced features will be unavailable. This doesn't resolve itself automatically just because you're on a managed cluster. GKE and AKS have eBPF-ready environments by default, but EKS uses AWS VPC CNI as its default CNI, so separate configuration is required to use Cilium.
Mistaking Cilium adoption for a full Istio replacement — Cilium handles L4 network policies and basic mTLS well, but if you need complex L7 traffic control such as header-based routing or canary deployments, Istio or a waypoint proxy is still necessary.
Migrating without reviewing isolation requirements in multi-tenant environments — In environments where different customers' workloads share the same nodes, a node-level proxy approach may raise issues in security audits. It's better to discuss this with the security team beforehand — much more comfortable than having it surface during an audit.

Closing Thoughts

The reason sidecar-less service meshes are attracting attention is not simply because they're trendy. eBPF has become the most realistic direction for pushing service mesh infrastructure costs down into the kernel — securing both observability and security outside application code. If the sidecar approach embodied the mindset of "inject a proxy into every Pod," eBPF represents a shift in thinking: "the kernel already sees all traffic — let's process it there."

Here is a suggested order for getting started right away.

Before You Start: Run kubectl get nodes -o wide to check the node OS, and verify with uname -r that the kernel version is 5.10 or higher. If you're in an EKS environment, it's also worth finding out in advance whether a separate Cilium configuration is needed.

Start with observation — For a new cluster, visualize current service traffic patterns without code changes using the Cilium + Hubble combination. For an existing Istio environment, apply Merbridge and then use Hubble or Pixie (if cloud connectivity is available).
Start new clusters with Cilium — Replacing the CNI in an existing environment is a major undertaking, but for new clusters you can choose Cilium from the start and secure core features without a separate service mesh.
Accelerate existing Istio environments with Merbridge first — Simply replacing iptables with eBPF, without a full replacement, is enough to observe latency improvements. It's not too late to consider migrating to Ambient Mesh after seeing those results.

References

The Reality of Sidecar-Less Service Mesh: How eBPF Replaces Istio Sidecars | DEV BAK - 기술블로그

Architecture

The Reality of Sidecar-Less Service Mesh: How eBPF Replaces Istio Sidecars

So is there a way to approach this problem with a fundamentally different structure?

Core Concepts

Why Do We Need a Service Mesh?

mTLS (Mutual TLS): A method where client and server exchange certificates with each other to verify identity in both directions. Regular TLS only authenticates the server, but mTLS requires both sides to "prove yourself" in inter-service communication. It is the foundation of zero-trust security between services.

The Limitations of the Traditional Sidecar Approach

The conventional Istio + Envoy model looks like this:

css

[Pod A]                    [Pod B]
 ├── App Container          ├── App Container
 └── Envoy Sidecar          └── Envoy Sidecar
         ↕                          ↕
   iptables Interception       iptables Interception
         ↕                          ↕
   [L7 Processing → Forward to Destination]

L4/L7: Layer numbers in the OSI network model. L4 (Transport Layer) handles traffic at the TCP/UDP port level, while L7 (Application Layer) understands application protocols such as HTTP headers and gRPC methods. Having L7 processing capability in a service mesh means fine-grained control like "only route when a specific header is present."

How eBPF Replaces This Role

eBPF (Extended Berkeley Packet Filter) is a technology that safely runs user-defined programs inside the Linux kernel — without modifying kernel source or loading kernel modules.

eBPF Sandbox: Before an eBPF program is loaded into the kernel, a verifier checks its safety. Infinite loops and invalid memory accesses are blocked in advance, so programs can be dynamically loaded without worrying about kernel crashes. Note that since Linux 5.3, bounded loops with a provably finite iteration count are permitted.

The architecture looks like this:

[Node]
 ┌──────────────────────────────────────────────┐
 │  Linux Kernel                                │
 │  ┌──────────────────────────────────────┐    │
 │  │  eBPF Programs (TC hook / Socket hook)│    │
 │  │  BPF Maps: Endpoints, Policies, Metrics│   │
 │  └──────────────────────────────────────┘    │
 │         ↕ Direct Traffic Handling (iptables bypass) │
 │  ┌──────────────┐   ┌──────────────┐         │
 │  │    Pod A     │   │    Pod B     │         │
 │  │  App Container│  │  App Container│         │
 │  └──────────────┘   └──────────────┘         │
 └──────────────────────────────────────────────┘

The sidecars don't disappear — rather, the kernel absorbs their role.

Practical Application

Example 1: Building a Sidecar-Less Service Mesh with Cilium (New Cluster)

CNI (Container Network Interface): The plugin interface responsible for Pod networking in Kubernetes. flannel, Calico, and Cilium are CNI implementations. The range of network capabilities available depends on which CNI you choose at cluster creation time.

Replacing the CNI on an existing cluster is a significant undertaking, so if you're provisioning a new cluster, you can choose Cilium from the start.

bash

# Based on Cilium v1.15 — Install with Service Mesh + Hubble observability via Helm
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set serviceMesh.enabled=true \
  --set authentication.mutual.spire.enabled=true  # v1.14+ feature
 
# Check installation status
cilium status --wait

Upon successful installation, you should see output like this:

yaml

    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/        ClusterMesh:        disabled
 
DaemonSet         cilium             Desired: 3, Ready: 3/3, Available: 3/3
Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment        hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1

From there, policies for applying mTLS between services are declared as Kubernetes CRDs.

yaml

# Enforce mTLS between services without any code changes
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: require-mtls
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: order-service
    authentication:
      mode: required

SPIFFE/SPIRE: SPIFFE (Secure Production Identity Framework for Everyone) is the standard specification for proving service identity, and SPIRE is its implementation. Enabling authentication.mutual.spire.enabled=true causes SPIRE to issue a unique cryptographic identity (SVID) to each service, upon which mTLS authentication is based. The key benefit is managing service identity without any changes to application code.

Item	Sidecar Approach	Cilium eBPF Approach
Traffic Path	App → iptables → Envoy → Destination	App → eBPF (kernel) → Destination
Additional Containers	One Envoy per Pod	None
Policy Enforcement Location	Inside sidecar (user space)	Kernel level (BPF Maps)
mTLS Support	Handled by Envoy	Integrated with SPIFFE/SPIRE

Example 2: Applying eBPF Acceleration to an Existing Istio Environment with Merbridge

bash

# Apply Merbridge to an existing Istio cluster (also supports Linkerd, Kuma)
# ⚠️ The command below is intended for lab environments.
#    In production, download the manifest first, review its contents, then apply.
kubectl apply -f https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml
 
# Verify — check eBPF program load status
kubectl -n merbridge get pods
kubectl -n merbridge logs -l app=merbridge --tail=50

Example 3: Gaining Observability with Pixie Without Changing a Single Line of Code

bash

# ⚠️ The curl | bash pattern does not let you inspect script contents beforehand.
#    In untrusted environments, download the script first, review it, then execute.
# Install Pixie CLI (requires a Pixie Cloud account)
curl -fsSL https://withpixie.ai/install.sh | bash
 
# Deploy Pixie to the cluster
px deploy
 
# View HTTP requests/responses in real time (last 5 minutes, production namespace)
px run px/http_data -- \
  -start_time="-5m" \
  -namespace="production"

Pros and Cons Analysis

Advantages

Item	Details
Performance	Eliminates 1–3ms of added latency per sidecar hop. Noticeable difference in high-TPS environments
Resource Efficiency	Saves 50–100MB of sidecar memory per Pod. Up to 10GB savings across 100 services
Visibility Scope	Kernel-level collection enables observation of syscalls, network events, and CPU cycles
Transparency	Network policies and monitoring applied without changes to application code or configuration
Operational Simplicity	No sidecar version management or rolling upgrades required

Disadvantages and Caveats

Item	Details	Mitigation
L7 Processing Limitations	Size limits and loop count restrictions on eBPF programs make complex HTTP/gRPC routing difficult	Delegate L7 to a separate proxy, such as Istio Ambient's waypoint proxy
Security Isolation Level	Node-shared proxy approach provides weaker isolation than sidecars in multi-tenant environments	Consider running sidecars in parallel where sensitive tenant separation is required
Maturity	Cilium Service Mesh and Istio Ambient Mesh still have limited enterprise production references	Recommended to adopt gradually, starting from new clusters
Debugging Difficulty	Tracing kernel-level issues is more complex than with sidecars	Recommended to become familiar with eBPF-specific debugging tools such as Hubble and bpftool
Kernel Version Dependency	Linux 5.10+ recommended for advanced eBPF features; some require 5.15+	Mandatory to verify node OS version in advance

waypoint proxy: The L7 processing component in Istio Ambient Mesh. L4 is handled by ztunnel per node, and the per-namespace waypoint proxy only intervenes when L7 policies like HTTP/gRPC are needed. Because proxies are placed only where needed, this is far more efficient than maintaining sidecars everywhere.

The Most Common Mistakes in Practice

Attaching eBPF features without checking the kernel version — Run uname -r to check the node kernel version first. If you're running Ubuntu 20.04 LTS (kernel 5.4) in an on-premises environment, some advanced features will be unavailable. This doesn't resolve itself automatically just because you're on a managed cluster. GKE and AKS have eBPF-ready environments by default, but EKS uses AWS VPC CNI as its default CNI, so separate configuration is required to use Cilium.
Mistaking Cilium adoption for a full Istio replacement — Cilium handles L4 network policies and basic mTLS well, but if you need complex L7 traffic control such as header-based routing or canary deployments, Istio or a waypoint proxy is still necessary.
Migrating without reviewing isolation requirements in multi-tenant environments — In environments where different customers' workloads share the same nodes, a node-level proxy approach may raise issues in security audits. It's better to discuss this with the security team beforehand — much more comfortable than having it surface during an audit.

Closing Thoughts

Here is a suggested order for getting started right away.

Start with observation — For a new cluster, visualize current service traffic patterns without code changes using the Cilium + Hubble combination. For an existing Istio environment, apply Merbridge and then use Hubble or Pixie (if cloud connectivity is available).
Start new clusters with Cilium — Replacing the CNI in an existing environment is a major undertaking, but for new clusters you can choose Cilium from the start and secure core features without a separate service mesh.
Accelerate existing Istio environments with Merbridge first — Simply replacing iptables with eBPF, without a full replacement, is enough to observe latency improvements. It's not too late to consider migrating to Ambient Mesh after seeing those results.

Core Concepts

Why Do We Need a Service Mesh?

The Limitations of the Traditional Sidecar Approach

How eBPF Replaces This Role

Practical Application

Example 1: Building a Sidecar-Less Service Mesh with Cilium (New Cluster)

Example 2: Applying eBPF Acceleration to an Existing Istio Environment with Merbridge

Example 3: Gaining Observability with Pixie Without Changing a Single Line of Code

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Why Do We Need a Service Mesh?

The Limitations of the Traditional Sidecar Approach

How eBPF Replaces This Role

Practical Application

Example 1: Building a Sidecar-Less Service Mesh with Cilium (New Cluster)

Example 2: Applying eBPF Acceleration to an Existing Istio Environment with Merbridge

Example 3: Gaining Observability with Pixie Without Changing a Single Line of Code

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Platform Engineering and Internal Developer Platforms: How Backstage and Golden Paths Enable Developer Self-Service

What 42% Who Abandoned Microservices Chose Instead — Monolith vs. Microservices: The Reality of Architecture Decisions

Serverless + Edge Computing: Achieving 5ms Response Times Across 300 Global Nodes with Cloudflare Workers

To Keep EDA from Becoming a Distributed Monolith — Outbox, Saga, CQRS Core Patterns and 3 Production Pitfalls