The Reality of Sidecar-Less Service Mesh: How eBPF Replaces Istio Sidecars
When operating microservices, you eventually encounter a strange sight. You have 100 services, but kubectl get pods shows more than 200 running containers. Each Pod has an Envoy sidecar attached to it. When I first introduced Istio and saw this screen, I was momentarily stunned. "I adopted this to secure inter-service communication, and now half the cluster is filled with proxy containers?" I thought.
Each of those proxy containers consumes 50–100MB of memory and adds 1–3ms of latency every time traffic passes through. With 100 services, that's 100 sidecars — and every time you upgrade the Istio version, you have to roll the entire fleet. There comes a moment when you wonder whether the service mesh is solving problems or creating new ones.
So is there a way to approach this problem with a fundamentally different structure?
Core Concepts
Why Do We Need a Service Mesh?
As the number of microservices grows, inter-service communication becomes spaghetti. Which service talks to which, how to prevent cascading failures when a particular service slows down, where to handle authentication between services — implementing all of this directly inside each service's code quickly hits its limits.
A service mesh extracts these concerns out of application code. Traffic management, mTLS-based inter-service authentication, circuit breaking, and distributed tracing are handled at the infrastructure layer.
mTLS (Mutual TLS): A method where client and server exchange certificates with each other to verify identity in both directions. Regular TLS only authenticates the server, but mTLS requires both sides to "prove yourself" in inter-service communication. It is the foundation of zero-trust security between services.
The Limitations of the Traditional Sidecar Approach
The conventional Istio + Envoy model looks like this:
[Pod A] [Pod B]
├── App Container ├── App Container
└── Envoy Sidecar └── Envoy Sidecar
↕ ↕
iptables Interception iptables Interception
↕ ↕
[L7 Processing → Forward to Destination]L4/L7: Layer numbers in the OSI network model. L4 (Transport Layer) handles traffic at the TCP/UDP port level, while L7 (Application Layer) understands application protocols such as HTTP headers and gRPC methods. Having L7 processing capability in a service mesh means fine-grained control like "only route when a specific header is present."
All inbound/outbound traffic is redirected through iptables rules to Envoy before being forwarded to the destination. It's powerful, but it comes at a cost. The Envoy sidecar per Pod consumes 50–100MB of memory, and the additional network hops add 1–3ms of latency.
How eBPF Replaces This Role
eBPF (Extended Berkeley Packet Filter) is a technology that safely runs user-defined programs inside the Linux kernel — without modifying kernel source or loading kernel modules.
eBPF Sandbox: Before an eBPF program is loaded into the kernel, a verifier checks its safety. Infinite loops and invalid memory accesses are blocked in advance, so programs can be dynamically loaded without worrying about kernel crashes. Note that since Linux 5.3, bounded loops with a provably finite iteration count are permitted.
The reason eBPF is especially powerful in service meshes is socket redirection. Instead of packets traversing the entire network stack, they are connected directly to the destination at the socket level — bypassing iptables and handling traffic without proxy containers.
eBPF programs share state through BPF Maps. Service endpoint lists and policies are stored in BPF Maps, so routing decisions are made directly in the kernel without going through user space every time traffic arrives. The XDP (eXpress Data Path) hook processes at the driver level for maximum speed, while the TC (Traffic Control) hook operates deeper in the network stack with access to richer metadata. Service mesh implementations primarily use TC hooks together with socket-level hooks.
The architecture looks like this:
[Node]
┌──────────────────────────────────────────────┐
│ Linux Kernel │
│ ┌──────────────────────────────────────┐ │
│ │ eBPF Programs (TC hook / Socket hook)│ │
│ │ BPF Maps: Endpoints, Policies, Metrics│ │
│ └──────────────────────────────────────┘ │
│ ↕ Direct Traffic Handling (iptables bypass) │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Pod A │ │ Pod B │ │
│ │ App Container│ │ App Container│ │
│ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────┘The sidecars don't disappear — rather, the kernel absorbs their role.
Practical Application
There are three scenarios to consider. If you're building a new cluster, Cilium is the cleanest choice. If you want to improve performance in an existing Istio environment without a full replacement, Merbridge can be applied with minimal changes. If you want to first look inside how your current cluster behaves, starting with observability via Pixie is a good entry point.
Example 1: Building a Sidecar-Less Service Mesh with Cilium (New Cluster)
Cilium is a solution that implements the service mesh using only eBPF from the ground up — without sidecars. Major clouds have already adopted it as their default data plane: GKE Dataplane V2, and AKS's Azure CNI Powered by Cilium, among others.
CNI (Container Network Interface): The plugin interface responsible for Pod networking in Kubernetes. flannel, Calico, and Cilium are CNI implementations. The range of network capabilities available depends on which CNI you choose at cluster creation time.
Replacing the CNI on an existing cluster is a significant undertaking, so if you're provisioning a new cluster, you can choose Cilium from the start.
# Based on Cilium v1.15 — Install with Service Mesh + Hubble observability via Helm
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set serviceMesh.enabled=true \
--set authentication.mutual.spire.enabled=true # v1.14+ feature
# Check installation status
cilium status --waitUpon successful installation, you should see output like this:
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: disabled (using embedded mode)
\__/¯¯\__/ Hubble Relay: OK
\__/ ClusterMesh: disabled
DaemonSet cilium Desired: 3, Ready: 3/3, Available: 3/3
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Deployment hubble-relay Desired: 1, Ready: 1/1, Available: 1/1From there, policies for applying mTLS between services are declared as Kubernetes CRDs.
# Enforce mTLS between services without any code changes
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: require-mtls
namespace: production
spec:
endpointSelector:
matchLabels:
app: payment-service
ingress:
- fromEndpoints:
- matchLabels:
app: order-service
authentication:
mode: requiredSPIFFE/SPIRE: SPIFFE (Secure Production Identity Framework for Everyone) is the standard specification for proving service identity, and SPIRE is its implementation. Enabling
authentication.mutual.spire.enabled=truecauses SPIRE to issue a unique cryptographic identity (SVID) to each service, upon which mTLS authentication is based. The key benefit is managing service identity without any changes to application code.
| Item | Sidecar Approach | Cilium eBPF Approach |
|---|---|---|
| Traffic Path | App → iptables → Envoy → Destination | App → eBPF (kernel) → Destination |
| Additional Containers | One Envoy per Pod | None |
| Policy Enforcement Location | Inside sidecar (user space) | Kernel level (BPF Maps) |
| mTLS Support | Handled by Envoy | Integrated with SPIFFE/SPIRE |
Example 2: Applying eBPF Acceleration to an Existing Istio Environment with Merbridge
If you're already using Istio, you have the option to reduce latency by replacing iptables with eBPF — without a full replacement. That's what Merbridge does. The practical appeal is that it requires no Istio configuration changes and no code modifications.
# Apply Merbridge to an existing Istio cluster (also supports Linkerd, Kuma)
# ⚠️ The command below is intended for lab environments.
# In production, download the manifest first, review its contents, then apply.
kubectl apply -f https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml
# Verify — check eBPF program load status
kubectl -n merbridge get pods
kubectl -n merbridge logs -l app=merbridge --tail=50Merbridge replaces existing iptables rules with eBPF socket redirection. Because packets connect directly to the destination at the socket level rather than traversing the entire network stack, round-trip latency decreases. The biggest advantage is that the Istio control plane configuration doesn't need to be touched at all.
Example 3: Gaining Observability with Pixie Without Changing a Single Line of Code
If you've put off setting up distributed tracing because it's a hassle, only to regret it after an incident — Pixie can eliminate that nagging feeling. Without changing a single line of code and without any sidecars, it automatically captures service traffic.
One prerequisite worth knowing: px deploy requires a Pixie Cloud account and the cluster must allow outbound connections to the external cloud. The phrase "zero instrumentation" does not mean "fully self-contained." For air-gapped environments or environments with restricted cloud connectivity, consider the Cilium + Hubble combination as an alternative.
# ⚠️ The curl | bash pattern does not let you inspect script contents beforehand.
# In untrusted environments, download the script first, review it, then execute.
# Install Pixie CLI (requires a Pixie Cloud account)
curl -fsSL https://withpixie.ai/install.sh | bash
# Deploy Pixie to the cluster
px deploy
# View HTTP requests/responses in real time (last 5 minutes, production namespace)
px run px/http_data -- \
-start_time="-5m" \
-namespace="production"Immediately after deployment, it auto-parses protocols such as HTTP, gRPC, MySQL, and Redis, displaying requests/responses, latency, and error rates. This is possible because eBPF intercepts socket data at the kernel level.
Pros and Cons Analysis
Advantages
When switching to the eBPF approach, the most tangible benefit is resource efficiency. With 100 services, eliminating sidecars can free up to 10GB of memory. I was quite surprised when I first calculated that number — it's enough to potentially reduce the node count by one or two. Performance genuinely differs as well. In high-TPS environments, a single sidecar hop adding 1–3ms clearly accumulates at the p99 latency level.
| Item | Details |
|---|---|
| Performance | Eliminates 1–3ms of added latency per sidecar hop. Noticeable difference in high-TPS environments |
| Resource Efficiency | Saves 50–100MB of sidecar memory per Pod. Up to 10GB savings across 100 services |
| Visibility Scope | Kernel-level collection enables observation of syscalls, network events, and CPU cycles |
| Transparency | Network policies and monitoring applied without changes to application code or configuration |
| Operational Simplicity | No sidecar version management or rolling upgrades required |
Disadvantages and Caveats
Having many advantages doesn't mean migration is always the right answer. The L7 processing area in particular still has real-world limitations, and in multi-tenant environments you must carefully evaluate the level of security isolation.
| Item | Details | Mitigation |
|---|---|---|
| L7 Processing Limitations | Size limits and loop count restrictions on eBPF programs make complex HTTP/gRPC routing difficult | Delegate L7 to a separate proxy, such as Istio Ambient's waypoint proxy |
| Security Isolation Level | Node-shared proxy approach provides weaker isolation than sidecars in multi-tenant environments | Consider running sidecars in parallel where sensitive tenant separation is required |
| Maturity | Cilium Service Mesh and Istio Ambient Mesh still have limited enterprise production references | Recommended to adopt gradually, starting from new clusters |
| Debugging Difficulty | Tracing kernel-level issues is more complex than with sidecars | Recommended to become familiar with eBPF-specific debugging tools such as Hubble and bpftool |
| Kernel Version Dependency | Linux 5.10+ recommended for advanced eBPF features; some require 5.15+ | Mandatory to verify node OS version in advance |
waypoint proxy: The L7 processing component in Istio Ambient Mesh. L4 is handled by ztunnel per node, and the per-namespace waypoint proxy only intervenes when L7 policies like HTTP/gRPC are needed. Because proxies are placed only where needed, this is far more efficient than maintaining sidecars everywhere.
The Most Common Mistakes in Practice
-
Attaching eBPF features without checking the kernel version — Run
uname -rto check the node kernel version first. If you're running Ubuntu 20.04 LTS (kernel 5.4) in an on-premises environment, some advanced features will be unavailable. This doesn't resolve itself automatically just because you're on a managed cluster. GKE and AKS have eBPF-ready environments by default, but EKS uses AWS VPC CNI as its default CNI, so separate configuration is required to use Cilium. -
Mistaking Cilium adoption for a full Istio replacement — Cilium handles L4 network policies and basic mTLS well, but if you need complex L7 traffic control such as header-based routing or canary deployments, Istio or a waypoint proxy is still necessary.
-
Migrating without reviewing isolation requirements in multi-tenant environments — In environments where different customers' workloads share the same nodes, a node-level proxy approach may raise issues in security audits. It's better to discuss this with the security team beforehand — much more comfortable than having it surface during an audit.
Closing Thoughts
The reason sidecar-less service meshes are attracting attention is not simply because they're trendy. eBPF has become the most realistic direction for pushing service mesh infrastructure costs down into the kernel — securing both observability and security outside application code. If the sidecar approach embodied the mindset of "inject a proxy into every Pod," eBPF represents a shift in thinking: "the kernel already sees all traffic — let's process it there."
Here is a suggested order for getting started right away.
Before You Start: Run kubectl get nodes -o wide to check the node OS, and verify with uname -r that the kernel version is 5.10 or higher. If you're in an EKS environment, it's also worth finding out in advance whether a separate Cilium configuration is needed.
- Start with observation — For a new cluster, visualize current service traffic patterns without code changes using the Cilium + Hubble combination. For an existing Istio environment, apply Merbridge and then use Hubble or Pixie (if cloud connectivity is available).
- Start new clusters with Cilium — Replacing the CNI in an existing environment is a major undertaking, but for new clusters you can choose Cilium from the start and secure core features without a separate service mesh.
- Accelerate existing Istio environments with Merbridge first — Simply replacing iptables with eBPF, without a full replacement, is enough to observe latency improvements. It's not too late to consider migrating to Ambient Mesh after seeing those results.
References
- eBPF and Service Mesh: Performance and Observability | Groundcover
- Service Mesh with eBPF: 5 Key Capabilities | Tigera
- eBPF and the Service Mesh: Don't Dismiss the Sidecar Yet | InfoQ
- eBPF: The Silent Power Behind Cloud Native's Next Phase | Cloud Native Now
- Istio Ambient vs. Cilium | Istio Official Blog
- Unlocking Cloud Native Security with Cilium and eBPF | CNCF
- Sidecar-Free Service Mesh: Understanding Cilium's eBPF Architecture | Aicademy
- Real-World Use Cases of eBPF | Medium
- eBPF Applications Landscape | ebpf.io
- Merbridge GitHub Repository
- Deep Dive into Istio Ambient Mode Traffic Paths | Jimmy Song
- eBPF, Sidecars, and the Future of the Service Mesh | Buoyant