Seeing Into the Kernel Without Changing a Single Line of Code with eBPF — A Practical Guide to Kubernetes Observability

Sometimes in production you get a strange latency spike, there's nothing in the logs, and your APM only registers something like "slow." I dealt with that frustration for quite a while myself. The problem was happening at the kernel level, and our observability tools simply couldn't reach that far.

eBPF is the technology that resolves that frustration. Without touching kernel source code, without loading a new kernel module, and without modifying a single line of application code, you can simultaneously inspect CPU, memory, network, and system calls. As of 2025, AWS EKS has adopted eBPF-based Cilium as its default networking plugin, and according to CNCF reports, production adoption has grown 300% year-over-year — it has already become mainstream.

After reading this article, you'll be able to run bpftrace in production or directly experience what it means to attach an APM to a Kubernetes cluster without any code changes. That said, to be honest, the hands-on examples will resonate most immediately with backend and infrastructure engineers who have experience working with Kubernetes or Linux servers.

Core Concepts

What eBPF Does in the Kernel

eBPF (Extended Berkeley Packet Filter) is a massive extension of BPF, which was originally designed for network packet filtering. The name contains "packet filter," making it sound like a networking-only technology, but today it is used across the full spectrum of observability, security, and networking.

Here's how it works:

A user writes an eBPF program in C or a high-level DSL
The kernel's verifier statically analyzes the program to preemptively block infinite loops or abnormal memory accesses
Once verification passes, the JIT compiler transforms it into native machine code
It attaches as a hook to kernel events (system calls, network packets, function entry/return) and executes

// Example of measuring openat() system call latency with bpftrace
// It looks like touching kernel internals, but it runs in a safely sandboxed environment
tracepoint:syscalls:sys_enter_openat
{
    @start[tid] = nsecs;
}
 
tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
    @latency_us = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}

What is the verifier? It is a safety checker that the kernel itself runs before an eBPF program is loaded into the kernel. It uses static analysis to verify that "running this code won't crash the kernel" before permitting execution. It's not the same as full formal verification, but thanks to it, eBPF can be used much more safely than kernel modules.

I initially found the verifier intimidating and thought eBPF was too risky, but once you actually use it, you develop a confidence that "with safety measures like this, it's fine in production." The code the verifier rejects is genuinely code that could endanger the kernel, and within those constraints, there's still plenty you can do for observability purposes.

Why Zero-Instrumentation Is Powerful

Traditional observability approaches fell into two categories: embedding an SDK in application code, or attaching a sidecar container. Both require code changes or infrastructure modifications, and sidecars consume additional resources per pod.

eBPF directly intercepts events at the kernel level, so neither of those burdens applies. When an HTTP request comes in, it passes through the kernel network stack, and that's where it's captured directly — regardless of whether the application is Python, Go, or JVM.

Observability Approach	Code Changes	Overhead	Visibility Scope
SDK Injection	Required	Medium	Application layer
Sidecar (Envoy, etc.)	Not required	High	L7 traffic
eBPF	Not required	Very low (1–3% CPU)	Full kernel through L7

Choosing a Tool — bpftrace vs Cilium vs Pixie

Before looking at hands-on examples, it's worth clarifying "which one should I use?" The three tools have overlapping areas but their primary use cases are entirely different.

Tool	When to Use
bpftrace	When you need one-off debugging on a specific server or process. Ad-hoc profiling such as flame graphs and system call latency
Cilium + Hubble	When you want continuous monitoring of network traffic across an entire Kubernetes cluster. When you have permission to replace the CNI
Pixie	When you want to attach HTTP/gRPC/DB APM immediately without code changes. Only requires `kubectl` access to the cluster

Practical Applications

Example 1: Real-Time Production Server Profiling with bpftrace

When: When you want to investigate on the spot why a specific process is slow, or when you want to try it immediately on a single server without any cluster installation.

Honestly, at first I wondered "is it really okay to run something like this in production?" But because it runs as JIT-compiled native code, the CPU overhead is generally around 1–3%. Due to how bpftrace works — tracepoint hooks only execute when they're triggered — on a quiet server it's practically imperceptible.

You might be curious why the code examples use tracepoint. The kprobe approach attaches directly to internal kernel functions, and function names can change between kernel versions. In contrast, tracepoint uses hook points that the kernel officially provides as a stable interface, making it far more portable across versions.

bash

# Check the latency distribution of read() system calls for a specific process
# Usage: sudo bpftrace read_latency.bt $(pgrep nginx)
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_read { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_read
/@start[tid] && pid == $1/
{
  @read_latency_us = hist((nsecs - @start[tid]) / 1000);
  delete(@start[tid]);
}
' $(pgrep nginx)

bash

# Collect a CPU flame graph — see at a glance where CPU is being consumed most
# ustack captures the user-space call stack; switch to kstack to see kernel stacks too
sudo bpftrace -e '
profile:hz:99
/pid == $1/
{ @[ustack] = count(); }
' $(pgrep nginx)

Command Element	Description
`tracepoint:syscalls:sys_enter_read`	Attach hook at the entry point of the read() system call
`@start[tid]`	Store start timestamp keyed by thread ID (BPF Map — shared storage for exchanging data between eBPF programs and user space)
`profile:hz:99`	Sample 99 times per second (CPU profiling)
`ustack`	Collect user-space call stack (switch to `kstack` for kernel stack)

What is a BPF Map? It's the conduit for bringing data collected by an eBPF program inside the kernel out to user space. The @start[tid], @read_latency_us, and all variables with the @ prefix in the examples above are BPF Maps. bpftrace handles this internally so you don't need to declare them explicitly, but understanding the flow — "collect in kernel → store in Map → read from user space" — will be very helpful when writing more complex programs.

Example 2: Visualizing Kubernetes Traffic with Cilium + Hubble

When: When you want continuous monitoring of service-to-service communication across an entire Kubernetes cluster. When you have permission to replace the CNI and want to gain both network observability and security policy simultaneously.

In a Kubernetes environment, Cilium is the fastest path to introducing eBPF observability. Once installed, opening the Hubble UI lets you see inter-pod traffic flows, latency, and dropped packets in real time.

bash

# Install Cilium + Hubble with Helm
# It's recommended to check the latest version with the command below before replacing the version
# helm search repo cilium/cilium -o json | jq -r '.[0].version'
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.16.0 \
  --namespace kube-system \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

bash

# View real-time traffic flows with the Hubble CLI
# Filter dropped HTTP traffic in the production namespace
hubble observe --namespace production \
  --protocol http \
  --verdict DROPPED

bash

# Check inter-service latency statistics
# Example output appears in the form: {"flow":{"l7":{"latency_ns":1523400}}}
hubble observe --namespace production \
  --type l7 \
  --output json | jq '.flow.l7.latency_ns'

Example 3: Building APM Without Code Changes Using Pixie

When: When you want to attach APM as quickly as possible. Only requires kubectl access, and you want to automatically trace all HTTP/gRPC/DB requests without replacing the CNI or modifying code.

Pixie is one of the most impressive tools I've personally used. Installing a single kubectl plugin automatically captures the full bodies of HTTP, gRPC, and MySQL requests and even generates flame graphs.

The installer provides a curl | bash pattern, which executes a script without reviewing its contents first — if you're in a security-sensitive environment, it's recommended to download the script directly from the official GitHub, review its contents, and then execute it.

bash

# Install the Pixie CLI (recommended to review the script before installing)
# For security-sensitive environments: curl -fsSL https://withpixie.ai/install.sh -o install.sh && cat install.sh
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
px deploy

bash

# Query P99 latency per HTTP endpoint
px run px/http_data_filtered -- \
  -start_time '-5m' \
  -namespace 'production'
 
# Automatically detect slow MySQL queries
px run px/mysql_stats -- \
  -start_time '-10m'

What is PxL? It's the query language used by Pixie, which lets you query kernel-level data collected by eBPF using pandas-like syntax. Another advantage is that queries run directly on local storage within the cluster, so data never leaves it.

Pros and Cons Analysis

Advantages

Item	Details
Zero-instrumentation	Collect telemetry without modifying application code or container images
Low overhead	Runs as JIT-compiled native code, typically around 1–3% CPU overhead
Wide visibility	Can simultaneously observe from kernel to user space, L3 through L7 network
Safety guarantee	The verifier preemptively blocks code that could cause kernel crashes via static analysis
Dynamic attachment	Add/remove probes at runtime without reboots or service interruptions

Disadvantages and Caveats

Item	Details	Mitigation
Learning curve	Requires understanding of kernel internals, C language, and verifier constraints	Recommended to start with high-level tools like Pixie and Coroot
Kernel version dependency	Features vary by kernel version; difficult to support legacy environments like older RHEL	Verify BTF/CO-RE support before adoption
Program constraints	No infinite loops; limitations in implementing complex logic	Bring data up to user space for complex analysis
No analysis layer	eBPF is a collection tool; visualization and analysis require a separate layer	Integrate with Prometheus + Grafana or Hubble UI
Linux only	Not supported on Windows/macOS (Windows eBPF project is in early stages)	Use Linux VM via Lima/Multipass for development environments
Security risks	Potential for kernel-level attacks via malicious eBPF programs; requires elevated privileges	Apply principle of least privilege with `CAP_BPF` capability; verify program signatures

To share a brief experience of actually struggling with the drawbacks in practice — kernel version dependency comes up far more often than expected. I remember being caught off guard when things worked fine in the development environment but didn't work at all on an on-premises legacy RHEL system. And I assumed "attach eBPF and observability is done," only to realize too late that I needed to design a separate visualization pipeline — it's worth considering Prometheus + Grafana integration from the very beginning.

What is BTF (BPF Type Format)? It's a format that includes type information so eBPF programs can be ported regardless of kernel version. BTF itself was introduced in kernel 4.18, and to fully leverage CO-RE (Compile Once – Run Everywhere), you need kernel 5.2 or higher with the CONFIG_DEBUG_INFO_BTF option, and kernel 5.4 or higher for full CO-RE functionality. It's good practice to check with uname -r and bpftool feature first.

The Most Common Mistakes in Practice

Skipping the kernel version check — eBPF features vary significantly by kernel version. Checking uname -r and bpftool feature is the first step. It's worth keeping in mind that CO-RE requires kernel 5.4 or higher, and BTF (CONFIG_DEBUG_INFO_BTF) requires 5.2 or higher.
Unconditionally granting root privileges — Loading eBPF programs requires privileges, but on modern kernels (5.8+), the CAP_BPF capability alone is sufficient. It's best to avoid unconditionally setting privileged: true on containers in production.
Expecting eBPF to handle both collection and analysis — eBPF's role ends at collecting data and storing it in BPF Maps. Keeping in mind from the start of your design that visualization, alerting, and aggregation require a separate pipeline (Prometheus, Grafana, Jaeger, etc.) will save you the trouble of overhauling your architecture later.

Closing Thoughts

eBPF is not just a debugging tool — it's a paradigm shift that moves the observability architecture itself down to the kernel layer. The ecosystem for peering all the way into the kernel without code changes is already well-established, and now is a good time to start adopting it.

There's no need to start with complex kernel programming. I strongly recommend approaching it by trying high-level tools first and gradually going deeper.

Three steps you can take right now:

Install bpftrace on a local or development server and explore the basic tools — Install with sudo apt install bpftrace or brew install bpftrace (requires Linux VM), then run bpftrace -l 'tracepoint:syscalls:*' to browse the list of available hooks. You'll get a direct feel for which points in the kernel eBPF can attach to.
If you have a Kubernetes cluster, deploy Pixie — Installed with a single px deploy command; you can immediately see HTTP/gRPC/DB request tracing working without any code changes.
If you're considering upgrading production networking, evaluate a Cilium migration — The significant throughput improvement of the eBPF datapath over traditional iptables can be verified in Cilium's official benchmark documentation, and you get Hubble-based L7 observability at the same time.

Next article: A practical guide to configuring a Kubernetes service mesh without sidecars using Cilium + Hubble — walking through the process of migrating from iptables to an eBPF datapath step by step.

References

Seeing Into the Kernel Without Changing a Single Line of Code with eBPF — A Practical Guide to Kubernetes Observability | DEV BAK - 기술블로그

DevOps

Seeing Into the Kernel Without Changing a Single Line of Code with eBPF — A Practical Guide to Kubernetes Observability

Core Concepts

What eBPF Does in the Kernel

Here's how it works:

A user writes an eBPF program in C or a high-level DSL
The kernel's verifier statically analyzes the program to preemptively block infinite loops or abnormal memory accesses
Once verification passes, the JIT compiler transforms it into native machine code
It attaches as a hook to kernel events (system calls, network packets, function entry/return) and executes

// Example of measuring openat() system call latency with bpftrace
// It looks like touching kernel internals, but it runs in a safely sandboxed environment
tracepoint:syscalls:sys_enter_openat
{
    @start[tid] = nsecs;
}
 
tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
    @latency_us = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}

What is the verifier? It is a safety checker that the kernel itself runs before an eBPF program is loaded into the kernel. It uses static analysis to verify that "running this code won't crash the kernel" before permitting execution. It's not the same as full formal verification, but thanks to it, eBPF can be used much more safely than kernel modules.

Why Zero-Instrumentation Is Powerful

Observability Approach	Code Changes	Overhead	Visibility Scope
SDK Injection	Required	Medium	Application layer
Sidecar (Envoy, etc.)	Not required	High	L7 traffic
eBPF	Not required	Very low (1–3% CPU)	Full kernel through L7

Choosing a Tool — bpftrace vs Cilium vs Pixie

Before looking at hands-on examples, it's worth clarifying "which one should I use?" The three tools have overlapping areas but their primary use cases are entirely different.

Tool	When to Use
bpftrace	When you need one-off debugging on a specific server or process. Ad-hoc profiling such as flame graphs and system call latency
Cilium + Hubble	When you want continuous monitoring of network traffic across an entire Kubernetes cluster. When you have permission to replace the CNI
Pixie	When you want to attach HTTP/gRPC/DB APM immediately without code changes. Only requires `kubectl` access to the cluster

Practical Applications

Example 1: Real-Time Production Server Profiling with bpftrace

When: When you want to investigate on the spot why a specific process is slow, or when you want to try it immediately on a single server without any cluster installation.

bash

# Check the latency distribution of read() system calls for a specific process
# Usage: sudo bpftrace read_latency.bt $(pgrep nginx)
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_read { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_read
/@start[tid] && pid == $1/
{
  @read_latency_us = hist((nsecs - @start[tid]) / 1000);
  delete(@start[tid]);
}
' $(pgrep nginx)

bash

# Collect a CPU flame graph — see at a glance where CPU is being consumed most
# ustack captures the user-space call stack; switch to kstack to see kernel stacks too
sudo bpftrace -e '
profile:hz:99
/pid == $1/
{ @[ustack] = count(); }
' $(pgrep nginx)

Command Element	Description
`tracepoint:syscalls:sys_enter_read`	Attach hook at the entry point of the read() system call
`@start[tid]`	Store start timestamp keyed by thread ID (BPF Map — shared storage for exchanging data between eBPF programs and user space)
`profile:hz:99`	Sample 99 times per second (CPU profiling)
`ustack`	Collect user-space call stack (switch to `kstack` for kernel stack)

What is a BPF Map? It's the conduit for bringing data collected by an eBPF program inside the kernel out to user space. The @start[tid], @read_latency_us, and all variables with the @ prefix in the examples above are BPF Maps. bpftrace handles this internally so you don't need to declare them explicitly, but understanding the flow — "collect in kernel → store in Map → read from user space" — will be very helpful when writing more complex programs.

Example 2: Visualizing Kubernetes Traffic with Cilium + Hubble

bash

# Install Cilium + Hubble with Helm
# It's recommended to check the latest version with the command below before replacing the version
# helm search repo cilium/cilium -o json | jq -r '.[0].version'
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.16.0 \
  --namespace kube-system \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

bash

# View real-time traffic flows with the Hubble CLI
# Filter dropped HTTP traffic in the production namespace
hubble observe --namespace production \
  --protocol http \
  --verdict DROPPED

bash

# Check inter-service latency statistics
# Example output appears in the form: {"flow":{"l7":{"latency_ns":1523400}}}
hubble observe --namespace production \
  --type l7 \
  --output json | jq '.flow.l7.latency_ns'

Example 3: Building APM Without Code Changes Using Pixie

bash

# Install the Pixie CLI (recommended to review the script before installing)
# For security-sensitive environments: curl -fsSL https://withpixie.ai/install.sh -o install.sh && cat install.sh
bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
px deploy

bash

# Query P99 latency per HTTP endpoint
px run px/http_data_filtered -- \
  -start_time '-5m' \
  -namespace 'production'
 
# Automatically detect slow MySQL queries
px run px/mysql_stats -- \
  -start_time '-10m'

What is PxL? It's the query language used by Pixie, which lets you query kernel-level data collected by eBPF using pandas-like syntax. Another advantage is that queries run directly on local storage within the cluster, so data never leaves it.

Pros and Cons Analysis

Advantages

Item	Details
Zero-instrumentation	Collect telemetry without modifying application code or container images
Low overhead	Runs as JIT-compiled native code, typically around 1–3% CPU overhead
Wide visibility	Can simultaneously observe from kernel to user space, L3 through L7 network
Safety guarantee	The verifier preemptively blocks code that could cause kernel crashes via static analysis
Dynamic attachment	Add/remove probes at runtime without reboots or service interruptions

Disadvantages and Caveats

Item	Details	Mitigation
Learning curve	Requires understanding of kernel internals, C language, and verifier constraints	Recommended to start with high-level tools like Pixie and Coroot
Kernel version dependency	Features vary by kernel version; difficult to support legacy environments like older RHEL	Verify BTF/CO-RE support before adoption
Program constraints	No infinite loops; limitations in implementing complex logic	Bring data up to user space for complex analysis
No analysis layer	eBPF is a collection tool; visualization and analysis require a separate layer	Integrate with Prometheus + Grafana or Hubble UI
Linux only	Not supported on Windows/macOS (Windows eBPF project is in early stages)	Use Linux VM via Lima/Multipass for development environments
Security risks	Potential for kernel-level attacks via malicious eBPF programs; requires elevated privileges	Apply principle of least privilege with `CAP_BPF` capability; verify program signatures

What is BTF (BPF Type Format)? It's a format that includes type information so eBPF programs can be ported regardless of kernel version. BTF itself was introduced in kernel 4.18, and to fully leverage CO-RE (Compile Once – Run Everywhere), you need kernel 5.2 or higher with the CONFIG_DEBUG_INFO_BTF option, and kernel 5.4 or higher for full CO-RE functionality. It's good practice to check with uname -r and bpftool feature first.

The Most Common Mistakes in Practice

Skipping the kernel version check — eBPF features vary significantly by kernel version. Checking uname -r and bpftool feature is the first step. It's worth keeping in mind that CO-RE requires kernel 5.4 or higher, and BTF (CONFIG_DEBUG_INFO_BTF) requires 5.2 or higher.
Unconditionally granting root privileges — Loading eBPF programs requires privileges, but on modern kernels (5.8+), the CAP_BPF capability alone is sufficient. It's best to avoid unconditionally setting privileged: true on containers in production.
Expecting eBPF to handle both collection and analysis — eBPF's role ends at collecting data and storing it in BPF Maps. Keeping in mind from the start of your design that visualization, alerting, and aggregation require a separate pipeline (Prometheus, Grafana, Jaeger, etc.) will save you the trouble of overhauling your architecture later.

Closing Thoughts

There's no need to start with complex kernel programming. I strongly recommend approaching it by trying high-level tools first and gradually going deeper.

Three steps you can take right now:

Install bpftrace on a local or development server and explore the basic tools — Install with sudo apt install bpftrace or brew install bpftrace (requires Linux VM), then run bpftrace -l 'tracepoint:syscalls:*' to browse the list of available hooks. You'll get a direct feel for which points in the kernel eBPF can attach to.
If you have a Kubernetes cluster, deploy Pixie — Installed with a single px deploy command; you can immediately see HTTP/gRPC/DB request tracing working without any code changes.
If you're considering upgrading production networking, evaluate a Cilium migration — The significant throughput improvement of the eBPF datapath over traditional iptables can be verified in Cilium's official benchmark documentation, and you get Hubble-based L7 observability at the same time.

Next article: A practical guide to configuring a Kubernetes service mesh without sidecars using Cilium + Hubble — walking through the process of migrating from iptables to an eBPF datapath step by step.

Core Concepts

What eBPF Does in the Kernel

Why Zero-Instrumentation Is Powerful

Choosing a Tool — bpftrace vs Cilium vs Pixie

Practical Applications

Example 1: Real-Time Production Server Profiling with bpftrace

Example 2: Visualizing Kubernetes Traffic with Cilium + Hubble

Example 3: Building APM Without Code Changes Using Pixie

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

What eBPF Does in the Kernel

Why Zero-Instrumentation Is Powerful

Choosing a Tool — bpftrace vs Cilium vs Pixie

Practical Applications

Example 1: Real-Time Production Server Profiling with bpftrace

Example 2: Visualizing Kubernetes Traffic with Cilium + Hubble

Example 3: Building APM Without Code Changes Using Pixie

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

FinOps Practical Guide: Preventing Bill Shock Through Cloud Cost Optimization

WebAssembly (Wasm) Serverless: The Complete Guide — Sub-1ms Cold Starts to Kubernetes Deployment

MLOps Model Deployment Automation: Building a CI/CD/CT Pipeline with GitHub Actions + Kubeflow

Declaratively Automating Infrastructure with GitOps — From Deployment to Automated Recovery with Argo CD

AI-Driven Frontend CI/CD: Transforming Deployment Pipelines with Predictive, Self-Healing, and Autonomous Testing

Building an IDP with Backstage: The Story of Personally Implementing a Self-Service Deployment Environment