Grafana Loki + Tempo: Implementing Bidirectional Log-Trace Drill-Down with a Single Trace ID
Who this is for: A guide for backend developers who have experience operating Prometheus/Grafana but are new to Loki and Tempo. Basic Kubernetes operation experience and Docker Compose familiarity are sufficient to follow along.
When an incident occurs in a production service, the typical debugging routine goes like this: you spot a spike in error rates on the Grafana dashboard, open a Kibana tab to search through logs, then open Jaeger to explore distributed traces. The more this context switching repeats, the more critical clues you lose and the longer your Mean Time To Identify (MTTI) grows. In a microservices environment where a single request passes through dozens of services, finding the root cause with fragmented tools is inherently difficult.
After reading this article, you will be able to set up a bidirectional drill-down environment where you can click a single log line to view the full distributed trace, and conversely navigate back from a slow span to its associated logs. The core idea is simpler than you might expect: embed a single Trace ID in your logs, and Grafana Loki and Tempo will take care of the rest.
This article covers OpenTelemetry Collector pipeline configuration (Step 1), Grafana datasource bidirectional link setup (Step 2), and Node.js application instrumentation (Step 3), in that order. All example code is written based on configurations that actually work.
Core Concepts
The LGTM Stack and Signal Flow
The LGTM stack proposed by Grafana Labs consists of components each dedicated to one of the three core observability signals.
| Component | Role | Query Language |
|---|---|---|
| Loki | Label-based log aggregation. Cost-efficient due to no indexing of log content | LogQL |
| Grafana | Unified UI for visualizing metrics, logs, and traces on a single screen | — |
| Tempo | High-volume, low-cost distributed tracing backend. Leverages object storage (S3, etc.) | TraceQL |
| Mimir | Large-scale long-term metrics store (fully Prometheus-compatible) | PromQL |
Term definition LGTM is an acronym for Loki, Grafana, Tempo, and Mimir. Recently, Pyroscope (continuous profiling) has been added, and the stack is trending toward expansion as the LGTMP stack.
Understanding the signal flow is the starting point for configuration. All telemetry follows a one-way pipeline: Application → Collector → each backend → Grafana.
Application (OTel SDK)
│
│ OTLP (gRPC 4317 / HTTP 4318)
▼
OpenTelemetry Collector
├──── Logs ────▶ Loki :3100
├──── Traces ────▶ Tempo :4317
└──── Metrics ────▶ Mimir :9009
│
▼
Grafana :3000
(unified exploration UI)Because the Collector routes signals in the middle, you can later swap the backend from Loki to another logging solution without modifying your application code.
The Key to Bidirectional Linking: Trace ID
The connecting link of Full-Stack Observability is the Trace ID / Span ID. When an application writes a log and includes the current request's Trace ID, the following bidirectional drill-down becomes possible:
Metrics (Mimir)
│ Detect error rate spike → explore that time window
▼
Logs (Loki)
│ Click traceId in a log line → Derived Fields activates
▼
Traces (Tempo)
│ Identify slow spans → pinpoint root cause
▼
Profiling (Pyroscope) — code-line level analysisKey insight If a log contains just one Trace ID, bidirectional navigation between Loki → Tempo and Tempo → Loki is completed with a single click. This is the decisive difference between simple log collection and Full-Stack Observability.
OpenTelemetry: The Vendor-Neutral Instrumentation Standard
Since 2024, OpenTelemetry has established itself as the de facto instrumentation standard. Applications are instrumented with the OTel SDK and send metrics, logs, and traces to a single OpenTelemetry Collector endpoint. The Collector then routes each signal to the appropriate backend.
For log collection agents, Grafana Alloy, which integrates the legacy Promtail and Grafana Agent, is emerging as the next-generation standard. Alloy supports native OpenTelemetry pipelines and can collect all signals with a single agent, so for new projects it is worth considering Alloy over Promtail.
Practical Implementation
Three files work together in this structure:
project/
├── otel-collector-config.yaml # Step 1: Collector pipeline
├── datasources.yaml # Step 2: Grafana datasource provisioning
└── src/
├── tracing.ts # Step 3: SDK initialization (loaded before entry point)
└── logger.ts # Step 3: Logger with automatic Trace ID injectionStep 1: Configuring the OpenTelemetry Collector Pipeline
Configure the Collector first, as it serves as the entry point for the LGTM stack. The configuration below is a minimal pipeline that routes the three signals received via OTLP to their respective backends.
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc: {}
http: {}
processors:
batch: {} # Batches backend transmissions. Omitting this causes performance issues under high load
exporters:
loki:
endpoint: http://loki:3100/loki/api/v1/push
labels:
attributes:
service.name: "service_name"
severity: "level"
otlp/tempo:
endpoint: tempo:4317 # gRPC: use host:port format without a scheme
tls:
insecure: true
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
service:
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheusremotewrite]| Config Item | Description |
|---|---|
processors.batch |
Batches requests for bulk delivery. Must be included in production environments |
receivers.otlp |
Receives both gRPC (4317) and HTTP (4318) protocols simultaneously |
exporters.loki.labels.attributes |
Maps OTel attributes to Loki labels. Keeping the number of labels minimal is important |
exporters.otlp/tempo.endpoint |
gRPC exporters must be specified in host:port format without a scheme |
service.pipelines |
Independent pipeline configuration per signal type |
Step 2: Configuring Grafana Datasource Bidirectional Links
After configuring the Collector pipeline, you need to set up Grafana to link the two datasources to each other. Managing this with a YAML provisioning file lets the entire team share the same configuration as code.
# datasources.yaml (Grafana provisioning)
apiVersion: 1
datasources:
- name: Tempo
type: tempo
uid: tempo
url: http://tempo:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki # Must match the uid of the Loki datasource below
spanStartTimeShift: "-2s" # Timestamp offset correction (absorbs clock sync drift)
spanEndTimeShift: "+2s"
filterByTraceID: true
filterBySpanID: false
customQuery: false
- name: Loki
type: loki
uid: loki
url: http://loki:3100
jsonData:
derivedFields:
- name: TraceID
matcherRegex: '"traceId":\s*"(\w+)"' # Extracts the traceId field from JSON logs
url: "$${__value.raw}"
datasourceUid: tempo
urlDisplayLabel: "View Trace in Tempo"| Item | Description |
|---|---|
tracesToLogsV2 |
Tempo → Loki direction link. Automatically queries logs for the relevant time window when a span is clicked |
spanStartTimeShift / spanEndTimeShift |
NTP sync drift between servers can cause span times and log times to differ by a few seconds. Without this value, associated logs may not be retrieved when clicking from Tempo |
derivedFields.matcherRegex |
Loki → Tempo direction link. A regex that extracts the Trace ID from a log line. Must be adjusted to match your log format |
datasourceUid |
The uid values of Tempo and Loki must be correctly cross-referenced in each other's configuration for the link to work |
Step 3: Application Instrumentation — Node.js OpenTelemetry SDK
OTel SDK instrumentation is required for the application to automatically include the Trace ID in logs.
src/tracing.ts — The SDK initialization file. Must be loaded before the application entry point.
// src/tracing.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPLogExporter } from '@opentelemetry/exporter-logs-otlp-http';
import { BatchLogRecordProcessor } from '@opentelemetry/sdk-logs';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { resourceFromAttributes } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: resourceFromAttributes({
[ATTR_SERVICE_NAME]: 'my-api-service',
}),
traceExporter: new OTLPTraceExporter({
url: 'http://otel-collector:4318/v1/traces',
}),
logRecordProcessor: new BatchLogRecordProcessor(
// In development, swap to SimpleLogRecordProcessor to verify immediate delivery
new OTLPLogExporter({ url: 'http://otel-collector:4318/v1/logs' })
),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();Note The HTTP exporter (
exporter-trace-otlp-http) requires the URL to includehttp://and a path (/v1/traces). When using the gRPC exporter (exporter-trace-otlp-grpc), you must usehost:portformat without a scheme to avoid connection failures.BatchLogRecordProcessoris recommended for production, whileSimpleLogRecordProcessoris for development and debugging.
To load tracing.ts first when starting the application, run it as follows:
# With ts-node
ts-node -r ./src/tracing.ts src/main.ts
# After compiling
node -r ./dist/tracing.js dist/main.jssrc/logger.ts — Automatically injects the Trace ID from the currently active span into logs via Winston.
// src/logger.ts
import winston from 'winston';
import { trace } from '@opentelemetry/api';
const logger = winston.createLogger({
format: winston.format.combine(
winston.format.timestamp(),
winston.format((info) => {
const span = trace.getActiveSpan();
if (span) {
const { traceId, spanId } = span.spanContext();
info.traceId = traceId;
info.spanId = spanId;
}
return info;
})(),
winston.format.json()
),
transports: [new winston.transports.Console()],
});
export default logger;With this setup, every log automatically includes a "traceId": "abc123..." field. The Loki Derived Fields regex detects this field and generates a "View Trace in Tempo" link in the Grafana Explore view.
Pros and Cons Analysis
Advantages
| Item | Details |
|---|---|
| Cost efficiency | Loki's no-content-indexing approach and Tempo's use of object storage like S3 significantly reduce operational costs compared to ELK/Jaeger |
| Single UI | Metrics, logs, traces, and profiling can all be explored in a single Grafana screen |
| Cross-signal correlation | Full drill-down in all directions — Metrics → Logs → Traces → Profiling — is supported |
| Vendor neutrality | Adopting the OpenTelemetry standard means no lock-in to a specific cloud or vendor |
| Kubernetes-friendly | Official Helm Charts and the Alloy agent simplify cloud-native deployments |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Multiple query languages | PromQL, LogQL, and TraceQL each need to be learned separately | It is recommended to save frequently used queries as Grafana dashboards for reuse |
| High cardinality issues | Using high-cardinality fields like user_id or URLs as Loki labels causes severe performance degradation |
Use only low-cardinality values such as service, env, and level as labels; include the rest in the log message body |
| Operational complexity | Each component requires individual management and upgrades | For small teams (3 or fewer DevOps staff), Grafana Cloud managed service may have a lower total cost than self-hosting |
| Initial setup difficulty | Bidirectional Trace-to-Logs setup requires adjusting timestamp offsets and Derived Fields regexes | Managing the provisioning YAML in this article as code ensures configuration reproducibility |
| UI maturity | Grafana Explore UI's log analysis UX is considered somewhat lacking compared to Kibana | Continuously improving since Grafana 12; mixing Loki with Kibana is also possible if needed |
Term supplement Cardinality refers to the number of unique values a given label can hold. The
envlabel has low cardinality with values like production/staging/dev, butuser_idcan have millions of unique values, making it high cardinality. Because Loki creates a separate stream for each label combination, high-cardinality labels have a serious impact on memory and performance.
The Most Common Mistakes in Practice
-
Using high-cardinality values as Loki labels: Setting
request_idoruser_emailas labels causes the stream count to explode. The correct approach is to include these values in the log message body, not as labels. -
Not including Trace ID in logs: Without a Trace ID in the log, the Derived Fields regex won't match and no Logs-to-Trace link will be generated. It is recommended to use the OTel SDK as shown in
logger.tsto ensure every log automatically includes a Trace ID. -
Not correcting timestamp offsets: NTP sync drift between servers can cause span times and log times to differ by a few seconds. Without the
spanStartTimeShift: "-2s"setting, clicking from Tempo may result in no associated logs being returned.
Closing Thoughts
Simply embedding a single Trace ID in your logs completes the core connection of Full-Stack Observability linking metrics, logs, and traces. Even the seemingly complex LGTM stack can be adopted incrementally, centered around the OpenTelemetry Collector.
Three steps you can start right now:
-
You can experience the LGTM stack locally first. Clone
grafana/intro-to-mlt, the official Grafana Labs example repository, and a singledocker compose upwill give you a fully running environment with Loki, Tempo, Mimir, and Grafana. This repository is the first resource to look at. -
You can add the OpenTelemetry SDK to an existing service and verify that the
traceIdfield is included in your logs. For Node.js, the single@opentelemetry/auto-instrumentations-nodepackage is enough to start automatic tracing for HTTP, DB, and external requests. -
You can try managing your configuration as code in your team's repository, based on the
datasources.yamlprovisioning file in this article. This is far more reproducible than configuring through the UI and much easier to share with teammates.
Next article: TraceQL deep dive — filtering error spans, analyzing P99 latency per service, and writing cross-signal queries with Mimir metrics in Grafana Tempo 2.x
References
Recommended reading first:
- Configure trace to logs correlation | Grafana Docs — Official reference for the Step 2 configuration in this article
- Grafana intro-to-mlt | GitHub — Official example repository for local hands-on practice
Additional references:
- Grafana Tempo Official Documentation
- Trace correlations | Grafana Docs
- How to Build a Complete LGTM Stack with OpenTelemetry (2026)
- How to Correlate Logs and Traces with Loki and Tempo (2026)
- How to Set Up Trace-to-Logs Linking Between Grafana Tempo and Loki (2026)
- Full Stack Observability with Grafana, Prometheus, Loki, Tempo, and OpenTelemetry | Medium
- Kubernetes Observability with Grafana Stack (Prometheus, Loki, Tempo, Alloy) | Medium
- Monitoring Applications with OpenTelemetry, Alloy, Loki, Tempo & Mimir
- End-to-End Observability with Prometheus, Grafana, Loki, OpenTelemetry and Tempo | Improving
- Grafana Loki Pros and Cons 2025 | PeerSpot