LogQL Pipeline Parser Practical Guide — Extracting Fields from Unstructured Logs in Grafana Loki 3.x
When analyzing logs from a running service, you often encounter situations like these: Nginx access logs piling up as single-line text and you want to aggregate only specific status codes; you want to extract response time distributions from API server JSON logs; or you want to filter only error levels from custom text logs left by a legacy app. They all share one thing in common: nothing was defined at collection time.
Loki solves this problem by not indexing log content when storing it, but instead having parsers dynamically extract fields at query time. This means you don't need to design a schema or build an index in advance — you can pull the fields you want from the log body right now and use them for filtering and aggregation.
This article targets backend and infrastructure developers with basic Loki query experience. We'll walk through practical examples of when and how to combine the | json, | pattern, and | regexp parsers, and explain why Structured Metadata — introduced experimentally in Loki 2.9 and made official in 3.0 — must be placed before parsers in the pipeline.
Core Concepts
When first encountering Loki queries, stream selectors, parsers, and structured metadata may seem like independent concepts. But the three form a single hierarchy. The stream selector decides which group of log files to read, structured metadata filters pre-attached attributes on each log line, and the parser extracts fields from the log body at runtime. Query performance also varies significantly depending on this order.
The Two Axes of a LogQL Query: Stream Selector and Log Pipeline
{app="nginx", env="prod"} -- Stream selector: decides which logs to read
| json -- Pipeline start: parse log body
| status_code >= 500
| line_format "{{.method}} {{.path}} -> {{.status_code}}"- Stream selector: Written as
{label="value"}, it determines which log streams to read. Since Loki builds its index based on these labels, the narrower the selector, the faster the query. - Log pipeline: Processing stages connected by the
|symbol. They execute sequentially left to right, and the more logs reduced in earlier stages, the lower the processing cost of subsequent stages.
Pipeline execution principle: Placing simple string filters (
|=,!=) before the parser is best practice. If the parser runs first, it parses every unnecessary log line before discarding it.
Parser Types and Selection Criteria
Parsers analyze log lines to generate temporary key-value labels. The generated labels are then used in subsequent stages for filtering (| status_code >= 500) or formatting (line_format).
| Parser | Suitable Log Format | Relative Speed |
|---|---|---|
| json |
{"key":"value"} JSON logs |
Fast |
| logfmt |
key=value key2=value2 format |
Fast |
| pattern "<pat>" |
Whitespace/delimiter-based unstructured logs | Very fast |
| regexp "(?P<name>re)" |
Complex custom formats | Slow |
| unpack |
Restoring logs serialized by Promtail pack stage |
Fast |
unpack is a parser that, when the Promtail (or Grafana Alloy) pack stage has serialized labels into the log line for storage, separates them back out at query time. It is not used for general log parsing.
Selection criteria: If logs are JSON, try
| jsonfirst; if fields are separated by whitespace or delimiters, try| pattern; for other complex patterns, try| regexp. Thepatternparser delivers up to 10x better performance compared to regular expressions (official benchmark).
Structured Metadata — The Third Label Tier
Structured metadata is a third label tier first introduced experimentally in Loki 2.9 and promoted to an official feature in Loki 3.0. It sits between index labels and log lines, allowing unique-value fields to be attached to logs without increasing index cardinality.
┌────────────────────────────────────────────────────┐
│ Index labels app="api", env="prod" │ ← Stream definition, low cardinality required
│ Structured metadata trace_id="7f3a92b1" │ ← Stores unique values, no index cost
│ Log line {"level":"error","msg":"..."} │ ← Actual log content
└────────────────────────────────────────────────────┘When sending logs to Loki via the OpenTelemetry Collector, OTel's Resource Attributes and Log Attributes are automatically stored as structured metadata. At query time, they can be accessed using the same syntax as stream selectors.
-- Structured metadata filter — usable directly without a parser
{namespace="production"} | trace_id="7f3a92b1"Bloom Filter acceleration (Loki 3.3+): Structured metadata filters like
trace_idandspan_idmust be placed before the parser to benefit from Bloom Filter index acceleration. Placing them after the parser — as in| json | trace_id="abc"— causes a full log scan.
Practical Application
The examples are divided into two groups. Basic parsing patterns (Examples 1–3) cover the fundamental usage of each parser, while advanced usage (Examples 4–6) covers structured metadata, metric conversion, and label normalization.
Basic Parsing Patterns
Example 1: Aggregating HTTP 500 Errors from JSON API Logs
Sample input log
{"level":"error","method":"POST","path":"/api/orders","status_code":500,"duration_ms":342}{app="api-server"}
| json
| status_code >= 500
| line_format "{{.method}} {{.path}} -> {{.status_code}}"| Stage | Role |
|---|---|
| json |
Parses the log line and extracts status_code, method, path, etc. as temporary labels |
| status_code >= 500 |
Numeric comparison filtering using the extracted status_code label |
| line_format "..." |
Reformats the output using Go template syntax ({{.fieldName}}) |
Handling nested JSON: If the log contains a nested structure like {"request":{"method":"POST","path":"/api"}}, you can specify particular fields for extraction.
{app="api-server"}
| json method="request.method", path="request.path"Using the | json labelName="nested.path" format to specify the path directly lets you extract inner fields under the label name of your choice.
Example 2: Parsing Nginx Access Logs with the pattern Parser
Sample input log
127.0.0.1 - frank [10/Oct/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326{job="nginx"}
| pattern `<ip> - <user> [<ts>] "<method> <path> <proto>" <status> <bytes>`
| status = "500"
| line_format "{{.ip}} | {{.path}}"Specifying positions in the <fieldName> format extracts the value at that position as a label. Placing the input log and the pattern template side by side lets you immediately verify the correspondence.
Caution with fields containing spaces: When multiple space-separated fields appear inside quotes — as in
"<method> <path> <proto>"— the quotes and space delimiters in the pattern must match the actual log exactly. This is the most common mistake when writing patterns for the first time, so it's recommended to first verify extraction results with| line_format "{{.method}} {{.path}}".
Example 3: Parsing Custom Legacy Logs with the regexp Parser
Sample input log
ERROR 2025-10-10 Null pointer exception in PaymentService.process(){app="legacy-app"}
| regexp `(?P<level>\w+)\s+(?P<ts>\d{4}-\d{2}-\d{2})\s+(?P<msg>.+)`
| level = "ERROR"Naming a capture group with the (?P<name>pattern) format creates a label with that name. Validating the pattern at regex101.com (select Go flavor) before applying it reduces trial and error.
Advanced Usage
Example 4: Distributed Trace Correlation with OTel Structured Metadata
A pattern for extracting only the logs matching a specific trace_id from logs collected via OpenTelemetry. Since trace_id is stored as structured metadata, it can be filtered directly without a parser.
{namespace="production"}
| trace_id="7f3a92b1"
| json
| line_format "{{.message}}"The trace_id filter must be placed before the | json parser for Loki 3.3+'s Bloom Filter acceleration to apply. Reversing the order causes a full log scan.
Example 5: Converting Numeric Data in Logs to Metrics with unwrap
If a log line contains a numeric value such as response time, you can use unwrap to extract it for use in range aggregations.
sum by (endpoint) (
avg_over_time(
{app="api"} | json | unwrap duration_ms | __error__="" [5m]
)
)
rate()is a function that handles counter increments (events per second). When computing the average of measured values like response times, useavg_over_time()to get correct results.
There is a reason to add the | __error__="" filter. unwrap generates an __error__ label when the target field cannot be converted to a number. Without this filter to exclude conversion-failed logs, aggregation results may be skewed.
Example 6: Normalizing Label Names with label_format
When multiple services log fields with the same meaning under different names, label_format can unify the naming.
{app="payments"}
| json
| label_format svc=app, req_id=request_idDifference between rename and copy: The
label_format dst=srcformat in LogQL moves (renames) the value of thesrclabel to the namedst. The original labelsrcis removed from the result. If you want to keep the original while also adding a new name, use the| label_format new_name="{{.old_name}}"format.
Pros and Cons
Advantages
| Item | Description |
|---|---|
| No upfront schema required | Parsers dynamically generate labels at query time, so there is no need to define a schema at the collection stage |
| Cardinality problem resolved | Storing unique-value fields like trace_id and request_id as structured metadata enables filtering without index explosion |
| Native OTel integration | Attributes are automatically mapped to structured metadata in an OTel Collector → Loki direct pipeline |
| Flexible pipeline composition | Parser → filter → format → aggregation can be chained freely |
| Deriving metrics from logs | unwrap enables numeric aggregation from logs without a separate metrics collection setup |
Disadvantages and Caveats
| Item | Description | Mitigation |
|---|---|---|
| Cardinality explosion | Setting unique values like user_id or IP as index labels creates thousands of chunk files, causing severe performance degradation |
Handle unique values as structured metadata or pipeline filters |
| Bloom acceleration condition | Structured metadata filters placed after the parser do not benefit from Bloom Filter acceleration | Always place metadata filters before the parser stage |
| regexp performance | The regular expression parser is powerful but slower than other parsers | Consider the pattern parser first whenever possible |
| Full-text search limitations | Loki does not index log content, making it unsuitable for full-text search | Consider Elasticsearch/OpenSearch if full-text search is required |
| unwrap conversion errors | If the unwrap target field is not numeric, an __error__ label is generated and aggregation becomes skewed |
Always add the | __error__="" filter |
Cardinality: The number of unique values a label can hold. When a label has few distinct values — like
env="prod/dev/staging"— it has low cardinality. When values grow unboundedly — likeuser_id="1001/1002/..."— it has high cardinality. Since Loki creates separate chunks per label, higher cardinality causes storage and memory usage to spike.
The Most Common Mistakes in Practice
- Setting unique values as index labels: Adding values like
request_id,trace_id, oruser_idas index labels in Promtail or Alloy configuration causes cardinality explosion. It is recommended to include these values in structured metadata or the log body and handle them in the pipeline. - Placing the parser at the very front of the pipeline: If a simple string filter (
|= "ERROR") is not placed before| json, every log line unrelated to errors gets parsed before being discarded. Applying simple filters first to reduce the set of lines to be parsed is far more efficient. - Placing structured metadata filters after the parser: Writing
| json | trace_id="abc"— with the metadata filter after the parser — means Bloom Filter acceleration does not apply and query speed degrades. Maintaining the order| trace_id="abc" | jsonis important.
Closing Thoughts
A single change in pipeline order can alter query performance by orders of magnitude. Keeping just two principles — placing string filters before the parser, and placing structured metadata filters before the parser — resolves the vast majority of performance issues.
Three steps you can take right now:
- In Grafana's Explore menu, enter
{app="your-service"} | jsonand see what fields are extracted. - Copy a single line from your Nginx or application logs, then fill in
<fieldName>at each field position to complete a| patterntemplate. - If you're using the OTel Collector, use the query
{namespace="production"} | trace_id="request-id"to trace the full log flow for a specific request.
Next article: How to build a log-based SLO (Service Level Objective) dashboard in Grafana Loki using
avg_over_time(... | unwrap ...)andquantile_over_time()
References
- Log queries | Grafana Loki official docs
- What is structured metadata | Grafana Loki official docs
- Loki 3.0 release notes — Bloom filters, native OTel support
- Grafana Loki 3.3 — Bloom filter query acceleration for structured metadata
- New in Loki 2.3: LogQL pattern parser | Grafana Labs blog
- Query acceleration | Grafana Loki official docs
- Ingesting logs to Loki using OpenTelemetry Collector | Grafana Loki official docs
- LogQL template functions | Grafana Loki official docs