LogQL Pipeline Parser Practical Guide — Extracting Fields from Unstructured Logs in Grafana Loki 3.x

When analyzing logs from a running service, you often encounter situations like these: Nginx access logs piling up as single-line text and you want to aggregate only specific status codes; you want to extract response time distributions from API server JSON logs; or you want to filter only error levels from custom text logs left by a legacy app. They all share one thing in common: nothing was defined at collection time.

Loki solves this problem by not indexing log content when storing it, but instead having parsers dynamically extract fields at query time. This means you don't need to design a schema or build an index in advance — you can pull the fields you want from the log body right now and use them for filtering and aggregation.

This article targets backend and infrastructure developers with basic Loki query experience. We'll walk through practical examples of when and how to combine the | json, | pattern, and | regexp parsers, and explain why Structured Metadata — introduced experimentally in Loki 2.9 and made official in 3.0 — must be placed before parsers in the pipeline.

Core Concepts

When first encountering Loki queries, stream selectors, parsers, and structured metadata may seem like independent concepts. But the three form a single hierarchy. The stream selector decides which group of log files to read, structured metadata filters pre-attached attributes on each log line, and the parser extracts fields from the log body at runtime. Query performance also varies significantly depending on this order.

The Two Axes of a LogQL Query: Stream Selector and Log Pipeline

logql

{app="nginx", env="prod"}         -- Stream selector: decides which logs to read
  | json                           -- Pipeline start: parse log body
  | status_code >= 500
  | line_format "{{.method}} {{.path}} -> {{.status_code}}"

Stream selector: Written as {label="value"}, it determines which log streams to read. Since Loki builds its index based on these labels, the narrower the selector, the faster the query.
Log pipeline: Processing stages connected by the | symbol. They execute sequentially left to right, and the more logs reduced in earlier stages, the lower the processing cost of subsequent stages.

Pipeline execution principle: Placing simple string filters (|=, !=) before the parser is best practice. If the parser runs first, it parses every unnecessary log line before discarding it.

Parser Types and Selection Criteria

Parsers analyze log lines to generate temporary key-value labels. The generated labels are then used in subsequent stages for filtering (| status_code >= 500) or formatting (line_format).

Parser	Suitable Log Format	Relative Speed
`\| json`	`{"key":"value"}` JSON logs	Fast
`\| logfmt`	`key=value key2=value2` format	Fast
`\| pattern "<pat>"`	Whitespace/delimiter-based unstructured logs	Very fast
`\| regexp "(?P<name>re)"`	Complex custom formats	Slow
`\| unpack`	Restoring logs serialized by Promtail `pack` stage	Fast

unpack is a parser that, when the Promtail (or Grafana Alloy) pack stage has serialized labels into the log line for storage, separates them back out at query time. It is not used for general log parsing.

Selection criteria: If logs are JSON, try | json first; if fields are separated by whitespace or delimiters, try | pattern; for other complex patterns, try | regexp. The pattern parser delivers up to 10x better performance compared to regular expressions (official benchmark).

Structured Metadata — The Third Label Tier

Structured metadata is a third label tier first introduced experimentally in Loki 2.9 and promoted to an official feature in Loki 3.0. It sits between index labels and log lines, allowing unique-value fields to be attached to logs without increasing index cardinality.

bash

┌────────────────────────────────────────────────────┐
│ Index labels       app="api", env="prod"           │  ← Stream definition, low cardinality required
│ Structured metadata  trace_id="7f3a92b1"           │  ← Stores unique values, no index cost
│ Log line           {"level":"error","msg":"..."}   │  ← Actual log content
└────────────────────────────────────────────────────┘

When sending logs to Loki via the OpenTelemetry Collector, OTel's Resource Attributes and Log Attributes are automatically stored as structured metadata. At query time, they can be accessed using the same syntax as stream selectors.

logql

-- Structured metadata filter — usable directly without a parser
{namespace="production"} | trace_id="7f3a92b1"

Bloom Filter acceleration (Loki 3.3+): Structured metadata filters like trace_id and span_id must be placed before the parser to benefit from Bloom Filter index acceleration. Placing them after the parser — as in | json | trace_id="abc" — causes a full log scan.

Practical Application

The examples are divided into two groups. Basic parsing patterns (Examples 1–3) cover the fundamental usage of each parser, while advanced usage (Examples 4–6) covers structured metadata, metric conversion, and label normalization.

Basic Parsing Patterns

Example 1: Aggregating HTTP 500 Errors from JSON API Logs

Sample input log

json

{"level":"error","method":"POST","path":"/api/orders","status_code":500,"duration_ms":342}

logql

{app="api-server"}
  | json
  | status_code >= 500
  | line_format "{{.method}} {{.path}} -> {{.status_code}}"

Stage	Role
`\| json`	Parses the log line and extracts `status_code`, `method`, `path`, etc. as temporary labels
`\| status_code >= 500`	Numeric comparison filtering using the extracted `status_code` label
`\| line_format "..."`	Reformats the output using Go template syntax (`{{.fieldName}}`)

Handling nested JSON: If the log contains a nested structure like {"request":{"method":"POST","path":"/api"}}, you can specify particular fields for extraction.

logql

{app="api-server"}
  | json method="request.method", path="request.path"

Using the | json labelName="nested.path" format to specify the path directly lets you extract inner fields under the label name of your choice.

Example 2: Parsing Nginx Access Logs with the pattern Parser

Sample input log

127.0.0.1 - frank [10/Oct/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326

logql

{job="nginx"}
  | pattern `<ip> - <user> [<ts>] "<method> <path> <proto>" <status> <bytes>`
  | status = "500"
  | line_format "{{.ip}} | {{.path}}"

Specifying positions in the <fieldName> format extracts the value at that position as a label. Placing the input log and the pattern template side by side lets you immediately verify the correspondence.

Caution with fields containing spaces: When multiple space-separated fields appear inside quotes — as in "<method> <path> <proto>" — the quotes and space delimiters in the pattern must match the actual log exactly. This is the most common mistake when writing patterns for the first time, so it's recommended to first verify extraction results with | line_format "{{.method}} {{.path}}".

Example 3: Parsing Custom Legacy Logs with the regexp Parser

Sample input log

ERROR 2025-10-10 Null pointer exception in PaymentService.process()

logql

{app="legacy-app"}
  | regexp `(?P<level>\w+)\s+(?P<ts>\d{4}-\d{2}-\d{2})\s+(?P<msg>.+)`
  | level = "ERROR"

Naming a capture group with the (?P<name>pattern) format creates a label with that name. Validating the pattern at regex101.com (select Go flavor) before applying it reduces trial and error.

Advanced Usage

Example 4: Distributed Trace Correlation with OTel Structured Metadata

A pattern for extracting only the logs matching a specific trace_id from logs collected via OpenTelemetry. Since trace_id is stored as structured metadata, it can be filtered directly without a parser.

logql

{namespace="production"}
  | trace_id="7f3a92b1"
  | json
  | line_format "{{.message}}"

The trace_id filter must be placed before the | json parser for Loki 3.3+'s Bloom Filter acceleration to apply. Reversing the order causes a full log scan.

Example 5: Converting Numeric Data in Logs to Metrics with unwrap

If a log line contains a numeric value such as response time, you can use unwrap to extract it for use in range aggregations.

logql

sum by (endpoint) (
  avg_over_time(
    {app="api"} | json | unwrap duration_ms | __error__="" [5m]
  )
)

rate() is a function that handles counter increments (events per second). When computing the average of measured values like response times, use avg_over_time() to get correct results.

There is a reason to add the | __error__="" filter. unwrap generates an __error__ label when the target field cannot be converted to a number. Without this filter to exclude conversion-failed logs, aggregation results may be skewed.

Example 6: Normalizing Label Names with label_format

When multiple services log fields with the same meaning under different names, label_format can unify the naming.

logql

{app="payments"}
  | json
  | label_format svc=app, req_id=request_id

Difference between rename and copy: The label_format dst=src format in LogQL moves (renames) the value of the src label to the name dst. The original label src is removed from the result. If you want to keep the original while also adding a new name, use the | label_format new_name="{{.old_name}}" format.

Pros and Cons

Advantages

Item	Description
No upfront schema required	Parsers dynamically generate labels at query time, so there is no need to define a schema at the collection stage
Cardinality problem resolved	Storing unique-value fields like `trace_id` and `request_id` as structured metadata enables filtering without index explosion
Native OTel integration	Attributes are automatically mapped to structured metadata in an OTel Collector → Loki direct pipeline
Flexible pipeline composition	Parser → filter → format → aggregation can be chained freely
Deriving metrics from logs	`unwrap` enables numeric aggregation from logs without a separate metrics collection setup

Disadvantages and Caveats

Item	Description	Mitigation
Cardinality explosion	Setting unique values like `user_id` or IP as index labels creates thousands of chunk files, causing severe performance degradation	Handle unique values as structured metadata or pipeline filters
Bloom acceleration condition	Structured metadata filters placed after the parser do not benefit from Bloom Filter acceleration	Always place metadata filters before the parser stage
regexp performance	The regular expression parser is powerful but slower than other parsers	Consider the `pattern` parser first whenever possible
Full-text search limitations	Loki does not index log content, making it unsuitable for full-text search	Consider Elasticsearch/OpenSearch if full-text search is required
unwrap conversion errors	If the `unwrap` target field is not numeric, an `__error__` label is generated and aggregation becomes skewed	Always add the `\| __error__=""` filter

Cardinality: The number of unique values a label can hold. When a label has few distinct values — like env="prod/dev/staging" — it has low cardinality. When values grow unboundedly — like user_id="1001/1002/..." — it has high cardinality. Since Loki creates separate chunks per label, higher cardinality causes storage and memory usage to spike.

The Most Common Mistakes in Practice

Setting unique values as index labels: Adding values like request_id, trace_id, or user_id as index labels in Promtail or Alloy configuration causes cardinality explosion. It is recommended to include these values in structured metadata or the log body and handle them in the pipeline.
Placing the parser at the very front of the pipeline: If a simple string filter (|= "ERROR") is not placed before | json, every log line unrelated to errors gets parsed before being discarded. Applying simple filters first to reduce the set of lines to be parsed is far more efficient.
Placing structured metadata filters after the parser: Writing | json | trace_id="abc" — with the metadata filter after the parser — means Bloom Filter acceleration does not apply and query speed degrades. Maintaining the order | trace_id="abc" | json is important.

Closing Thoughts

A single change in pipeline order can alter query performance by orders of magnitude. Keeping just two principles — placing string filters before the parser, and placing structured metadata filters before the parser — resolves the vast majority of performance issues.

Three steps you can take right now:

In Grafana's Explore menu, enter {app="your-service"} | json and see what fields are extracted.
Copy a single line from your Nginx or application logs, then fill in <fieldName> at each field position to complete a | pattern template.
If you're using the OTel Collector, use the query {namespace="production"} | trace_id="request-id" to trace the full log flow for a specific request.

Next article: How to build a log-based SLO (Service Level Objective) dashboard in Grafana Loki using avg_over_time(... | unwrap ...) and quantile_over_time()

References

LogQL Pipeline Parser Practical Guide — Extracting Fields from Unstructured Logs in Grafana Loki 3.x | DEV BAK - 기술블로그

DevOps

LogQL Pipeline Parser Practical Guide — Extracting Fields from Unstructured Logs in Grafana Loki 3.x

Core Concepts

The Two Axes of a LogQL Query: Stream Selector and Log Pipeline

logql

{app="nginx", env="prod"}         -- Stream selector: decides which logs to read
  | json                           -- Pipeline start: parse log body
  | status_code >= 500
  | line_format "{{.method}} {{.path}} -> {{.status_code}}"

Stream selector: Written as {label="value"}, it determines which log streams to read. Since Loki builds its index based on these labels, the narrower the selector, the faster the query.
Log pipeline: Processing stages connected by the | symbol. They execute sequentially left to right, and the more logs reduced in earlier stages, the lower the processing cost of subsequent stages.

Pipeline execution principle: Placing simple string filters (|=, !=) before the parser is best practice. If the parser runs first, it parses every unnecessary log line before discarding it.

Parser Types and Selection Criteria

Parsers analyze log lines to generate temporary key-value labels. The generated labels are then used in subsequent stages for filtering (| status_code >= 500) or formatting (line_format).

Parser	Suitable Log Format	Relative Speed
`\| json`	`{"key":"value"}` JSON logs	Fast
`\| logfmt`	`key=value key2=value2` format	Fast
`\| pattern "<pat>"`	Whitespace/delimiter-based unstructured logs	Very fast
`\| regexp "(?P<name>re)"`	Complex custom formats	Slow
`\| unpack`	Restoring logs serialized by Promtail `pack` stage	Fast

Selection criteria: If logs are JSON, try | json first; if fields are separated by whitespace or delimiters, try | pattern; for other complex patterns, try | regexp. The pattern parser delivers up to 10x better performance compared to regular expressions (official benchmark).

Structured Metadata — The Third Label Tier

bash

┌────────────────────────────────────────────────────┐
│ Index labels       app="api", env="prod"           │  ← Stream definition, low cardinality required
│ Structured metadata  trace_id="7f3a92b1"           │  ← Stores unique values, no index cost
│ Log line           {"level":"error","msg":"..."}   │  ← Actual log content
└────────────────────────────────────────────────────┘

logql

-- Structured metadata filter — usable directly without a parser
{namespace="production"} | trace_id="7f3a92b1"

Bloom Filter acceleration (Loki 3.3+): Structured metadata filters like trace_id and span_id must be placed before the parser to benefit from Bloom Filter index acceleration. Placing them after the parser — as in | json | trace_id="abc" — causes a full log scan.

Practical Application

Basic Parsing Patterns

Example 1: Aggregating HTTP 500 Errors from JSON API Logs

Sample input log

json

{"level":"error","method":"POST","path":"/api/orders","status_code":500,"duration_ms":342}

logql

{app="api-server"}
  | json
  | status_code >= 500
  | line_format "{{.method}} {{.path}} -> {{.status_code}}"

Stage	Role
`\| json`	Parses the log line and extracts `status_code`, `method`, `path`, etc. as temporary labels
`\| status_code >= 500`	Numeric comparison filtering using the extracted `status_code` label
`\| line_format "..."`	Reformats the output using Go template syntax (`{{.fieldName}}`)

Handling nested JSON: If the log contains a nested structure like {"request":{"method":"POST","path":"/api"}}, you can specify particular fields for extraction.

logql

{app="api-server"}
  | json method="request.method", path="request.path"

Using the | json labelName="nested.path" format to specify the path directly lets you extract inner fields under the label name of your choice.

Example 2: Parsing Nginx Access Logs with the pattern Parser

Sample input log

127.0.0.1 - frank [10/Oct/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326

logql

{job="nginx"}
  | pattern `<ip> - <user> [<ts>] "<method> <path> <proto>" <status> <bytes>`
  | status = "500"
  | line_format "{{.ip}} | {{.path}}"

Caution with fields containing spaces: When multiple space-separated fields appear inside quotes — as in "<method> <path> <proto>" — the quotes and space delimiters in the pattern must match the actual log exactly. This is the most common mistake when writing patterns for the first time, so it's recommended to first verify extraction results with | line_format "{{.method}} {{.path}}".

Example 3: Parsing Custom Legacy Logs with the regexp Parser

Sample input log

ERROR 2025-10-10 Null pointer exception in PaymentService.process()

logql

{app="legacy-app"}
  | regexp `(?P<level>\w+)\s+(?P<ts>\d{4}-\d{2}-\d{2})\s+(?P<msg>.+)`
  | level = "ERROR"

Naming a capture group with the (?P<name>pattern) format creates a label with that name. Validating the pattern at regex101.com (select Go flavor) before applying it reduces trial and error.

Advanced Usage

Example 4: Distributed Trace Correlation with OTel Structured Metadata

logql

{namespace="production"}
  | trace_id="7f3a92b1"
  | json
  | line_format "{{.message}}"

The trace_id filter must be placed before the | json parser for Loki 3.3+'s Bloom Filter acceleration to apply. Reversing the order causes a full log scan.

Example 5: Converting Numeric Data in Logs to Metrics with unwrap

If a log line contains a numeric value such as response time, you can use unwrap to extract it for use in range aggregations.

logql

sum by (endpoint) (
  avg_over_time(
    {app="api"} | json | unwrap duration_ms | __error__="" [5m]
  )
)

rate() is a function that handles counter increments (events per second). When computing the average of measured values like response times, use avg_over_time() to get correct results.

Example 6: Normalizing Label Names with label_format

When multiple services log fields with the same meaning under different names, label_format can unify the naming.

logql

{app="payments"}
  | json
  | label_format svc=app, req_id=request_id

Difference between rename and copy: The label_format dst=src format in LogQL moves (renames) the value of the src label to the name dst. The original label src is removed from the result. If you want to keep the original while also adding a new name, use the | label_format new_name="{{.old_name}}" format.

Pros and Cons

Advantages

Item	Description
No upfront schema required	Parsers dynamically generate labels at query time, so there is no need to define a schema at the collection stage
Cardinality problem resolved	Storing unique-value fields like `trace_id` and `request_id` as structured metadata enables filtering without index explosion
Native OTel integration	Attributes are automatically mapped to structured metadata in an OTel Collector → Loki direct pipeline
Flexible pipeline composition	Parser → filter → format → aggregation can be chained freely
Deriving metrics from logs	`unwrap` enables numeric aggregation from logs without a separate metrics collection setup

Disadvantages and Caveats

Item	Description	Mitigation
Cardinality explosion	Setting unique values like `user_id` or IP as index labels creates thousands of chunk files, causing severe performance degradation	Handle unique values as structured metadata or pipeline filters
Bloom acceleration condition	Structured metadata filters placed after the parser do not benefit from Bloom Filter acceleration	Always place metadata filters before the parser stage
regexp performance	The regular expression parser is powerful but slower than other parsers	Consider the `pattern` parser first whenever possible
Full-text search limitations	Loki does not index log content, making it unsuitable for full-text search	Consider Elasticsearch/OpenSearch if full-text search is required
unwrap conversion errors	If the `unwrap` target field is not numeric, an `__error__` label is generated and aggregation becomes skewed	Always add the `\| __error__=""` filter

Cardinality: The number of unique values a label can hold. When a label has few distinct values — like env="prod/dev/staging" — it has low cardinality. When values grow unboundedly — like user_id="1001/1002/..." — it has high cardinality. Since Loki creates separate chunks per label, higher cardinality causes storage and memory usage to spike.

The Most Common Mistakes in Practice

Setting unique values as index labels: Adding values like request_id, trace_id, or user_id as index labels in Promtail or Alloy configuration causes cardinality explosion. It is recommended to include these values in structured metadata or the log body and handle them in the pipeline.
Placing the parser at the very front of the pipeline: If a simple string filter (|= "ERROR") is not placed before | json, every log line unrelated to errors gets parsed before being discarded. Applying simple filters first to reduce the set of lines to be parsed is far more efficient.
Placing structured metadata filters after the parser: Writing | json | trace_id="abc" — with the metadata filter after the parser — means Bloom Filter acceleration does not apply and query speed degrades. Maintaining the order | trace_id="abc" | json is important.

Closing Thoughts

Three steps you can take right now:

In Grafana's Explore menu, enter {app="your-service"} | json and see what fields are extracted.
Copy a single line from your Nginx or application logs, then fill in <fieldName> at each field position to complete a | pattern template.
If you're using the OTel Collector, use the query {namespace="production"} | trace_id="request-id" to trace the full log flow for a specific request.

Next article: How to build a log-based SLO (Service Level Objective) dashboard in Grafana Loki using avg_over_time(... | unwrap ...) and quantile_over_time()

Core Concepts

The Two Axes of a LogQL Query: Stream Selector and Log Pipeline

Parser Types and Selection Criteria

Structured Metadata — The Third Label Tier

Practical Application

Basic Parsing Patterns

Example 1: Aggregating HTTP 500 Errors from JSON API Logs

Example 2: Parsing Nginx Access Logs with the pattern Parser

Example 3: Parsing Custom Legacy Logs with the regexp Parser

Advanced Usage

Example 4: Distributed Trace Correlation with OTel Structured Metadata

Example 5: Converting Numeric Data in Logs to Metrics with unwrap

Example 6: Normalizing Label Names with label_format

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

The Two Axes of a LogQL Query: Stream Selector and Log Pipeline

Parser Types and Selection Criteria

Structured Metadata — The Third Label Tier

Practical Application

Basic Parsing Patterns

Example 1: Aggregating HTTP 500 Errors from JSON API Logs

Example 2: Parsing Nginx Access Logs with the pattern Parser

Example 3: Parsing Custom Legacy Logs with the regexp Parser

Advanced Usage

Example 4: Distributed Trace Correlation with OTel Structured Metadata

Example 5: Converting Numeric Data in Logs to Metrics with unwrap

Example 6: Normalizing Label Names with label_format

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Building a P99 Latency & Error Rate SLO Dashboard — A Practical Guide to Grafana Loki LogQL

Automating Fast-burn/Slow-burn Alerts with Grafana SLO

Implementing SLO-as-Code with Terraform grafana_slo: A Step-by-Step GitOps Pipeline

Grafana Loki Practical Guide: Managing Kubernetes Logs Without ELK