Switching to UUIDv7 Cut PostgreSQL Index I/O by 312x — How It Reduces B-tree Page Splits by 500x

Honestly, I used to think UUID was just a matter of calling uuid4() once and calling it done. It's a 128-bit unique value either way, and it works fine in the DB. But one day I ran into a situation in production where the index size had grown larger than the table itself. Running VACUUM made no difference, and as I dug into the cause, I discovered the UUID insertion pattern was the problem. That naturally led me to UUIDv7, and when I reproduced the scenario myself, the results were striking.

In May 2024, the IETF officially standardized UUIDv7 through RFC 9562, and in 2025, PostgreSQL 18 shipped the uuidv7() function natively. There's a reason the ecosystem is moving this fast. Measured against 10 million rows on PostgreSQL 18 with NVMe SSD, switching to UUIDv7 reduces B-tree index page splits by up to 500x and cuts index I/O by 312x. Let's unpack why those numbers happen — and how you can apply this today.

Core Concepts

Two UUID versions, one number makes all the difference

UUIDv4 and UUIDv7 look similar on the surface, but their internal structures are completely different.

UUIDv4 (fully random):
550e8400-e29b-41d4-a716-446655440000
^^^^^^^^ ^^^^ ^^^^ ^^^^ ^^^^^^^^^^^^
all random (122 bits)
 
UUIDv7 (time-ordered):
018fbe61-9f6a-7000-a47b-5c7962ab7f22
^^^^^^^^ ^^^^
Unix timestamp (48 bits, milliseconds)  →  rest is random (74 bits)

The upper 48 bits of UUIDv7 are a Unix timestamp in milliseconds. This means values sort naturally in the order they were generated. Why does that matter? Because the B-tree — the default index structure in databases — determines performance based on where a new value is inserted.

B-trees hate random inserts

A B-tree index keeps values in sorted order. When a new record arrives, it finds the node where the value belongs and inserts it there. The problem occurs when that node is already full.

Page Split: When a B-tree leaf node is full, it splits into two nodes, each ending up half full.

UUIDv4 is fully random, so there's no predicting where in the index a newly inserted ID will land. The index constantly has to wedge new values into the middle of already-closed (full) nodes. UUIDv7, on the other hand, always appends new values to the rightmost end of the index. When a node fills up, you just create a new one to the right.

Item	UUIDv4	UUIDv7
Insert position	Random scatter across the entire index	Sequential append to the rightmost leaf
Page splits (per 1M rows)	5,000–10,000+	10–20
Leaf page fill rate	~79%	~97%
Index fragmentation	Severe	Minimal

The 79% leaf page fill rate is an average measured on PostgreSQL B-tree indexes under random insertion patterns, consistently observed in the Ardent Performance Computing UUID benchmark and multiple real-world measurements.

Fragmentation accumulates, then I/O explodes

Imagine millions of index leaf pages sitting at only 79% full. Reading the same data now requires reading more pages.

Buffer Pool: The area where PostgreSQL caches pages read from disk in memory. The more fragmented the index, the more pages must be loaded into the buffer pool, which also degrades memory efficiency.

As randomly scattered index pages multiply, the number of pages that must be loaded into the buffer pool for a single query grows explosively. That's exactly where the 312x figure comes from.

Practical Application

Example 1: 10-million-row bulk insert benchmark — seeing the numbers directly

Sometimes numbers are more convincing than explanations. When I first saw these results, I double-checked my math thinking I'd made an error.

sql

-- Test environment: PostgreSQL 18.0, Apple M2 MacBook Pro, NVMe SSD
 
-- Test setup
CREATE TABLE test_v4 (id UUID PRIMARY KEY, data TEXT);
CREATE TABLE test_v7 (id UUID PRIMARY KEY, data TEXT);
 
-- UUIDv4 bulk insert (takes ~7+ minutes)
INSERT INTO test_v4 (id, data)
SELECT gen_random_uuid(), md5(random()::text)
FROM generate_series(1, 10000000);
 
-- UUIDv7 bulk insert (takes ~3 minutes — ~2.3x faster)
INSERT INTO test_v7 (id, data)
SELECT uuidv7(), md5(random()::text)
FROM generate_series(1, 10000000);
 
-- Index I/O comparison (buffer hit count)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_v4 WHERE id = '550e8400-e29b-41d4-a716-446655440000';
-- Buffers: 8,562,960 hit (UUIDv4)
 
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_v7 WHERE id = '018fbe61-9f6a-7000-a47b-5c7962ab7f22';
-- Buffers: 27,332 hit (UUIDv7) — comparable to integer PK

8,562,960 ÷ 27,332 ≈ 313. That's the source of the "312x" figure. Even more impressive is that the UUIDv7 buffer hit count is virtually identical to an integer PK.

Example 2: Setting a UUIDv7 primary key in PostgreSQL 18

Starting with PostgreSQL 18, you can use the DB function directly without generating UUIDs at the application layer.

sql

-- PostgreSQL 18 and above
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT uuidv7(),
    user_id UUID NOT NULL,
    total_amount NUMERIC(10, 2),
    status VARCHAR(20)
);
 
-- No need to specify an ID on insert
INSERT INTO orders (user_id, total_amount, status)
VALUES ('018fbe61-9f6a-7000-a47b-5c7962ab7f22', 99.99, 'pending');
 
-- Extract the creation timestamp directly from the ID
SELECT id, uuid_extract_timestamp(id) AS created_at
FROM orders
ORDER BY id DESC
LIMIT 10;

uuid_extract_timestamp() is a new built-in function added in PostgreSQL 18. It lets you replace the pattern of maintaining a separate created_at column with just the ID. Of course, some teams prefer an explicit timestamp column for clarity — it's a matter of team convention.

Deep dive: To inspect index health in production, the pg_statio_user_indexes view is useful. It shows real-time buffer hit and miss ratios per index, making it easy to measure the effect of a UUID type switch in production with concrete numbers.

Example 3: Generating UUIDv7 at the application layer (PostgreSQL 17 and below, or MySQL)

If you're on a version before PostgreSQL 18, or using MySQL/MariaDB, you can generate UUIDs directly in the application.

typescript

// TypeScript — uuid package v9.0+
import { v7 as uuidv7 } from 'uuid';
 
// Usage with TypeORM
// Class property initialization is the simplest approach with TypeORM v0.3+.
// @BeforeInsert() and @Column({ default: () => ... }) also work,
// but this pattern explicitly assigns the ID at entity creation time.
@Entity()
export class Order {
  @PrimaryColumn('uuid')
  id: string = uuidv7();
 
  @Column()
  userId: string;
}

For Python, usage varies by version.

python

# Python 3.14+ — uuid7() included in the standard library (no external package needed)
import uuid
order_id = uuid.uuid7()

python

# Python 3.13 and below — use a third-party library
# pip install uuid7
from uuid7 import uuid7
order_id = uuid7()

// Go — github.com/google/uuid
import "github.com/google/uuid"
 
orderID, err := uuid.NewV7()
if err != nil {
    return err
}

Pros and Cons

Advantages

Item	Detail
Index insert performance	Sequential inserts minimize page splits — up to 500x fewer splits than UUIDv4
Storage space	Leaf page fill rate 97% (vs UUIDv4's 79%) → ~20% reduction in index size
Read I/O	Buffer hit count at 1/312 of UUIDv4 for 10 million rows
Time ordering	Generation order is derivable from the ID alone; `ORDER BY id` equals chronological order
Reduced WAL overhead	Sequential inserts shrink WAL size, easing replication and backup load
Debugging convenience	Approximate creation time can be inferred from the ID
Distributed uniqueness	Effectively zero collision probability anywhere in the world without central coordination

WAL (Write-Ahead Log): The log PostgreSQL writes changes to before flushing them to data files. More random page updates mean a larger WAL, which increases the burden on replication and backup.

Disadvantages and Caveats

Item	Detail	Mitigation
Timing exposure	Creation time is embedded in the ID, potentially leaking information	Keep UUIDv4 for security-sensitive domains like user IDs and session tokens
Distributed node ordering	Global ordering across nodes is not guaranteed within the same millisecond	Within a single node, the rand_a counter guarantees monotonic increase
Legacy migration	Switching an existing UUIDv4 system requires index reconstruction	`REINDEX CONCURRENTLY` allows zero-downtime reconstruction
.NET caveats	Some implementations have been reported to not fully guarantee RFC 9562 ordering	Verify RFC 9562 compliance directly, or use `UUIDNext`

The distributed node ordering issue is something practitioners often overlook, so let me elaborate. The rand_a field (12 bits) in UUIDv7 acts as a counter to ensure UUIDs generated within the same millisecond on a single node increase monotonically. Within a single node, this guarantees ordering even at sub-millisecond granularity. However, if server A and server B both generate a UUID at the exact same millisecond, their rand_a counters operate independently, so the relative ordering between the two values is left to chance. For most web services this difference is negligible, but it's worth considering in distributed systems where strict event ordering matters — audit logs, financial transactions, and the like.

The most common mistakes in practice

Applying UUIDv7 to user IDs as well — The creation time embedded in the ID can expose account registration timing to the outside world. It's worth first categorizing your identifiers: domain entities like order IDs and product IDs where timing exposure is harmless, versus security-sensitive identifiers like user IDs, session tokens, and auth codes.
Changing the column type without migrating existing data — If you leave existing UUIDv4 values in place and only generate new records as UUIDv7, the index becomes a mixed pattern and the performance gains are cut in half. Migrating requires rebuilding the index via REINDEX CONCURRENTLY. REINDEX CONCURRENTLY is a PostgreSQL command that rebuilds an index without taking the service offline.
Not verifying a library's RFC compliance — I used to just pick the top search result on npm without checking, until a teammate filed an issue about unexpected ordering in a .NET environment, and I developed the habit of verifying RFC 9562 compliance. In particular, some UUIDv7 implementations in .NET that are based on Guid.NewGuid() have been reported to not guarantee sort order as specified — worth checking when choosing a library.

Closing Thoughts

For domain entity identifiers specifically, UUIDv7 is the most practical choice available today — one that preserves distributed uniqueness while bringing B-tree index performance up to the level of integer PKs.

Here are three steps you can take right now.

Start by categorizing the UUID usage in your current project — separate domain entities like order IDs and product IDs where timing exposure is acceptable from security-sensitive identifiers like session tokens and user IDs.
Apply UUIDv7 to new tables first — On PostgreSQL 18, DEFAULT uuidv7() works out of the box. On earlier versions, you can start by generating at the application layer using the uuid npm package v9.0+, the Python 3.14+ standard library (uuid7 package for earlier versions), or Go's github.com/google/uuid.
For existing tables, measure current index I/O first with EXPLAIN (ANALYZE, BUFFERS) — Comparing buffer hit counts before and after the switch lets you verify the performance improvement with concrete numbers.

References

#UUIDv7#PostgreSQL#B-tree인덱스#데이터베이스최적화#UUID#인덱스단편화#TypeScript#TypeORM#분산시스템#RFC9562

Switching to UUIDv7 Cut PostgreSQL Index I/O by 312x — How It Reduces B-tree Page Splits by 500x | DEV BAK - 기술블로그

Database

Switching to UUIDv7 Cut PostgreSQL Index I/O by 312x — How It Reduces B-tree Page Splits by 500x

Core Concepts

Two UUID versions, one number makes all the difference

UUIDv4 and UUIDv7 look similar on the surface, but their internal structures are completely different.

UUIDv4 (fully random):
550e8400-e29b-41d4-a716-446655440000
^^^^^^^^ ^^^^ ^^^^ ^^^^ ^^^^^^^^^^^^
all random (122 bits)
 
UUIDv7 (time-ordered):
018fbe61-9f6a-7000-a47b-5c7962ab7f22
^^^^^^^^ ^^^^
Unix timestamp (48 bits, milliseconds)  →  rest is random (74 bits)

B-trees hate random inserts

A B-tree index keeps values in sorted order. When a new record arrives, it finds the node where the value belongs and inserts it there. The problem occurs when that node is already full.

Page Split: When a B-tree leaf node is full, it splits into two nodes, each ending up half full.

Item	UUIDv4	UUIDv7
Insert position	Random scatter across the entire index	Sequential append to the rightmost leaf
Page splits (per 1M rows)	5,000–10,000+	10–20
Leaf page fill rate	~79%	~97%
Index fragmentation	Severe	Minimal

Fragmentation accumulates, then I/O explodes

Imagine millions of index leaf pages sitting at only 79% full. Reading the same data now requires reading more pages.

Buffer Pool: The area where PostgreSQL caches pages read from disk in memory. The more fragmented the index, the more pages must be loaded into the buffer pool, which also degrades memory efficiency.

As randomly scattered index pages multiply, the number of pages that must be loaded into the buffer pool for a single query grows explosively. That's exactly where the 312x figure comes from.

Practical Application

Example 1: 10-million-row bulk insert benchmark — seeing the numbers directly

Sometimes numbers are more convincing than explanations. When I first saw these results, I double-checked my math thinking I'd made an error.

sql

-- Test environment: PostgreSQL 18.0, Apple M2 MacBook Pro, NVMe SSD
 
-- Test setup
CREATE TABLE test_v4 (id UUID PRIMARY KEY, data TEXT);
CREATE TABLE test_v7 (id UUID PRIMARY KEY, data TEXT);
 
-- UUIDv4 bulk insert (takes ~7+ minutes)
INSERT INTO test_v4 (id, data)
SELECT gen_random_uuid(), md5(random()::text)
FROM generate_series(1, 10000000);
 
-- UUIDv7 bulk insert (takes ~3 minutes — ~2.3x faster)
INSERT INTO test_v7 (id, data)
SELECT uuidv7(), md5(random()::text)
FROM generate_series(1, 10000000);
 
-- Index I/O comparison (buffer hit count)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_v4 WHERE id = '550e8400-e29b-41d4-a716-446655440000';
-- Buffers: 8,562,960 hit (UUIDv4)
 
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_v7 WHERE id = '018fbe61-9f6a-7000-a47b-5c7962ab7f22';
-- Buffers: 27,332 hit (UUIDv7) — comparable to integer PK

8,562,960 ÷ 27,332 ≈ 313. That's the source of the "312x" figure. Even more impressive is that the UUIDv7 buffer hit count is virtually identical to an integer PK.

Example 2: Setting a UUIDv7 primary key in PostgreSQL 18

Starting with PostgreSQL 18, you can use the DB function directly without generating UUIDs at the application layer.

sql

-- PostgreSQL 18 and above
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT uuidv7(),
    user_id UUID NOT NULL,
    total_amount NUMERIC(10, 2),
    status VARCHAR(20)
);
 
-- No need to specify an ID on insert
INSERT INTO orders (user_id, total_amount, status)
VALUES ('018fbe61-9f6a-7000-a47b-5c7962ab7f22', 99.99, 'pending');
 
-- Extract the creation timestamp directly from the ID
SELECT id, uuid_extract_timestamp(id) AS created_at
FROM orders
ORDER BY id DESC
LIMIT 10;

Deep dive: To inspect index health in production, the pg_statio_user_indexes view is useful. It shows real-time buffer hit and miss ratios per index, making it easy to measure the effect of a UUID type switch in production with concrete numbers.

Example 3: Generating UUIDv7 at the application layer (PostgreSQL 17 and below, or MySQL)

If you're on a version before PostgreSQL 18, or using MySQL/MariaDB, you can generate UUIDs directly in the application.

typescript

// TypeScript — uuid package v9.0+
import { v7 as uuidv7 } from 'uuid';
 
// Usage with TypeORM
// Class property initialization is the simplest approach with TypeORM v0.3+.
// @BeforeInsert() and @Column({ default: () => ... }) also work,
// but this pattern explicitly assigns the ID at entity creation time.
@Entity()
export class Order {
  @PrimaryColumn('uuid')
  id: string = uuidv7();
 
  @Column()
  userId: string;
}

For Python, usage varies by version.

python

# Python 3.14+ — uuid7() included in the standard library (no external package needed)
import uuid
order_id = uuid.uuid7()

python

# Python 3.13 and below — use a third-party library
# pip install uuid7
from uuid7 import uuid7
order_id = uuid7()

// Go — github.com/google/uuid
import "github.com/google/uuid"
 
orderID, err := uuid.NewV7()
if err != nil {
    return err
}

Pros and Cons

Advantages

Item	Detail
Index insert performance	Sequential inserts minimize page splits — up to 500x fewer splits than UUIDv4
Storage space	Leaf page fill rate 97% (vs UUIDv4's 79%) → ~20% reduction in index size
Read I/O	Buffer hit count at 1/312 of UUIDv4 for 10 million rows
Time ordering	Generation order is derivable from the ID alone; `ORDER BY id` equals chronological order
Reduced WAL overhead	Sequential inserts shrink WAL size, easing replication and backup load
Debugging convenience	Approximate creation time can be inferred from the ID
Distributed uniqueness	Effectively zero collision probability anywhere in the world without central coordination

WAL (Write-Ahead Log): The log PostgreSQL writes changes to before flushing them to data files. More random page updates mean a larger WAL, which increases the burden on replication and backup.

Disadvantages and Caveats

Item	Detail	Mitigation
Timing exposure	Creation time is embedded in the ID, potentially leaking information	Keep UUIDv4 for security-sensitive domains like user IDs and session tokens
Distributed node ordering	Global ordering across nodes is not guaranteed within the same millisecond	Within a single node, the rand_a counter guarantees monotonic increase
Legacy migration	Switching an existing UUIDv4 system requires index reconstruction	`REINDEX CONCURRENTLY` allows zero-downtime reconstruction
.NET caveats	Some implementations have been reported to not fully guarantee RFC 9562 ordering	Verify RFC 9562 compliance directly, or use `UUIDNext`

The most common mistakes in practice

Applying UUIDv7 to user IDs as well — The creation time embedded in the ID can expose account registration timing to the outside world. It's worth first categorizing your identifiers: domain entities like order IDs and product IDs where timing exposure is harmless, versus security-sensitive identifiers like user IDs, session tokens, and auth codes.
Changing the column type without migrating existing data — If you leave existing UUIDv4 values in place and only generate new records as UUIDv7, the index becomes a mixed pattern and the performance gains are cut in half. Migrating requires rebuilding the index via REINDEX CONCURRENTLY. REINDEX CONCURRENTLY is a PostgreSQL command that rebuilds an index without taking the service offline.
Not verifying a library's RFC compliance — I used to just pick the top search result on npm without checking, until a teammate filed an issue about unexpected ordering in a .NET environment, and I developed the habit of verifying RFC 9562 compliance. In particular, some UUIDv7 implementations in .NET that are based on Guid.NewGuid() have been reported to not guarantee sort order as specified — worth checking when choosing a library.

Closing Thoughts

Here are three steps you can take right now.

Start by categorizing the UUID usage in your current project — separate domain entities like order IDs and product IDs where timing exposure is acceptable from security-sensitive identifiers like session tokens and user IDs.
Apply UUIDv7 to new tables first — On PostgreSQL 18, DEFAULT uuidv7() works out of the box. On earlier versions, you can start by generating at the application layer using the uuid npm package v9.0+, the Python 3.14+ standard library (uuid7 package for earlier versions), or Go's github.com/google/uuid.
For existing tables, measure current index I/O first with EXPLAIN (ANALYZE, BUFFERS) — Comparing buffer hit counts before and after the switch lets you verify the performance improvement with concrete numbers.

References

#UUIDv7#PostgreSQL#B-tree인덱스#데이터베이스최적화#UUID#인덱스단편화#TypeScript#TypeORM#분산시스템#RFC9562

Core Concepts

Two UUID versions, one number makes all the difference

B-trees hate random inserts

Fragmentation accumulates, then I/O explodes

Practical Application

Example 1: 10-million-row bulk insert benchmark — seeing the numbers directly

Example 2: Setting a UUIDv7 primary key in PostgreSQL 18

Example 3: Generating UUIDv7 at the application layer (PostgreSQL 17 and below, or MySQL)

Pros and Cons

Advantages

Disadvantages and Caveats

The most common mistakes in practice

Closing Thoughts

References

Core Concepts

Two UUID versions, one number makes all the difference

B-trees hate random inserts

Fragmentation accumulates, then I/O explodes

Practical Application

Example 1: 10-million-row bulk insert benchmark — seeing the numbers directly

Example 2: Setting a UUIDv7 primary key in PostgreSQL 18

Example 3: Generating UUIDv7 at the application layer (PostgreSQL 17 and below, or MySQL)

Pros and Cons

Advantages

Disadvantages and Caveats

The most common mistakes in practice

Closing Thoughts

References

Recommended Posts

ULID vs UUIDv7: The Standard, Encoding, and DB Compatibility Differences That Determine Your Primary Key Choice

How PostgreSQL 18's `uuidv7()` and `uuid_extract_timestamp()` Speed Up Time-Series Queries by 44% Without `created_at`

When Does pgvector Hit Its Performance Limits — pgvector vs VectorChord Benchmark Comparison

PostgreSQL 18 VACUUM and Asynchronous I/O (AIO): Reducing Bloat in Large Tables with Autovacuum Tuning

PostgreSQL 18 Asynchronous I/O (AIO): Why Cold Cache Sequential Scans Are 3× Faster and Benchmark Results