Switching to UUIDv7 Cut PostgreSQL Index I/O by 312x — How It Reduces B-tree Page Splits by 500x
Honestly, I used to think UUID was just a matter of calling uuid4() once and calling it done. It's a 128-bit unique value either way, and it works fine in the DB. But one day I ran into a situation in production where the index size had grown larger than the table itself. Running VACUUM made no difference, and as I dug into the cause, I discovered the UUID insertion pattern was the problem. That naturally led me to UUIDv7, and when I reproduced the scenario myself, the results were striking.
In May 2024, the IETF officially standardized UUIDv7 through RFC 9562, and in 2025, PostgreSQL 18 shipped the uuidv7() function natively. There's a reason the ecosystem is moving this fast. Measured against 10 million rows on PostgreSQL 18 with NVMe SSD, switching to UUIDv7 reduces B-tree index page splits by up to 500x and cuts index I/O by 312x. Let's unpack why those numbers happen — and how you can apply this today.
Core Concepts
Two UUID versions, one number makes all the difference
UUIDv4 and UUIDv7 look similar on the surface, but their internal structures are completely different.
UUIDv4 (fully random):
550e8400-e29b-41d4-a716-446655440000
^^^^^^^^ ^^^^ ^^^^ ^^^^ ^^^^^^^^^^^^
all random (122 bits)
UUIDv7 (time-ordered):
018fbe61-9f6a-7000-a47b-5c7962ab7f22
^^^^^^^^ ^^^^
Unix timestamp (48 bits, milliseconds) → rest is random (74 bits)The upper 48 bits of UUIDv7 are a Unix timestamp in milliseconds. This means values sort naturally in the order they were generated. Why does that matter? Because the B-tree — the default index structure in databases — determines performance based on where a new value is inserted.
B-trees hate random inserts
A B-tree index keeps values in sorted order. When a new record arrives, it finds the node where the value belongs and inserts it there. The problem occurs when that node is already full.
Page Split: When a B-tree leaf node is full, it splits into two nodes, each ending up half full.
UUIDv4 is fully random, so there's no predicting where in the index a newly inserted ID will land. The index constantly has to wedge new values into the middle of already-closed (full) nodes. UUIDv7, on the other hand, always appends new values to the rightmost end of the index. When a node fills up, you just create a new one to the right.
| Item | UUIDv4 | UUIDv7 |
|---|---|---|
| Insert position | Random scatter across the entire index | Sequential append to the rightmost leaf |
| Page splits (per 1M rows) | 5,000–10,000+ | 10–20 |
| Leaf page fill rate | ~79% | ~97% |
| Index fragmentation | Severe | Minimal |
The 79% leaf page fill rate is an average measured on PostgreSQL B-tree indexes under random insertion patterns, consistently observed in the Ardent Performance Computing UUID benchmark and multiple real-world measurements.
Fragmentation accumulates, then I/O explodes
Imagine millions of index leaf pages sitting at only 79% full. Reading the same data now requires reading more pages.
Buffer Pool: The area where PostgreSQL caches pages read from disk in memory. The more fragmented the index, the more pages must be loaded into the buffer pool, which also degrades memory efficiency.
As randomly scattered index pages multiply, the number of pages that must be loaded into the buffer pool for a single query grows explosively. That's exactly where the 312x figure comes from.
Practical Application
Example 1: 10-million-row bulk insert benchmark — seeing the numbers directly
Sometimes numbers are more convincing than explanations. When I first saw these results, I double-checked my math thinking I'd made an error.
-- Test environment: PostgreSQL 18.0, Apple M2 MacBook Pro, NVMe SSD
-- Test setup
CREATE TABLE test_v4 (id UUID PRIMARY KEY, data TEXT);
CREATE TABLE test_v7 (id UUID PRIMARY KEY, data TEXT);
-- UUIDv4 bulk insert (takes ~7+ minutes)
INSERT INTO test_v4 (id, data)
SELECT gen_random_uuid(), md5(random()::text)
FROM generate_series(1, 10000000);
-- UUIDv7 bulk insert (takes ~3 minutes — ~2.3x faster)
INSERT INTO test_v7 (id, data)
SELECT uuidv7(), md5(random()::text)
FROM generate_series(1, 10000000);
-- Index I/O comparison (buffer hit count)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_v4 WHERE id = '550e8400-e29b-41d4-a716-446655440000';
-- Buffers: 8,562,960 hit (UUIDv4)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_v7 WHERE id = '018fbe61-9f6a-7000-a47b-5c7962ab7f22';
-- Buffers: 27,332 hit (UUIDv7) — comparable to integer PK8,562,960 ÷ 27,332 ≈ 313. That's the source of the "312x" figure. Even more impressive is that the UUIDv7 buffer hit count is virtually identical to an integer PK.
Example 2: Setting a UUIDv7 primary key in PostgreSQL 18
Starting with PostgreSQL 18, you can use the DB function directly without generating UUIDs at the application layer.
-- PostgreSQL 18 and above
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT uuidv7(),
user_id UUID NOT NULL,
total_amount NUMERIC(10, 2),
status VARCHAR(20)
);
-- No need to specify an ID on insert
INSERT INTO orders (user_id, total_amount, status)
VALUES ('018fbe61-9f6a-7000-a47b-5c7962ab7f22', 99.99, 'pending');
-- Extract the creation timestamp directly from the ID
SELECT id, uuid_extract_timestamp(id) AS created_at
FROM orders
ORDER BY id DESC
LIMIT 10;uuid_extract_timestamp() is a new built-in function added in PostgreSQL 18. It lets you replace the pattern of maintaining a separate created_at column with just the ID. Of course, some teams prefer an explicit timestamp column for clarity — it's a matter of team convention.
Deep dive: To inspect index health in production, the
pg_statio_user_indexesview is useful. It shows real-time buffer hit and miss ratios per index, making it easy to measure the effect of a UUID type switch in production with concrete numbers.
Example 3: Generating UUIDv7 at the application layer (PostgreSQL 17 and below, or MySQL)
If you're on a version before PostgreSQL 18, or using MySQL/MariaDB, you can generate UUIDs directly in the application.
// TypeScript — uuid package v9.0+
import { v7 as uuidv7 } from 'uuid';
// Usage with TypeORM
// Class property initialization is the simplest approach with TypeORM v0.3+.
// @BeforeInsert() and @Column({ default: () => ... }) also work,
// but this pattern explicitly assigns the ID at entity creation time.
@Entity()
export class Order {
@PrimaryColumn('uuid')
id: string = uuidv7();
@Column()
userId: string;
}For Python, usage varies by version.
# Python 3.14+ — uuid7() included in the standard library (no external package needed)
import uuid
order_id = uuid.uuid7()# Python 3.13 and below — use a third-party library
# pip install uuid7
from uuid7 import uuid7
order_id = uuid7()// Go — github.com/google/uuid
import "github.com/google/uuid"
orderID, err := uuid.NewV7()
if err != nil {
return err
}Pros and Cons
Advantages
| Item | Detail |
|---|---|
| Index insert performance | Sequential inserts minimize page splits — up to 500x fewer splits than UUIDv4 |
| Storage space | Leaf page fill rate 97% (vs UUIDv4's 79%) → ~20% reduction in index size |
| Read I/O | Buffer hit count at 1/312 of UUIDv4 for 10 million rows |
| Time ordering | Generation order is derivable from the ID alone; ORDER BY id equals chronological order |
| Reduced WAL overhead | Sequential inserts shrink WAL size, easing replication and backup load |
| Debugging convenience | Approximate creation time can be inferred from the ID |
| Distributed uniqueness | Effectively zero collision probability anywhere in the world without central coordination |
WAL (Write-Ahead Log): The log PostgreSQL writes changes to before flushing them to data files. More random page updates mean a larger WAL, which increases the burden on replication and backup.
Disadvantages and Caveats
| Item | Detail | Mitigation |
|---|---|---|
| Timing exposure | Creation time is embedded in the ID, potentially leaking information | Keep UUIDv4 for security-sensitive domains like user IDs and session tokens |
| Distributed node ordering | Global ordering across nodes is not guaranteed within the same millisecond | Within a single node, the rand_a counter guarantees monotonic increase |
| Legacy migration | Switching an existing UUIDv4 system requires index reconstruction | REINDEX CONCURRENTLY allows zero-downtime reconstruction |
| .NET caveats | Some implementations have been reported to not fully guarantee RFC 9562 ordering | Verify RFC 9562 compliance directly, or use UUIDNext |
The distributed node ordering issue is something practitioners often overlook, so let me elaborate. The rand_a field (12 bits) in UUIDv7 acts as a counter to ensure UUIDs generated within the same millisecond on a single node increase monotonically. Within a single node, this guarantees ordering even at sub-millisecond granularity. However, if server A and server B both generate a UUID at the exact same millisecond, their rand_a counters operate independently, so the relative ordering between the two values is left to chance. For most web services this difference is negligible, but it's worth considering in distributed systems where strict event ordering matters — audit logs, financial transactions, and the like.
The most common mistakes in practice
-
Applying UUIDv7 to user IDs as well — The creation time embedded in the ID can expose account registration timing to the outside world. It's worth first categorizing your identifiers: domain entities like order IDs and product IDs where timing exposure is harmless, versus security-sensitive identifiers like user IDs, session tokens, and auth codes.
-
Changing the column type without migrating existing data — If you leave existing UUIDv4 values in place and only generate new records as UUIDv7, the index becomes a mixed pattern and the performance gains are cut in half. Migrating requires rebuilding the index via
REINDEX CONCURRENTLY.REINDEX CONCURRENTLYis a PostgreSQL command that rebuilds an index without taking the service offline. -
Not verifying a library's RFC compliance — I used to just pick the top search result on npm without checking, until a teammate filed an issue about unexpected ordering in a .NET environment, and I developed the habit of verifying RFC 9562 compliance. In particular, some UUIDv7 implementations in .NET that are based on
Guid.NewGuid()have been reported to not guarantee sort order as specified — worth checking when choosing a library.
Closing Thoughts
For domain entity identifiers specifically, UUIDv7 is the most practical choice available today — one that preserves distributed uniqueness while bringing B-tree index performance up to the level of integer PKs.
Here are three steps you can take right now.
-
Start by categorizing the UUID usage in your current project — separate domain entities like order IDs and product IDs where timing exposure is acceptable from security-sensitive identifiers like session tokens and user IDs.
-
Apply UUIDv7 to new tables first — On PostgreSQL 18,
DEFAULT uuidv7()works out of the box. On earlier versions, you can start by generating at the application layer using theuuidnpm package v9.0+, the Python 3.14+ standard library (uuid7package for earlier versions), or Go'sgithub.com/google/uuid. -
For existing tables, measure current index I/O first with
EXPLAIN (ANALYZE, BUFFERS)— Comparing buffer hit counts before and after the switch lets you verify the performance improvement with concrete numbers.
References
- RFC 9562: Universally Unique IDentifiers (UUIDs) — Official IETF standard document
- UUID v7 in PostgreSQL 18 | Better Stack Community
- PostgreSQL UUID: Bulk insert with UUIDv7 vs UUIDv4 - DEV Community
- PostgreSQL UUID Performance: Benchmarking Random (v4) and Time-based (v7) UUIDs - DEV Community
- Why Random UUIDs (v4) Kill Database Performance: A Deep Dive into B-tree Index Issues
- UUID Benchmark War | Ardent Performance Computing
- UUIDv7 Comes to PostgreSQL 18 | Nile
- A Comparative Analysis of Identifier Schemes: UUIDv4, UUIDv7, and ULID for Distributed Systems (arXiv)
- Goodbye to sequential integers, hello UUIDv7! | Buildkite
- UUIDv7 in 33 languages | antonz.org
- UUIDv7 vs UUIDv8 (RFC 9562): production notes — .NET pitfalls