PostgreSQL 18 VACUUM and Asynchronous I/O (AIO): Reducing Bloat in Large Tables with Autovacuum Tuning
If you're running a production PostgreSQL database and suddenly queries are slowing down, disk usage keeps climbing, but the actual data hasn't grown nearly as much — it's almost certainly a bloat problem. I spent a long time puzzling over "why is this table so huge?" myself, but it's a natural consequence of PostgreSQL's MVCC architecture, and it can be kept under control with the right configuration.
PostgreSQL 18, officially released in September 2025, introduces a new asynchronous I/O (AIO) subsystem that delivers 2–3× throughput improvements on cold-cache sequential scans in cloud block storage environments. Since VACUUM benefits directly from AIO, dead tuples can be reclaimed far more quickly before bloat has a chance to accumulate. This article covers everything from how PG 18 AIO accelerates VACUUM, to autovacuum tuning settings and bloat removal strategies you can apply in production today — in order: diagnose → tune → monitor → remove.
Core Concepts
Why PostgreSQL Creates Bloat — The Structural Nature of MVCC
PostgreSQL does not physically delete existing rows immediately when an UPDATE or DELETE occurs. To guarantee transaction isolation, it simply marks the row as "no longer valid" (a dead tuple) while the space remains occupied. When these dead tuples accumulate, the table file grows larger than necessary — this is called bloat.
MVCC (Multi-Version Concurrency Control): A concurrency control mechanism that maintains multiple versions of data simultaneously so that reads and writes do not block each other. Adopted by most modern RDBMSes including PostgreSQL, MySQL InnoDB, and Oracle.
VACUUM reclaims these dead tuples so that subsequent INSERTs can reuse the freed space. However, a regular VACUUM does not return space to the OS — it only marks it as reusable internally — so the file size itself remains unchanged. Tools like VACUUM FULL or pg_repack are needed to return space to the OS.
The practical impact of bloat is more serious than you might expect:
- Sequential scans must read all the useless pages too
- Indexes also grow inefficiently by retaining pointers to dead tuples
shared_bufferscache efficiency degrades
PostgreSQL 18 Asynchronous I/O (AIO) Architecture
Prior to PostgreSQL 18 (version 17 and below), I/O was handled synchronously: issue a read request → wait for completion → issue the next request. The higher the storage latency, the more inefficient this became — a problem that was especially pronounced with cloud network storage (EBS, Azure Managed Disk, GCP PD).
PG 18 supports three I/O modes via the io_method parameter:
| io_method | Behavior | Notes |
|---|---|---|
sync |
Legacy synchronous mode (same as PG 17 and below) | For backward compatibility |
worker |
Background I/O worker process queue-based | Default |
io_uring |
Direct use of Linux kernel ring buffer | Linux 5.1+, best performance |
worker mode works by having backend processes register read requests in a shared memory queue, which separate I/O worker processes handle asynchronously, returning results to a completion queue. Request submission and completion waiting overlap, increasing overall throughput.
io_uring mode goes a step further — there are no worker processes at all; PostgreSQL and the Linux kernel share a ring buffer (a memory region shared between kernel and user space) directly. This architecture significantly reduces the number of kernel-mode ↔ user-mode transitions (context switches), yielding better performance than worker mode on NVMe SSDs and cloud storage.
io_uring: A high-performance asynchronous I/O interface introduced in Linux 5.1. By sharing a ring buffer between the kernel and user space, it reduces the number of system calls themselves. However, seccomp (security filter based on an allowed list of system calls) or AppArmor (access control framework) policies in container environments may block io_uring syscalls, so prior verification is required in containerized environments.
In PG 18, the default value of effective_io_concurrency has also been raised from 1 to 16. This value controls the number of parallel read requests that can be queued simultaneously in worker/io_uring mode, and affects all asynchronous I/O operations including VACUUM.
How AIO Works Within VACUUM
With the legacy synchronous approach, VACUUM scanned an entire table by reading one heap page → waiting for completion → reading the next page → waiting for completion.
In PG 18, based on the maintenance_io_concurrency setting (the number of concurrent I/O operations for maintenance tasks, default 16), multiple page read requests are queued simultaneously, and the next requests proceed concurrently while already-fetched pages are being processed. This "pipelining" effect is the key driver of VACUUM's speed improvement. io_combine_limit is the parameter that caps the maximum size when merging multiple I/O requests into a single larger request during this process (default 128kB).
Practical Application
Example 1: Assessing Current Bloat — Which Tables to Address First
Honestly, my first instinct was "can't I just crank up the global autovacuum settings?" — but since update patterns vary so much per table, per-table configuration turned out to be far more effective. Before tuning, the first step is identifying which tables are the problem.
-- List tables with a high dead tuple ratio
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
ROUND(
n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100,
2
) AS dead_ratio,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_ratio DESC;If you see tables with a dead_ratio above 20%, it's time to tune autovacuum. Above 30–40%, it's worth considering pg_repack.
For a more precise analysis, you can use the pgstattuple extension:
-- Install pgstattuple (contrib extension)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
-- Detailed bloat analysis for the orders table
SELECT * FROM pgstattuple('orders');Looking at dead_tuple_percent and tuple_percent (the live tuple ratio) together in the output lets you quantify the severity of bloat.
Caution:
pgstattupleperforms a full sequential scan of the table. Running it on a table with hundreds of millions of rows during peak hours can cause load spikes — it is recommended to run it once a week during off-peak hours.
Example 2: Autovacuum Tuning for Large Tables
The PG default (autovacuum_vacuum_scale_factor = 0.2) means that on a table with 1 million rows, as many as 200,000 dead tuples must accumulate before vacuum is triggered. When autovacuum fires is determined by the following formula:
trigger condition = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × n_live_tupWith the default threshold (50) + 0.2 × 1,000,000 = 200,050 dead tuples required to trigger vacuum, bloat accumulates significantly. This is excessively conservative for modern NVMe SSD environments.
Global settings can be changed in postgresql.conf:
# postgresql.conf — recommended settings for SSD environments
autovacuum_vacuum_cost_delay = 2 # default 2ms; 0–2ms on SSD
autovacuum_vacuum_cost_limit = 3000 # default 200 → 3000 recommended for SSD
autovacuum_max_workers = 5 # default 3 → increase for more parallelism
autovacuum_vacuum_scale_factor = 0.05 # default 0.2 → lower to 5%However, a far more effective approach than global settings is applying per-table configuration to frequently updated tables. These take effect immediately without a server restart:
-- Apply per-table settings to a frequently updated table (e.g., orders)
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- trigger vacuum immediately at 1% dead tuples
autovacuum_vacuum_threshold = 1000, -- only when at least 1000 (added to scale_factor)
autovacuum_vacuum_cost_delay = 0, -- critical table: no I/O throttling delay
autovacuum_analyze_scale_factor = 0.005 -- also refresh statistics more frequently
);
-- Verify the settings
SELECT relname, reloptions
FROM pg_class
WHERE relname = 'orders';autovacuum_vacuum_threshold = 1000 fills the threshold slot in the formula above. The structure is "run when 1000 + 0.01 × live_tup or more dead tuples exist," so vacuum will not trigger unless there are at least 1,000 dead tuples regardless of row count. This prevents excessive vacuuming on small tables while making large tables react much faster.
Example 3: PG 18 AIO Configuration and VACUUM I/O Monitoring
You can start by checking the current io_method in your environment:
-- Check current io_method
SHOW io_method;
-- Check effective_io_concurrency (default 16)
SHOW effective_io_concurrency;The configuration to enable io_uring in postgresql.conf looks like this:
# postgresql.conf
# Enable io_uring on Linux 5.1+ environments
io_method = io_uring
# Number of concurrent I/O operations for maintenance tasks like VACUUM (controls AIO pipelining)
maintenance_io_concurrency = 16
# Number of prefetch read requests (only meaningful in worker/io_uring mode)
effective_io_concurrency = 16
# Maximum size limit for combined I/O requests
io_combine_limit = 128kBThe new pg_aios view added in PG 18 lets you query the asynchronous I/O pipeline status in real time:
-- Real-time AIO pipeline query (new view in PG 18)
SELECT * FROM pg_aios;pg_aios shows a list of asynchronous I/O requests currently being processed, including I/O worker ID, request type (read/write), processing mode, and target file information. If you query this from another session while VACUUM is running on a table with hundreds of millions of rows, you'll see output like this:
worker_id | handle_type | mode | pending | ...
-----------+-------------+------+---------+----
1 | read | aio | t | ...
2 | read | aio | t | ...
3 | read | aio | t | ...On small tables, VACUUM may finish so quickly that pg_aios always appears empty. This simply means AIO is processing that efficiently — you need a truly large table with hundreds of millions of rows to observe the pipelining state in action.
| Parameter | Default | Role |
|---|---|---|
io_method |
worker |
I/O mode selection (sync/worker/io_uring) |
effective_io_concurrency |
16 |
Number of concurrent parallel read requests |
maintenance_io_concurrency |
16 |
Number of concurrent I/O operations for maintenance tasks like VACUUM |
io_combine_limit |
128kB |
Maximum size limit for combined I/O requests |
Example 4: Online Bloat Removal with pg_repack
Even after autovacuum reclaims dead tuples, the file size does not shrink. Returning disk space to the OS requires rewriting the table from scratch, but VACUUM FULL requires an exclusive lock, making it impractical for 24/7 production environments. This is where pg_repack comes to the rescue.
# Install pg_repack (Ubuntu/Debian example)
# Install the version matching your running PostgreSQL major version
apt-get install postgresql-18-repack
# Online table reorganization without an exclusive lock
pg_repack -h localhost -U postgres -d mydb -t orders
# Reorganize only indexes (leave the table as-is)
pg_repack -h localhost -U postgres -d mydb --index orders_pkey
# Reorganize the entire database (may take a long time)
pg_repack -h localhost -U postgres -d mydbUnderstanding how pg_repack works makes it easier to use with confidence:
- Start building a new copy while keeping the original table in service
- Record any changes that occur during the copy into a separate log table
- Apply the log to the new table once copying is complete
- Swap old and new tables with a brief lock of under one second
- Rebuild existing indexes concurrently as well
The Most Common Mistakes in Practice
-
Lowering only the global scale_factor without per-table settings: Even if you lower the global value to 0.05, small tables may get vacuumed far too often, saturating workers. Applying per-table settings via
ALTER TABLEto your critical tables is far more precise. -
Raising cost_limit too aggressively in one shot: I once jumped cost_limit from 200 to 3000 in a single change and watched overnight batch jobs slow down by 30 seconds each. It's recommended to increase it gradually — 500 → 1000 → 2000 → 3000 — while monitoring query latency at each step.
-
Tuning without monitoring bloat: Changing autovacuum settings is meaningless if you never verify whether they're having an effect. Periodically checking the
last_autovacuumtimestamp and the trend inn_dead_tupwill make the before-and-after difference immediately visible.
Pros and Cons
Advantages
The most direct effect of introducing AIO is faster VACUUM. According to PostgreSQL HTX benchmarks, 2–3× throughput improvements on cold-cache sequential scans have been reported in cloud block storage environments. Even on local NVMe, improvements of 20–35% have been observed. The reason the effect is more pronounced on network storage like EBS, Azure Managed Disk, and GCP PD is that higher latency means greater gains from asynchronous pipelining.
Faster VACUUM naturally suppresses dead tuple accumulation, and the pg_aios view enables real-time visibility into the I/O pipeline — improving observability into VACUUM internals that were previously a black box. Combined with pg_repack, disk space can be returned to the OS with under one second of locking, and per-table fine-grained control via ALTER TABLE ... SET (autovacuum_*) takes effect immediately without a server restart.
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| io_uring environment requirements | Linux 5.1+ required; Windows/macOS not supported | Use worker mode on those platforms |
| Container security policies | io_uring syscalls may be blocked by seccomp/AppArmor | Fall back to worker mode or modify policy |
| Limited gains on local NVMe | ~20–35% improvement vs. cloud (lower latency means smaller relative gain) | Still an improvement; application is recommended |
| Increasing vacuum cost_limit | Higher I/O usage → possible resource contention with production queries | Raise gradually with concurrent monitoring |
| Increasing max_workers | Proportional increase in shared_buffers and memory usage | Verify available server memory before applying |
| pgstattuple load | Full sequential table scan → unsuitable for real-time monitoring on large tables | Run once per week during off-peak hours |
autovacuum_vacuum_cost_limit: The cap on "cost tokens" that autovacuum is allowed to spend on I/O. The default of 200 is a legacy of the HDD era; in SSD/NVMe environments, raising it to the 1000–3000 range has little impact on production query performance.
Comparison of Bloat Removal Methods
| Method | Lock Level | Space Returned | Service Interruption | Recommended When |
|---|---|---|---|---|
| VACUUM | None | No (internal reuse only) | None | Routine maintenance |
| VACUUM FULL | Exclusive lock | Yes (returned to OS) | Yes | When a maintenance window is available |
| pg_repack | ~1 second lock | Yes (returned to OS) | Minimal | 24/7 production environments |
| pg_squeeze | ~1 second lock | Yes (returned to OS) | Minimal | Alternative to pg_repack |
Closing Thoughts
The real payoff of applying this combination in practice is the moment you see last_autovacuum timestamps appearing far more frequently than before. The combination of PostgreSQL 18 AIO and autovacuum tuning is a proven approach that structurally suppresses bloat in large tables through nothing more than configuration changes.
Three steps you can start right now:
-
Start by assessing your current bloat situation — run the
pg_stat_user_tablesquery introduced earlier to pull a list of tables with highdead_ratio. This can be run immediately regardless of PG version. -
Apply per-table autovacuum settings to your most frequently updated tables — something like
ALTER TABLE [tablename] SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_cost_delay = 0);can be applied instantly without downtime. After applying, compare thelast_autovacuumtimestamps and then_dead_tuptrend after 1–2 days. Users on PG 17 and below can see immediate results from steps 1 and 2 alone. -
If you're planning to upgrade to PostgreSQL 18, start planning your
io_methodconfiguration —io_uringfor Linux 5.1+ environments, and the defaultworkermode is sufficient for everything else. Right after upgrading, use thepg_aiosview to verify that the AIO pipeline is actually running.
References
- PostgreSQL 18 Official Release Notes | postgresql.org
- PostgreSQL 18 Asynchronous I/O: A Complete Guide | Better Stack
- Exploring why PostgreSQL 18 put asynchronous I/O in your database | Aiven
- Waiting for Postgres 18: Accelerating Disk Reads with Async I/O | pganalyze
- PostgreSQL 18 Async I/O: How it works under the hood | Medium
- PostgreSQL 18 Async I/O in Production: Real-World Benchmarks | PostgreSQL HTX
- PostgreSQL Vacuum Optimization for Large Tables: Deep Dive | Andrew Baker
- How to Prevent Table Bloat with Autovacuum Tuning in PostgreSQL | OneUptime
- Autovacuum Tuning Basics | EnterpriseDB
- PostgreSQL Table Bloat Management: VACUUM FULL, pg_repack | PostgreSQL HTX
- Understanding pg_repack: Eliminate Bloat Without Downtime | alexandrubagu.github.io
- How to Reduce Bloat in Large PostgreSQL Tables | Tiger Data
- PostgreSQL Async-IO using io_uring | Medium