Mastering PostgreSQL 9.6
上QQ阅读APP看书,第一时间看更新

Configuring VACUUM and autovacuum

Back in the early days of PostgreSQL projects, people had to run VACUUM manually. Fortunately this is long gone. Nowadays administrators can rely on a tool called autovacuum, which is part of the PostgreSQL Server infrastructure. It automatically takes care of cleanup and works in the background. It wakes up once per minute (see autovacuum_naptime = 1 in postgresql.conf) and checks if there is work to do. If there is work, autovacuum will fork up to three worker processes (see autovacuum_max_workers in postgresql.conf).

The main question is: When does autovacuum trigger the creation of a worker process?

Actually the autovacuum process does not fork processes itself. Instead it tells the main process to do so. This is done to avoid zombie processes in the case of failure and to improve robustness.

The answer to this question can again be found in postgresql.conf:

autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1

The autovacuum_vacuum_scale_factor tells PostgreSQL that a table is worth vacuuming if 20% of data has been changed. The trouble is that, if a table consists of one row, one change is already 100%. It makes absolutely no sense to fork a complete process to clean up just one row. Therefore autovacuum_vacuuum_threshold says that we need 20% and that 20% must be at least 50 rows. Otherwise, VACUUM won't kick in. The same mechanism is used when it comes to optimizer stats creation. 10% and at least 50 rows are needed to justify new optimizer stats. Ideally, autovacuum creates new statistics during a normal VACUUM to avoid unnecessary trips to the table.