Configuring VACUUM and autovacuum
Back in the early days of PostgreSQL projects, people had to run VACUUM manually. Fortunately this is long gone. Nowadays administrators can rely on a tool called autovacuum, which is part of the PostgreSQL Server infrastructure. It automatically takes care of cleanup and works in the background. It wakes up once per minute (see autovacuum_naptime = 1 in postgresql.conf) and checks if there is work to do. If there is work, autovacuum will fork up to three worker processes (see autovacuum_max_workers in postgresql.conf).
The main question is: When does autovacuum trigger the creation of a worker process?
The answer to this question can again be found in postgresql.conf:
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
The autovacuum_vacuum_scale_factor tells PostgreSQL that a table is worth vacuuming if 20% of data has been changed. The trouble is that, if a table consists of one row, one change is already 100%. It makes absolutely no sense to fork a complete process to clean up just one row. Therefore autovacuum_vacuuum_threshold says that we need 20% and that 20% must be at least 50 rows. Otherwise, VACUUM won't kick in. The same mechanism is used when it comes to optimizer stats creation. 10% and at least 50 rows are needed to justify new optimizer stats. Ideally, autovacuum creates new statistics during a normal VACUUM to avoid unnecessary trips to the table.