Making use of EXPLAIN
In this example, reading 4 million rows has taken more than 100 milliseconds. From a performance point of view, it is a total disaster. To figure out what is going wrong, PostgreSQL offers the EXPLAIN command:
test=# \h EXPLAIN
Command: EXPLAIN
Description: show the execution plan of a statement
Syntax:
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
where option can be one of:
ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
TIMING [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
When you have the feeling that a query is not performing well, EXPLAIN will help you to reveal the real performance problem.
Here is how it works:
test=# EXPLAIN SELECT * FROM t_test WHERE id = 432332;
QUERY PLAN
---------------------------------------------------------------
Gather (cost=1000.00..43463.92 rows=1 width=9)
Workers Planned: 2
-> Parallel Seq Scan on t_test
(cost=0.00..42463.82 rows=1 width=9)
Filter: (id = 432332)
(4 rows)
What you see in this listing is a so called execution plan. In PostgreSQL, an SQL statement will be executed in four stages. The following components are at work:
- The parser will check for syntax errors and for obvious problems
- The rewrite system take care of rules (views and other things)
- The optimizer will figure out how to execute a query in the most efficient way and work out a plan
- The plan provided by the optimizer will be used by the executor to finally create the result
The purpose of EXPLAIN is to see what the planner has come up with to run the query efficiently. In my example, PostgreSQL will use a parallel sequential scan. This means that two workers will cooperate and work on the filter condition together. The partial results are then united through a thing called gather node, which has been introduced in PostgreSQL 9.6 (it is part of the parallel query infrastructure). If you look at the plan more precisely, you will see how many rows PostgreSQL expects at each stage of the plan (in this example, rows = 1 that is, one row will be returned).
Parallelism is not a must. It is always possible to reduce the number of parallel workers in pre-PostgreSQL 9.6 behavior by setting the following variable to 0:
test=# SET max_parallel_workers_per_gather TO 0;
SET
Note that this change has no side effect as it is only inside your session. Of course you can also decide the change in the postgresql.conf file, but I would not advise you to do that as you might lose quite a lot of performance provided by parallel queries.