This page provides information about the blocks in a table, on a per-slice basis.
The vacuum process works on all the columns in a slice at once, so
watching this table gives a clear idea of how vacuum is proceeding,
without having to face the much larger number of rows from the
Table (slices, columns, blocks
page.
The page auto-refreshes. It is intended to allow a table to be directly observed, in real-time.
Name | Type |
---|---|
schema_id | int8 |
schema | varchar |
table_id | int8 |
table | varchar |
node | int2 |
slice | int2 |
store:blocks:sorted | int8 |
store:blocks:unsorted | int8 |
store:blocks:total | int8 |
store:rows:sorted | int8 |
store:rows:unsorted | int8 |
store:rows:total | int8 |
sorting % | float4 |
The schema ID. This column is emitted in CSV exports only.
The schema name.
The table ID. This column is emitted in CSV exports only.
Surprisingly, table IDs turn out to be unique across all databases.
The table name.
The node ID. Nodes count from 0 and are contiguous.
The slice ID. Slices count from 0 and are contiguous and unique across all nodes.
There is a single special case “slice”, which used to be number 6411 but now seems to be a value which varies by query and which is slightly above 12800, and this is in fact the leader node.
The number of sorted blocks.
The number of unsorted blocks.
The total number of blocks.
The number of sorted rows.
The number of unsorted rows.
The total number of rows.
Sorting is not a per-table concept, but in fact a per-table, per-slice concept - in fact, it is even actually per-table, per-column, per-slice, but there isn’t much gain by adding in the per-column level, as will become clear.
What I mean by this is that sorting in the system tables and the docs is presented as a single number for a table.
This is wrong, and is misleading.
The key issue is that VACUUM
does not operate on an
entire table at once, but in fact on a subset of slices.
If you watch a table, in real-time, while VACUUM
runs,
you see one group of slices goes through the process of being sorted -
this is quite an involved process, and the slices being vacuumed go
through various contortions with how many sorted and unsorted blocks
they possess (both go up and down), until eventually it all sorts itself
out and the slice is sorted.
This takes quite a while, and the user can interrupt the process at any time.
This leaves the slice in an partially vacuumed state - and, critically, it means then that slices can vary in how vacuumed they are.
A table is as fast as its least sorted slice.
It is then that what we care about is the least sorted slice.
This column gives that percentage, which it is computed by on a per-slice basis dividing the number of sorted blocks on the slice by the total number of blocks on the slice.
This leads us to a the per-column nature of sorting.
It can be that row distribution between slices is not approximately equal. One slice may have significantly more rows than other slices - we can imagine, for ease of reasoning, an fairly extreme scenario where all slices have say three sorted blocks, except for one slice, which has ten sorted blocks.
If we then update one row, the three-block slices become four block slices, with three sorted blocks and one unsorted block. They are 3/4 = 75% sorted.
The ten-block slice becomes an eleven block slice, with one unsorted block, which is therefore 10/11 = 90.9% sorted.
This however is I think normally not much of a consideration - usually tables are well balanced between slices.
A final note, the automatic VACUUM
, which Redshift has
running in the background, is described as backing off, and stopping
working, when the user load on the cluster is too high, or the user
issues a VACUUM
. Obviously, if this is true, then this is a
source of partially vacuumed slices.
(As it is, auto-vacuum seems to be negligible in practise. It runs
too infrequently, and does too little work, to have a detectable impact
upon clusters. Also, in my testing, I only ever saw it run delete-only
VACUUM
.)