This page provides information about the blocks in a table, on a per-slice, per-column basis.
This is the most detailed information about the blocks held by a table; you really get to see the full and exact state of the table. The problem is that as the slice and column counts rise, it becomes a lot of rows.
This page does not auto-refresh, because there are too many rows for it to be easily taken in, and because vacuum operates on a per-slice basis, but on any given slice being vacuumed, all columns are vacuumed together, so there’s no need for per-column information.
Name | Type |
---|---|
schema_id | int8 |
schema | varchar |
table_id | int8 |
table | varchar |
column_ordinal | int2 |
column | varchar |
node | int2 |
slice | int2 |
store:blocks:sorted | int8 |
store:blocks:unsorted | int8 |
store:blocks:total | int8 |
store:rows:sorted | int8 |
store:rows:unsorted | int8 |
store:rows:total | int8 |
sorting % | float4 |
The schema ID. This column is emitted in CSV exports only.
The schema name.
The table ID. This column is emitted in CSV exports only.
Surprisingly, table IDs turn out to be unique across all databases.
The table name.
The column ordinal. This column is emitted in CSV exports only.
Column ordinals are unique to a single table.
In pg_attribute
, ordinals for user columns count from 1,
system columns are negative.
In stv_blocklist
, ordinals for user columns count from
0, and system columns have a rather extraordinary numbering, in that
they are the numbers after the final user column, which is to
say, the system column numbers vary by table.
(I believe stv_blocklist
internally is using unsigned
variables, in part from this numbering issue, but also because I’ve been
able to make num_values
overflow, and when it overflows, it
looks like an unsigned value which has wrapped and is then having its
bit pattern interpreted as a signed value, where Redshift data types are
all signed.)
In pg_attribte
, you see quite a number of system
columns, but in fact most vestigial, left-overs from Postgres, and do
not exist on disk. Only three system columns are actually in use, which
have the ordinals -2, -8 and -9, being the row ID, MVCC insert
transaction ID and MVCC delete transaction ID, although offhand I cannot
remember which is which (although I do recall it’s not the order you’d
expect - I have a feeling -8 is the row ID).
The column name.
The node ID. Nodes count from 0 and are contiguous.
The slice ID. Slices count from 0 and are contiguous and unique across all nodes.
There is a single special case “slice”, which used to be number 6411 but now seems to be a value which varies by query and which is slightly above 12800, and this is in fact the leader node.
The number of sorted blocks.
The number of unsorted blocks.
The total number of blocks.
The number of sorted rows.
The number of unsorted rows.
The total number of rows.
Sorting is not a per-table concept, but in fact a per-table, per-slice concept - in fact, it is even actually per-table, per-column, per-slice, but there isn’t much gain by adding in the per-column level, as will become clear.
What I mean by this is that sorting in the system tables and the docs is presented as a single number for a table.
This is wrong, and is misleading.
The key issue is that VACUUM
does not operate on an
entire table at once, but in fact on a subset of slices.
If you watch a table, in real-time, while VACUUM
runs,
you see one group of slices goes through the process of being sorted -
this is quite an involved process, and the slices being vacuumed go
through various contortions with how many sorted and unsorted blocks
they possess (both go up and down), until eventually it all sorts itself
out and the slice is sorted.
This takes quite a while, and the user can interrupt the process at any time.
This leaves the slice in an partially vacuumed state - and, critically, it means then that slices can vary in how vacuumed they are.
A table is as fast as its least sorted slice.
It is then that what we care about is the least sorted slice.
This column gives that percentage, which it is computed by on a per-slice basis dividing the number of sorted blocks on the slice by the total number of blocks on the slice.
This leads us to a the per-column nature of sorting.
It can be that row distribution between slices is not approximately equal. One slice may have significantly more rows than other slices - we can imagine, for ease of reasoning, an fairly extreme scenario where all slices have say three sorted blocks, except for one slice, which has ten sorted blocks.
If we then update one row, the three-block slices become four block slices, with three sorted blocks and one unsorted block. They are 3/4 = 75% sorted.
The ten-block slice becomes an eleven block slice, with one unsorted block, which is therefore 10/11 = 90.9% sorted.
This however is I think normally not much of a consideration - usually tables are well balanced between slices.
A final note, the automatic VACUUM
, which Redshift has
running in the background, is described as backing off, and stopping
working, when the user load on the cluster is too high, or the user
issues a VACUUM
. Obviously, if this is true, then this is a
source of partially vacuumed slices.
(As it is, auto-vacuum seems to be negligible in practise. It runs
too infrequently, and does too little work, to have a detectable impact
upon clusters. Also, in my testing, I only ever saw it run delete-only
VACUUM
.)