Table (slices, columns, blocks) v1

Description

This page provides information about the blocks in a table, on a per-slice, per-column basis.

This is the most detailed information about the blocks held by a table; you really get to see the full and exact state of the table. The problem is that as the slice and column counts rise, it becomes a lot of rows.

This page does not auto-refresh, because there are too many rows for it to be easily taken in, and because vacuum operates on a per-slice basis, but on any given slice being vacuumed, all columns are vacuumed together, so there’s no need for per-column information.

Columns

Name Type
schema_id int8
schema varchar
table_id int8
table varchar
column_ordinal int2
column varchar
node int2
slice int2
store:blocks:sorted int8
store:blocks:unsorted int8
store:blocks:total int8
store:rows:sorted int8
store:rows:unsorted int8
store:rows:total int8
sorting % float4

Column Descriptions

schema_id

The schema ID. This column is emitted in CSV exports only.

schema

The schema name.

table_id

The table ID. This column is emitted in CSV exports only.

Surprisingly, table IDs turn out to be unique across all databases.

table

The table name.

column_ordinal

The column ordinal. This column is emitted in CSV exports only.

Column ordinals are unique to a single table.

In pg_attribute, ordinals for user columns count from 1, system columns are negative.

In stv_blocklist, ordinals for user columns count from 0, and system columns have a rather extraordinary numbering, in that they are the numbers after the final user column, which is to say, the system column numbers vary by table.

(I believe stv_blocklist internally is using unsigned variables, in part from this numbering issue, but also because I’ve been able to make num_values overflow, and when it overflows, it looks like an unsigned value which has wrapped and is then having its bit pattern interpreted as a signed value, where Redshift data types are all signed.)

In pg_attribte, you see quite a number of system columns, but in fact most vestigial, left-overs from Postgres, and do not exist on disk. Only three system columns are actually in use, which have the ordinals -2, -8 and -9, being the row ID, MVCC insert transaction ID and MVCC delete transaction ID, although offhand I cannot remember which is which (although I do recall it’s not the order you’d expect - I have a feeling -8 is the row ID).

column

The column name.

node

The node ID. Nodes count from 0 and are contiguous.

slice

The slice ID. Slices count from 0 and are contiguous and unique across all nodes.

There is a single special case “slice”, which used to be number 6411 but now seems to be a value which varies by query and which is slightly above 12800, and this is in fact the leader node.

store:blocks:sorted

The number of sorted blocks.

store:blocks:unsorted

The number of unsorted blocks.

store:blocks:total

The total number of blocks.

store:rows:sorted

The number of sorted rows.

store:rows:unsorted

The number of unsorted rows.

store:rows:total

The total number of rows.

sorting %

Sorting is not a per-table concept, but in fact a per-table, per-slice concept - in fact, it is even actually per-table, per-column, per-slice, but there isn’t much gain by adding in the per-column level, as will become clear.

What I mean by this is that sorting in the system tables and the docs is presented as a single number for a table.

This is wrong, and is misleading.

The key issue is that VACUUM does not operate on an entire table at once, but in fact on a subset of slices.

If you watch a table, in real-time, while VACUUM runs, you see one group of slices goes through the process of being sorted - this is quite an involved process, and the slices being vacuumed go through various contortions with how many sorted and unsorted blocks they possess (both go up and down), until eventually it all sorts itself out and the slice is sorted.

This takes quite a while, and the user can interrupt the process at any time.

This leaves the slice in an partially vacuumed state - and, critically, it means then that slices can vary in how vacuumed they are.

A table is as fast as its least sorted slice.

It is then that what we care about is the least sorted slice.

This column gives that percentage, which it is computed by on a per-slice basis dividing the number of sorted blocks on the slice by the total number of blocks on the slice.

This leads us to a the per-column nature of sorting.

It can be that row distribution between slices is not approximately equal. One slice may have significantly more rows than other slices - we can imagine, for ease of reasoning, an fairly extreme scenario where all slices have say three sorted blocks, except for one slice, which has ten sorted blocks.

If we then update one row, the three-block slices become four block slices, with three sorted blocks and one unsorted block. They are 3/4 = 75% sorted.

The ten-block slice becomes an eleven block slice, with one unsorted block, which is therefore 10/11 = 90.9% sorted.

This however is I think normally not much of a consideration - usually tables are well balanced between slices.

A final note, the automatic VACUUM, which Redshift has running in the background, is described as backing off, and stopping working, when the user load on the cluster is too high, or the user issues a VACUUM. Obviously, if this is true, then this is a source of partially vacuumed slices.

(As it is, auto-vacuum seems to be negligible in practise. It runs too infrequently, and does too little work, to have a detectable impact upon clusters. Also, in my testing, I only ever saw it run delete-only VACUUM.)