Tables v1

Description

This page provides information about the tables in the database.

  1. Limitations of the queries, store and I/O columns
  2. Volume of records

Columns

Name Type
schema_id int8
schema varchar
table_id int8
table varchar
owner_user_id int4
owner varchar
creation timestamp
age interval
idle interval
types:sort varchar
types:dist varchar
counts:columns int2
counts:constraints int8
counts:sortkeys int2
counts:slices int2
store:blocks:sorted int8
store:blocks:unsorted int8
store:blocks:total int8
store:rows:sorted int8
store:rows:unsorted int8
store:rows:total int8
store::sorting %% float4
i/o:bytes read:disk int8
i/o:bytes read:network int8
i/o:rows:inserted int8
i/o:rows:deleted int8
i/o:bytes processed:in memory int8
i/o:bytes processed:on disk int8
related pages varchar

Column Descriptions

schema_id

The schema ID. This column is emitted in CSV exports only.

schema

The schema name.

table_id

The table ID. This column is emitted in CSV exports only.

Surprisingly, table IDs turn out to be unique across all databases.

table

The table name.

owner_user_id

The owner user ID. This column is emitted in CSV exports only.

owner

The owner user name.

creation

The creation timestamp of the table.

age

The age of the table (i.e. now() minus the creation timestamp).

idle

Duration since the start of most recent table access in stl_delete, stl_insert and stl_scan, which is to say, the time since the table last had rows deleted, inserted, or was scanned. If empty, there are no records in any of these system tables within the time period being examined by the page.

types:sort

Value Sorting Type
compound Compound
interleaved Interleaved
unsorted Unsorted

Note that it is not possible for me to write a view to tell if a table has auto sorting. This is because the system table which contains this information is owned by rdsdb, the root used controlled by AWS, and so is not accessible.

The only way to I can find out is to use svv_table_info, which is a view provided by AWS, which provides information about tables.

However, firstly, the view is flawed. It only displays information about tables which have at least one row; empty tables are not listed.

Secondly, the view is huge. It’s a lot of SQL and work, and indeed is repeating many of the table reads I am performing in my own view. It also looks complex and I think then probably buggy.

As such, I’ve opted not to use it, and so this page does not tell you if auto sorting is active.

As it is, I would advise you never to use AUTO, anyway. You cannot have a knowingly correctly designed system when the system contains undocumented black boxes provided by third parties, and where those black boxes can and do silently change.

Finally, note that auto is the default setting when creating a table - if you do not specify sorting, that’s what you get - and so to get an unsorted table you now need, after making the table, to then issue an ALTER TABLE command, to deactivate AUTO.

types:dist

Value Sorting Type
compound Compound
interleaved Interleaved
unsorted Unsorted

Note that it is not possible for me to write a view to tell if a table has auto sorting. This is because the system table which contains this information is owned by rdsdb, the root used controlled by AWS, and so is not accessible.

The only way to I can find out is to use svv_table_info, which is a view provided by AWS, which provides information about tables.

However, firstly, the view is flawed. It only displays information about tables which have at least one row; empty tables are not listed.

Secondly, the view is huge. It’s a lot of SQL and work, and indeed is repeating many of the table reads I am performing in my own view. It also looks complex and I think then probably buggy.

As such, I’ve opted not to use it, and so this page does not tell you if auto sorting is active.

As it is, I would advise you never to use AUTO, anyway. You cannot have a knowingly correctly designed system when the system contains undocumented black boxes provided by third parties, and where those black boxes can and do silently change.

Finally, note that auto is the default setting when creating a table - if you do not specify sorting, that’s what you get - and so to get an unsorted table you now need, after making the table, to then issue an ALTER TABLE command, to deactivate AUTO.

counts:columns

The number of columns. Tables have system columns (there are three), and these are not included in the count.

counts:constraints

The number of constraints; NOT NULL, unique, primary key, foreign keys.

counts:sortkeys

The number of sortkeys.

counts:slices

The number of slices which hold blocks from this table.

If the number of slices holding blocks is less than the number of slices in the cluster, then obviously, not all slices are participating in queries which read this table. This is improper, and you should look to fix it. However, for tables with less records than there are slices in the cluster, they will naturally use less slices than are present in the cluster. Finally, note that tables with all distribution have a full copy of themselves existing on a single slice on each node. Their rows, when they are read, are broadcast to all slices. Remember only use all for small tables; if you use it with large tables, you end up producing a system which does not scale, because only one slice per node can write to that that table.

store:blocks:sorted

The number of sorted blocks.

store:blocks:unsorted

The number of unsorted blocks.

store:blocks:total

The total number of blocks.

store:rows:sorted

The number of sorted rows.

store:rows:unsorted

The number of unsorted rows.

store:rows:total

The total number of rows.

store::sorting %%

Sorting is not a per-table concept, but in fact a per-table, per-slice concept - in fact, it is even actually per-table, per-column, per-slice, but there isn’t much gain by adding in the per-column level, as will become clear.

What I mean by this is that sorting in the system tables and the docs is presented as a single number for a table.

This is wrong, and is misleading.

The key issue is that VACUUM does not operate on an entire table at once, but in fact on a subset of slices.

If you watch a table, in real-time, while VACUUM runs, you see one group of slices goes through the process of being sorted - this is quite an involved process, and the slices being vacuumed go through various contortions with how many sorted and unsorted blocks they possess (both go up and down), until eventually it all sorts itself out and the slice is sorted.

This takes quite a while, and the user can interrupt the process at any time.

This leaves the slice in an partially vacuumed state - and, critically, it means then that slices can vary in how vacuumed they are.

A table is as fast as its least sorted slice.

It is then that what we care about is the least sorted slice.

This column gives that percentage, which it is computed by on a per-slice basis dividing the number of sorted blocks on the slice by the total number of blocks on the slice.

This leads us to a the per-column nature of sorting.

It can be that row distribution between slices is not approximately equal. One slice may have significantly more rows than other slices - we can imagine, for ease of reasoning, an fairly extreme scenario where all slices have say three sorted blocks, except for one slice, which has ten sorted blocks.

If we then update one row, the three-block slices become four block slices, with three sorted blocks and one unsorted block. They are 3/4 = 75% sorted.

The ten-block slice becomes an eleven block slice, with one unsorted block, which is therefore 10/11 = 90.9% sorted.

This however is I think normally not much of a consideration - usually tables are well balanced between slices.

A final note, the automatic VACUUM, which Redshift has running in the background, is described as backing off, and stopping working, when the user load on the cluster is too high, or the user issues a VACUUM. Obviously, if this is true, then this is a source of partially vacuumed slices.

(As it is, auto-vacuum seems to be negligible in practise. It runs too infrequently, and does too little work, to have a detectable impact upon clusters. Also, in my testing, I only ever saw it run delete-only VACUUM.)

i/o:bytes read:disk

This column then shows the total number of bytes read from disk, as best I can judge the types indicated in stl_scan.

i/o:bytes read:network

This column then shows the total number of bytes read from network, as best I can judge the types indicated in stl_scan.

Importantly, it only counts the receive side of network activity - the step is scan, after all, not broadcast, so we’re not counting bytes twice.

i/o:rows:inserted

The number of rows inserted into the table.

For tables with all distribution, this is the physical number of rows (i.e. one per node), not the logical number of rows.

i/o:rows:deleted

The number of rows deleted from the table.

For tables with all distribution, this is the physical number of rows (i.e. one per node), not the logical number of rows.

i/o:bytes processed:in memory

This column then shows the total number of bytes processed by the stl_aggr, stl_hash, stl_save, stl_sort and stl_unique steps, when running in memory rather than on disk.

i/o:bytes processed:on disk

This column then shows the total number of bytes processed by the stl_aggr, stl_hash, stl_save, stl_sort and stl_unique steps, when running on disk rather than in memory.