Redshift Research Project



Weather Report

For the last two weeks or so, a few nodes types in a few regions are not starting.

I’ve listed non-starters in the table.

Region Node Notes
ap-east-1 ra3.xlplus Very slow disk-read-write and network, benchmarks about 3x longer than usual.
ap-northeast-2 dc2.large Network benchmark back to normal (~4 seconds), from 2x this.
ap-northeast-3 dc2.large Very slow disk-read-write, 13s vs normal 3 to 4. Processor also slow (4s vs 3s) which is very unusual.
ap-northeast-3 ra3.xlplus Very slow disk-read-write, 9.5s vs normal 3s.
ap-southeast-1 dc2.large Network slow, 6s vs normal 4.5s.
ap-southeast-3 ra3.xlplus Disk-read-write slow, 7s vs 3.5s
eu-north-1 dc2.large Failed to start node type.
eu-north-1 ra3.xlplus Failed to start node type.
eu-west-1 ra3.xlplus Failed to start node type.
us-east-1 ra3.xlplus Failed to start node type.
us-east-1 ra3.4xlarge Failed to start node type.


Differentiating Leader Nodes and Worker Nodes

Redshift is composed of a single leader node and then two or more worker nodes (don’t use single node clusters).

The leader node is Postgres 8, plus a ton of new functionality to control worker nodes.

The worker nodes are completely new, nothing to do with Postgres.

By default a query runs and only runs on the leader node. Such a query does not go into WLM, and so you are not limited by the number of slots in WLM.

If a query requires resources on worker nodes (which is usually to say, tables created by users), then the query goes into WLM and runs on the worker nodes.

It can sometimes be useful to be able to identify if a query is running on the leader node, and this can be done by certain queries, because the output of those queries are different on the leader node to worker nodes.

The method I’ve always used is to print a BC date. The leader node prints in a different format to worker nodes (and indeed, worker nodes cannot ingest their own output format!)

A new way recently came to light, which is to do with support for the interval keyword.

Leader nodes, being Postgres, support intervals up to and including millennium.

Worker nodes, being much more minimal in implementation, support intervals up to and include week.

If you issue a query using an interval of more than a week, it will run successfully on the leader node, but fail on worker nodes - it’s a second easy method to tell the two apart.


Compute Slices PDF

Working on a new PDF, first in a while.

Looking at the performance impact of compute slices.

Taking a two node cluster, benchmarking it, then elastic resize to four nodes, benchmark again. Then make a four node cluster, and benchmark that.

Then see what we get - a measure of performance loss from elastic resize.

I’m benchmark disk-read-write only right now. A comprehensive test would need to look at which steps compute slices participate in, which isn’t published, and then look at the performance loss for those steps.


RUSI Article

This is superb.

Europe Must Urgently Prepare to Deter Russia Without Large-Scale US Support


ra3 Bring-Up Problems

Trying to write a new PDF, running into a lot of problems getting an ra3 cluster up. Multiple regions unable to bring up a cluster.

dc2 Bring-Up Problems

I just ran a cross-region version check.

Failed to bring up a dc2.large in eu-south-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, me-south-1, and eu-west-2.


New PDF OTW : Late-Binding Views


Just nailed how the function which provides late-binding view info slows down to the point where it’s effectively frozen.

PDF tomorrow (Monday).


Late-Binding Views (PDF)

There are in Redshift two types of view, normal and late-binding. When the leader node creates a normal view, it checks at the time of creation pg_class for the tables and views being used by the view, and will not create the view if it refers to non-existent tables or views. Accordingly, external tables cannot be used with normal views because external tables are not present in pg_class and so are perceived as being non-existent. To work around this, late-binding views were created, which perform no dependency checking - they do not inspect pg_class. This approach has two major and three minor issues; the lack of dependency checking means blunders when creating or modifying views lead to faults which can become apparent only at a very different time and in a very different place to a very different person who has no idea what caused the fault, there are as the number and complexity of late-binding views increase critical performance problems when enumerating the columns in late-binding views such that it is possible for it to become impossible to enumerate late-binding view columns, a system table function is used to enumerates late-binding view columns and access privileges for this function are incorrectly implemented such that syslog unrestricted and access system table do not work, the lack of dependency information means no records in pg_depend so it is impossible to perform data lineage or inspect dependencies, and finally, late-binding views pollute pg_class as they look like normal views (except for a single undocumented difference). Late-binding views should due to their flaws be used for and only for their one essential purpose, when an external table must be in a view.


1.0.61395 is out

New Redshift is out.

28 new system table tables, 13 new system table views, 84 new system table functions.

With Redshift releases, normally there’s two or three releases with no system table changes, and then a release with a bunch of changes - this is such a release, and the first such release since I improved the system table diff page, which finally comes into its own;

It’s now easy to examine the changes.

We can also look at the official changes, here, so let’s begin with those and then look directly at what’s changed in the system tables.

  1. Changes CURRENT_USER to no longer truncate the returned username to 64 characters.

    Oooof. I didn’t know this problem existed! glad that’s been fixed.

  2. Adds OBJECT_TRANSFORM SQL function. For more information, see OBJECT_TRANSFORM function in the Amazon Redshift Database Developer Guide.

    Ahhhhh - as we will see when we look at the system tables, this explains the mass of new functions with names like try_boolen() and try_float(). BTW, the docs don’t say it, but this is in preview. It’s not GA.

  3. Adds the INTERVAL data type.

    Oooooooooh. Nice. Good. This immediately raises the question of how intervals are sorted. They are probably 8 bytes in length, and count microseconds, so hopefully the sorting value is a direct copy of the interval value.

    Hmm. IIRC, worker nodes only support intervals up to days or weeks, not longer time periods. That restriction may well still be in play, which would reduce the usefulness of this change.

  4. Adds CONTINUE_HANDLER, which is a type of exception handler that controls the flow of a stored procedure. Using it, you can catch and handle exceptions without ending the existing statement block.

    Ah, that’s good. This was a major piece of missing functionality, both in and of itself because you need it, and because Postgres et al can do this, so moving code over would require rewrites (which might be fundamental rewrites).

  5. Adds the ability to define permissions on a scope (schema or database) in addition to individual objects. This allows users and roles to be granted a permission on all current and future objects within the scope.

    Hmm. You can do this already, with default privileges. This is a second mechanism to do the same thing. I’m normally not impressed with what the devs produce for RS, so I fear the worst.

  6. Changes external functions to now implicitly cast numbers with or without fractional parts to the numeric data type of the column. For int2, int4, and int8 columns, numbers with fractional digits are accepted by truncating unless the number is out of range. For float4 and float8 columns, numbers are accepted without fractional digits.

    I may be wrong, but I think this is bad. This is reducing the strength of typing, which is not what you want.

Okay, now on to the system table changes, here;

First, functions.

  1. There’s a whole ton of functions with names like try_boolean or try_float4, or try_int2. I guess these are for OBJECT_TRANSFORM.

  2. New function, pg_get_cluster_isolation_level. That’s curious, because that information is already in a system table (unless I’m about to find it’s been removed).

  3. Couple of new functions for intervals, intervaly2m and intervald2s. This hints that the limitation of worker nodes to days or weeks (whichever it was) has been lifted.

  4. Two very curious functions, _pg_interval_precision() and _pg_interval_type(). This implies more than one type of interval. (The official docs have not been updated, so there’s no info on this in Data Types).

Now, new tables.

  1. New table, pg_global_object.

    I’ve had code of my own producing this table for years. I wonder if this new table actually has all objects in, or just some? what I do see is that FINALLY the devs have included OIDs. Historically, this has not been done - you’d just get the object names - and then need to look up the OIDs yourself (which was not always possible, you might have a table name but no schemaname, or the names might be truncated to shorter than their normal maximum length, so you’d potentially have a partial name only).

  2. Matching new table, pg_global_object_attributes.

    In Postgres, an attribute is a column. This table lists all columns of column-possessing objects, with the object defined by its OID. There’s a table which does this already, pg_attribute - but this table has normal tables and normal views only. I wonder if this new table has external tables and late-binding views and so on? in which case, why not just enhance the existing table? why have two tables?

  3. Some new tables which contain information specifically about Sage (presumably Sagemaker), which seems a bit odd. A database shouldn’t know specifically about external data source entities?

  4. Curious. Some new tables to do with “undo”, like stv_inflight_undos and stcs_auto_undo_events. What’s that all about?

Next, modified tables;

  1. The *_aggr tables have a new column dist_by_distinct. Intriguing. This will relate to distribution of rows, for the aggregation, and I would guess whether or not distribution gave every distinct value to the same slice - but I could be wrong, because the data type is int4, not boolean. Is this new functionality, an improvement, or now simply publishing some state about behaviour which already existed?

  2. Curious little change to stcs_arcadia_billing_xact_log.

    Arcadia billing is about Serverless billing. The new column is force_billable_state. Sounds like something has changed a bit with Serverless billing.

  3. cluster_arn in stcs_burst_connection (and I think probably globally) has changed from char(100) to char(128). Officially, maximum possible length of an ARN is 2048 characters, so shrug. Maybe RS ARNs are always shorter.

  4. cluster_arn in stcs_burst_manager_personalization (and I think probably globally) has changed from char(120) (one hundred and twenty, not one hundred) to char(128). Ah, looks like someone noticed ARN field lengths were not consistent and they’ve all (hopefully all) been changed to 128.

  5. stv_exec_state has a new column step_attribute.

    Not present in the official docs;

    It’s showing up in one or two other modified tables, too.

Now, new views.

  1. stl_auto_undo_events, which provides the public face to the various “undo” objects seen previously.

  2. Cute little view, stl_create_table_ddl_summary. Presumably showing CREATE TABLE commands.

  3. Huh. information_schema.columns has changed - interval_precision is no longer a cardinal_number but is now character_data. That’s going to mess up people’s SQL, I think. Same for

  4. stl_vacuum_detail now includes columns which - get this - actually tell you WHICH table was vacuumed =-) WORK OF GENIUS I SAY! :-) :-)

Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers