Redshift Research Project

Blog

2023-04-01

System Table Tracker Function Diffs

First pass at getting function diffs up.

Works, but the links to old/new current broken.

Functions can’t be diffed like tables or views, because they are not unique by their name, so I can’t tell if a function has been modified - imagine I have ten functions with the same name but different arguments, and say three of them change arguments, but all keep the same name. Which old function has changed into which new function? so for functions, there’s no concept of modified - only old and new.

Finally, functions diff on everything - which is to say, schema name, language name, volatility, return base data type, function name, argument base data types and their order.

Here’s an example of a release with a lot of action.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.46534_to_1.0.46607/index.html

2023-04-02

Stuff

I wanted for a very long time to publish a PDF which describes all the major considerations involved in using Redshift - the cluster, nodes, tables, queries, data distribution, everything.

The problem with this document is that although by ordering the content appropriately, and explaining concepts as it goes along, it can be made viable for weakly technical readers, it is too long for such readers; they do not want to know all the details, but the key, central issues.

As such I’ve today written a much shorter, punchier document, which directly cover the main considerations - in fact, it’s pretty much the written version of the lecture I have found myself giving again and again and again, to clients.

I need to write the conclusion, and then the document needs a certain time to mature, to be polished. It shouldn’t be too long, certainly this week.

2023-04-05

Four New Regions

So I finally added the four new regions to the back-end I have for tracking Redshift stuff.

The regions are;

ap-south-2
ap-southeast-4
eu-central-2
eu-south-2

Now, I Googled beforehand to see which regions support Redshift, and didn’t find anything which obviously listed this information, but I did find a page listing Redshift API support and so on, and all these regions were listed.

Well, having just tried now to fire up dc2.large clusters, I can say none of these regions support Redshift, at least, not for me, doing that.

Pricing info exists in all these regions for RS, though, and that’s now up.

2023-04-06

Mystery of The Missing System Tables

Too late now for a write up, but two new version of RS have come out - and it looks like in the first (which is still latest in some regions) some system tables were dropped accidentally, as they have been reinstated in the second version. That or my code is buggy, which is possible.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.48004_to_1.0.48714/index.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.48714_to_1.0.48805/index.html

Example of this is sys_copy_job, but most of the tables dropped in 48714 have come back in 48805.

I ran the benchmarks, too, but again too late for write-up. Tomorrow.

I need to spin up a 48714, manually, to make sure some of the tables really are missing.

(Addendum. I’ve manually brought up a 48714 and 48805, and confirmed sys_copy_job is missing in the former and has returned in the latter. Looks like my code is okay :-)

2023-04-07

New White Paper

So, I’ve been working on-and-off for some time on a white paper which covers the fundamentals of Redshift.

This is not that white paper.

That white paper is completely thorough, and although written for a weakly technical audience (managers and better), by being thorough, for there are a lot of concepts to cover, it is long, and by being long, detracts from its approachability. Only IT professionals will read it, I think.

So I’ve produced this white paper, now, which is written not to be completely thorough, but to be short and punchy, and to cover the critical concepts as succinctly as possible.

If you’re using, or thinking of using, Redshift, this is the white paper now to read, or to have someone read for you, so you can learn about everything you currently have no idea about whatsoever, and which you would otherwise run into, and quite possibly have a miserable and then failed project before moving onto a different database.

This white paper is basically the hour long conversation I have had innumerable times with I don’t know how many people, who have been having problems with Redshift, or were thinking to use Redshift.

Thank God for writing, as from now on I can just send them a URL :-)

https://www.redshiftresearchproject.org/white_papers/downloads/introduction_to_the_fundamentals_of_amazon_redshift.pdf

2023-04-11

Weather Report

(A bit delayed - the benchmarks were taken on the 6th April, which is about a week late, and published only today, which is two weeks late.)

ap-northeast-2 for both dc2.large and ra3.xlplus both are showing very high standard deviation for the disk-read-write benchmark. Deviation should be about 0.05s, but is about 2s. This happened six weeks ago as well.
eu-west-1 dc2.large still very slow for disk-read. Normally, 0.07s to 0.18s, two weeks ago, 0.64s, this week 0.52s. Seems to be an ongoing issue.
eu-west-2 dc2.large disk-read-write seems to have been upgraded. Currently 3.2s, usually 4.5s to 6s.
us-east-1 dc2.large has the same disk-read problem as eu-west-1. Currently 0.53s, should be 0.07s.

http://www.redshiftresearchproject.org/cross_region_benchmarks/index.html

2023-04-18

Weather Report

ap-northeast-2 dc2.large network performance fell back again, to ~8s - normally 4s.
ap-southeast-1 ra3.xlplus network performance fell back from ~2.3s to 3s. This is unusual - ra3 nodes are usually very consistent.
eu-central-1 dc2.large disk-read-write benchmark ran at 6.3s, the norm for the last six months has been ~4s (before that, before the general upgrades, ~7s).
eu-central-1 ra3.xlplus looking back over the benchmarks is flipping between about 2.5s and 3.5s. This is not seen in other regions.
eu-west-1 dc2.large (like eu-central-1) ran slowly on the disk-read-write benchmark, 5.5s rather than the usual ~4s, but the last month of ~10x slowdown on the read benchmark has gone, with read performance returning to 0.06s.
us-west-1 dc2.large disk-read-write benchmark continues to show very high standard deviation, with the benchmark at about 4.6s but the deviation being about 2.5s (in other regions, usually about 0.2s).

https://www.redshiftresearchproject.org/cross_region_benchmarks/index.html

2023-04-23

New PDF : MERGE

New PDF, MERGE, which oddly enough is about the new MERGE command.

https://www.redshiftresearchproject.org/white_papers/downloads/MERGE.pdf

First Guesses about Amazon Redshift Serverless

Serverless has been in GA for about a year, and I’ve not investigated, in part because it’s expensive (sixty-second minimum billing for a single query, including system table queries), and in part because the system tables have been very nearly almost completely removed, which will make investigation time consuming.

However, over the year, I’ve been picking up hints and clues, and forming up something of a feeling for what might be going on, and now I’m writing out my current thoughts.

Introduction

Historically, Redshift has been a clustered database, and so there is a cluster, composed of nodes, the database being a server itself (a database server), which runs on server hardware (the cluster).

As of July 2022, AWS introduced a new Redshift variant, Amazon Redshift Serverless.

What “Serverless” here means is not defined.

From a user’s point of view, what’s changed is the general “computing power” of the cluster is now defined in and only in terms of “Redshift Processing Units”.

Exactly what these are is not defined, but the minimum number is 32 and the maximum is 512, and billing is now no longer per second of cluster up-time, but is per-second of Redshift Processing Units usage (with a sixty-second minimum charge). Exactly what usage is, is also not defined.

The most obvious issue is that it is not clear in what way(s) Serverless scales.

Does it scale in terms of accelerating individual queries? does it scale in the number of concurrent queries?

Is a single Redshift Processing Unit a single full speed query, and more Redshift Processing Units will make no difference to performance? or if I have a single query, and I add Redshift Processing Units, will that query go faster?

Time to examine such clues and hints as I’ve found.

Queuing Queries

There’s a fascinating reply, from an AWS staffer on re:Post, here;

https://repost.aws/questions/QUacMGrtjYSAaVzCmStUidoA/concurrency-scaling-with-redshift-serverless

If you are at maximum RPU’s then Redshift will queue up user queries for execution.

Now it might be old-style queues and slots have gone, and this is a new type of queue which is for when serverless capacity is all in use, but I doubt it - my guess is that we still have queues and slots, and that we have is AutoWLM deciding how to allocate resource between queries.

Persistent Up-scaling

https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-billing.html

Amazon Redshift Serverless scales up from its base RPU capacity to handle periods of higher load. It some cases, RPU capacity can remain at a higher setting for a period after query load falls.

That does not sound very serverless to me. That sounds like servers being added and removed by some background process.

Serverless Resizing

There’s an article from FiveTran, which is Serverless cheerleader-squad (and compares Serverless vs a single node dc2.large, which shows you where that article and author are at), but it has a fascinating nugget of information : changing the maximum number of Redshift Processing Units takes some minutes.

https://www.fivetran.com/blog/i-took-redshift-serverless-for-a-spin-heres-what-i-found

64 RPUs → 128 RPUs = 5 min 27 sec
128 RPUs → 64 RPUs = 5 min 25 sec
128 RPUs → 256 RPUs = 7 min 15 sec
32 RPUs → 512 RPUs = 4 min 49 sec
512 RPUs → 256 RPUs = 3 min 58 sec

That does not look serverless. Serverless is like lambda, where you have a large array of instances and when you fire off a lambda function, it goes to one of those servers; you don’t have to wait while you change the maximum number of lambdas available to you. That sounds like a cluster - you are adjusting how many nodes, or perhaps their type, are provisioned in your cluster.

IP Addresses per Serverless Node?

Next we have this, which is a page with the filename “serverless known issues” in the URL, but the page title is now the more diplomatic “Considerations when using Amazon Redshift Serverless” :-)

https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-known-issues.html

What we see here is that we need in our VPC about one IP address per 3 Redshift Processing Units. That has to be a node.

Cursor Size Limits

However, mitigating against the idea of a leader node is this;

https://docs.aws.amazon.com/redshift/latest/dg/declare.html#declare-constraints

The maximum cursor size for Serverless is small, at 150,000mb - unless it’s a typo, which is might well be, because the docs are not well checked.

We see though clearly here that the cursor is being materialized in full on some kind of resource constrained hardware.

(We can also see that single node ra3 get a very small cursor maximum, just as single node cluster of other types do - we can reasonably think that single node ra3, like dc2, is squeezing a leader node and worker node into a single node of hardware, and so should be avoided.)

RMS

The amount of store allowed to serverless is large enough it must be cloud based and that must mean RMS.

https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-capacity.html#serverless-rpu-capacity

The 8, 16, and 24 RPU base RPU capacities are targeted towards workloads that require less than 128TB of data. If your data requirements are greater than 128 TB, you must use a minimum of 32 RPU.

and

Configurations of 8 or 16 RPU support Redshift managed storage capacity of up to 128 TB. If you’re using more than 128 TB of managed storage, you can’t downgrade to less than 32 RPU.

That much store can’t be local disk.

Summary

My current guess is that Serverless is a single normal Redshift cluster, with Concurrency Scaling Clusters providing extra capacity (on ra3 nodes CSC can perform write operations to RMS, but with lots of limitations), AutoWLM handling queues and slots - so it’s all existing functionality - what has actually changed is and only is billing.

Billing is now per second of total cluster size - i.e. the cluster itself, plus any CSC clusters currently running.

When you change maximum Redshift Processing Units, you’re actually getting a new cluster spun up, which is appropriately provisioned to the maximum you’ve chosen.

The system tables have been almost completely removed, in part I would guess because if they were retained, it would be obvious serverless was in fact still a single cluster; which means you now have no clue what state your tables are in, or your users, privileges, queues, queries, you name it. This is not a serious product; a database with no meaningful system tables is a toy, not something for use in business critical systems.

The one caveat to all this is that it does not explain the low cursor size limit, but that limit needs to be verified.

The next steps for me are to get a cluster running, check to see if single queries scale as more Redshift Processing Units are added, and check the cursor size maximum.

2023-04-27

Second version of MERGE published

Version 2 of the MERGE document has been published, with the explanation from mauro (from Slack DBT) as to the purpose of the temp table.

The abstract, below, is the new version 2 abstract.

The MERGE command is implemented as a wrapper around existing SQL commands, calling CREATE TEMP TABLE, UPDATE and INSERT or DELETE. The SQL standard specifies the MERGE command will either match (and so update) or not match (and so insert) each source row. When manually issuing an UPDATE followed by an INSERT, it is possible for a row to match and not match, when the change performed by the update causes a row which did match to no longer match. This problem is solved by using a temp table, to store all rows which are to be inserted, prior to running the update. However, in the case when the data is such this problem does not occur, the temp table is not necessary, and is pure overhead. As such, MERGE is a kind of worst-case implementation; it has to always work correctly, so it always uses a temp table, even when it is not necessary. Finally, MERGE mandates the use of and only of a table for merge source rows, and I can see no reason for this, as none of the wrapped commands require it, and all accept a sub-query or view.

https://www.redshiftresearchproject.org/white_papers/downloads/MERGE.pdf

2023-04-30

Something Odd Is Happening

In a couple of regions - us-east-1 and us-east-2, and now also sa-east-1, at least with dc2.large (the only type I’ve tested with so far) something odd is happening.

Clusters start up, and I can see they are available and I can get an IP, but the clusters continue to claim to have an active operation (even though they are in the available state), and this means firstly, my scripting is getting confused when starting clusters and consider those clusters failed (but in fact they have been started, and I now must manually delete them) and also that I can in fact not delete them manually until the seemingly-spurious ongoing operation finally completes.

I’ve not looked more closely yet (it’s 2am).

I’ve been noticing this for a week or so, I thought maybe it was transient, but it seems to be ongoing.

Two New Redshift Versions

Two new RS versions out - 1.0.49676, which is now in two regions only, and 1.0.49780, which is in all the other regions.

Quite a lot of change (new function user_is_member_of_role() in 49676 looks particularly interesting), but also 49676 has introduced quite a bit of new stuff which 49780 has removed (including that new function).

https://www.redshiftresearchproject.org/system_table_tracker/1.0.49087_to_1.0.49676/index.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.49676_to_1.0.49780/index.html

Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers