Redshift Research Project



Server-less Redshift

I’m guessing this is RA3 only, and now to describe how I guess it’s working I’ve got to use a confusing mis-mash of old terminology being applied to new Redshift architecture.

Historically, RS has nodes and slices. One node made of a given number of slices (varies by node type). Data is distributed over slices, and only the slice holding a block can access that block.

These days, you have RA3. RA3 has data slices and compute slices. The “data slice” I think is a group of one or more network attached SSDs, and each compute slice is an EC2 instance. Data is still distributed over slices, but I empirically see now that any slice on a node can access any block which is on that node - that’s definitely not how it used to be.

So I think what happens now is SSD is on the network, which is now a notional “data slice”, but there’s only one data slice per “node” (you don’t really have nodes any more - it’s more of a legacy billing concept), and you have a cloud of EC2 instances, but they’re each tied to one of the network attached SSD groups.

So, server-less - now we have to think about how data is being handled by RA3.

The actual data is stored in S3.

S3 lets you read at any offset in any given file (file as in byte stream in S3 - you know the deal), but S3 does not allow you to modify existing files. You can write en entire new file, but that’s it.

Since RS does have to support reasonably small writes, my guess is each column is broken up into many pretty small files in S3.

Redshift historically thought of distribution as you pick a column and all the rows with the same values go to the “same place”, which was the SSDs in a node (they were physically in the node), and in particular a given slice (since slices maintained their own files).

The concept of distribution now though has a whole new meaning because primary store is S3, which is just a bunch of files in “one place” (the S3 key space) and each compute slices can access S3 concurrently (I would guess one connection at a time per slice, that seems to be how it’s done).

Now, remember though for a join to be performed, the rows involved must end up on the same compute slice.

What could be happening now (and I think - I’ll need to re-read the white paper) this was something like how Snowflake works - you keep track of how many values are in the table, and when a query comes along, you assign the same value range to each compute slice - and it’s S3, so they just go ahead and access the files with those values in.

Changing node counts historically has been a huge cost because redistribution is required.

You can avoid redistribution with S3, because all data is stored in the same place - instead of redistributing the data, you redistribute which compute slices process which values.

That allows you to change the number of compute nodes easily and rapidly.

My initial take on this is that cluster resize may have finally properly been solved - although at a cost, since everything is now in S3 and I suspect that’s going to make write performance rather variable.

Far as I see it, that’s it. That’s the entire consequence of server-less. You still need to know completely and fully how to operate Redshift correctly, and almost no one does, because AWS publish no information on how to do this.

The idea of “not managing infrastructure” is to my mind a complete white elephant. This is not repeat NOT the issue. Knowing how to operate sorted column-store correctly is the issue.

You can introduce functionality to automate this (a primary selling point for Snowflake), but that functionality isn’t great - it’s much better than if you don’t know what you’re doing, but worse than if you do (but with Snowflake, if you do know what you’re doing, it doesn’t help, because there are no user-operated controls - it’s automation and only automation, for better or for worse!)

If you don’t know what you’re doing, you shouldn’t be using Redshift in the first place - probably you just don’t know it - every client I’ve worked for has imagined Redshift as a “big Postgres”.


New White Paper

So, I’ve been out of action for about a week, picked up a fever.

Pretty much more or less back to normal as of this morning and - fiiiiiiiiiiiiiiiiiiiiiiiiinally - a new white paper will be published tomorrow, “Multi-Version Concurrency Control and Serialization Isolation Failure”.

Thank God for that :-)

The main section, the Discussion, is currently about 27 pages long, so bring you brain and a cup of hot chocolate ;-)

(If it ends up over 30, make it mint hot chocolate.)



I contracted coronavirus at the very end of November, the evening of the 30th in fact.

I’ve had no problems of any kind breathing, so I’ve basically had the “lite” version.

The first three or four days I was pretty much wiped out - wrapped up in just about all the clothes I own and in bed. No energy, no focus, just busy fighting it off and by Friday I was coming back to normal and Saturday I was back to normal - except for one thing.

At that point (and that’s the usual time at which it happens), my sense of taste become pretty strongly distorted. Sense of smell seems unchanged.

Main problem far as I can tell is any sweet tastes are now perceived as an incredibly strong taste of artificial sweetener, and it’s repellent. Mangos, kiwis, strawberries, fruits, all off the menu. Chocolate too - the sweet flavours in the chocolate becoming unbearable.

What’s more of a surprise though is that this distortion of taste seems pretty much to have flattened my sense of appetite. I’ve basically not eaten now for about ten days.

I began experimenting a bit (remember I’m in quarantine, so it’s not so easy) and tried some Indonesian foods, and they were viable, which was much appreciated, but at the same time that’s a fairly oily cuisine and so really not something I’m accustomed to on an ongoing basis.

So one of the things I’m going to work on now is finding food I can eat, and trying to find some sources for the vitamins and minerals I’m missing.

I’ve been reading up on duration of disruption of taste, consensus seems to be about two weeks.

People who experience disruption of taste or smell normally have milder coronavirus symptoms, which has been the case for me.

The one other problem has been sleep. I don’t know why, but overnight I wake up about every 30 minutes, and that’s become pretty old pretty fast, believe me.



I’m well on the mend.

Sense of taste about 75%. Cough almost gone. Catching up on exercise. Discovered a new found love for breakfast cereals, since I can eat them and they taste okay :-)

On the weekend, I started the final write up of the AZ64 white paper. That’s been a looot of work.

I need to publish it along with a second white paper, which I have yet to write, which covers default encoding choices by Redshift.


Time Broken in Interleaved Tables

Started to investigate default encodings.

As a part of checking the default encoding selection rules work, I’ve discovered you cannot currently use the time data type in interleaved tables…!

create table table_1
  column_00  time  not null encode raw
interleaved sortkey( column_00 );

  error:  Assert
  code:      1000
  context:   false - 
  query:     0
  location:  z_compressor.cpp:396
  process:   padbmaster [pid=715]



I’ve just been working on some code which requires that I compare a date with the same date a year ago.

The natural operator to use is interval, like so;

where target_date = previous_year.target_date - interval '1 year'

Problem is, interval does not support month or year.

Most you get is week - and the bloody docs as ever are a danger to readers, because they contain an example using interval '52 week', as if that is the same as 1 year.

Problem is, dates are a nasty, tricky business, and not all years have 52 weeks. You can’t get a year by specifying 52 weeks. You need to actually specify a year, and let the underlying date library, which knows all about how tricky and nasty dates are, get it right.

So I think I’m going to have to do a whole bunch of messy code to physically construct a date which is one year earlier than the current date, maybe using date_trunc and working from 1900-00-00 and whatever…

Where the whole bloody point of interval is to avoid such God-awful mess - but interval hasn’t been implemented enough to actually do its job (and instead has an dangerous example, probably written off-the-cuff by the poor docs guy who doesn’t know about computing, and never has anyone technical check his work).



I’ve only just seen it, but AWS sent me an email on the 16th December, about Redshift.

Here’s the opening paragraph;

You are receiving this message because you have used one or more of the following Amazon Redshift spatial functions below. These spatial functions are changing and will not preserve backward compatibility with current function specification.

It looks like AWS track usage on a per-cluster, per-function call basis…


AWS have added a new data type to Redshift, VARBYTE.

One of the limitations;

You can’t use a VARBYTE column as a sort key or a distribution key for an Amazon Redshift table.

Well, if you can’t sort a data type, you can’t use it for Big Data. Correctly operated sorting is necessary for timely on SQL on Big Data; no sorting, no Big Data.

You could indirectly sort a VARBYTE column by having a second colmun, which you can sort, and arranging its order so that as it is sorted the VARBYTE column is sorted - but this is not built-in, native support for VARBYTE sorting; this is jury-rigged workaround managed by the user.

Anything without sorting isn’t Big Data, and so it’s… well, potentially useful, you could have aspects of your data set which are small data and which are queried and the query connects to Big Data tables; but it’s not a first class citizen of a Big Data database.

The docs of course are beyond silent on this matter; the notion that some RS functionality (in fact, quite a lot of what’s been added in the last few years) is not Big Data capable is utterly verboten and simply not spoken of.


Startup Failure

Wanted to get some Redshift work done.

Spun up a cluster - took 15 minutes to fail to start.

Shut down the cluster - took another 15 minutes.

30 minutes later, no work done at all.


Loading Varbyte

Something just occurred to me…

INSERT has a 16mb limit on the total length of the SQL.

COPY has a 4mb limit on the length of any one line.

A varbyte can be up to 1,024,000 bytes.

So if you’re using COPY and you have 5 max length varbyte in a column and you want to load the full 1,024,000 bytes into each - you can’t.

So there’s a hidden limit here with varbyte, which you need to take into account when making table DDL.


User Experience.

The function avg works with intervals; stddev_pop does not.

Going to have to figure out how to convert timestamps to UNIX seconds (the worker nodes have no function for this, but I may be able to do what I need as leader-node only query).

Recently had much the same with interval; doesn’t support months or years. Had to figure out how to do it myself.

Redshift is not a smooth, consistent experience, and I see no improvement in this over time - I only see large new features being bolted onto the side.

stddev_pop() With Intervals

So, I’m trying to get stddev_pop() to work with intervals.

The SQL I’m starting with is this (system table column names, don’t look at me!);

stddev_pop( endtime - starttime )

I need to convert both timestamps into integers or floats, and then subtract.

My thought is to convert into UNIX timestamps, but normally UNIX timestamps are second resolution and I need microseconds (because the interval can be very small).

So if we have a timestamp like this '2021-12-31 03:28:30.123456'::timestamp, I can find a way to get the UNIX timestamp (although I most absolutely and definitely could NOT find this from the official docs) but to get the fractional part, I need to issue two function calls, like this;

dev=# select extract( 'milliseconds' from '2021-12-31 03:28:30.123456'::timestamp );
(1 row)

dev=# select extract( 'microseconds' from '2021-12-31 03:28:30.123456'::timestamp );
(1 row)

So the final code to get UNIX time with fractional part is;

extract(epoch from endtime)::float8 + extract('milliseconds' from endtime)::float8 / 100 + extract('microseconds' from endtime)::float8 / 100000

The final line of code becomes;

stddev_pop( (extract(epoch from endtime)::float8 + extract('milliseconds' from endtime)::float8 / 100 + extract('microseconds' from endtime)::float8 / 100000) - (extract(epoch from starttime)::float8 + extract('milliseconds' from starttime)::float8 / 100 + extract('microseconds' from starttime)::float8 / 100000) )

Now, it might be me; maybe there is a better way; but I’m not seeing it, and I know RS pretty well. I’ve said it before, and I’ll say it again - Redshift is a piece of junk.

I don’t want bloody AQUA or Postgres table access - I want basic APIs for basic tasks which are sane, work, and are fully implemented.

As an aside, I’ve found something quite strange while experimenting with time/date functions.

If you call extract(), the result is titled date_part, which to me implies extract is an alias (with a different API) for date_part;

dev=# select extract( 'milliseconds' from '2021-12-31 03:28:30.123456'::timestamp );
(1 row)

But if you actually call date_part, you get a result titled pgdate_part (because date_part is an alias for pgdate_part, it’s - gasp - documented!);

dev=# select date_part( 'microseconds', '2021-12-31 03:28:30.123456'::timestamp );
(1 row)

But you get different results from the different calls!

Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers