Redshift Research Project

Blog

2023-08-01

Weather Report

Region	Node Type	Notes
`ap-south-1`	`dc2.large`	Benchmark read-disk has returned to normal (0.06s vs 0.27s).
`eu-west-2`	`dc2.large`	Benchmark read-disk has returned to normal (0.06s vs 0.68s).
`eu-west-3`	`dc2.large`	Fastest ever disk-read-write, at 3.21s (normally 4s to 6s).
`me-south-1`	`ra3.xlplus`	Network benchmark performance remaining extraordinarily slow (8.56s, vs the usual 2.1s).
`us-west-1`	`dc2.large`	Benchmark read-disk has returned to normal (0.06s vs 0.34s).

https://www.redshiftresearchproject.org/cross_region_benchmarks/index.html

2023-08-02

XFCE

Thunar (the XFCE window manager) has a bug, where rarely, when you close a terminal window with “exit”, every single Thunar window will disappear.

It just happened, and I had a metric ton of open state.

2023-08-05

Redshift Bulletin

A new Redshift version has been released, 1.0.54239, superseding 1.0.54052.

Full system table dump here;

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/index.html

Diffs from previous version here;

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54052_to_1.0.54239/index.html

Official Patch Notes and Comments

The AWS official docs indicate this is “patch 177”, where patch 177 contains version 1.0.54239 (no, I don’t really know how that works either).

https://docs.aws.amazon.com/redshift/latest/mgmt/cluster-versions.html#cluster-version-177

The list of documented changes, with my observations, are;

Adds functionality that Using the GROUPING function without a GROUP BY clause or using grouping operations in a WHERE clause results in an error.
Adds support to catch a divide by zero error and handled inside the exception-handling block in stored procedures
Fixes the case-sensitive identifier documented at enable_case_sensitive_identifier to now work with a MERGE statement.

MERGE is a macro, which emits other SQL commands - SELECT, UPDATE, etc. Sounds like the SQL being emitted was flawed for case sensitivity, and this has been fixed.
Fixes the bug that a query on function pg_get_late_binding_view_cols() might get ignored occasionally. Such queries can always be canceled now.

Wat? what does “ignore” mean? the query just doesn’t happen? and how does “ignore” turn into “can be cancelled”?
Adds the SYS_MV_STATE view, which contains a row for every state transition of a materialized view. SYS_MV_STATE can be used for MV refresh monitoring for Amazon Redshift Serverless and Amazon Redshift provisioned instances.

The only use case for MVs is when you’re using Redshift incorrectly, and performance is terrible; MVs will be better than that. Using Redshift correctly is the correct solution, and it a zillion light years better than MVs.

This view can be found here;

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_mv_state.html
Adds the SYS_USERLOG view, which records details for the changes to a database user for Create user, Drop user, Alter user (rename), Alter user (alter properties).

There are as we might expect more and more SYS tables/views. Presumably they are intended to take over from the existing views. I do not get the feeling the SYS views are being organized or designed any more skilfully than the existing views; I see a second big ad hoc mess forming up, along with the first big ad hoc mess. All the SYS views however refuse to show their underlying SQL, so this is a growing problem for users.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_userlog.html
Adds the SYS_COPY_REPLACEMENTS view, which displays a log that records when invalid UTF-8 characters were replaced by the COPY command with the ACCEPTINVCHARS option.

I don’t get it. Why bother? this exists already at STL_REPLACEMENTS. The only reason is that Serverless is restricted to SYS. Why not just make STL visible to Serverless users?

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_copy_replacements.html
Adds the SYS_SPATIAL_SIMPLIFY view, which contains information about simplified spatial geometry objects using the COPY command.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_spatial_simplify.html
Adds the SYS_VACUUM_HISTORY view, which you can use to see the details and results of VACUUM operations.

Again, exists already. Seems to be part of an long term effort by AWS to hide all system table SQL from users, by replacing the existing system tables with SYS, and then I would guess eventually ditching the existing system tables. In any event, you should not be using Serverless; PDF out soon.

Also, remember, the way this view presents vacuum is misleading. It presents vacuum as a table level operation - so you see number of rows sorted, unsorted, etc, for the table. This is wrong. A table is as fast as its slowest slice - vacuum is a slice level operation. You can have a table 99% sorted and it can be slow as a dog because you one totally unsorted slice. Vacuum itself is not a table level operation, but operates on groups on slices at a time.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_vacuum_history.html
Adds the SYS_SCHEMA_QUOTA_VIOLATIONS view to record the occurrence, timestamp, XID, and other useful information when a schema quota is exceeded.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_schema_quota_violations.html
Adds the SYS_RESTORE_STAT view, which you can use monitor the redistribution progress of each table in the cluster during asynchronous classic resize.

It’s SYS_RESTORE_STATE. The docs lost the trailing ‘E’.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_restore_state.html
Adds the SYS_EXTERNAL_QUERY_ERROR view to return information about Redshift Spectrum scan errors.

All the stings in here are char, which are ASCII only. If you have UTF-8 filenames, you’ll end up with UTF-8 (quite possibly multi-byte chars) in a char column. I don’t know why the devs keep doing this; it’s a mistake.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.sys_external_query_error.html
Adds the tag parameter to the CREATE MODEL command, so you can now track training costs with autopilot training jobs.
Fixes a bug that prevented queries from using concurrency scaling to write data to tables when the source table is a data sharing table.

I recently found a view which lists a metric ton of cases and situations where CSC can’t be used.
Adds support for using user roles with parameter groups in workload management (WLM).

Good. Previously, WLM rules could act upon user groups or query groups.
Adds support for Amazon Redshift custom domain names (CNAME) for Amazon Redshift clusters.
Improves performance for data sharing or concurrency scaling queries, especially with concurrent data changes on the producer or when offloading to a concurrency scaling instance attached to the consumer.

Interesting. I’ve always wanted to investigate how long it takes a table which has been updated to become usable for CSC clusters; AWS say NOTHING about this, but obviously, since the CSC clusters are using a copy of the tables, when the main cluster makes changes, those changes need to propagate outwards. How long does it take? what happens if you keep making changes? does it mean in effect that table cannot be used for CSC?
Improves performance for data sharing queries running on consumers when vacuum jobs are running on the producer.

Sounds a bit like maybe they’ve done something to improve locking - knowing RS and the docs, I wonder “improvement” means before it was actually blocked, and now it’s not? but I have no idea.
Adds preview support for Apache Iceberg, enabling customers to run analytics queries on Apache Iceberg tables within Amazon Redshift.

Good.
Adds support for automatic mounting of AWS Glue Data Catalog, making it easier for customers to run queries in their data lakes.

I’m not sure what this actually means, in practice or really in principle.

Observations

There’s a number of new tables, I think all being the tables which the new views are using.

None of the tables are accessible, except for a new STV table, stv_xrestore_state, which is accessible to an admin user, and presumably is the table (table-like object - it’s an STV) which is used by the new SYS_RESTORE_STATE, which is a view.

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.stcs_segment_ce_stats.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.stcs_user_vacuum_history.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.stll_segment_ce_stats.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.stll_user_vacuum_history.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239/pg_catalog.stv_xrestore_state.html

There are additionally a number of tables (and their matching views) which have had an extra column or two added - two many to process manually; I’m going to need to write some code which produces an automated, column-level diff which is easy to read.

That’ll come after the Serverless PDF, and Combobulator (Redshift PII-safe management web-app AMI) going on the market, both of which should happen this week.

2023-08-07

New Forum (and Mailing List)

I am very, very pleased to say that the new forum is up.

I rewrote the back-end, moving away from inn2 (with a front-end) to my own Postgres database.

It took a day to write the new back-end, and then a week to get into production, as improvements were made to my Python libraries, and also I changed my approach to CSS, which meant a lot of collateral work, and quite a bit of change in the mailing list back-end code.

Going to take a well-deserved break for a bit :-)

Next - Combobulator goes live, and then the Serverless PDF!

2023-08-16

Weather Report

Region	Node	Notes
`eu-south-1`	`ra3.xlplus`	Disk-read-write performance improved significantly, from the usual ~3.3s to 2.40s. This is the fatest drw benchmark seen in this region.
`us-east-2`	`ra3.xlplus`	Disk-read-write performance was slow, with huge standard deviation (3.54s/2.55s), compared to the usual ~2.5s/0.02s. Disk-read was also slow, at 0.12 rather than 0.05.
`us-west-1`	`dc2.large`	Disk-read-write slow, with very large standard deviation (5.10s/3.05s - last benchmark was 3.27s/0.10s), but this region is usually like this - however, in the previous benchmark, it was not and looked like a normal fast region; and every now and then, it is not.
`us-west-1`	`ra3.xlplus`	Disk-read-write slow, with very large standard deviation (3.46s/2.43s - last benchmark was 2.45/0.03s), but this region is usually like this - however, in the previous benchmark, it was not and looked like a normal fast region; and every now and then, it is not.

https://www.redshiftresearchproject.org/cross_region_benchmarks/index.html

New Redshift release, version 1.0.54899

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54899/index.html

https://www.redshiftresearchproject.org/system_table_tracker/1.0.54239_to_1.0.54899/index.html

A small set of new tables and their veneer views, all relating to “integration”, which is new to me.

AWS Redshift release page;

https://docs.aws.amazon.com/redshift/latest/mgmt/cluster-versions.html

Says nothing about it - the long list of changes is for the previous version, not this version.

I still don’t get this patch/version dichotomy. Anyone?

Mailing List

On the 11th August, someone tried to subscription to the advisory mailing list using the email address “******@lbnt.nt” (i’ve removed the left part for privacy), which presumably has a typo in it, as it’s an invalid email address.

If that was you, please try again!

2023-08-22

Combobulator

I have one bit of dev work to do, on the timestamp-range picker, and then I need to take some screenshots, and then make a demo system and populate it with data and point a public Combobulator at it - and then Combobulator goes GA (General Availability).

The money from Combobulator, minus business costs, goes to the Ukrainian military.

2023-08-27

Max Ganz’s Redshift Combobulator is now GA

I have for some time been working on a browser-based app which is a management console for Redshift.
It is not cloud based - you run it locally, no network access, no superuser, and no privs to read your actual tables; the design is intended to make it viable to use with the most sensitive of data.
While the Russo-Ukraine war continues, all income barring business costs goes to the Ukrainian military.

The home page for Combobulator is here;

https://www.redshiftresearchproject.org/combobulator/index.html

The live demonstration system is here;

https://34.200.189.75

Combobulator uses HTTPS, but where there is no IP name, a self-signed SSL certificate has to be used; you’ll need to click through the warnings from the browser.

2023-08-31

Combobulator Trial for Publicly Accessible Clusters

So, I assumed everyone who would use Combob would have their Redshift cluster locked down, no external access.

It occurred to me today actually there may be some users who have publicly accessible clusters.

I’ve put up a second index page, which is open use; follow the instructions, set up the Combob user, grant the necessary couple of privs, and you’ll then be pointing it at your own system.

https://34.200.189.75/index_open.py

Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers