I’ve made a list of some improvements I’d like to see in Redshift.
float8data types’ sorting value computed correctly, so their full bit pattern is used, rather than only their integer value clamped to
sort autothe default).
I have almost never used CTEs. I’ve never had a need to.
I’m now on a gig (contract) where I have learned something; it looks to me like one of the reasons CTEs are used is in fact because the underlying data is badly disorganized and non-relational.
You end up needing CTEs to get from the starting data to something which works with the relational data model.
Just found a bug, either in
octet_length() or in the
docs, since they differ.
Docs are here.
Length calculations do not count trailing spaces for fixed-length character strings but do count them for variable-length strings.
My cluster sayeth;
dev=# select column_01 from table_1; column_01 ------------------ oink (1 row) dev=# select octet_length(column_01) from table_1; octet_length -------------- 16 (1 row)
column_01 is a
I’m afraid the white paper for this week is late. I hope to publish on Monday. I’ve been investigating random numbers, and there’s been much more development of investigative method than usual, in part because of unrelated discoveries during investigation, one of which is central to Redshift architecture and which is described (otherwise the white paper can’t make any sense), but not investigated (as this white paper is about random numbers), and in part because some of the tests are time consuming, as large numbers of single queries must be issued.
I very rarely use
I’m using it now, bit of work for client, I’ve just noticed when you output to CSV, it forces all column titles to lower case. That’s a problem, because I’m trying to exactly match an existing CSV.
It’s been another full evening and early morning trying to figure out why two of the PRNG tests are not producing the same results on each run - which they should do, since seed is set to 0.
I ended up going down into the lowest level of debugging and I’ve finally figured it out. I believe the conclusion is that due to design flaws in the PRNG, these tests cannot produce the same results, even with the same seed.
Exactly why will be explained in the white paper.
I have finished the investigative work.
I now need to generate the bitmap images of the various PRNG outputs, which I think will take about four or five hours, and write up the results.
Last night, overnight, the final run of the evidence-generating script occurs, which took just over 19,000 seconds.
I’m now fully into write-up mode; unless I discover a flaw or bug or something in the data which is strange and had been overlooked, dev work has finished.
Well, I’m still working on the PRNG white paper.
Every time I think I’m done, there’s some loose thread, which I have to pull on to check, and then it turns everything I thought I understood upside-down. Again =-)
Fortunately, the weekend now, so I have much more time to work on this.
Doing my very best to publish tomorrow (Sunday 17th Oct).
You know, I thought it was done and I would be publishing Saturday morning, for once, but again there was yet another thread which didn’t quite make sense, and pulling on it led to another round of everything I thought I understood being turned upside-down :-)
Finally, though, all the pieces fit and there are no more threads to pull on.
Three weekends and most evenings during the two weeks. Problem is I’m contracting at the moment, so I’m working full-time, so I don’t have much time left over to work on research. With most of the white papers, I’m basically tidying up and making a script for material I’ve investigated over the last few years, but pinning down exactly what was wrong with the PRNG was new, and so it took as long as it took.
The latest release of Redshift has moved from GCC 3 (from about 2008) to GCC 7.3.0.
This is a scary change to make on any large (and I suspect unmaintainable) code base which needs to be reliable and where you have large numbers of existing customers, but it’s absolutely necessary, something which I’ve been unhappy about for a very long time, and I’m very, very, very glad to see it’s been done.
Unfortunately, in doing so, the output from
has changed, to this;
dev=# select version(); version ----------------------------------------------------------- Redshift 1.0.32574 on Amazon Linux, compiled by gcc-7.3.0 (1 row)
It used to look like this;
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.28422
This is going to break a lot of existing code, which is going to be processing this string using fixed offsets of various kinds. I’ve seen a confirmed report that it’s breaking SQL Alchemy.
I’m a bit alarmed that the team which moved from GCC 3 to 7 is also the team which did not realise changing the version string would cause problems. But it might have been they did know this, and this change is accidental.
So, new clusters are now returning the old style of version string. The Redshift version has risen, but the other version numbers are unchanged (and so are now in fact incorrect, but given the fall-out and the need for speed, changing them now would be imprudent).
Existing clusters which produce the problem version string can be fixed by Support, by them making a parameter change (which you cannot make) followed by a reboot.
It’s that or shut down your cluster and start it up again with the fixed version which is now in use.
Support are now aware of the issue, and if you contact them, they’ll know what you’re talking about
You can either contact Support, or if you don’t have support (I don’t :-), tack your name on to the end of this thread, which Support are watching;
In Redshift, there are one or two ways to get hold of the SQL for a
view. There are functions which get the SQL, given a view
oid, and the system table
pg_views lists the
SQL (almost certainly using one of the functions).
In all cases, until today, the function returned either SQL or NULL.
Now the RS devs have it returning arbitrary error text in that column - which SQL parsers barf on, oddly enough.
The same thing happens in the system tables showing query texts.
This is absolutely and categorically wrong.
What you do here is add a column; you don’t start using an existing column in a wholly new way which breaks existing code. This is bloody obvious.
This problem is of course exactly the same in its nature as the problem we just saw earlier today where the version string format was changed.
This is a database - absolutely central to large, commercial systems. It’s practically an OS kernel in terms of its significance and criticality at the center of things. Users should never find any code broken because of a version upgrade.
I am of the view, from what I’ve seen over time, that the devs do not have the sense and experience to know that changes like this are wrong, cannot be done, and will cause problems if they are done.
As you will have noticed from the recent flurry of posts, there’s a new RS version out.
I’ve published the system table dump over in the System Table Explorer.
It looks like the recent version string change issue breaks the AWS RS JDBC driver - which is to say, their own driver.
I’ve read one direct report, citing driver version 188.8.131.52 with RS version 1.0.32574, and that the previous version works fine, and I have a link to the relevant source in the driver;
It’s obviously not great that the AWS RS team were unaware of the behaviour of their own driver, but it might be a different team make the driver.
However, this leads to another question - does this not imply that the RS test suite does not contain a check that JDBC connectivity is working?
So, this week’s white paper turned out to be about and only about
I have investigated about half the encodings.
I wanted to produce a white paper covering all encodings, but this was utterly unrealistic over the course of a weekend :-)
I completed the introduction (programmatic enumeration of data types
and encodings) and the investigation of
raw, by the end of
Friday evening, and then got started on
That took until about 5pm Sunday, and that was one of the encodings I’d already investigated! I learned quite a bit more in the process of making the white paper.
So I’m thinking to publish one white paper per encoding and once they’re all done, bind them together into a single white paper, the original idea, which covers all encodings.
I’ve now finally read the docs for SUPER. I think it’s a stupendously bad idea. The whole point of SQL is strict typing to reduce data bugs. If you make a type which has relaxed typing, you are now relying on software engineers to write code with no bugs, as opposed to relying on the SQL engine with strict typing to tell them when they have made bugs.
So I’ve been spending a bit of time now to get to know this data
geometry (I’ve done GIS
before, so it’s more getting to know what Redshift is doing with
I’m now trying to make some dummy data for
populate a test table, one of the columns is
So, there’s only four
HLL functions. One takes an
argument indicating what data to examine and returns an
int8 - it returns the cardinality. Then there’s a function
which takes a
hllsketch and gives you the cardinality from
it. That leaves two more - one combines two
types and combines them, returning a
hllsketch (so you need
two of them in the first place) and, finally, we have
hll_create_sketch(), which takes an argument of what data
to examine and returns a
Problem is that’s an aggregate function. I can’t use it in
VALUES when there’s more than row
It’d be like doing this;
insert into test_table( column_1 ) values ( max(10) ), ( max(5) );
ERROR: cannot use aggregate function in multi-row VALUES.
Problem is, there is no other way to generate values of this data type.
So I think what I have to do now is an
SELECT, but where the
SELECT is one row per
UNIONing them all together, and then I think it’ll
work. Not nice though.
Home 3D Друк Blog Bring-Up Times Combobulator Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker System Table Tracker The Known Universe Twitter White Papers