Redshift added support for materialized views near the end of 2019.
Materialized views are a method for pre-computing the results of a query, so that when the results come to be used, the time and work to compute them has already been expended.
A Redshift materialized view is defined in the same way as a normal view, as an SQL statement, but unlike a normal view - where the name of the view is replaced by its SQL in the text of an SQL query issued against the view - a materialized view actually produces the rows of the SQL defining the materialized view and stores them on disk, in a table, and a query issued against the materialized view actually uses that table.
The concept of pre-computing results is useful and widespread, but I aver there are in Redshift’s implementation numerous design and implementation flaws which ensure that materialized views are the exact same amount of development work and complexity, but with much less performance, than manually creating and maintaining your own pre-computed results, and as such, there are no circumstances where it is correct to use them.
This document then examines the internal implementation of materialized views and assesses and critiques their behaviour.
There are quite a few tests.
An empty single column table is created, and then a materialized view, with auto refresh on, is created, which uses that table.
The table then has ten exactly full blocks inserted, and then the materialized view is monitored, once per second, to time how long auto-refresh takes.
The test is then repeated, but now immediately before each check of the materialized view, a single row is inserted into the underlying table, on every check.
This is repeated five times, to give a range of auto-refresh times. Since this is not a performance test in the usual sense, but more of a discovery, where extremes matter, all five times form the results.
A normal, empty table is created, and then a materialized view is created using that table. Auto refresh is off, and a window function is used in the materialized view definition to ensure a full refresh is in use. The SQL command sequence induced by creating the materialized view is captured and examined.
Then a single row is inserted into the underlying table, and the materialized view is refreshed. The SQL command sequence induced by refreshing the materialized view is captured and examined, to obtain insight into the internal implementation of materialized views.
The test is then repeated, but without a window function, to allow incremental refresh, and captures the SQL command sequence issued in this case.
A table is created with one column per data type. A materialized view is created, using this table. The encodings selected by Redshift are then taken from the system tables.
Two tables each with a single column is created. A full and an incremental materialized view are created using one table in their SQL, and a second full and incremental materialized view are created using both tables in their SQL. In all cases, the full list of columns in the materialized view is taken from the system tables.
A table with a single column is created. An incremental refresh materialized view is created using this table. A sequence of INSERT
and REFRESH MATERIALIZED VIEW
occur, where it is noted if the refreshes are full or incremental. The table ends up with some sorted and some unsorted rows and a VACUUM
is issued on the table, and then one more REFRESH
, again noting the refresh type of the final refresh.
A table with a single column is created. An incremental refresh materialized view is created using this table. A sequence of INSERT
and REFRESH MATERIALIZED VIEW
occur, where it is noted if the refreshes are full or incremental. The table ends up with some sorted and some unsorted rows and a VACUUM
is issued on the materialized view, noting the number of sorted and unsorted rows before and after. Next, a VACUUM
is issued on the table underlying the materialized view, noting the number of sorted and unsorted rows before and after, and also then issuing a REFRESH
, to see if this VACUUM
induced a full refresh.
Test duration was 2,948 seconds.
All times are in seconds.
Often the first and second iterations vary considerably, and then times settle to about 55 seconds. Note though the test fully tears down the test environment between each test, so it’s a little strange we seem to see state persisting over test runs.
Iteration | Delay |
---|---|
0 | 11.06 |
1 | 54.31 |
2 | 53.53 |
3 | 55.13 |
4 | 54.03 |
These results are typical. Making the table busy, by inserting to it, seems to throw off the auto-refresh algorithm. On the other hand, I also saw exactly one run which looked exactly like the “Count Only” test, above; first run about 10 seconds, rest about 55 seconds.
Iteration | Delay |
---|---|
0 | 240.62 |
1 | 54.23 |
2 | 1294.94 |
3 | 542.90 |
4 | 176.52 |
These are the encoding choices made by Redshift for a materialized view, given empty tables being used by the SQL for the materialized view.
Ordinal | Name | Data Type | Encoding |
---|---|---|---|
-9 | deletexid | int8 | raw |
-8 | insertxid | int8 | raw |
-7 | tableoid | oid | raw |
-6 | cmax | cid | raw |
-5 | xmax | xid | raw |
-4 | cmin | cid | raw |
-3 | xmin | xid | raw |
-2 | oid | oid | raw |
-1 | ctid | tid | raw |
1 | column_01 | bool | raw |
2 | column_02 | char(256) | lzo |
3 | column_03 | char(64) | lzo |
4 | column_04 | date | az64 |
5 | column_05 | float4 | raw |
6 | column_06 | float8 | raw |
7 | column_07 | geometry | raw |
8 | column_08 | hllsketch | raw |
9 | column_09 | int2 | az64 |
10 | column_10 | int4 | az64 |
11 | column_11 | int8 | az64 |
12 | column_12 | numeric(19,0) | az64 |
13 | column_13 | numeric(38,0) | az64 |
14 | column_14 | varchar(256) | lzo |
15 | column_15 | time | az64 |
16 | column_16 | timestamp | az64 |
17 | column_17 | timestamptz | az64 |
18 | column_18 | timetz | az64 |
19 | column_19 | varchar(64) | lzo |
20 | table_1_oid | int8 | az64 |
21 | num_rec | int4 | az64 |
“Materialized view mv_1 was incrementally updated successfully” is given for an incremental refresh.
“Materialized view mv_1 was recomputed successfully” is given for a full refresh.
We see here a VACUUM
on the source table forces a full refresh.
1. create table table_1
(
column_1 bigint not null encode raw
)
diststyle even
compound sortkey( column_1 );
2. create materialized view mv_1
diststyle even
compound sortkey( column_1 )
auto refresh no
as
select
*
from
table_1;
3. insert into table_1( column_1 ) values ('1'), ('2'), ('3'), ('4'), ('5'), ('6');
4. refresh materialized view mv_1;
5. INFO: Materialized view mv_1 was incrementally updated successfully.
6. insert into table_1( column_1 ) values ('1'), ('2'), ('3'), ('4'), ('5'), ('6');
7. refresh materialized view mv_1;
8. INFO: Materialized view mv_1 was incrementally updated successfully.
9. insert into table_1( column_1 ) values ('1'), ('2'), ('3'), ('4'), ('5'), ('6');
10. vacuum full table_1 to 100 percent;
11. refresh materialized view mv_1;
12. INFO: Materialized view mv_1 was recomputed successfully.
The key rows here are the counts of sorted and unsorted rows.
We can see on lines 9, 10 and 11 a VACUUM
of the materialized view did nothing.
We can see on lines 12 and 13, VACUUM
of the underlying table sorted the underlying table, and we then see on lines 14 and 15, this did not then force a full refresh.
1. create table table_1
(
column_1 bigint not null encode raw
)
diststyle even
compound sortkey( column_1 );
2. create materialized view mv_1
diststyle even
compound sortkey( column_1 )
auto refresh no
as
select
*
from
table_1;
3. insert into table_1( column_1 ) values ('1'), ('2'), ('3'), ('4'), ('5'), ('6');
4. refresh materialized view mv_1;
5. INFO: Materialized view mv_1 was incrementally updated successfully.
6. insert into table_1( column_1 ) values ('1'), ('2'), ('3'), ('4'), ('5'), ('6');
7. refresh materialized view mv_1;
8. INFO: Materialized view mv_1 was incrementally updated successfully.
9. 6 sorted rows, 6 unsorted rows
10. vacuum full mv_1 to 100 percent;
11. 6 sorted rows, 6 unsorted rows
12. vacuum full mv_tbl__mv_1__0 to 100 percent;
13. 12 sorted rows, 0 unsorted rows
14. insert into table_1( column_1 ) values ('1'), ('2'), ('3'), ('4'), ('5'), ('6');
15. refresh materialized view mv_1;
16. INFO: Materialized view mv_1 was incrementally updated successfully.
The row_number
column is part of the source table, it’s used to force a full refresh.
We see here there are no additional columns.
Ordinal | Name | Data Type | Encoding |
---|---|---|---|
-9 | deletexid | int8 | raw |
-8 | insertxid | int8 | raw |
-7 | tableoid | oid | raw |
-6 | cmax | cid | raw |
-5 | xmax | xid | raw |
-4 | cmin | cid | raw |
-3 | xmin | xid | raw |
-2 | oid | oid | raw |
-1 | ctid | tid | raw |
1 | row_number | int8 | az64 |
2 | column_1 | int8 | raw |
The row_number
column is part of the source table, it’s used to force a full refresh.
We see here there are no additional columns.
Ordinal | Name | Data Type | Encoding |
---|---|---|---|
-9 | deletexid | int8 | raw |
-8 | insertxid | int8 | raw |
-7 | tableoid | oid | raw |
-6 | cmax | cid | raw |
-5 | xmax | xid | raw |
-4 | cmin | cid | raw |
-3 | xmin | xid | raw |
-2 | oid | oid | raw |
-1 | ctid | tid | raw |
1 | row_number | int8 | az64 |
2 | column_1 | int8 | raw |
The numbers of tables, views and procedures before and after a CREATE MATERIALIZED VIEW
, for a full refresh materialized view.
Before | After | |
---|---|---|
Tables | 947 | 948 |
Views | 547 | 548 |
Procs | 0 | 1 |
pg_views
(Full Refresh)The SQL source in pg_views
for a full refresh materialized view. A normal view contains only the SQL statement which forms the view.
create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select row_number() over ( partition by column_1 order by column_1 ), column_1 from table_1;
The SQL commands induced by a CREATE MATERIALIZED VIEW
with a full refresh materialized view.
Event Time | System Table | SQL |
---|---|---|
2021-09-05 21:28:33.306758 | stl_querytext | create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select row_number() over ( partition by column_1 order by column_1 ), column_1 from table_1; |
2021-09-05 21:28:33.323018 | stl_ddltext | create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select row_number() over ( partition by column_1 order by column_1 ), column_1 from table_1; |
2021-09-05 21:28:33.324324 | stl_ddltext | create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select row_number() over ( partition by column_1 order by column_1 ), column_1 from table_1; |
2021-09-05 21:28:33.326914 | stl_querytext | padb_fetch_sample: select count(*) from mv_tbl__mv_1__0 |
The SQL commands induced by a REFRESH MATERIALIZED VIEW
with a full refresh materialized view.
Event Time | SQL |
---|---|
2021-09-05 21:28:35.975696 | REFRESH MATERIALIZED VIEW mv_1; |
2021-09-05 21:28:35.977569 | CALL public.mv_sp__mv_1__1_0(7434, 7439, 1, ‘(0)’); |
2021-09-05 21:28:35.989098 | CREATE TABLE public.mv_tbl__mv_1__0__tmp BACKUP YES DISTSTYLE EVEN COMPOUND SORTKEY(2)AS ( SELECT ROW_NUMBER() OVER (PARTITION BY “table_1”.“column_1” ORDER BY “table_1”.“column_1” ASC NULLS LAST) AS “row_number”, “table_1”.“column_1” AS “column_1” FROM “public”.“table_1” AS “table_1” ) |
2021-09-05 21:28:36.006137 | Analyze mv_tbl__mv_1__0__tmp |
2021-09-05 21:28:36.006278 | padb_fetch_sample: select * from mv_tbl__mv_1__0__tmp |
2021-09-05 21:28:36.082477 | CREATE OR REPLACE VIEW public.mv_1 AS SELECT * FROM public.mv_tbl__mv_1__0__tmp |
2021-09-05 21:28:36.084287 | DROP TABLE public.mv_tbl__mv_1__0 |
2021-09-05 21:28:36.085588 | ALTER TABLE public.mv_tbl__mv_1__0__tmp RENAME TO mv_tbl__mv_1__0 |
2021-09-05 21:28:36.087398 | CREATE OR REPLACE VIEW public.mv_1 AS SELECT * FROM public.mv_tbl__mv_1__0 |
2021-09-05 21:28:36.090246 | COMMIT |
mv_sp__mv_1__1_0( recompute bool, end_xid int8, start_xid int8, finished_xid_list varchar )
begin
if recompute then
create table public.mv_tbl__mv_1__0__tmp backup yes diststyle even compound sortkey(2)as ( select row_number() over (partition by "table_1"."column_1" order by "table_1"."column_1" asc nulls last) as "row_number", "table_1"."column_1" as "column_1" from "public"."table_1" as "table_1" )
create or replace view public.mv_1 as select * from public.mv_tbl__mv_1__0__tmp;
drop table public.mv_tbl__mv_1__0;
alter table public.mv_tbl__mv_1__0__tmp rename to mv_tbl__mv_1__0;
create or replace view public.mv_1 as select * from public.mv_tbl__mv_1__0;
else
delete from public.mv_tbl__mv_1__0;
insert into public.mv_tbl__mv_1__0( select row_number() over (partition by "table_1"."column_1" order by "table_1"."column_1" asc nulls last) as "row_number", "table_1"."column_1" as "column_1" from "public"."table_1" as "table_1" );
end if;
end;
Incremental materialized views have additional columns. Here we see the additional num_rec
column, and then one further column, table_1_oid
.
Ordinal | Name | Data Type | Encoding |
---|---|---|---|
-9 | deletexid | int8 | raw |
-8 | insertxid | int8 | raw |
-7 | tableoid | oid | raw |
-6 | cmax | cid | raw |
-5 | xmax | xid | raw |
-4 | cmin | cid | raw |
-3 | xmin | xid | raw |
-2 | oid | oid | raw |
-1 | ctid | tid | raw |
1 | column_1 | int8 | raw |
2 | table_1_oid | int8 | az64 |
3 | num_rec | int4 | az64 |
Incremental materialized views have additional columns. Here we see the additional num_rec
column, and then two further columns, table_1_oid
and table_2_oid
; in general there is one additional column per table in the SQL forming the materialized view.
Ordinal | Name | Data Type | Encoding |
---|---|---|---|
-9 | deletexid | int8 | raw |
-8 | insertxid | int8 | raw |
-7 | tableoid | oid | raw |
-6 | cmax | cid | raw |
-5 | xmax | xid | raw |
-4 | cmin | cid | raw |
-3 | xmin | xid | raw |
-2 | oid | oid | raw |
-1 | ctid | tid | raw |
1 | column_1 | int8 | raw |
2 | table_1_oid | int8 | az64 |
3 | table_2_oid | int8 | az64 |
4 | num_rec | int4 | az64 |
The numbers of tables, views and procedures before and after a CREATE MATERIALIZED VIEW
, for an incremental refresh materialized view.
Before | After | |
---|---|---|
Tables | 947 | 948 |
Views | 547 | 548 |
Procs | 0 | 1 |
pg_views
(Incremental Refresh)The SQL source in pg_views
for an incremental refresh materialized view. A normal view contains only the SQL statement which forms the view.
create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select column_1 from table_1;
Event Time | System Table | SQL |
---|---|---|
2021-09-05 21:28:40.468523 | stl_querytext | create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select column_1 from table_1; |
2021-09-05 21:28:40.484168 | stl_ddltext | create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select column_1 from table_1; |
2021-09-05 21:28:40.485893 | stl_ddltext | create materialized view mv_1 diststyle even compound sortkey( column_1 ) auto refresh no as select column_1 from table_1; |
2021-09-05 21:28:40.491842 | stl_querytext | padb_fetch_sample: select count(*) from mv_tbl__mv_1__0 |
Event Time | SQL |
---|---|
2021-09-05 21:28:41.664228 | REFRESH MATERIALIZED VIEW mv_1; |
2021-09-05 21:28:41.665943 | CALL public.mv_sp__mv_1__1_0(7475, 7480, 0, ‘(0)’); |
2021-09-05 21:28:41.678324 | INSERT INTO “public”."mv_tbl__mv_1__0" (SELECT “table_1”.“column_1” AS “column_1”, CAST(“table_1”.“oid” AS INT8) AS “table_1_oid”, CAST(1 AS INT4) AS “num_rec” FROM “public”.“table_1” AS “table_1” WHERE ((CAST(“table_1”.“insertxid” AS INT8) > 7475) OR CAST(“table_1”.“insertxid” AS INT8) IN (0)) AND ((CAST(“table_1”.“insertxid” AS INT8) <= 7480) AND (CAST(“table_1”.“deletexid” AS INT8) > 7480))) |
2021-09-05 21:28:41.772182 | DELETE FROM “public”."mv_tbl__mv_1__0" USING (SELECT “table_1”.“column_1” AS “column_1”, CAST(“table_1”.“oid” AS INT8) AS “table_1_oid”, CAST(-1 AS INT4) AS “num_rec” FROM “public”.“table_1” AS “table_1” WHERE (CAST(“table_1”.“insertxid” AS INT8) <= 7475) AND (NOT CAST(“table_1”.“insertxid” AS INT8) IN (0) AND (((CAST(“table_1”.“deletexid” AS INT8) > 7475) OR CAST(“table_1”.“deletexid” AS INT8) IN (0)) AND (CAST(“table_1”.“deletexid” AS INT8) <= 7480)))) AS "mv_tbl__mv_1__0__deletes" WHERE "mv_tbl__mv_1__0“.”table_1_oid" = "mv_tbl__mv_1__0__deletes“.”table_1_oid" |
2021-09-05 21:28:41.845141 | COMMIT |
mv_sp__mv_1__1_0( recompute_mv bool, end_xid int8, start_xid int8, finished_xid_list varchar )
begin
if recompute_mv then
execute $_7275328207056490759_$ create table "public"."mv_tbl__mv_1__0_recomputed" backup yes diststyle even compound sortkey(1) as (select "table_1"."column_1" as "column_1", cast("table_1"."oid" as int8) as "table_1_oid", cast(1 as int4) as "num_rec" from "public"."table_1" as "table_1" where (cast("table_1"."insertxid" as int8) <= $_7275328207056490759_$ || end_xid || $_7275328207056490759_$) and (cast("table_1"."deletexid" as int8) > $_7275328207056490759_$ || end_xid || $_7275328207056490759_$)) $_7275328207056490759_$
create or replace view "public"."mv_1" as (select "derived_table1"."column_1" as "column_1" from "public"."mv_tbl__mv_1__0_recomputed" as "derived_table1");
drop table "public"."mv_tbl__mv_1__0";
alter table "public"."mv_tbl__mv_1__0_recomputed" rename to "mv_tbl__mv_1__0";
create or replace view "public"."mv_1" as (select "derived_table1"."column_1" as "column_1" from "public"."mv_tbl__mv_1__0" as "derived_table1");
else
execute $_7275328207056490759_$ insert into "public"."mv_tbl__mv_1__0" (select "table_1"."column_1" as "column_1", cast("table_1"."oid" as int8) as "table_1_oid", cast(1 as int4) as "num_rec" from "public"."table_1" as "table_1" where ((cast("table_1"."insertxid" as int8) > $_7275328207056490759_$ || start_xid || $_7275328207056490759_$) or cast("table_1"."insertxid" as int8) in $_7275328207056490759_$ || finished_xid_list || $_7275328207056490759_$) and ((cast("table_1"."insertxid" as int8) <= $_7275328207056490759_$ || end_xid || $_7275328207056490759_$) and (cast("table_1"."deletexid" as int8) > $_7275328207056490759_$ || end_xid || $_7275328207056490759_$))) $_7275328207056490759_$;
execute $_7275328207056490759_$ delete from "public"."mv_tbl__mv_1__0" using (select "table_1"."column_1" as "column_1", cast("table_1"."oid" as int8) as "table_1_oid", cast(-1 as int4) as "num_rec" from "public"."table_1" as "table_1" where (cast("table_1"."insertxid" as int8) <= $_7275328207056490759_$ || start_xid || $_7275328207056490759_$) and (not cast("table_1"."insertxid" as int8) in $_7275328207056490759_$ || finished_xid_list || $_7275328207056490759_$ and (((cast("table_1"."deletexid" as int8) > $_7275328207056490759_$ || start_xid || $_7275328207056490759_$) or cast("table_1"."deletexid" as int8) in $_7275328207056490759_$ || finished_xid_list || $_7275328207056490759_$) and (cast("table_1"."deletexid" as int8) <= $_7275328207056490759_$ || end_xid || $_7275328207056490759_$)))) as "mv_tbl__mv_1__0__deletes" where "mv_tbl__mv_1__0"."table_1_oid" = "mv_tbl__mv_1__0__deletes"."table_1_oid" $_7275328207056490759_$;
end if;
end;
When the CREATE MATERIALIZED VIEW
command is issued we see in the transaction four SQL statements are issued.
On the face of it, what we see is strange. We see the create command three times, once in STL_QUERYTEXT
and twice in STL_DDLTEXT
, and then we see in STL_QUERYTEXT
a final command of SELECT
, which is being used for a padb_fetch_sample
, which is a query typically issued by ANALYZE
.
Now, I have in fact observed in the past that which is logged in these system tables is not strictly the text of the commands issued. This can be seen here in the final command, which is a status message followed by SQL, or in the messages logged by VACUUM
, which behave in the same way.
If you approach these tables expecting them to behave as they are described in the docs and as their names indicate, and to contain only DDL/SQL commands, you will be disappointed. You will need to parse the output, it can be arbitrary, and you have no idea what it can be until you happen to see it.
What I think is actually happening here is that the text in STL_DDLTEXT
is a bug, and the wrong SQL text is being logged.
If we examine the counts of tables, views and procedures before and after CREATE MATERIALIZED VIEW
, we see the command creates one table, one view and one procedure.
I think what’s happening is that the first command is the CREATE MATERIALIZED VIEW
we actually issued, which must be creating the view and since this it the first command and so the table which will be created does not yet exist, it must be this view is created with late binding; and the second and third commands are actually a CREATE PROCEDURE
followed by a CREATE TABLE AS
, and the final command is the automatic sampling of data performed by the inherent ANALYZE
issued by CREATE TABLE AS
.
A materialized view, then, is a normal view (with the name passed by the caller to CREATE MATERIALIZED VIEW
) which points at a normal table created by Redshift, which is used to store the materialized rows, but we also have a mysterious (but to be explained) procedure.
This explains why materialized views are listed in pg_class
with the relkind
‘v’; what we’re seeing there is the view part of the materialized view.
This is though a colossal blunder, as the only way now I can get a list of normal views from pg_class
is to list all views, and then EXCEPT
from that list the list of materialized view names from STV_MV_INFO
. Whomever had this happen was asleep at the wheel.
Moreover, this is in fact also a breaking change, because I have replacement system tables which show information about normal views, and they abruptly began also showing information about materialized views; and the SQL recorded in the system tables for a normal view is not the entire CREATE VIEW
command, but only the SQL statement given to CREATE VIEW
to define the view, but the SQL recorded in the system tables for a materialized view is the entire CREATE MATERIALIZED VIEW
command, so it’s two breaking changes.
The same problem is also seen in pg_views
, which is now showing materialized views as well as normal views.
When it comes to databases, where they are so central and critical as core components of larger systems - like operating system kernels - breaking changes are almost verboten. If you are going to make them, you telegraph them in advance to the user base and you announce them when they happen and you document them heavily.
Silent breaking changes, with no announcement and no documentation, are so profoundly and completely off-the-map they are utterly inconceivable. No dev team would do this, from awareness of their users’ needs and because of the catastrophic loss of trust and reputation - and every now and then, I see such a change being made in Redshift. It’s not a one-off thing.
I think the devs are unaware of the impact their changes are having on end-users.
It is now the mysterious procedure comes into play.
The REFRESH MATERIALIZED VIEW
command in fact calls the procedure, which in turn contains a bit of logic and a number of SQL commands and implements refresh, be it full or incremental.
We can see the procedure text in the Results, and see that there are two different procedures, one for materialized views with full refresh, the other for materialized views with incremental refresh.
Here’s how they work, remembering that being inside a procedure, every set of SQL commands is executed inside a transaction;
Note when recompute
is true
the avoidance of truncate table
and instead the creation and rename of a new table. This is because truncate
commits the current transaction. If issued in a procedure, it commits the current transaction and a new transaction automatically and immediately begins - which means then that the table holding the rows of materialized view is for a certain period of time empty and seen to be empty by users of the materialized view, which isn’t going to fly.
In general truncate table
when needed inside a transaction can be replaced by the method used here.
Now, I would say in most ETL systems, it’s often possible to truncate and then insert (which also means a vacuum is not required), or simply to insert.
The implementation here though of materialized views has no access to information about the system within which it is being used, and so it has to play it safe; it must use a method - delete and insert inside a transaction (which then mandates a vacuum, because this makes a mess of the underlying table) - which is always going to be correct, even though that means being awfully expensive in the situations where it is in fact possible to use much more efficient methods.
The behaviour then of the procedures is all pretty clear, except there is something remarkable; the path for the Incremental Refresh where recompute
is false
uses the system columns found in every table, deletexid
and insertxid
, to figure out which rows to insert and which to delete.
This is a real surprise. I tried in the past to access these columns and access was forbidden (“column does not exist”), and indeed trying this now it still doesn’t work, neither when issuing SQL directly and also not in a procedure. Note Postgres does allow access to these columns.
I suspect it may be it works in these procedures because the owner of the materialized view procedure is rdsdb
, the über-user, the user owned by Amazon, which is more powerful than the mere system admin user allowed to whomever created the cluster. Remember - Redshift is like Android : you are not root.
In the documentation, there is a peculiar and entirely unexplanatory paragraph which warns about an interaction between automatic background vacuum with the auto-refresh of materialized views;
Background vacuum operations might be blocked if materialized views aren’t refreshed. After an internally defined threshold period, a vacuum operation is allowed to run. When this vacuum operation happens, any dependent materialized views are marked for recomputation upon the next refresh (even if they are incremental). For information about VACUUM, see VACUUM. For more information about events and state changes, see STL_MV_STATE. 1
When I first read that documentation paragraph, I had absolutely no idea what was going on - why on earth would auto-vacuum care about whether or not materialized views were updated or not?
However, I’ve investigated the materialized view implementation enough that I think I can now actually penetrate the murk and explain what’s going on, and this in fact reveals a stupendous and truly staggering omission from the documentation.
When a materialized view is using a full refresh, there is no problem at all. There is no interaction between vacuum, or auto-vacuum, and refresh, whether it is automatic or manual.
The quoted paragraph in fact only applies to incremental refresh materialized views.
When a materialized view is using incremental refresh, it is in fact relying upon the use of the system columns deletexid
and insertxid
, which are in every table, to figure out what row changes have occurred, as that information is required to perform an incremental refresh.
The problem is that when a VACUUM
- of any kind, manual or automatic - runs, it resets the values in the insertxid
and deletexid
columns. This of course completely messes up the record-keeping information being used by a materialized view to perform incremental refresh - and so, perforce, a full refresh has to occur.
What the documentation so saliently fails to mention is that this need for a full refresh is induced not only by auto-vacuum, but by normal, manually issued vacuum.
So, to be clear : every time you VACUUM
any table used by an incremental-refresh materialized view, the next refresh will be a full refresh.
This is also true for any auto-vacuum on any table used by an incremental-refresh materialized view, but auto-vacuum seems to run so infrequently I suspect this doesn’t play much of a part.
You cannot in fact, despite appearances, directly VACUUM
a materialized view.
The VACUUM
command will run, it will not throw an error, and it will return the status message VACUUM
and so it will look exactly like it ran, but it does not run. This is not documented.
You can VACUUM
the underlying table which holds the rows of the materialized view. This does work, and it does what you expect; the problem is that this is not documented, and the name of this table is quite well hidden; you need to investigate the SQL commands issued by CREATE MATERIALIZED VIEW
to find it.
As a consequence, and this is an enormous problem, the only VACUUM
operating on materialized views is whatever is afforded by background auto-vacuum, which is regarded by myself and a number of fellow Redshift admin as running so infrequently as to have negligible effect.
Vacuuming the underlying table does not induce a full refresh (it’s deletexid
and insertxid
columns are not used in refresh).
Finally, note that when an incremental refresh materialized view actually has an incremental refresh, an insert
has been issued, followed by a delete
and so at this point, a VACUUM
is needed. For both full refresh, and an incremental refresh which runs in full refresh mode, where a new underlying table has been created from scratch and repopulated, a VACUUM
is not needed.
When you create a materialized view, you can specify the sorting type and keys, and the distribution type (and distribution column). What you cannot specify are the column encodings. Redshift automatically selects encodings. The problem is, in my view, Redshift makes extremely poor encoding choices and this by itself is a fatal shortcoming.
A discussion of this assertion is long and takes us a long way from materialized views, so I have moved it to Appendix B.
Materialized views are described in the documentation as offering an option to automatically bring themselves up to date when the underlying data changes, but, critically, there are no guarantees of when this occurs, and the algorithm which when updates occur is an undocumented, likely complex, black box which is going to be undergoing ongoing silent changes.
A quote from the docs;
To complete refresh of the most important materialized views with minimal impact to active workloads in your cluster, Amazon Redshift considers multiple factors. These factors include current system load, the resources needed for refresh, available cluster resources, and how often the materialized views are used.
There are tens of thousands of Redshift systems out there. Real life is infinitely more complicated than we can imagine and so design for. The idea, for example, that a materialized view which is used more often is more important is simply not true; it may be a good rule of thumb, but that’s small consolation if someone applies that rule of thumb to your system when it’s not true.
Moreover, any complex system has design and implementation flaws. The implementation for example of AutoWLM has to my knowledge gone through at least three major rewrites, the earlier versions being complete failures and even the current version I think remaining inherently flawed, just less blatantly and up-front problematically so (such that if you tried AutoWLM, you had to disable it, since it was killing your system, as happened in the earlier versions).
In any event, I fully and completely hold the view that it is not possible to knowingly design a correct system when that system contains black boxes controlled by a third party who silently and on an ongoing basis changes their function, let alone when those black boxes are complex and likely flawed.
(This is a major problem with Redshift these days. It’s happening a lot.)
In fact, another feature of Redshift, the automatic background vacuum, has exactly the same automatic update behaviour and is described in the same way in its documentation. It’s been found that the automatic background vacuum runs too infrequently to be useful.
I didn’t want to get into an open-ended exploration exploration of the factors involved in auto-refresh, because there’s so many that could be involved, and they vary so much (which leads to the question of how you test this functionality in the first place), so I made two simple tests, just to get the beginning of a feel for behaviour;
In both cases, the cluster (two node dc2.large
) is completely idle, and we can have a reasonable expectation that refresh times will be extended the busier a cluster becomes. Both tests were repeated five times.
For the first test case, in the test run for this document, the first refresh time was ten seconds, then went to 40 seconds, then became stable at about 55 seconds. (On earlier runs, I saw refresh times as low as 3 seconds and as high as 140 seconds). In general, the first two runs vary (maybe by as much as a minute) but the later test runs settle at 55 seconds - which is still a bit odd, given the materialized view and table are both being dropped and re-created on every test run; it looks like Redshift is maintaining state of some kind relating to refresh decisions which is not tied to the given materialized view and its table(s).
For the second test case, the delay was much more variable and often much longer, ranging from 54 seconds to 1295 seconds (twenty minutes).
The obvious question is : how do you build an ETL system when one of the stages of data propagation through that system is outside of your control, you have only a rough idea of how long it might take, where that idea could be up-ended by factors you are unaware of, and where the behaviour of that stage could also change at any time, without notice?
The answer obviously is that you cannot; you need to manually issue the refresh command at the appropriate point in the ETL process. I assert auto-refresh has no value.
A materialized view which uses full refresh has no extra columns added into the underlying table.
A materialized view which uses incremental refresh has a number of extra columns, the number being 1 (num_rec
) plus 1 for every table used by the SQL of the materialized view.
When a materialized view is created, Redshift examines the SQL and determines if the materialized view when refreshed will experience full refresh or incremental refresh, with incremental refresh being used if possible (being preferred).
If you want a materialized view with incremental refresh, there’s a wide range of restrictions on the SQL which can be used to form a materialized view - none of the following are permitted;
In addition, as we’ve seen, when a refresh is issued on an incremental refresh materialized view after a vacuum on any of the tables it uses, a full refresh occurs anyway, and furthermore, unless you are manually issuing VACUUM
on the table underlying the materialized view, the only vacuum work being performed on that table is that from automatic background vacuum, which I think running so infrequently it is negligible, and finally, there’s an overhead of one column plus one column per table used in the materialized view SQL.
All in all, I think incremental refresh is a complete non-starter and in fact should specifically be avoided. It is in the first place very limited in the SQL it permits, and then we also find the implementation is even more than problematic - it’s harmful, because of the lack of vacuum.
Tucked away in the documentation for one of the system tables which holds information about materialized views, STV_MV_INFO
, we discover two columns of particular interest, the first of which is is_stale
, an char(1)
, which sayeth the docs;
A
t
indicates that the materialized view is stale. A stale materialized view is one where the base tables have been updated but the materialized view hasn’t been refreshed. This information might not be accurate if a refresh hasn’t been run since the last restart.
In other words, this column has only ambiguous information about whether or not a materialized view is up to date; you have to perform an update to initialize the information in this column.
I wanted to write a view which shows users information about their materialized views, and of course one very useful piece of information is whether or not the materialized view is up to date. I can’t show that information, because the only source is this column, and I can’t know if this column is accurate or not, and I do not want to require and depend upon the users reading the docs to avoid being misled (something STV_MV_INFO
is it seems entirely happy with).
Then, next, we have the column state
, an integer
, which the docs have holding the following values;
- 0 – The materialized view is fully recomputed when refreshed.
- 1 – The materialized view is incremental.
- 101 – The materialized view can’t be refreshed due to a dropped column. This constraint applies even if the column isn’t used in the materialized view.
- 102 – The materialized view can’t be refreshed due to a changed column type. This constraint applies even if the column isn’t used in the materialized view.
- 103 – The materialized view can’t be refreshed due to a renamed table.
- 104 – The materialized view can’t be refreshed due to a renamed column. This constraint applies even if the column isn’t used in the materialized view.
- 105 – The materialized view can’t be refreshed due to a renamed schema. |
We see that the information about the refresh type (full or incremental) is conflated in this single column with information about failures (underlying tables or columns being altered) and so when there is a failure, it is no longer possible to know the refresh type of the materialized view.
How am I supposed to make a view on top of this, showing the refresh type?
(Moreover, as we saw earlier, the design of materialized views is such that incremental refresh views sometimes anyway will issue full refreshes, so this can never be a reliable indicator anyway.)
Materialized views are implemented as a table, a normal view and a procedure.
When CREATE MATERIALIZED VIEW
is issued, Redshift actually issues a number of SQL commands, creating the table, the view, and the procedure. The view is presented to the user as the materialized view; it simply points at the table. The procedure is used to implement refresh, with REFRESH MATERIALIZED VIEW
calling the procedure.
The table underlying the materialized view is created by the CREATE TABLE AS
command. This is why it is not possible with materialized views to specify column encodings. I am of the view the encoding choices made by Redshift are extremely poor (indeed, in some cases, nonsensical) and as such many columns are experiencing no compression at all.
For any given materialized view, Redshift by examining at creation time the SQL of the materialized view determines whether full refresh or incremental refresh will be used. Redshift prefers incremental and will choose it where possible, but to be possible, a very extensive list of SQL functionality cannot be used in the materialized view SQL.
When an incremental refresh occurs, the procedure uses the table system columns deletexid
and insertxid
, which are in Redshift not available to normal users (but they are in Postgres), to figure out which rows to delete and which rows to insert, with an additional column per table used by the materialized view being created in the underlying table to store this information. Materialized views using full refresh have no extra columns.
As such, when a VACUUM
(manual or automatic) of any of the tables used by the materialized view occurs, it forces a full refresh, as VACUUM
resets the values in the deletexid
and insertxid
columns of the vacuumed table and so messes up the book-keeping information held by the materialized view.
When a full refresh occurs, either by the refresh occurring on a materialized view using full refresh, or an incremental refresh running as a full refresh due to VACUUM
, the procedure, which implements refresh, and here remembering that all SQL inside a procedure is inside a transaction, creates a new table, populates it, renames the old table out of the way, renames the new table to the name of the old table, re-points the view at the new table and drops the old table. A VACUUM
is not required, since the table is brand new and then populated by a single insert.
When an incremental refresh occurs, the table underlying the materialized view experiences an insert
and then a delete
, and as such after refresh requires VACUUM
.
Materialized views cannot directly be vacuumed. The vacuum command will run, it will report no errors, and it will return the normal VACUUM
info message, but it will do no work.
The table underlying the materialized view can be vacuumed, but the name of this table is not normally available. It can only be found by inspecting the SQL commands issued by Redshift to implement CREATE MATERIALIZED VIEW
.
The only other source of vacuum for the underlying table is whatever comes from auto-vacuum. I have not yet produced a white paper on auto-vacuum, so this is has not been investigated, but I and other admin think it runs so infrequently that it is ineffectual.
Note though that when an incremental refresh materialized view is forced to perform a full refresh due to one or more of the tables it uses being vacuumed, the full refresh will produce a new, freshly populated table, which is inherently fully ordered, and so “reset” the underlying table.
Materialized views can be refreshed manually, by issuing the REFRESH MATERIALIZED VIEW
command, or they can be created such that Redshift takes responsible for automatically issuing refresh, as and when it sees fit.
On a completely idle two node dc2.large
cluster, a materialized view with full refresh pointing at a one column table with exactly ten full blocks of data typically takes 55 seconds before the first auto-refresh occurs.
When the same setup has a single new row being inserted once per second, the time before the first auto-refresh varied from a low of 54 seconds to a high of 1295 seconds (twenty-one minutes).
The algorithm for refresh is not published, so the factors involved are not known, and it is likely to be undergoing ongoing undocumented, unannounced change.
With regard to all of the above, I hold the following points to be true;
Auto-refresh cannot be used, in part because the refresh times are so variable, but mainly because the algorithm is not defined, is likely to be complex, and where real-life is always far more complex than such an algorithm, quite possibly flawed, and, most critically, is sure to be undergoing ongoing undocumented, unannounced change; you cannot knowingly build a correct system when it contains black boxes controlled by third parties.
Incremental refresh, because of its weaknesses regarding vacuum, should specifically be avoided.
All refresh then must be full refresh, which has to fully regenerate all rows.
The impact upon performance of the poor encoding choices made by Redshift is very large.
This leaves then materialized views as full refresh and manual refresh only, which is identical to manually maintaining pre-computing results, except that with materialized views, column encodings cannot be chosen, leading to a very large loss of performance, and, most critically of all, materialized views only offer on refresh full regeneration of all pre-computed results, where-as manually pre-computing results allows you to perform an incremental insert only, which is often all that is required.
The ETL work for both is the same, an initial statement to create the materialized view or the table holding manually pre-computed results, and then a statement to perform refresh, which is issued by the ETL system at the appropriate moment.
All in all, then, I can actually see no use case for materialized views at all. As things stand, they are always inferior to manually pre-computing results.
When you investigate Redshift, there are always unexpected findings.
I ran into so many intricate and perplexing problems with STL_QUERYTEXT
that in the end, once I got to the third round of debugging, I abandoned the effort to perform the necessary pre-processing to the query text being searched for so that it would match the rows recorded in STL_QUERYTEXT
.
For now, I am for queries I need to find pre-processing the SQL before issuing it, to reduce all contiguous white space to one white space, remove all newlines, etc, so that I avoid most of the bugs in STL_QUERYTEXT
and can find the queries by their text.
This doesn’t work as a proper solution - what happens if you have strings with whitespace in - but it works for the queries I need to search for in this script. I intend and need now to produce a white paper on STL_QUERYTEXT
to figure out all the problems and solutions to them, although I have a nasty suspicion right now it may in fact be impossible to find all queries in that table (I suspect you end up with different query texts which are indistinguishable from each other once they’re been placed into STL_QUERYTEXT
).
A VACUUM
, manual or automatic, on any table used by an incremental refresh materialized view, caused the next refresh on that materialized view to be a full refresh.
The implementation of materialized views uses the per-table system columns insertxid
and deletexid
, access to which is - but by no means at all should or needs to be (Postgres allows access) - forbidden to users.
Normal views in PG_VIEWS
for their SQL store the SQL of the statement which forms the view. This statement must be mated with a CREATE VIEW
command to create a view.
Materialized views, however, which are also shown in PG_VIEWS
, and which have on the face of it cannot be distinguished from normal views, for their SQL store the entire ’CREATE MATERIALIZED VIEW` command.
Views and materialized views then must be handled differently, are however conflated into the same view, and require painful SQL to distinguish between (using STV_MV_INFO
, which only lists materialized views, to figure out which rows in PG_VIEWS
are views and which are materialized views).
If an object name is too long (such as a table or view name), I think Redshift used to throw an error. Now Redshift, with an info message, truncates the overly-long name to the maximum supported length.
I think this is staggeringly vast mistake. If something is wrong, it’s infinitely better to alert the user and stop. If you carry on, the user is likely not to be expecting this (since they had no idea there were making a mistake in the first place) and is likely to get caught out later on - which is the more expensive route to discovering error.
Note these results are completely unprocessed; they are a raw dump of the results, so the original, wholly unprocessed data, is available.
{'proofs': {'dc2.large': {2: {'encodings': [[-9, 'deletexid', 'int8', 'raw'],
[-8, 'insertxid', 'int8', 'raw'],
[-7, 'tableoid', 'oid', 'raw'],
[-6, 'cmax', 'cid', 'raw'],
[-5, 'xmax', 'xid', 'raw'],
[-4, 'cmin', 'cid', 'raw'],
[-3, 'xmin', 'xid', 'raw'],
[-2, 'oid', 'oid', 'raw'],
[-1, 'ctid', 'tid', 'raw'],
[1, 'column_01', 'bool', 'raw'],
[2,
'column_02',
'char(256)',
'lzo'],
[3, 'column_03', 'char(64)', 'lzo'],
[4, 'column_04', 'date', 'az64'],
[5, 'column_05', 'float4', 'raw'],
[6, 'column_06', 'float8', 'raw'],
[7, 'column_07', 'geometry', 'raw'],
[8,
'column_08',
'hllsketch',
'raw'],
[9, 'column_09', 'int2', 'az64'],
[10, 'column_10', 'int4', 'az64'],
[11, 'column_11', 'int8', 'az64'],
[12,
'column_12',
'numeric(19,0)',
'az64'],
[13,
'column_13',
'numeric(38,0)',
'az64'],
[14,
'column_14',
'varchar(256)',
'lzo'],
[15, 'column_15', 'time', 'az64'],
[16,
'column_16',
'timestamp',
'az64'],
[17,
'column_17',
'timestamptz',
'az64'],
[18, 'column_18', 'timetz', 'az64'],
[19,
'column_19',
'varchar(64)',
'lzo'],
[20, 'table_1_oid', 'int8', 'az64'],
[21, 'num_rec', 'int4', 'az64']],
'full': {'create': [['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 33, 306758),
'create materialized view '
'mv_1 diststyle even '
'compound sortkey( column_1 '
') auto refresh no as '
'select row_number() over ( '
'partition by column_1 '
'order by column_1 ), '
'column_1 from table_1;'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 33, 323018),
'create materialized view '
'mv_1 diststyle even '
'compound sortkey( column_1 '
') auto refresh no as '
'select row_number() over ( '
'partition by column_1 '
'order by column_1 ), '
'column_1 from table_1;'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 33, 324324),
'create materialized view '
'mv_1 diststyle even '
'compound sortkey( column_1 '
') auto refresh no as '
'select row_number() over ( '
'partition by column_1 '
'order by column_1 ), '
'column_1 from table_1;'],
['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 33, 326914),
'padb_fetch_sample: select '
'count(*) from '
'mv_tbl__mv_1__0']],
'full_column_list_one_table': [[-9,
'deletexid',
'int8',
'raw'],
[-8,
'insertxid',
'int8',
'raw'],
[-7,
'tableoid',
'oid',
'raw'],
[-6,
'cmax',
'cid',
'raw'],
[-5,
'xmax',
'xid',
'raw'],
[-4,
'cmin',
'cid',
'raw'],
[-3,
'xmin',
'xid',
'raw'],
[-2,
'oid',
'oid',
'raw'],
[-1,
'ctid',
'tid',
'raw'],
[1,
'row_number',
'int8',
'az64'],
[2,
'column_1',
'int8',
'raw']],
'full_column_list_two_tables': [[-9,
'deletexid',
'int8',
'raw'],
[-8,
'insertxid',
'int8',
'raw'],
[-7,
'tableoid',
'oid',
'raw'],
[-6,
'cmax',
'cid',
'raw'],
[-5,
'xmax',
'xid',
'raw'],
[-4,
'cmin',
'cid',
'raw'],
[-3,
'xmin',
'xid',
'raw'],
[-2,
'oid',
'oid',
'raw'],
[-1,
'ctid',
'tid',
'raw'],
[1,
'row_number',
'int8',
'az64'],
[2,
'column_1',
'int8',
'raw']],
'proc_args': [[2, 'recompute', 'bool'],
[1, 'end_xid', 'int8'],
[0, 'start_xid', 'int8'],
[3,
'finished_xid_list',
'varchar']],
'proc_name': 'mv_sp__mv_1__1_0',
'proc_text': ' BEGIN IF RECOMPUTE THEN '
'CREATE TABLE '
'public.mv_tbl__mv_1__0__tmp '
'BACKUP YES DISTSTYLE EVEN '
'COMPOUND SORTKEY(2)AS (\n'
'SELECT ROW_NUMBER() OVER '
'(PARTITION BY '
'"table_1"."column_1" '
'ORDER BY '
'"table_1"."column_1" ASC '
'NULLS LAST) AS '
'"row_number", '
'"table_1"."column_1" AS '
'"column_1" FROM '
'"public"."table_1" AS '
'"table_1"\n'
');CREATE OR REPLACE VIEW '
'public.mv_1 AS SELECT * '
'FROM '
'public.mv_tbl__mv_1__0__tmp;DROP '
'TABLE '
'public.mv_tbl__mv_1__0;ALTER '
'TABLE '
'public.mv_tbl__mv_1__0__tmp '
'RENAME TO '
'mv_tbl__mv_1__0;CREATE OR '
'REPLACE VIEW public.mv_1 '
'AS SELECT * FROM '
'public.mv_tbl__mv_1__0;ELSE '
'DELETE FROM '
'public.mv_tbl__mv_1__0;INSERT '
'INTO '
'public.mv_tbl__mv_1__0(\n'
'SELECT ROW_NUMBER() OVER '
'(PARTITION BY '
'"table_1"."column_1" '
'ORDER BY '
'"table_1"."column_1" ASC '
'NULLS LAST) AS '
'"row_number", '
'"table_1"."column_1" AS '
'"column_1" FROM '
'"public"."table_1" AS '
'"table_1"\n'
');END IF; END; ',
'procs': (0, 1),
'refresh': [['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 35, 975696),
'REFRESH MATERIALIZED VIEW '
'mv_1;'],
['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 35, 977569),
'CALL '
'public.mv_sp__mv_1__1_0(7434, '
"7439, 1, '(0)');"],
['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 35, 989098),
'CREATE TABLE '
'public.mv_tbl__mv_1__0__tmp '
'BACKUP YES DISTSTYLE EVEN '
'COMPOUND SORTKEY(2)AS ( '
'SELECT ROW_NUMBER() OVER '
'(PARTITION BY '
'"table_1"."column_1" '
'ORDER BY '
'"table_1"."column_1" ASC '
'NULLS LAST) AS '
'"row_number", '
'"table_1"."column_1" AS '
'"column_1" FROM '
'"public"."table_1" AS '
'"table_1" )'],
['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 6137),
'Analyze '
'mv_tbl__mv_1__0__tmp'],
['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 6278),
'padb_fetch_sample: select '
'* from '
'mv_tbl__mv_1__0__tmp'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 82477),
'CREATE OR REPLACE VIEW '
'public.mv_1 AS SELECT * '
'FROM '
'public.mv_tbl__mv_1__0__tmp'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 84287),
'DROP TABLE '
'public.mv_tbl__mv_1__0'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 85588),
'ALTER TABLE '
'public.mv_tbl__mv_1__0__tmp '
'RENAME TO '
'mv_tbl__mv_1__0'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 87398),
'CREATE OR REPLACE VIEW '
'public.mv_1 AS SELECT * '
'FROM '
'public.mv_tbl__mv_1__0'],
['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 36, 90246),
'COMMIT']],
'tables': (947, 948),
'view_text': 'create materialized view '
'mv_1 diststyle even '
'compound sortkey( '
'column_1 ) auto refresh '
'no as select row_number() '
'over ( partition by '
'column_1 order by '
'column_1 ), column_1 from '
'table_1;',
'views': (547, 548)},
'incremental': {'create': [['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 40, 468523),
'create materialized '
'view mv_1 diststyle '
'even compound '
'sortkey( column_1 ) '
'auto refresh no as '
'select column_1 '
'from table_1;'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 40, 484168),
'create materialized '
'view mv_1 diststyle '
'even compound '
'sortkey( column_1 ) '
'auto refresh no as '
'select column_1 '
'from table_1;'],
['stl_ddltext',
datetime.datetime(2021, 9, 5, 21, 28, 40, 485893),
'create materialized '
'view mv_1 diststyle '
'even compound '
'sortkey( column_1 ) '
'auto refresh no as '
'select column_1 '
'from table_1;'],
['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 40, 491842),
'padb_fetch_sample: '
'select count(*) '
'from '
'mv_tbl__mv_1__0']],
'full_column_list_one_table': [[-9,
'deletexid',
'int8',
'raw'],
[-8,
'insertxid',
'int8',
'raw'],
[-7,
'tableoid',
'oid',
'raw'],
[-6,
'cmax',
'cid',
'raw'],
[-5,
'xmax',
'xid',
'raw'],
[-4,
'cmin',
'cid',
'raw'],
[-3,
'xmin',
'xid',
'raw'],
[-2,
'oid',
'oid',
'raw'],
[-1,
'ctid',
'tid',
'raw'],
[1,
'column_1',
'int8',
'raw'],
[2,
'table_1_oid',
'int8',
'az64'],
[3,
'num_rec',
'int4',
'az64']],
'full_column_list_two_tables': [[-9,
'deletexid',
'int8',
'raw'],
[-8,
'insertxid',
'int8',
'raw'],
[-7,
'tableoid',
'oid',
'raw'],
[-6,
'cmax',
'cid',
'raw'],
[-5,
'xmax',
'xid',
'raw'],
[-4,
'cmin',
'cid',
'raw'],
[-3,
'xmin',
'xid',
'raw'],
[-2,
'oid',
'oid',
'raw'],
[-1,
'ctid',
'tid',
'raw'],
[1,
'column_1',
'int8',
'raw'],
[2,
'table_1_oid',
'int8',
'az64'],
[3,
'table_2_oid',
'int8',
'az64'],
[4,
'num_rec',
'int4',
'az64']],
'proc_args': [[2,
'recompute_mv',
'bool'],
[1,
'end_xid',
'int8'],
[0,
'start_xid',
'int8'],
[3,
'finished_xid_list',
'varchar']],
'proc_name': 'mv_sp__mv_1__1_0',
'proc_text': ' BEGIN IF '
'Recompute_MV THEN '
'EXECUTE '
'$_7275328207056490759_$ '
'CREATE TABLE '
'"public"."mv_tbl__mv_1__0_recomputed" '
'BACKUP YES '
'DISTSTYLE EVEN '
'COMPOUND '
'SORTKEY(1) AS '
'(SELECT '
'"table_1"."column_1" '
'AS "column_1", '
'CAST("table_1"."oid" '
'AS INT8) AS '
'"table_1_oid", '
'CAST(1 AS INT4) AS '
'"num_rec" FROM '
'"public"."table_1" '
'AS "table_1" WHERE '
'(CAST("table_1"."insertxid" '
'AS INT8) <= '
'$_7275328207056490759_$ '
'|| end_xid || '
'$_7275328207056490759_$) '
'AND '
'(CAST("table_1"."deletexid" '
'AS INT8) > '
'$_7275328207056490759_$ '
'|| end_xid || '
'$_7275328207056490759_$)) '
'$_7275328207056490759_$ '
'; CREATE OR '
'REPLACE VIEW '
'"public"."mv_1" AS '
'(SELECT '
'"derived_table1"."column_1" '
'AS "column_1" FROM '
'"public"."mv_tbl__mv_1__0_recomputed" '
'AS '
'"derived_table1"); '
'DROP TABLE '
'"public"."mv_tbl__mv_1__0"; '
'ALTER TABLE '
'"public"."mv_tbl__mv_1__0_recomputed" '
'RENAME TO '
'"mv_tbl__mv_1__0"; '
'CREATE OR REPLACE '
'VIEW '
'"public"."mv_1" AS '
'(SELECT '
'"derived_table1"."column_1" '
'AS "column_1" FROM '
'"public"."mv_tbl__mv_1__0" '
'AS '
'"derived_table1"); '
'ELSE EXECUTE '
'$_7275328207056490759_$ '
'INSERT INTO '
'"public"."mv_tbl__mv_1__0" '
'(SELECT '
'"table_1"."column_1" '
'AS "column_1", '
'CAST("table_1"."oid" '
'AS INT8) AS '
'"table_1_oid", '
'CAST(1 AS INT4) AS '
'"num_rec" FROM '
'"public"."table_1" '
'AS "table_1" WHERE '
'((CAST("table_1"."insertxid" '
'AS INT8) > '
'$_7275328207056490759_$ '
'|| start_xid || '
'$_7275328207056490759_$) '
'OR '
'CAST("table_1"."insertxid" '
'AS INT8) IN '
'$_7275328207056490759_$ '
'|| '
'finished_xid_list '
'|| '
'$_7275328207056490759_$) '
'AND '
'((CAST("table_1"."insertxid" '
'AS INT8) <= '
'$_7275328207056490759_$ '
'|| end_xid || '
'$_7275328207056490759_$) '
'AND '
'(CAST("table_1"."deletexid" '
'AS INT8) > '
'$_7275328207056490759_$ '
'|| end_xid || '
'$_7275328207056490759_$))) '
'$_7275328207056490759_$; '
'EXECUTE '
'$_7275328207056490759_$ '
'DELETE FROM '
'"public"."mv_tbl__mv_1__0" '
'USING (SELECT '
'"table_1"."column_1" '
'AS "column_1", '
'CAST("table_1"."oid" '
'AS INT8) AS '
'"table_1_oid", '
'CAST(-1 AS INT4) '
'AS "num_rec" FROM '
'"public"."table_1" '
'AS "table_1" WHERE '
'(CAST("table_1"."insertxid" '
'AS INT8) <= '
'$_7275328207056490759_$ '
'|| start_xid || '
'$_7275328207056490759_$) '
'AND (NOT '
'CAST("table_1"."insertxid" '
'AS INT8) IN '
'$_7275328207056490759_$ '
'|| '
'finished_xid_list '
'|| '
'$_7275328207056490759_$ '
'AND '
'(((CAST("table_1"."deletexid" '
'AS INT8) > '
'$_7275328207056490759_$ '
'|| start_xid || '
'$_7275328207056490759_$) '
'OR '
'CAST("table_1"."deletexid" '
'AS INT8) IN '
'$_7275328207056490759_$ '
'|| '
'finished_xid_list '
'|| '
'$_7275328207056490759_$) '
'AND '
'(CAST("table_1"."deletexid" '
'AS INT8) <= '
'$_7275328207056490759_$ '
'|| end_xid || '
'$_7275328207056490759_$)))) '
'AS '
'"mv_tbl__mv_1__0__deletes" '
'WHERE '
'"mv_tbl__mv_1__0"."table_1_oid" '
'= '
'"mv_tbl__mv_1__0__deletes"."table_1_oid" '
'$_7275328207056490759_$; '
'END IF; END; ',
'procs': (0, 1),
'refresh': [['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 41, 664228),
'REFRESH '
'MATERIALIZED VIEW '
'mv_1;'],
['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 41, 665943),
'CALL '
'public.mv_sp__mv_1__1_0(7475, '
"7480, 0, '(0)');"],
['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 41, 678324),
' INSERT INTO '
'"public"."mv_tbl__mv_1__0" '
'(SELECT '
'"table_1"."column_1" '
'AS "column_1", '
'CAST("table_1"."oid" '
'AS INT8) AS '
'"table_1_oid", '
'CAST(1 AS INT4) AS '
'"num_rec" FROM '
'"public"."table_1" '
'AS "table_1" WHERE '
'((CAST("table_1"."insertxid" '
'AS INT8) > 7475) '
'OR '
'CAST("table_1"."insertxid" '
'AS INT8) IN (0)) '
'AND '
'((CAST("table_1"."insertxid" '
'AS INT8) <= 7480) '
'AND '
'(CAST("table_1"."deletexid" '
'AS INT8) > '
'7480)))'],
['stl_querytext',
datetime.datetime(2021, 9, 5, 21, 28, 41, 772182),
' DELETE FROM '
'"public"."mv_tbl__mv_1__0" '
'USING (SELECT '
'"table_1"."column_1" '
'AS "column_1", '
'CAST("table_1"."oid" '
'AS INT8) AS '
'"table_1_oid", '
'CAST(-1 AS INT4) '
'AS "num_rec" FROM '
'"public"."table_1" '
'AS "table_1" WHERE '
'(CAST("table_1"."insertxid" '
'AS INT8) <= 7475) '
'AND (NOT '
'CAST("table_1"."insertxid" '
'AS INT8) IN (0) '
'AND '
'(((CAST("table_1"."deletexid" '
'AS INT8) > 7475) '
'OR '
'CAST("table_1"."deletexid" '
'AS INT8) IN (0)) '
'AND '
'(CAST("table_1"."deletexid" '
'AS INT8) <= '
'7480)))) AS '
'"mv_tbl__mv_1__0__deletes" '
'WHERE '
'"mv_tbl__mv_1__0"."table_1_oid" '
'= '
'"mv_tbl__mv_1__0__deletes"."table_1_oid"'],
['stl_utilitytext',
datetime.datetime(2021, 9, 5, 21, 28, 41, 845141),
'COMMIT']],
'tables': (947, 948),
'view_text': 'create '
'materialized view '
'mv_1 diststyle '
'even compound '
'sortkey( column_1 '
') auto refresh no '
'as select column_1 '
'from table_1;',
'views': (547, 548)},
'vacuum_causes_full_refresh': ['create table '
'table_1\n'
' (\n'
' '
'column_1 bigint '
'not null encode '
'raw\n'
' )\n'
' '
'diststyle even\n'
' '
'compound '
'sortkey( '
'column_1 );',
'create '
'materialized '
'view mv_1\n'
' '
'diststyle even\n'
' '
'compound '
'sortkey( '
'column_1 )\n'
' '
'auto refresh no\n'
' as\n'
' '
'select\n'
' '
'*\n'
' '
'from\n'
' '
'table_1;',
'insert into '
'table_1( '
'column_1 ) '
"values ('1'), "
"('2'), ('3'), "
"('4'), ('5'), "
"('6');",
'refresh '
'materialized '
'view mv_1;',
'INFO: '
'Materialized '
'view mv_1 was '
'incrementally '
'updated '
'successfully.',
'insert into '
'table_1( '
'column_1 ) '
"values ('1'), "
"('2'), ('3'), "
"('4'), ('5'), "
"('6');",
'refresh '
'materialized '
'view mv_1;',
'INFO: '
'Materialized '
'view mv_1 was '
'incrementally '
'updated '
'successfully.',
'insert into '
'table_1( '
'column_1 ) '
"values ('1'), "
"('2'), ('3'), "
"('4'), ('5'), "
"('6');",
'vacuum full '
'table_1 to 100 '
'percent;',
'refresh '
'materialized '
'view mv_1;',
'INFO: '
'Materialized '
'view mv_1 was '
'recomputed '
'successfully.'],
'vacuum_underlying_table': ['create table '
'table_1\n'
' (\n'
' '
'column_1 bigint '
'not null encode '
'raw\n'
' )\n'
' '
'diststyle even\n'
' compound '
'sortkey( column_1 '
');',
'create materialized '
'view mv_1\n'
' '
'diststyle even\n'
' '
'compound sortkey( '
'column_1 )\n'
' auto '
'refresh no\n'
' as\n'
' '
'select\n'
' *\n'
' from\n'
' '
'table_1;',
'insert into '
'table_1( column_1 ) '
"values ('1'), "
"('2'), ('3'), "
"('4'), ('5'), "
"('6');",
'refresh '
'materialized view '
'mv_1;',
'INFO: Materialized '
'view mv_1 was '
'incrementally '
'updated '
'successfully.',
'insert into '
'table_1( column_1 ) '
"values ('1'), "
"('2'), ('3'), "
"('4'), ('5'), "
"('6');",
'refresh '
'materialized view '
'mv_1;',
'INFO: Materialized '
'view mv_1 was '
'incrementally '
'updated '
'successfully.',
'6 sorted rows, 6 '
'unsorted rows',
'vacuum full mv_1 to '
'100 percent;',
'6 sorted rows, 6 '
'unsorted rows',
'vacuum full '
'mv_tbl__mv_1__0 to '
'100 percent;',
'12 sorted rows, 0 '
'unsorted rows',
'insert into '
'table_1( column_1 ) '
"values ('1'), "
"('2'), ('3'), "
"('4'), ('5'), "
"('6');",
'refresh '
'materialized view '
'mv_1;',
'INFO: Materialized '
'view mv_1 was '
'incrementally '
'updated '
'successfully.']}}},
'tests': {'dc2.large': {2: {'count_and_insert': [240.61839151382446,
54.22652196884155,
1294.9436657428741,
542.8987336158752,
176.51596808433533],
'count_mv_only': [11.05626916885376,
54.30705976486206,
53.532580614089966,
55.125099420547485,
54.03463935852051]}}},
'versions': {'dc2.large': {2: 'PostgreSQL 8.0.2 on i686-pc-linux-gnu, '
'compiled by GCC gcc (GCC) 3.4.2 20041017 (Red '
'Hat 3.4.2-6.fc3), Redshift 1.0.29551'}}}
I’ve not yet produced a white paper investigating the encoding choices Redshift makes (there are three or so different times and places where Redshift makes such choices, but to provide some evidence here, the Results contain a table which has one column for every data type, and a materialized view which is based on this table, and what we see is;
Data Types | Encoding |
---|---|
bool, float4, float8 | raw |
char, varchar | lzo |
everything else | az64 |
(Note geometry
and hllsketch
are both raw
because that’s the only encoding they support.)
There’s no reason for any data type to be raw
. It’s simply a lost opportunity to reduce disk space use.
I’ve made, but not yet published, a white paper which benchmarks encoding and decoding, and the information I provide now comes from that research.
Regarding char
and varchar
, there are two general purpose encoders, lzo
and zstd
, and a general purpose encoder is indeed the correct choice for arbitrary strings.
Of these two however, lzo
is fast to encode and slow to decode, which is not what you want, where-as zstd
is slow to encode but fast to decode, which is what you want, and compresses considerably more than lzo
.
In short, lzo
should only be used when you have data which is written more often than it is read, which is almost never going to be the case with a sorted column-store database; it is the wrong choice as the default, and in fact it is basically obsolete. It has been superseded by zstd
, which is not too surprising since the basis of lzo
was developed in 1977 and zstd
was developed in 2015, and zstd
should be the encoder used - but it is not.
Finally, we come to AZ64
, about which I am really quite angry.
Each encoder implements a different compression method and each method works well with and only with data which possesses certain characteristics, and works badly and often fabulously badly with data which does not possess those characteristics.
For example, run length encoding works well on data where there are many repeating values, row after row, and works very badly when this is not the case.
In short, to pick an encoder, you must know how it works.
Amazon have never published how az64
works and as such, quite simply, you can never use it, because you have no idea if it is going to work well or not given the data you have. It is utter lunacy users are placed in this position.
The only information Amazon ever published was a blog post, written by someone who was either incompetent or deliberately misleading readers.
The post compares az64
to raw
, lzo
and zstd
and makes various claims about how az64
is much faster and compresses much better.
I quote;
- Compared to
RAW
encoding,AZ64
consumed 60–70% less storage, and was 25–30% faster.- Compared to
LZO
encoding,AZ64
consumed 35% less storage, and was 40% faster.- Compared to
ZSTD
encoding,AZ64
consumed 5–10% less storage, and was 70% faster.
These figures are true for and only for extremely carefully selected data, and so are absolutely misleading to be given as generally true; the problem is that az64
looks to me in my research to be a runlength-type encoder 2, while the other two are general purpose encoders, and as such, these encoders cannot be compared - it’s an apples and oranges situation - and to do so, without explaining that az64
is not a general purpose encoder, is either flatly incompetent or culpable.
Indeed, by making this comparison, readers are led to believe az64
is a general purpose encoder, because no one in their right mind would directly compare different types of encoders without informing the reader.
To put it explicitly, consider a runlength type encoder. It works staggeringly well when data consists of long runs of the same value, and staggeringly badly - making the data larger than the original, in fact - when the data does not consist of long runs of the same value.
Now consider a general purpose encoder; it will normally produce about 30% to 40% compression, always, no matter what the data.
You cannot make - as has been done - a blanket statement that the runlength encoder compresses better and faster than the general purpose encoders, because it is simply not true. The performance of each encoder depends profoundly on the data being compressed, and with some data one encoder is better or much better, and with other data, the other encoder is better or much better.
Coming back then to the encoding choices made by Redshift, the use of az64
as the default for all non-string columns, whether it is runlength or delta, is in my view absolutely and categorically wrong, because the data being compressed is an unknown, except that the majority of columns are not going to be sorted, and so will not possess the characteristics needed for either of these two methods to work well (and indeed, by not being sorted, will in fact possess the characteristics needed for these two methods to perform badly).
Just to make it clear how flawed the encoding selection algorithm is, note both runlength and delta encoders should never be used on random data - it’s the worst kind of data for them both - but Redshift uses az64
as described above, for all non-string columns, on interleaved sorted tables.
Interleaved sorting makes the values in columns essentially random, and az64
I see from my own testing as expected in this situation produces zero compression. This isn’t just the wrong choice, it’s dribbling madness. No one is thinking about this. No one is checking the outcome of these choices. These are not well informed, carefully selected choices made by an expert.
I may be wrong, but what I suspect has happened is that where az64
is proprietary, an in-house AWS encoder, internal politics have pushed for it to be used, and so the wrong technical choice has been taken for political reasons.
We then come to see a later blog post, where we see that;
Five months after launch, the
AZ64
encoding has become the fourth most popular encoding option in Amazon Redshift with millions of columns.
I suspect this is true, but it’s because az64
has been made the by far the most common default encoding choice made by Redshift, and because people have been misled by the blog post comparing az64
to general purpose encoders, and so all of the people using it, either by default or by their own choice, are very likely using the wrong encoder for their data, quite likely are getting zero compression, and this is contributing to making performance poor and erratic, and this is contributing to people leaving for Snowflake.
Coming back then from this journey into encoding choices, I argue then that the encoding choices made by Redshift are no good, and these choices are used for materialized views, and this alone is enough to mandate that materialized views are never used, because encoding choices are critical for performance.
Now, I have to say I have over the years of investigating Redshift come fully to the view to documentation is not reviewed by technical staff. I think what happens is someone explains to the author, who is not a software engineer, who then writes down his grasp of what has been said, and I think everyone, all along the line, is trying to obfuscate everything which is not a strength.↩︎
A Redshift dev informed me az64
is in fact delta based. I’ve not yet conducted the testing to prove this, so all I can say is from the testing I have done so far on encodings it looks runlength-like, but that testing did not exclude the possibility of being delta-like.↩︎