This page provides the full text for SQL
queries.
The Redshift system tables store all query texts in system tables
which store 200 bytes of text perr row, plus it stores the first 4000
bytes in SQL_QUERY
(and I mean bytes, not characters, and
it’s with all formatting removed - it’s just one long line).
For DDL
and MDL
, and when you use
SQL_QUERY
to get the first 4k, this seems to work out okay,
when you look for a particular query - you get a fairly immediately
response.
To get the full text of SQL
queries (different table -
STL_QUERYTXT
) it is mind-bendingly slow. On a reasonably
used cluster you’re looking at something like ten minutes.
Consequently, we end up with two pages for query texts;
DDL
, MDL
, first 4k of SQL
(fast)SQL
(mind-numbingly slow)Take your pick.
Finally, note that I have seen a number of what appear to be bugs in the storage of query text in the system tables. I think the query text is still valid, but I think it is often, even normally, no longer identical to the original query text - spacing being modified.
I’ve not hunted down the full list of issues, but to give you an idea, one bug I did figure out; the system tables involved have one row for every 200 bytes of text. If the final character of one of these 200 byte texts is a space, the next row has a leading space added.
Ah, finally, note that the system tables use char
to
store text is char
, but in fact contains
UTF-8
. I’m not sure what happens if a multi-byte character
is truncated at the end of one of the 200 byte texts (I’ve seen command
line based database clients break hard on invalid
UTF-8
).
Name | Type |
---|---|
pid | int4 |
xid | int8 |
qid | int4 |
start | timestamp |
text | varchar |
The process ID.
When a user connects to the cluster, a process is spawned, which accepts that connection. That’s the process the ID is for. That process is also the process which issues queries, and when you need to cancel a query, you in fact cancel the process, which by being cancelled, inherently also cancels the query it is running.
The transaction ID.
The query ID. This is really the WLM query ID; leader node queries have no query ID, at least, not that you see in the system tables. They do really have query IDs, of course, but they’re on the leader node only and they never show up in the system tables. All we see from the leader node are transaction IDs and process IDs.
The start time.
The query text.