Query Text (slow) v1

Description

This page provides the full text for SQL queries.

The Redshift system tables store all query texts in system tables which store 200 bytes of text perr row, plus it stores the first 4000 bytes in SQL_QUERY (and I mean bytes, not characters, and it’s with all formatting removed - it’s just one long line).

For DDL and MDL, and when you use SQL_QUERY to get the first 4k, this seems to work out okay, when you look for a particular query - you get a fairly immediately response.

To get the full text of SQL queries (different table - STL_QUERYTXT) it is mind-bendingly slow. On a reasonably used cluster you’re looking at something like ten minutes.

Consequently, we end up with two pages for query texts;

  1. DDL, MDL, first 4k of SQL (fast)
  2. Full SQL (mind-numbingly slow)

Take your pick.

Query Text Modifications

Finally, note that I have seen a number of what appear to be bugs in the storage of query text in the system tables. I think the query text is still valid, but I think it is often, even normally, no longer identical to the original query text - spacing being modified.

I’ve not hunted down the full list of issues, but to give you an idea, one bug I did figure out; the system tables involved have one row for every 200 bytes of text. If the final character of one of these 200 byte texts is a space, the next row has a leading space added.

Ah, finally, note that the system tables use char to store text is char, but in fact contains UTF-8. I’m not sure what happens if a multi-byte character is truncated at the end of one of the 200 byte texts (I’ve seen command line based database clients break hard on invalid UTF-8).

Columns

Name Type
pid int4
xid int8
qid int4
start timestamp
text varchar

Column Descriptions

pid

The process ID.

When a user connects to the cluster, a process is spawned, which accepts that connection. That’s the process the ID is for. That process is also the process which issues queries, and when you need to cancel a query, you in fact cancel the process, which by being cancelled, inherently also cancels the query it is running.

xid

The transaction ID.

qid

The query ID. This is really the WLM query ID; leader node queries have no query ID, at least, not that you see in the system tables. They do really have query IDs, of course, but they’re on the leader node only and they never show up in the system tables. All we see from the leader node are transaction IDs and process IDs.

start

The start time.

text

The query text.