Queries (running) v1

Description

This page lists running and only running queries.

Critically, this table has two columns which are used to identify “killer queries” - improperly written queries which are bringing a cluster to a halt; see the killer queries doc page.

The page auto-refreshes; as soon as it loads, the page kicks of a request to reload the table in the page. Only the table refreshes, rather than the page (full page reloads are very annoying). To pause/unpause refreshing, double click on the table. You will need to do this if you want to look more closely at the rows, because when the table refreshes, which happens every few seconds, the width of each column almost always changes, making it problematic to read.

The table has one row per slot, ordered by queue, with empty slots having an empty row; so there is one row per slot. The idea here to allow an intuitive view of what’s going on with the queues.

Note at the time of writing (2022-12-27) there appears to be a regression in stv_recents, such that leader node only running queries are not listed, which means these queries are no currently not shown in this page.

  1. Inconsistent or partial data
  2. Killer queries

Columns

Name Type
queue_id int4
queue varchar
csc varchar
start exec timestamp
durations:queue interval
durations:compile interval
durations:execute interval
owner_user_id int4
owner varchar
pid int4
xid int8
tid int4
qid int4
processor:time interval
memory:avail int8
memory:used float8
disk:transient float8
disk:intermediate int8
data processed float8
text varchar
related pages varchar

Column Descriptions

queue_id

The queue ID. This column is emitted in CSV exports only.

queue

The queue name.

The default queue is played by Clint Eastwood, for it has no name. I’ve called it “default”.

csc

CSC if query is running on a concurrency scaling cluster, otherwise NULL. All leader node queries naturally enough are set to NULL.

start exec

The time query execution started.

durations:queue

The time spent queuing. I take this from stv_wlm_query_state. I know the same query can be queued multiple times; I do not yet know what happens to the queue time, whether it represents the cumulative queue time, or only the time of the most recent queuing.

durations:compile

It used to be compiling occurred in full, prior to execution. These days, compiling occurs in-line with execution, with each stream, if it needs it, performing compilation before it begins.

Additionally, it is possible for a query to be executed more than once - for example, it might enter SQA, be executed for a while, then be stopped, and be queued again, and execute again, in a normal queue.

In this situation, it could be some or even all compilation occurs during the first execution in SQA, and so occurs potentially long before the second execution of the query.

Given all this, I can’t see any reliable way now to untangle compilation time from execution time.

So this column is simply the total time of every compile that has been performed, regardless of when or how it happened.

durations:execute

It used to be compiling occurred in full, prior to execution. These days, compiling occurs in-line with execution, with each stream, if it needs it, performing compilation before it begins.

What time means is that the times recorded in the system tables, which are the start of execution and the end of execution, no longer mean what they used to mean.

When compilation occurred prior to execution, the start and end execution times showed how long the query took to execute.

These days, compilation time is mixed into the execution time, so you would need to subtract the compilation time.

Unfortunately, it is also now possible for a query to be executed more than once - for example, it might enter SQA, be executed for a while, then be stopped, and be queued again, and execute again, in a normal queue.

In this situation, it could be some or even all compilation occurs during the first execution in SQA, and so occurs potentially long before the second execution of the query.

I think, although I am not yet certain, that execution time is immediately prior to the beginning of the first compilation.

As such, in the usual case, where a query executes only the once, subtracting the compile time from the execution time should give the actual execution time.

In the cases with multiple execution, of course, this will be wrong.

Right now, I simply subtract the execution start from the execution end, and that’s the execution time.

You need to remember it includes compile time.

owner_user_id

The owner user ID. This column is emitted in CSV exports only.

owner

The owner user name.

pid

The process ID.

When a user connects to the cluster, a process is spawned, which accepts that connection. That’s the process the ID is for. That process is also the process which issues queries, and when you need to cancel a query, you in fact cancel the process, which by being cancelled, inherently also cancels the query it is running.

xid

The transaction ID.

tid

The task ID.

I’m not clear what tasks are really about. I think it’s to do with queries being bounced around queues by routing rules, but I’ve not looked into it. I did think it would mean that a query ID was used for and only for one query, but this is not the case; to my considerable surprise, you can find the same query ID showing up more than once in stl_query, for example.

qid

The query ID. This is really the WLM query ID; leader node queries have no query ID, at least, not that you see in the system tables. They do really have query IDs, of course, but they’re on the leader node only and they never show up in the system tables. All we see from the leader node are transaction IDs and process IDs.

processor:time

Total CPU time, over all slices.

The Redshift docs imply this is the sum of the time consumed on every processor across the cluster which has done work, but I’ve not verified this.

Note this value comes from stv_query_metrics, where stv_query_metrics normally takes some seconds to begin reporting information on queries (and for short queries, never has any information).

memory:avail

Memory allocated to the slot(s) the query is using, in megabytes, across the whole cluster.

memory:used

Of the memory allocated to the slot(s) the query is using, the amount in use, in megabytes, across the whole cluster.

It’s important to realise each slice is running independently and although I think they follow the same pattern, of consuming slot memory till none remains then then dumping out to a transient table, and repeating, because the slices are independent, they will all at given instant be using different amount of memory and this column is showing the total for all slices, which in effect becomes the mean value.

If we were watching just once slice, we would see it go from 0 up to the memory available (for that slice), dump to a transient table (which would increase by the amount of memory used) and then we’d go back to 0.

As it is, what we see is the mean value, which doesn’t change much, but we do see the size of the transient tables increasing over time.

disk:transient

Size of the row-store transient table(s), on the leader node, used by this query, in megabytes.

disk:intermediate

Size of intermediate results, on the worker nodes, used by this query, in megabytes. The underlying data here is in bytes, I’ve divided to get to megabytes to make the unit consistent across columns.

data processed

The total data processed, over all slices, in megabytes. The underlying data here is in bytes, I’ve divided to get to megabytes to make the unit consistent across columns.

text

The query text, truncated at the 48 character mark.