Killer Queries

Two critically important columns are disk:transient and disk:intermediate.

Redshift has a problem, like most databases, with “killer queries”, queries which are improperly written and generate vast numbers of rows. Killer queries bring the cluster to a halt, and they do this by hammering shared resources, disk in particular as disk is always the bottleneck in a database system.

From observing queries as they run, what seems to happen is that a query consumes the memory available in the slot, until none is left, and then dumps all this data from memory to disk, freeing up all memory - and this process repeats until the query is complete. This data is known as transient data, and disk:transient column indicates the size, in bytes, of transient data in use by the query. I think when the transient data is smaller than the memory available in the slot, none of it goes to disk; but as a query comes to generate more and more output, and exceeds the memory available in the slot, more and more is written to disk. In any event, the more rows the query is generating, the more transient data. Killer queries run for a long time and generate a lot of rows, so the value for transient data becomes very large.

Next, there is then this not quite defined concept of “spill”. In general, this is that work being performed by a query needs more memory than is available, and so necessarily begins to use disk. Exactly what this means in practise is not clear. I think this is not the amount of transient data which is on disk, but I am not certain of that. I see there are some steps which (hash and unique, for example) which in their system tables indicate they ran in memory, or on disk; it might be this is, either fully or in part, a measure of this disk use. Alternatively, or additionally, it may be for any step which cannot allocate sufficient memory, disk ends up being used.

In any event, empirically, I see killer queries can end up with large values for spill. The amount of spill is recorded in the column disk:intermediate, and it is measured in blocks (as opposed to transient, which is in bytes).

Note that the data for spill comes from stv_query_metrics. This system table I normally avoid like the plague, but it is the only source of this information, and for the purpose of detecting killer queries, it is I think adequate; as long as you’re aware of the issues with this system table, which I will now describe, and so can interpret what you see, it’s fine.

The table is actually pretty overloaded. It contains rows about queries, rows about segments, and rows about steps.

However, when it comes to killer queries, where they are long-running, I believe we will be able to obtain meaningful and useful information from this table.

In the case of a killer query, there is a column for pid, so log in as admin and cancel the process which owns the query, as this will abort the query.