Security Design

  1. The AMI uses Amazon Linux 2023. I understand from a DevOps friend the Amazon Linuxes are the best distros with regard to keeping up to date with security patches. There is a single volume, the root volume, which has encryption enabled. The AMI stores no data at all, except for that which may be generated incidentally into history, swap, temp files by the OS or the packages (such as psycopg2) which I use.

  2. You can SSH into the AMI, in the normal way - when the AMI is brought up, your SSH key is placed into the AMI by AWS - and so you can connect and keep the OS up to date.

  3. All other SSH keys, and all authorized_keys, have been deleted. No one else, including me, can SSH into your instance. SSH login as root is disabled (although of course ec2-user is a sudoer).

  4. The Redshift Workbench AMI has a single disk, the root volume, which in the AMI I release is not encrypted. The way it works is that when you launch the AMI, you specify your encryption key, and the root volume is encrypted with that key.

  5. The Redshift Workbench requires no external access. It needs to be able to access the Redshift cluster, and you need to able to access the Redshift Workbench (HTTPS only, port 443, IPv4 and IPv6); that’s it. As such, the Redshift Workbench can and should be firewalled completely.

  6. The Redshift Workbench runs Apache, and serves pages over and only over HTTPS. The only permitted protocol is TLSv1.2 (Apache on AL2 seems not yet to support TLSv1.3). Where there is no domain name, there must be and so is a certificate which is self-signed, and your browser will warn you about the certificate being self-signed when you first connect.

    The certificate is 2048 bit RSA, and is created when the AMI is built, and so is the same for all users, for a given build of the AMI.

  7. The Redshift Workbench requires a Redshift user with syslog unrestricted, and access system table (which can only be granted via a role), and temporary on any database being examined (this is needed to work around the problem of accessing leader node only functionality or data at the same time as data stored on worker nodes, by copying the leader node data into temporary tables); no other privileges are required. Superuser access is not required. As such, the Redshift Workbench cannot read your tables at all.

    There is one exception to this, which is the functionality to analyze tables.

    Analysis naturally enough involves reading the data in the table, and one of the tasks performed is validation of constraints, which for primary keys and unique, requires making a copy of the columns being validated into a temporary table, and which for foreign keys, requires making a copy of the foreign key column (to check it is a primary key, and to check that all values in the table being analyzed are present in the foreign key).

    These operations require certain privileges, and so when a user tries to use the analysis functionality on a table, it will at fail, because the necessary privileges will not have been granted to the Redshift Workbench Redshift user, and the failure page will list the necessary privileges (usage on the schema of the table, select on the table, the same again for any foreign key tables). Once privileges are granted, the page can then be reloaded and the analysis will then run.

    Once the analysis is complete, the Redshift Workbench Redshift user automatically drops the privileges it was granted. Currently, it does so whether or not it already held those privileges - the Redshift Workbench Redshift user is expected to be used for and only for the Redshift Workbench.

    Currently (2022-01-13), there is no code to ensure any privileges which already were granted are retained; rather, the Redshift Workbench knows what privileges are required to perform the analysis on the table, and at the end of analysis, always drops all of them. I am working on the basis that the user used for the Redshift Workbench is only used for the Redshift Workbench.

    If a temporary table is created, the analysis upon that table issues only queries which perform aggregation work - the count of the number of rows, count of the number of distinct rows, minimum value, maximum value, the count of the number of values which are a column but not in its foreign key column, etc. Once this work is complete, the temporary table is dropped, but note also by being a temporary table, it would be automatically dropped anyway when the session disconnects (and the Redshift Workbench connects and disconnects every time it produces a page, so the session will disconnect once the analysis is complete).

    Finally, note the Redshift Workbench can and does read query texts, which can contain PII information. There’s no avoiding that, as part of the functionality offered is to show query texts to the user.

  8. The Redshift Workbench, being web-based, operates on the basis of URLs. There are six pieces of information which are in play; the Redshift Workbench page being requested, the cluster IP, cluster port, the database to connect to, and the Redshift user name and password. Of these, the URLs contain the page being requested, the cluster IP and cluster port and the database to connect to. The Redshift user name and password are stored in cookies, and so are local to the browser of a user who has been given the Redshift Workbench Redshift user name and password.

    The cookies are configured thus;

    1. With HttpOnly, which is to say, cannot be read or modified via Javascript.
    2. With Secure, so cookies can only be issued over HTTPS.
    3. With SameSite, so they can only be used with the originating server.
    4. Path is set to /, and domain is not set.
    5. The cookie names are prefixed with __Host-, which causes the browser to emit the cookies when and only when all the above conditions are met.
    6. The cookies expire after four weeks.
    7. Both the username and password are 256-bit AES (CTR mode) encrypted, with a per-AMI key (as of 2023-01-22, no salt is added - this is on the to-do list).

    This last point, number seven, requires clarification.

    When the Redshift Workbench Redshift user first enters the Redshift user name and password into the connect form, the Redshift user name and password are sent en clair, but over HTTPS, to the Redshift Workbench. No cookies yet exist. The connect form is received by the Redshift Workbench, which encrypts the Redshift user name and password using AES with using a key which is generated when the AMI is generated (and so is unique to each AMI), and it is the encrypted Redshift user name and password which are sent back to the user to be set into cookies.

    The key is stored en clair, on disk, outside of the Apache WWW directory (and so cannot be served), and is owned by the Apache user.

    From this point on, when the Redshift Workbench user requests pages, the cookies being sent from their browser contain the AES encrypted Redshift user name and password, which the Redshift Workbench knows how to decrypt. Naturally, this also means the cookies are being stored encrypted by the browser on its local disk, as the browser only knows the encrypted username and password.

    (I think at least some browsers encrypt all cookies, when at rest on disk, but support seems to vary by browser, and was hard to find out about, so not a viable solution. Also, the user can normally in the browser view all cookies, so they’re easy enough to abscond with, and the cookies would be sent en clair, over the network, as there’s nothing at the other end which would know how to decrypt an encrypted cookie. You would be sending those cookies over HTTPS of course, but then you’re trusting the HTTPS implementation, which is a complex and not quite completely reliable mechanism. All it all, it is necessary to run our own encryption.)

  9. Note a number of other cookies are set, which contain user selectable configuration choices on a per-cluster, per-page basis. These contain the cluster IP and the name of the configuration option and its value, such as “true” or “false”, or a time limit, like “60 minutes”. The name and value are never PII, and do not provide any information about user data. I do not encrypted these cookies, and they are not HttpOnly (as they are set by Javascript, when the user in a Redshift Workbench page changes the setting of an option), and so are also not prefixed with __Host-, as where they lack HttpOnly, the __Host- mechanism would not work. Aside from that, they are configured in the same way as the Redshift user name and password cookies.

  10. The Redshift Workbench is written in Python and uses the psycopg2 module to connect to Redshift. The connection requires the use of SSL, and will fail otherwise, with the minimum protocol version being TLSv1.2. I have tried using certification verification as well, but the first attempt failed. It might be I can get this working by installing certificates on the AMI, but I’ve not looked into this yet.