Category: RedShift

Redshift Blog Archives, by Peter Whyte (Database Administrator). Includes Redshift Administration blog posts and SQL Scripts.

  • Introducing DBASco: SQL Server Database Admin Blog

    Introducing DBASco: SQL Server Database Admin Blog

    Greetings Tech People!

    If you’ve been following my technical journey on peter-whyte.com, I’m excited to share a new platform that dives into the world of Database Reliability Engineering and beyond – welcome to DBASco.com!

    At DBASco.com, I share my latest technical writings, focusing on a spectrum of topics catering to IT enthusiasts, database/system administrators, and those passionate about the software behind reliability engineering. The primary categories align with the ones you’re familiar with at peter-whyte.com, ensuring a seamless transition for any dedicated readers.

    This site here is now my secondary tech blog site. If you’ve been on here before you will have seen a lot of display ads, which are disabled for the time being. I was using Ezioc, and I was earning around $50-60 per month, until I neglected things for too long during 2023 and there was a drop in earnings. The site certificate had issues during 2023 Q4, and we are back fully operational as of this week.

    My enjoyment is the technical writing, refining, and seeing where tests lead me, exploring my passion for the things I do as a career and on the side. I hope you enjoy what I write, and it serves a purpose if visiting via Online Search to understand a concept or resolve an issue.

    SSMS Logo

    DBASco Blog Categories

    DBASco: SQL Server: Explore Microsoft SQL Server with in-depth insights into queries, optimization techniques, and best practices. Uncover the holistic experience, including the realm of Windows Administration that intertwines seamlessly with the life of a SQL Server DBA.

    DBASco: PowerShell: Dive into the enchanting world of PowerShell scripting with hands-on guides, expert tips, and powerful tricks. Unearth the magic of automation as you streamline your workflow with carefully crafted scripts and comprehensive guides.

    DBASco: WSL and Linux: Journey into the world of Windows Subsystem for Linux (WSL) and Linux, discovering the synergy between these platforms and unraveling the potential for seamless integration. Benefit from real-world experiences and extensive testing, providing you with valuable insights into cross-OS platform development.

    DBASco: Other: From RDMS’s like MySQL, Postgres, Redshift to semi-related technologies like AWS, and even technical guides on WordPress and website management – expect a wealth of knowledge that keeps you at the forefront of the ever-evolving tech landscape.

    SSMS Logo

    DBASco Latest Articles

    RedShift: Script to Show Running Queries
    Published on January 24, 2024
    Uncover the power of Amazon Redshift, a cloud-based data warehousing giant. Learn how to efficiently manage and analyze extensive datasets with a detailed script to reveal running queries.

    RedShift: Creating a User and Promoting to Superuser
    Published on January 24, 2024
    Unlock the secrets of Amazon RedShift as we guide you through creating a user and elevating their privileges to superuser status. Essential insights for effective cluster management.

    PowerShell Script: List Files with Sizes and Dates
    Published on January 23, 2024
    Master a practical PowerShell script that simplifies listing files in a directory. Dive into efficient file management with sizes and dates at your fingertips.

    Backing Up a Database in SQL Server Using Command
    Published on January 22, 2024
    Ensure the safety and integrity of your data with a comprehensive guide on backing up a database in SQL Server using command-line tools. A crucial aspect of database management unveiled.

    SQL Server Script: Get Last Backup Dates
    Published on January 22, 2024
    Stay informed about the last backup times, a critical element in maintaining data integrity. Enhance your SQL Server skills with a script that simplifies tracking backup dates.

    PowerShell: Counting Files, Words and Rows
    Published on January 22, 2024
    Embark on a PowerShell journey with a script that counts files, words, and rows. Discover the power of PowerShell for efficient data processing and analysis.

    Adding a Filegroup in SQL Server
    Published on January 21, 2024
    Learn the process of adding a new filegroup to a database in SQL Server. Elevate your database architecture skills with a step-by-step guide.

  • Why Use WHERE 1=2 in SQL

    Why Use WHERE 1=2 in SQL

    Some time ago I wrote a blog post on why use WHERE 1=1 in SQL. This time it’s why use WHERE 1=2, but really this can be WHERE 1=9 or anything that isn’t a 1, we just want the false statement.

    WHERE 1=1 is TRUE; has no difference on execution times and can be used for ease of adding/removing comments & commas on queries that you are building.

    WHERE 1=2 is FLASE; is usually used for building the structure of a table by copying it from another, without copying any of the rows. It does not copy keys & constraints.

    The following example SQL should help explain this, and the SQL Syntax used below can be run on any of the SQL Systems tagged in this post (MySQL, MS SQL, Redshift, Postgres & more).

    The syntax will work on other SQL systems, however, the table names used in the SQL below will not be available on all platforms. Please amend SQL table names as necessary.

    -- Create table copy from another table
    CREATE TABLE infoschema_tables AS (SELECT * FROM REMOVEinformation_schema.tables WHERE 1=1);
    
    -- Count rows of created table 
    -- (not required in this psql terminal example as it shows row counts)
    SELECT COUNT(*) FROM infoschema_tables;
    
    -- Drop table
    DROP TABLE infoschema_tables;
    
    -- Create table copy from another table but do not copy rows
    CREATE TABLE infoschema_tables AS (SELECT * FROM REMOVEinformation_schema.tables WHERE 1=2);
    
    -- Count rows of created table
    SELECT COUNT(*) FROM infoschema_tables;
    
    -- Clean-up drop table
    DROP TABLE infoschema_tables;
    
    WHERE 1 equals 1

  • Redshift JDBC Driver Download

    Redshift JDBC Driver Download

    A handy post to note down the JDBC client driver download. I’m frequently in need of this myself.

    You can download the JDBC driver from the AWS Docs and then gain your cluster endpoint address from the main Redshift dashboard window

    Instead of doing the above, you navigate to the Configure tab in the Redshift console and download the JDBC driver & grab the endpoint address below it. Click Step 2 for the driver download in this area.

    Once downloaded you can amend your query app JDBC driver. I’m using DBeaver and have deleted & replaced the .jar file.

    All I’ve changed is the Redshift host address, database name and login details. The test connection works!

    and I’ll just create a test database….


  • List all Schemas, Tables & Columns in Redshift or Postgres

    List all Schemas, Tables & Columns in Redshift or Postgres

    This post contains a SQL script that will return all schemas, tables and columns in Redshift or Postgres.

    I’m combining the Redshift and Postgres database blog categories here. Redshift was forked from Postgres and is similar in a lot of ways.

    One of the feature differences between Redshift and Postgres is that Redshift has is column-level access control. The script in this post is useful if you want to review columns that contain sensitive data (PII) before making changes to user and role permissions.

    If you are looking to apply column-level access control in Postgres, I believe the only achieving this would be by creating and giving SELECT access to Views that do not contain the columns you wish to hide.

    Script to Show all Schemas, Tables & Columns

    The SQL below will return all schemas, tables, & columns within RedShift or Postgres.

    We can add a filter to the above statement (as per comment line) to filter out system schemas. Umcomment this line if you only want to return objects that have been created by users.

  • Create a Superuser in RedShift

    Create a Superuser in RedShift

    Superusers in RedShift hold the permissions to perform any action in the cluster, the same access as ‘master’ or whatever you named the superuser during creation.

    Giving Superuser power should only really happen in local test environments, or if it’s a throw-away infrastructure as code (IaC) environment. You need to have control over the database servers you care about. If the data or infrastructure is business-critical, the preferred approach with permissions to follow the principle of least priviledge (PoLP).

    This post here is me creating a new user in RedShift and then promoting the user with Superuser access.

    # Create a User
    # Promote User to Superuser


    Create a User

    Only Superusers can execute CREATE USER in RedShift. It’s a simple command to run :

    CREATE USER sysadmin_guy PASSWORD 'wdoajdajde3123EAK';

    I have a general rule to use underscores (_) instead of hyphens (-) to avoid having to qualify the username with quotes (“). It can lead to annoyances with SQL/Python code.

    When the new user has been created, query the pg_user system table to show the attributes of the new user. One of them (usesuper) includes whether the user is a Superuser or not.

    SELECT * 
    FROM PG_USER;

    The 4th column in the query results above shows a tick if the user is a superuser. So far it’s just my own user and the default rdsdb user who are superusers.


    Promote User to Superuser

    The SQL to give a user Superuser access is done with the ALTER USER command.

    The CREATEUSER parameter is an odd name I think, with the Postgres equivalent being ‘WITH SUPERUSER’.

    ALTER USER sysadmin_guy CREATEUSER;

    Now check the pg_user system table again, we can see ‘usesuper‘ column is true for the sysadmin_guy user we created above.


  • Drop Users in RedShift

    Dropping users in Amazon Redshift is a simple task, but sometimes you may encounter an error message if the user you are trying to drop is an owner of an object in the database, such as a schema, table, or view.

    When this happens, you will see one of the following error messages:

    ERROR: user "user1" can't be dropped because the user owns some object [SQL State=55006]
    ERROR: user "user1" can't be dropped because the user has a privilege on some object [SQL State=55006]
    

    These error messages are mentioned in the AWS documentation for the DROP USER command. In this blog post, I will show you how to handle these errors by checking a user’s permissions, changing the owner of the relevant objects, and then dropping the user.

    Check database object owners

    First, you need to determine which objects the user owns or has privileges on. You can do this by running the following query:

    SELECT *
    FROM SVV_USER_PERMISSIONS
    WHERE grantee = 'user1';
    

    This will return a list of all the objects that the user has permissions on. You can then use this information to identify the objects that you need to change the owner for.

    Change object owner

    To change the owner of an object, you can use the ALTER command. For example, to change the owner of a schema called my_schema to a user called user2, you can use the following command:

    ALTER SCHEMA my_schema OWNER TO user2;
    

    You will need to repeat this step for each object that the user owns or has privileges on.

    Drop user

    Once you have changed the owner of all the relevant objects, you can drop the user using the DROP USER command. For example:

    DROP USER user1;
    

    That’s it! By following these steps, you should be able to successfully drop a user without encountering any errors.

    As always, it’s important to be careful when modifying your database. Make sure to create backups and test your changes before applying them to your production environment. Also, have a look at the DROP USER AWS Docs.

  • Create a Test AWS RedShift Cluster

    Create a Test AWS RedShift Cluster

    Here’s a simple guide on creating a new AWS RedShift Cluster within a personal AWS test environment.

    Warning, if you’re testing this AWS Service you have to be rigorous with managing the cluster for billing reasons. I delete my cluster right after I’m done with it as you’ll still be charged for storage if you pause a cluster.

    # Create RedShift Cluster
    # Connect to RedShift via DBeaver


    Create RedShift Cluster

    1. Open the AWS RedShift Service.

    2. Click to create a cluster.

    3. Give the new cluster a name and we’re selecting the free trial option.

    If your ‘organization’ has never created an Amazon Redshift cluster, you’re eligible for a two month free trial of the dc2.large node.

    This trial allows 750 hours per month for free to continuously run one dc2.large node with 160GB of compressed SSD storage. You can also build clusters with multiple nodes to test larger data sets, which will consume your free hours more quickly. Once your two month free trial expires or your usage exceeds 750 hours per month, it’ll start charging at the standard On-Demand Rate.

    If you’re no longer eligible for the free trial, the cheapest production option at the time of writing is the same as above, a 1 node dc2.large cluster (below).

    Warning!…As mentioned at the top of this post. This shows that the estimated cost of leaving this minimum spec 1 node cluster online every hour of a month will cost $230 at $0.32 per hour. Storage and snapshots will increase costs and pausing clusters will only stop compute costs, not storage. I recommend deleting the cluster after use – Ideally, you’d do this on an employer’s AWS account though.

    More on billing! Remember to have billing alerts in-place when working with cloud labs. My environment has $10/15/25 alerts and my expenses are around $10 per month, mostly for storage (EBS/S3). When testing RedShift my £25 alert triggered for the first time in about a year!

    4. The default database name is dev which is suitable here. Enter a username/password, and there’s never a need to deviate from default database ports.

    5. For this simple cluster create guide, there’s no need for an IAM role. I’m confident I’ll update this with a new post soon.

    6. Below, select your lab VPC and Security Group, ensuring your local machine has access over port 5439 as configured above.

    Enabling Enhanced VPC Routing won’t increase cost, but it might result in additional complexity in network configuration.

    I’m making my cluster publicly accessible as my VPC is set-up for external addresses.

    7. Default parameter group and no encrpytion needed for now.

    8. Maintenance period is only 30 minutes, you ca fit that somewhere within your week easy.

    Note: In production environments you may want to configure trailing updates, just to take a precautious approach. AWS do roll out new changes thick and fast, but that said I haven’t seen or heard of any issues with new features. It’s pretty solid.

    9. No need for monitoring atm.

    10. Select minimum snapshot retention, which is 1 day.

    Note: You can set the snapshot retention period to a maximum of 35 days although manual snapshots can have infinite retention.

    11. Click to create the new cluster. It’ll take a few minutes.

    12. Click into the cluster when it’s ready and copy the endpoint address for the next section; connecting to the cluster with a SQL client app.


    Connect to RedShift via DBeaver

    1. Open DBeaver and click to create a new database connection as shown.

    2. Search and select Redshift.

    3. Enter details as defined during the cluster create (username/password/database name & host address). The host is the cluster endpoint address, gained from step 12 above in this post.

    Nothing should need changed in here if it’s a relatively up to date version of DBeaver.

    4. Click to test connectivity before saving this as a new connection.

    5. Start querying the cluster!

    Remember to either pause the cluster or delete it when you’re finished.


  • Why use WHERE 1=1 in SQL Queries

    Why use WHERE 1=1 in SQL Queries

    A common SQL question is, why would anyone use WHERE 1=1 in their queries? And what does it do?

    The WHERE 1=1 condition means WHERE TRUE. It returns the same query result as it would without the WHERE Clause. There is no impact on query execution time.

    This is something you can add to a SQL query for convenience for adding & replacing conditions on the statement. If you have WHERE 1=1 on a SQL query you are writing, all conditions thereafter will contain AND, so it’s easier when commenting out conditions on exploratory SQL queries.

    Example: WHERE 1 Equals 1

    /* where 1=1 example */
    SELECT *
    FROM TABLE
    WHERE 1=1
    --  and column1='blah'
        and column2='more_blah'
    --  and column3 is not null

    You can see this would be easier for commenting out WHERE conditions in the SQL ^

    Example (Extra): Commenting Columns

    This is similar to another SQL querying technique where you have commas before column names, rather than after the column name.

    Again, this might work out well for you when commenting out columns on a work-in-progress SQL query.

    SELECT
         Column1
    --  ,Column2
        ,Column3
        ,Column4
    --  ,Column5
    FROM TABLE

    Personally, I often make use of both of the above techniques when I write SQL queries.

    Convenience is everything when working behind a computer. We have to keep exploring new short keys and any general ways of working to optimize our efficiency. Some things you might not like and won’t adopt, and sometimes GUI is better than CLI. It’s all personal preference, in the end, choose what you are most optimal with.

    If you wonder what would happen if we change the WHERE 1=1 to WHERE 1=2, have a look at my other blog post – Why Use WHERE 1=2 in SQL