AWS Redshift Advanced
AWS Redshift Advanced Topics include Distribution Styles for Table, Workload Management, and others.
Distribution styles
The table distribution style is how data is distributed across compute devices. It helps to minimize the impact of redistribution by locating data where it is needed before the query is executed.
Redshift supports four distribution methods: AUTO, EVEN and KEY.
A single column acts like a distribution key (DISTKEY), and helps to place matching values on the same slice.
As a rule of thumb, select a column that:Is evenly distributed – Otherwise, skew data can cause unbalances to the volume of data stored in each compute node. This can lead to undesirable situations such as where some slices will process more data than others and create bottlenecks.
acts as a column of JOIN – for tables that are related to dimensions tables (star-schema), you should choose as DISTKEY that field that acts as the field of JOIN with the larger dimension tables. This will ensure that matching values from the same columns are physically stored together and reduce the amount that must be broadcast through the network.
Distribute the rows across the slices in round-robin fashion regardless of the column values.
If the table is not participating in joins, choose EVEN distribution
If there is no clear choice between ALL and KEY distribution.
Every compute node replicates the entire table.
This ensures that every row of every table join is colocated
Ideal for tables that move slowly, tables that aren’t updated often or not frequently.
Because of the low cost of redistribution, small dimension tables are not eligible for ALL distribution.
Redshift determines the best distribution style based upon the size of the table data, such as e.g. Apply ALL distribution to a small table, and as it grows, change it to Even distribution
Amazon Redshift applies AUTO distribution.
The sort keys determine the order in which data will be stored.
Sorting allows for efficient handling of range-restricted predicates
One type key per table is allowed, but it can be combined with other columns.
Redshift stores columnar data in 1 MB disk blocks. The metadata also stores the min and maximum values for each block. If query uses a range-restricted condition, the query processor can use these values to quickly skip large numbers of blocks during table scans.
Redshift has two types of sort keys: Interleaved and Compound.
A compound key is a combination of all the columns listed in the type key definition in the order they are listed.
A compound sort key is more effective when query predicates use prefixes or query’s filter apply conditions conditions such as filters and joins. This is a subset the sort key columns in ordered.
Compound sort keys can speed up joins, GROUP BI and ORDER BI operations, and window functions that use ORDER BY or PARTITION BY.
Interleaved sort keys give equal weight to each column in the key. This means query predicates can use any part of the sort key in any order.
A interleaved sort key works better when multiple queries use different columns to filter.
Use an interleaved type key only on columns with monotonically increasing attributes such as identity columns, dates or timestamps.
Use cases involve performing ad-hoc multi-dimensional analytics, which often requires pivoting, filtering and grouping data using different columns as query dimensions.Constraints
Redshift does not support Indexes.
Redshift supports UNIQUE PRIMARY KEY, FOREIGN KEY and UNIQUE constraints. However, they are only for informational purposes.
These constraints are not subject to Redshift integrity checks and are used by query planner as hints in order optimize ex

AWS Redshift Advanced