This blog post covers some of the fundamentals of statistics in SQL Server, the cardinality estimation models across different versions of SQL Server, how cardinality estimates are derived in different scenarios and variations in estimates under the different cardinality estimation models.
Data Skew is the uneven distribution of data within a column.
The query optimiser needs to perform well across several workload types such as OLTP , relational data warehouses (OLAP) and hybrid workloads.
In order to strike a balance given typical SQL Server customer usage scenarios and the vast potential for variations in data distribution, volume and query patterns, the query optimiser has to make certain assumptions which may or may not reflect the actual state of any given database design and data distribution.
The core assumption models are: Good cardinality estimation (row count expectations at each node of the logical tree) is vital; if these numbers are wrong, all later decisions are affected.
Cardinality estimates are a major factor in deciding which physical operator algorithms are used, the overall plan shape (join orders etc) and ultimately they determine the final query plan that executes.