SQL Server Execution Plans: Statistics - Part 2
Part 2 of the series on Execution Plans dives into the data that SQL Server uses to generate its query plans: statistics.
Subscribe and turn on notifications to never miss a weekly video: https://www.youtube.com/c/DataWithBert?sub_confirmation=1
Check out the full post for example queries and more details:
https://bertwagner.com/2019/07/30/execution-plans-statistics/
Follow me on Twitter:
https://twitter.com/bertwagner
Video Summary & Chapters
No chapters for this video generated yet.
Video Transcript
What's up everyone and thanks for joining me again this week.
Today we're continuing our series on execution plans and we're going to learn about the metadata
that's inside SQL Server that the query optimizer uses to generate query plans.
If you missed last week's episode, be sure to check that out first so you have an understanding
of what an execution plan is and how you can view one because today we're diving deeper
into the details.
So as I mentioned in last week's video, SQL Server uses data about your indexes and your
columns of data in your database in order to help estimate
the different costs associated with retrieving data
in different ways.
And the primary data that SQL Server uses
to estimate these costs is what's known as.
as statistics.
So statistics are high level information
about what the data looks like in your indexes and columns.
SQL Server pre-computes these statistics
so that at runtime, instead of having to calculate
how many rows or estimated data size of rows
for a billion row table,
which could take a little bit of time,
it already knows that information up front
and it can use it in order to generate a query plan
that will return your data efficiently.
So by default, SQL Server automatically creates
and update statistics on your indexes
whenever a significant amount of data changes.
You can confirm that this is happening on your database
by going to the database properties
and looking at the auto create
and auto update statistics properties.
So if those auto statistics options are turned on,
anytime a significant amount of data
is either inserted or updated in a table,
SQL Server will automatically update the statistics on it.
And the thresholds for automatically updating
those statistics vary based on which version of SQL Server you're on.
So in order to see these statistics that I keep talking about, all we need to do is type
in dbcc show statistics and then the name of our table and the name of the index that
we want to see statistics for.
If we go ahead and run that, we'll get the following output.
And I don't want to turn this video into a deep dive on SQL Server statistics, but the
things to be aware of are things like rows, right, which indicate the number of rows in
a table, the numbers of rows sampled.
So how many rows were sampled to calculate these statistics or maybe it was the whole index of rows that was
used. Things like all density which measure how many distinct values are in your data and of course the histogram which shows the
frequencies for up to 200
ranges of values in your index. And while I'll cover statistics in more detail in another video,
I want to at least make you aware that this is the information that SQL
server uses to generate your query plans.
And so if it's inaccurate or out of date, right,