DevOps vs DataOps

Aqua Data Studio

Our speaker today is Brian Kelly, who’s a SQL Server author, columnist and presenter, and today he will show us how we can identify and fix SQL Server performance issues. We encourage you to ask questions, to ask a question, answer your question in the question area and your presentation controls, and we will try to answer as many questions as possible during today’s webcast. In addition to today’s event, we invite you to check out all the free resources on Mscaltips.com. If you’re not already a member of the community, we invite you to join.

SQL Server Statistics

Today we’re going to talk about outdated statistics. A lot of times when I’m looking at performance issues, this is the first place I’ll start because statistics play a key role in SQL Server determining how it builds its query plan. We’ll look at what to look at, to see if we might potentially have outdated statistics and then we’ll move on to some other things.

Performance Issues

One of the other areas that I often see when I’m doing performance troubleshooting is you’ll have a query that runs well and it runs great and then all of a sudden it stops running well. Some processes that usually took less than a second. Now users are complaining because they’re seeing a 15 or 20 or 25 2nd delay and supposedly nothing changed with regards to that application and that process. The developers will come back and they’ll tell you, well, we didn’t modify those screens, we didn’t modify that functionality of the code. What we’ll see is that something else got modified or something else got added. What’s happening is it’s causing a problem because it’s stepping on the process that you are getting complaints about and those usually manifest as blocking issues. We’ll look at that, we’ll also look at just understanding what’s going on, right? We’ll also talk about how to find bad query plans.

SQL Server does a great job of trying to reuse query plans and we can take advantage of that to go look inside SQL Server if we don’t happen to catch it in flight and see what was run and maybe hopefully be able to pull out some plans that weren’t so good and that might give us some clues as to where to go from there.

Indexes

We’ll also look at indexes because a lot of times query plans are bad and we’ll be sitting there going, well, I should be seeing it find something very quickly, but I’m seeing it take an entire table scan. Why am I seeing that? In some cases, it’s because we simply didn’t have the right indexes in place. In other cases, maybe there’s some modification of the query. We’ll look at both of those cases in that. With regards to all of this though, we do need to make a couple of assumptions and the assumptions we’re making today, that it’s not a hardware issue, right?

Hardware

Normally when I go to troubleshoot, the first thing I always do is I look at the hardware, what’s the CPU doing, what’s the memory doing, am I having a storage issue? Those kinds of things we’re picking up from the point where we’ve ruled out hardware because obviously if there’s hardware issues, there’s no point in going any further. The hardware is going to continue to be an issue for you. We’re also making the assumption that the server isn’t misconfigured in any way, and I mean at the OS level. No one has done anything to cause performance issues, no one’s added an additional application that’s suddenly sucking up half of your CPU or monopolizing the memory, right? We’re also assuming that SQL Server hasn’t been misconfigured.

Database Performance

We also have ruled out that the performance issue is definitely in the database layer, it’s not at the app layer. Now, I know there are various statistics out there that say that 50% or 60% or 70% of all performance issues are at the app layer. That may be the case, but that also means that there are times when the performance is within the database itself. We’re making the assumption that, yes, indeed, we’ve concluded that it’s at the database level. Now, obviously, when you go through troubleshooting, you may have a team looking at hardware, you may have a team looking at the app layer, and then you may have DBAs or database professionals looking at the database layer as well, if that’s the case. This is focused for the folks that are focused on the database layer, right? Those are the assumptions we’ll make today. We’ll assume that the problem is in the database or a query going into the database and that’s where we need to find the issue and solve it.

Statistics in SQL Server

So let’s talk about statistics. Now, if you’re not familiar with statistics, and I find that statistics are something that some people are very familiar with, and other people know that they exist in SQL Server and they’re not very familiar with, and then some people know them very well.

SQL Server Query Plan

Basically when SQL Server is trying to determine a query plan, what it does is it tries to determine the best plan within a reasonable amount of time. It’s not going to try every single possible permutation to figure out the best way to run the query, because technically that could be an infinite number of tries, especially with large joins and large numbers of tables and so on and so forth. It tries to make a reasonable estimation of the best plan forward. In order to do that, what SQL Server does is it samples the data in your tables and it samples the data.

It basically has an idea of how the data is broken up, what percentage of uniqueness you have, that thing. It can get some ideas of whether or not it is good to use an index here, or to use this index over that index, or to use this type of operator, or so on and so forth. What those are called within SQL Server are statistics. If your statistics are not up to date, then you have a problem. Because what SQL Server is doing is it’s working with effectively bad data, if you want to think of it along those lines, right? It’s trying to make an educated guess as the best path forward. But it’s working with old data. If your statistics are out of date, that’s exactly what that means. It’s no different than trying to make a decision with old data when there’s other new data that could cause us to change our decision potentially, right?

Update Statistics in SQL Server

It’s important to make sure that statistics are up to date, because you could be looking at a database and you can be going, I’ve checked everything out, but it just seems like it’s taking forever and I’m getting very bad performance. Maybe you’ve pulled a query plan out or two, and I just don’t understand why it’s picking that particular query plan. Well, the first place to always look at the statistics. The good news is that SQL Server does give us information so that we can determine how often statistics are being updated. Per whether it’s an index that has been created or auto-generated statistics, if you’ve turned that on, where SQL Server goes, I don’t have an index there, but I do need to do some sampling because I see something. It auto-creates its own statistics, it does its own sampling, and so on and so forth.

SQL Server lets you see when statistics were last updated for each individual item, and it also will give you an idea of how many times data has been modified since the last update. If you see that a statistic hasn’t been updated for even two years, but the data hasn’t changed very much, if at all, that’s not a big deal. Even if you see something where a statistic is, say maybe only two days old, but you look at the number of modifications and it’s a huge number, and you have performance issues related to the object where that statistic is working against, then that may be a candidate to rebuild the statistics. Here’s an example of just an output of statistics.

Filter Schema

I want to filter out any objects belonging to the SIS Schema which are internal to SQL Server. The statistics of those don’t get updated and so it’s just noise. By joining to Sys Schemas, I filter that out and I’m going to order it by when the statistic was last updated. Let’s run this first one and take a look. If I look, if I’m looking at when statistics were last updated a few years ago, right, this is a sample database. Statistics don’t get updated very often, but notice that for all of these that I’m seeing initially, the modification counter is zero. That means from the time that the statistics were created, there have been no changes and that makes perfect sense. This is an Adventure Work 2014 database. I don’t tend to play with those tables. I tend to restore from backup after every demo. Every time I do something and so okay, not a problem, right?

If I don’t see modification counters that are very high, then probably the statistics are not out of date, I don’t have anything to worry about. If I’m just looking at this perspective, I’m probably not getting the information I want. What we probably want to do, since modification not counter is important, is sort by that. This query here is basically the same thing. Notice that the order changes to the modification counter and of course I’m using descending in order to be able to see the largest number. Now this too may not tell me a whole lot because as the case was with the System Center 2012 snapshot that I took and took a picture of and put in the slides, if I’m not seeing a performance issue, then I don’t need to worry about the statistics and that’s not the issue, right? If it’s on an object that’s not causing a performance issue, then I don’t need to go any further if we’re trying to troubleshoot in real time and figure out what’s going on.

When we look at this, what we notice is that we do have an object database log and the statistics on it belong to the index, the primary key in this case, we can guess from the name. Notice that the modification counter is high: 1488. Now, within this database, here’s another example of being able to see that we have a statistic that potentially needs updating. Again, it’s one of those things from a troubleshooting perspective, am I really seeing an issue or am I not?

Auto Update Statistics in SQL Server

We tend to only worry about statistics updating when we need to. Now, SQL Server allows you to turn statistics on and off, it’s up to you, and that’s at the database level, you can always run a statistics update manually. Now, there are some times when you will find that the recommendation for the application is to turn statistics off. At least the auto-update and statistics off. SharePoint is that way because SharePoint would rather control when it does that. The reality is that if we are not seeing a performance issue, even though the statistic is out of what we would think of, maybe out of date, or we see a lot of modification counters, then don’t worry about it’s likely something else. However, if we’re seeing a performance issue, we see a date that is a good way away, especially if a high modification counter, then probably let’s go ahead and rebuild the statistic on that particular statistic or all the statistics for a particular table and move forward from there.

If you wanted to do something like that, we could do it. Even though I know that I won’t ever use it again, if I wanted to update it, I could run something like this. Of course, it fires very fast because there are only a few sets of rows, it’s only 200 rows or so. Now when I go to run it, I notice it completely drops off altogether. The reason for that, of course, is I just updated statistics, right? Larger tables with a lot of data that has to be sampled will obviously take longer to run. That’s always a consideration with regards to updating statistics. Hopefully, that makes sense to everyone. Statistics are key because like I said, that’s what SQL Server uses to determine the execution plan. If you’ve rolled out statistics or you’ve looked at statistics, but they don’t seem like a likely case, then the next question to ask is what’s running right now?

Database Performance Troubleshooting

Especially if you have a performance issue going on right now as you speak, as you’re trying to troubleshoot, not something that ran long in the nightly cycle, but you’re getting calls and your manager standing over your desk and going, what’s running long? Or you’re getting the Im from your manager if you work remotely or with remote teams. What we want to do is we want to look at what’s currently running, especially against the database in question. We may even check to see if there’s anything that has an estimated time to complete. It’s very long. For most queries, estimated time to complete is going to be zero. Usually, we only see this with regard to big operations, like someone doing a DBC check, DB or something along those lines. I just include that because if I do see a number greater than zero, then I want to know why.

Because if someone is running, say a check DB operation in the middle of when they shouldn’t be. I want to know that if they’re doing something else and they shouldn’t be, I need to know that because that could be causing performance issues. We definitely want to see that and just realize that with live queries, we certainly can capture execution plans as long as the query is currently running. If the execution plan has been generated, we should be able to find it in the plan cache.

Clustered vs Nonclustered Index

This is a bookmark lookup and what that means is that in order to execute this query, SQL Server is actually hitting the data for that table twice. The first to find the information in a nonclustered index, pull the indicator to find where the data is within the clustered index and then recover everything. There are ways to solve this and it probably has something to do with the query itself, which will come back and troubleshoot later. One little thing that we can know stands out is notice that I’m doing a select star, right? We’ll focus on that when we go to look at how to troubleshoot. This is one of those reasons that we recommend not to use select star when we can help it. Because the select star says give me back all the columns. Unless all the columns are stored in the index that it’s using, what’s going to end up happening is that it’s going to go back and it’s going to hit the clustered index.

Nonclustered Index

Your non-clustered index it may be using because of say, filtering that’s in the predicate clause, like the square clause where p last name equals Smith. It knows it has an index where it can basically pare down how much data retrieved based on that last name of Smith. Because there are other columns that are not contained in that index, it’s got to go back and get all the data from something else. We have two read operations happening, right? This would be a query that we would look at for performance too, by modifying the query, by rewriting it to only include the columns that we need. Like I said, we’ll look at an example of how to do that later on. All right, going back, what about blocking? I gave the example of a process that is running just fine. A part of the application is running just fine and it’s always been running fine.

Query Blocking

One day you start getting a bunch of phone calls and nothing has changed in regards to that process. One of the things to always look at, if it’s not statistics, is something else coming along that is stepping on everything. There’s been a lot of times where I have seen this. They made a modification to one part of the app and maybe they didn’t put it under a full load test. It’s run with a limited number of users, with a limited data set in nonprod or on a developer workstation even, and then it goes directly to prod, they never see the issue. They get to production and all of a sudden everything blows up, right? Chances are it’s because under load, with more data, with more users, you’re getting users stepping on each other. In SQL Server, we call that blocking because one query will block another, right?

Database Execution Plans

This is something especially looked at when we know that the execution plan is good. For instance, if you just did that quick capture, you look at the execution plan, you see a bunch of seeks, it doesn’t look anything different than the last time you looked at it when everything was running fine. You check the statistics, they’re good. The next question is to ask is there something blocking? And actually, because of the fact. That blocking happens often enough. This is sometimes people’s first thing to look at before they look at anything else. Because obviously it’s very easy to spot if processes are stepping on each other, right? With blocking, it’s where I need a resource. Let’s say I need a table, but something else is currently accessing that table and I can’t get there until it’s done, right? That’s what blocking is. That’s what it means by stepping on each other.

How to Check Blocking in SQL Server

Let’s look at how we can see that if we go back and we don’t need this anymore. Obviously we don’t need the execution plan anymore and we don’t need that query anymore. Let’s pull up how to check blocking. Now with regards to blocking, the information is in DM exec requests. There is a blocking session ID. If a particular session is being blocked, then system request will tell you that, hey, this is the session that’s blocking it. If I were to run this right now, we would expect us to not have anything there because nothing is being blocked, right? Let’s create a situation where we do in fact see the block. So let me set this up.

TempDB

Now I’m going to use Tempdb. I’m going to create a table, a simple table. Begin a transaction. I’m going to insert into that table and notice that I’ve commented out the commit, right?

That means that this is going to hold an exclusive lock on that table until I commit to tran or roll it back. Either commit transaction or transaction has to happen. Until then, I’ve got a lock. If I try to query from the table after this, it’s going to have a blocking issue. Let’s go ahead and run this query. I’m blocked if I run anything else. Let’s run something that will get blocked. Simply, let’s just select and notice that it’s executing a query. I’m not getting anything back. I only inserted one row, right? Why is this a problem? And it’s because we’re getting the blocking. If we go and rerun this, we do see in fact that session 58 is being blocked by session 57. Right? Now, I want more information than just this. I mean, I can get this information 58 being blocked by 57.

Now if I go back and I run this big long query, I’m going to get nothing back, obviously, because if I ran the shorter query, there’s nothing being blocked. By checking for blocking situations, you may find where processors are stepping on each other and you may have something that is perfectly well written and if it was running by itself and it had no interference, it would run really super fast. There are other things going on in the database that are causing issues. By the way, this is also one of the reasons why we recommend testing with production-like data and a production-like amount. If you’ve got 100,000 rows in one table in production, then you want to do a load test in your testing environment for 100,000 rows, not 100 rows. If you’re expecting 100 or 150 concurrent users and you see that regularly in production, then you want to run your testing not with two users or three users, but 150 users in order to be able to try to get some assessment of whether or not you’re going to have these issues.

Because it’d be better to run these types of queries and test and find the problems there than for them to migrate obviously to production. That is when you find out for the first time that you’re having a problem. A lot of times, unfortunately, we do see these types of things in production and the thing to remember from it is that you can get this information very easily. Sys DMX request that’s a go to for anything live running. In this particular case, the transaction locks, the tran underscore locks are telling us what types of locks are being taken. All right, so what happens if we can’t capture the information in real time? We’re notified about it, but the problem has passed, right? It could even be that the problem was just even 30 minutes ago, or maybe it ran during the nightly cycle. If that’s the case, what we try to do is we’ll try to use the plan cache that SQL Server has.

SQL Server Plan Cache

Now, the problem with the plan cache is that obviously if you have a lot going on, if you’ve got a lot of stuff being inserted into the cache, SQL Server has processes that basically get rid of plans that aren’t used very often or that are particularly old when it has to put in certain new ones. The catch here is to make sure that it hasn’t aged out. If it has aged out, then you’re not going to be able to find it. A lot of times, especially if we know very shortly after a fact that there’s a performance issue, we can look at the cache that SQL Server keeps for its own use. Because the reason it keeps the cache is that if it sees a query similar, it’s just going to pull the execution plan again and rerun it, right? And it parameterized those other things.

Cache Plans Query

A SQL Server looks and goes, oh, they want every row from the table. Okay, fine, here you go. You may not see an entry for something as simple as this, but we’ll go ahead and run it and we’ll see if we do, right? This one is the one that’s more complex that we’ve run before it has the join. This one should have an execution plan cached. We’ll go ahead and run it, we’ve run it twice. We now have the potential for execution plans. Now let’s get the information out. All right, so the problem is that before I use DM exec requests right now, because it’s not running in real time, I can’t grab it from something that’s running.

What do I use? Well, all the cache execution plans I can get a look at by using this DM exact cache plant. Basically it’s the same query as before, but this time, instead of going against the exact request, I’m going against exact cache plans.

AD HOC Query

Let’s go ahead and run it and see what we get. All right, so we get several things. One, the query that I’m running right now, and notice it says it’s ad hoc. Of course, because I’m running a query, I’m not running a stored procedure, I’m not doing any kind of prepared statement using.NET or anything like that to tell SQL Server, hey, this is a SQL statement coming, go ahead, prepare, pre, compile it, whatever. I can actually see enough of the text and I can actually see all of it if I expand it out. This is the one I said that you might not see cash because it’s fairly trivial. In this particular case, the cash is empty, there’s not a lot going on. So, yeah, it’s there. This one is the more complex one. Let’s go ahead and look at what the execution plans look like.

Clustered vs Nonclustered Index

All right, so this is a really simple one where I’m telling it to return every row, and because I’m telling it to return every row out of the table, it makes sense that SQL Server gives me a clustered index scan. Now, if every single row happened to be in a single column, and happened to be in a non-clustered index, I might see a clustered or a non-clustered index scan. Oftentimes you’ll get people that say who don’t think through what a query is doing, oh, I see a scan, that’s bad. Well, ask yourself what the query is doing. In this particular case, am I returning the majority of the rows? Well, yeah, I am, actually. I’m returning all the rows. Well, if I’m returning all the rows, there’s no point doing anything other than a scan. A SEC implies that I’m only using only returning a handful of rows compared to the total size of the table.

Clustered Index

In this case, I’m returning all the rows, so I expect to see a scan. Just because you see a scan doesn’t mean something is bad. It may be that’s the intent based on the query, and that’s where we look at the text of the query and see what’s going on. Right, so we’ve seen that one. What about the more complex one? Do we still see the same? Yes, we do, because it’s highlighting. We do see that bookmark, look up, right? We see the index. Seek finds it on clustered, it finds it based on last name. We see IX underscore person underscore last name. I don’t even need to highlight over that to see what object is query on. That index actually is descriptive enough that I know exactly what it’s doing. That’s one reason for having good index names. Notice though that it goes against the primary key because that’s the clustered index to get the rest of the information it needs.

We know that it’s doing that, of course, because of the fact that we did a select star. It has to get every single column and not every column is in that index. Fairly straightforward. All right, so if I’m looking at it and I don’t identify something like that bookmark, look up. Another reason that I may be getting a poor-performing query or I may be seeing a scan when I’m expecting maybe a seek, is because I have missing indexes. Now obviously if I have the query plans cached and I can grab them all as well, right? There are cases where I can’t get that information because maybe the plan has basically aged out because I’ve got other activity going on a database.

Index Fragmentation

You get the idea. One last thing is index fragmentation. I don’t have any queries on this, but it’s just something to think about. You could have indexes, but they could be badly fragmented and that could be causing a performance issue as well. For instance, a lot of times when you call Microsoft for something like System Center, the first thing I’m going to ask you is, are all the indexes defragmented fragmentation can? It doesn’t necessarily lead to poor points, but it can. If you can’t rule that, if you can’t figure out anything else, that’s one area to look at. Now, as far as regular maintenance. Microsoft recommendations are right here. You can also find the books online. If a particular index shows less than or equal to 5% fragmentation, don’t touch it’s fine, or it should be fine. And again, these are general recommendations. Of course, your mileage may vary based on the application and the database and the amount of data, but it’s to take it into account, right?

Defrag Index

If you’re between five and 30%, you can do basically a defrag, which just reorganizes a 30%. Basically it rebuilds the entire index. There are other flags that you can do, like online or not or what have you. Again, this is something else to look at when you’re looking at performance. Fragmentation is something that happens over time with databases. It’s not a reflection of poor skills, it’s just a reflection of as I do inserts, as I do updates and delete data changes. It’s just like the file system, it will fragment over time. We’ll skip the demo simply because it’s just running a query to say what are the index fragmentations? If you grab the scripts, you can see it’s included. So here’s what we covered today. We covered outdated statistics, why statistics are important, and why they could be a performance query problem because of the fact that SQL Server relies on them to build the execution plans.

Aqua Data Studio

Great. Thanks so much. Great presentation, Brian. As a compliment to SSMS, if you are looking for support for a lot of these same functionalities for additional platforms, you might consider Aqua Data Studio.

Database IDE

It is a universal database IDE that helps all kinds of It professionals working with their data and databases, including developing and managing databases, accessing a variety of data sources, importing and exporting information, as well as visually analyzing your data.

Data Platforms

We can connect to a range of platforms relational, NoSQL and Cloud databases, as well as running on all major operating systems, Windows, macOS and Linux. Aqua Data Studio is used by a wide range of personas, from the more technical database developers and administrators to the business and data analysts, as well as the more data modeling side, data modelers, data architects and administrators and managers. All kinds of folks can take advantage of the different features in Aqua Data Studio, which are very comprehensive.

We can register and connect to servers and databases across over 28 platforms. If we don’t have native support for a platform, you can access it through JDBC and ODBC. We have database object navigation query analyzer for an editor query builder table data editing, schema and data comparisons importing exporting data and DDL visual analytics, ER modeling and database administration. We have a lot of those features specifically for SQL Server, as well as these other amazing numbers of platforms. Again, both relational platforms such as SQL Server, Oracle, DB, two relational platforms of the cloud platforms like Azure SQL Database, as well as big data platforms like Teradata, MongoDB, Hive and Spark. A lot of different support here in this tool. You have one nice user interface that works with all of these different platforms.

SQL Query Builder

Specifically with the Query Builder, you can easily construct these complex queries without having to know the details of how to create those sequel statements. You just do a drag and drop to choose the tables and the fields that you’re interested in and select the relationship so that you can create that query automatically and run them from within this interface.

Query Analyzer

If you do need to be more specific, you can use our Query Analyzer that has a built in text editor as well. You can create those select statements, the SQL statements such as what Brian was showing you, and you could cut and paste from SSMS and paste those into Aqua Data Studio if that’s beneficial to you.

Aqua Data Studio also helps you with autocomplete suggestions as well as formatting options. When you look at the results of the query, they’re in a nice grid layout, and then you can summarize those and analyze the data in useful charts right within the Query Analyzer tool.

Visual Analytics

If you want to take the next step, you can incorporate those data results into our Visual Analytics tool, which can create more detailed graphs and charts for use by your data analysts. Explain Plans are one thing that Brian mentioned quite a bit, and Aqua Data Studio also creates Explain Plans so that you can view your diagrams and the statistics associated with those for a query execution plan and then use that information to tune your queries and improve your server performance. You can save these diagrams as Jpg or PNG images if those are something you want to save and share with your team and the DBA tools.

Aqua Data Studio

As I mentioned earlier, we have these as well, and they include instance management sessions, agents, storage and security management so that you can easily see what’s going on in your databases and know who has the access to those databases. That will wrap up my summary of Aqua Data Studio. It has a whole lot of great capabilities that will help you improve productivity no matter what data professional job you are doing, and you can learn more about Aqua Data Studio and download a free trial.