In this post, Pashupati Shrestha, Lead Software Engineer at Miles Technologies, discusses some common SQL development pitfalls to avoid and gives his 10 best tips for writing faster SQL queries.
SQL Server is one of the most widely used and cross industry standard platforms for relational database management system (RDBMS) from Microsoft. Like any other RDBMS, SQL Server has its own Query Language called T-SQL. SQL Server has to do a certain amount of work internally to return the requested data to the end user. It uses SQL Optimizer to understand the user’s needs and optimize the process of data retrieval.
However, we shouldn’t depend completely on the intelligence of the SQL optimizer. SQL developers should proactively be cautious while writing their code to help SQL optimizer so that the process of data retrieval gets faster and smarter. This can change the course of how well your application performs. Let’s take a look at some of the best practices for speeding up our SQL Queries.
1. Ensure that you return minimum amount of data
The amount of data returned form the database is very crucial. You might be processing a lot more data than you think. In most of the list, only a limited number of rows and columns are displayed. SQL developers should consider implementing paging methodologies to limit number of rows returned from SQL Server. There are many built in methods in SQL which can be used to achieve this goal such as Ranking Functions, TOP keyword and OFFSET FETCH. Also, sometimes developers make the mistake of writing their queries with SELECT * instead of listing the columns individually. The problem again is that it pulls more data than we need. We might end up selecting all columns from a table with hundreds of columns and end up using only five of them.
2. Write SARGable Queries
A query is SARGable (having correct search arguments) when SQL optimizer can optimize the execution plan that the query uses. It is very important that SQL optimizer has the most optimized execution plan for your query which takes the benefit of existing Indexes. It is very common that you might end up including a field inside a function in the WHERE clause for your convenience. As a result, your query becomes non SARGable and hence SQL optimizer can’t use an index to optimize the query even if it exists.
3. Use Inline table valued user-defined functions (TV-UDF) instead of Multi statement TV-UDFs
SQL Optimizer treats an Inline TV-UDF the same way it treats a view, meaning it performs an index seek against the clustered index of the base table. This is very important and crucial in terms of performance that the query doesn’t need to scan the whole table in order to retrieve the result set. So, use inline UDFs instead of multi statement UDFs whenever possible.
4. Correlated Sub query is not expensive anymore
A correlated sub query is one that depends on a value in the outer query. It has always been thought of as a bad practice and been avoided. There are other alternatives to achieve the same result such as using JOINs. The good news is that SQL 2012 Query Optimizer is completely in favor of Correlated Sub Queries and treats them as INNER JOIN. Yes, you can avoid having LEFT JOINS in many cases using correlated sub queries.
5. Union ALL vs. Union
Even though UNION ALL and UNION look the same, they work completely different. UNION ALL simply unions result sets whereas UNION makes sure that the resulting result is Unique (No duplicates). That means you might end up imposing a huge overhead on SQL when you use UNION even though you know the statements you are trying to union are mutually exclusive.
6. Use Query Hint WITH RECOMPILE when needed in a stored procedure
When SQL Server executes procedures, any parameter values that are used by the procedure when it compiles are included as part of generating the query plan. If these values represent the typical ones with which the procedure is subsequently called, then the procedure benefits from the query plan every time that it compiles and executes. If parameter values on the procedure are frequently atypical, forcing a recompile of the procedure and a new plan based on different parameter values can improve performance. This can be achieved by using query hint WITH RECOMPILE.
7. Pass Parameters as Null
In most of cases, a procedure which filters the dataset to return a result set takes multiple parameters. However, when an end user uses the application they pass very few parameters based on their need. So, a SQL developer should pass a null parameter rather than passing a default value so that SQL optimizer skips that clause in the where statement.
8. Consider pre-calculating some values
Queries related to reports are often expensive for obvious reasons. They include many complex calculations. Most of the times developers tend to repeat the same queries for calculating the same values multiple times within a single procedure. The query will perform better if those calculations are pre-calculated and stored in a variable. Ex: A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report.
9. Avoid using SQL Cursors
If a SQL developer doesn’t know about SQL Cursors, good for them. The way cursors work in SQL is not optimized and often causes problems such as locking database tables. I have witnessed that developers tend to use cursors even though there are better and faster alternatives. One of the most common reasons the cursor is used is when we need to loop through a result set a row at a time and perform a certain action for each row. Cursors may be the easiest method to achieve this goal by nature, but developers should think about other alternatives. How about using a SQL WHILE loop or using a user defined function (UDF) instead?
10. Avoid using DISTINCT
Another keyword I love to know that SQL developers don’t know is DISTINCT. When we use this Keyword to get distinct result sets, SQL has to make sure that each row and column is unique. The SQL engine has to go through a one to one check. Often, developers use the DISTINCT keyword in select statement because they do not want to Group BY all selected columns. This is the reason that I love to know that developers don’t know the DISTINCT keyword. How about stepping one step above and consider writing a derived table or CTE or something to make sure the result set is unique instead of using such expensive keywords and avoid SQL engine from suffering?
Do you agree with our best practices? Let us know in the comments section below!
Want to learn more about how custom software solutions can help improve the performance for your business? Contact us today to speak to an expert.