Optimization of SQL - samwellwang

1. Why Optimize SQL#

In the early stages of our project launch, due to the relatively small amount of business data, the execution efficiency of some SQL statements does not significantly impact the program's running efficiency. Developers and operations personnel are also unable to determine how much SQL affects the program's running efficiency, so there is rarely any specialized optimization for SQL. However, as time accumulates and the volume of business data increases, the execution efficiency of SQL gradually impacts the program's running efficiency, making SQL optimization necessary.

2. Some Methods for SQL Optimization#

Optimize queries by avoiding full table scans whenever possible. First, consider creating indexes on the columns involved in the WHERE and ORDER BY clauses.
Avoid null value checks on fields in the WHERE clause, as this will cause the engine to abandon using indexes and perform a full table scan. For example:
select id from t where num is null
You can set a default value of 0 on num to ensure that there are no null values in the num column, then query like this:
select id from t where num=0
Avoid using != or <> operators in the WHERE clause, as this will cause the engine to abandon using indexes and perform a full table scan.
Avoid using or to connect conditions in the WHERE clause, as this will lead the engine to abandon using indexes and perform a full table scan. For example:
select id from t where num=10 or num=20
You can query like this:
select id from t where num=10
union all
select id from t where num=20
Use in and not in cautiously, as they can lead to full table scans. For example:
select id from t where num in(1,2,3)
For consecutive values, use between instead of in:
select id from t where num between 1 and 3
The following query will also lead to a full table scan:
select id from t where name like ‘%abc%’
Avoid performing expression operations on fields in the WHERE clause, as this will cause the engine to abandon using indexes and perform a full table scan. For example:
select id from t where num/2=100
Should be changed to:
select id from t where num=100*2
Avoid using function operations on fields in the WHERE clause, as this will cause the engine to abandon using indexes and perform a full table scan. For example:
select id from t where substring(name,1,3)=’abc’ – id starting with abc
Should be changed to:
select id from t where name like ‘abc%’
Do not perform functions, arithmetic operations, or other expression operations on the left side of = in the WHERE clause, as this may prevent the system from correctly using indexes.
When using indexed fields as conditions, if the index is a composite index, you must use the first field in that index as a condition to ensure the system uses that index; otherwise, the index will not be used. Additionally, try to keep the field order consistent with the index order.
Avoid writing meaningless queries, such as generating an empty table structure:
select col1,col2 into #t from t where 1=0
This type of code will not return any result set but will consume system resources. It should be changed to:
create table #t(…)
Often, using exists instead of in is a good choice:
select num from a where num in(select num from b)
Replace with:
select num from a where exists(select 1 from b where num=a.num)
Not all indexes are effective for queries. SQL optimizes queries based on the data in the table. When there is a large amount of duplicate data in the indexed column, SQL queries may not utilize the index. For example, if a table has a field sex with male and female almost equally, having an index on sex will not improve query efficiency.
More indexes are not always better. While indexes can improve the efficiency of corresponding selects, they also reduce the efficiency of inserts and updates, as inserting or updating may require rebuilding indexes. Therefore, how to create indexes needs careful consideration based on specific situations. Ideally, a table should not have more than 6 indexes; if there are too many, consider whether indexes on infrequently used columns are necessary.
Try to use numeric fields. Fields that only contain numeric information should not be designed as character types, as this will reduce query and join performance and increase storage overhead. This is because the engine compares each character in the string one by one during query and join processing, while for numeric types, only one comparison is needed.
Use varchar instead of char whenever possible, as variable-length fields save storage space and improve search efficiency within a relatively small field.
Never use select * from t; replace * with a specific field list and avoid returning any unnecessary fields.
Avoid frequently creating and deleting temporary tables to reduce the consumption of system table resources.
Temporary tables are not unusable; using them appropriately can make certain routines more efficient, especially when needing to repeatedly reference a large or commonly used dataset. However, for one-time events, it is better to use export tables.
When creating temporary tables, if a large amount of data needs to be inserted at once, use select into instead of create table to avoid generating a large log and improve speed. If the data volume is small, to ease the consumption of system table resources, first create the table, then insert.
If temporary tables are used, be sure to explicitly delete all temporary tables at the end of the stored procedure: first truncate table, then drop table, to avoid long-term locking of system tables.
Avoid using cursors, as they are less efficient. If the data operated on by the cursor exceeds 10,000 rows, consider rewriting it.
Before using cursor-based methods or temporary table methods, first look for set-based solutions to solve the problem, as set-based methods are usually more efficient.
Like temporary tables, cursors are not unusable. Using FAST_FORWARD cursors on small datasets is usually better than other row-by-row processing methods, especially when multiple tables must be referenced to obtain the required data. Routines that include "totals" in the result set are usually faster than those executed using cursors. If development time allows, both cursor-based and set-based methods can be tried to see which one performs better.
Avoid large transaction operations to improve system concurrency.
Avoid returning large amounts of data to the client. If the data volume is too large, consider whether the corresponding requirements are reasonable.