TecnoFiccion: Pros y Contras de uso de Read-Uncommitted y NoLock en SQL Server

Pros y Contras de uso de Read-Uncommitted y NoLock en SQL Server

- Este artículo fué enviado por Gerárdo Arevalo, Analista Desarrollador Bases de Datos en IQ Outsourcing (iqoutsourcing.co). A partir de la siguiente línea se copia textualmente su artículo:

Buen día,

Deseo compartir con el equipo de trabajo un interesante artículo que me encontré sobre el uso y mal uso de las banderas (flags/hints) en SQL Server, en específico sobre el uso de Read-Uncommitted y NoLock, ya que he podido identificar en estos son usados sin discriminación entre el grupo en sus desarrollos. Por favor, leanlo y saquen sus propias conclusiones.

http://sqlblog.com/blogs/tamarick_hill/archive/2013/05/06/pros-cons-of-using-read-uncommitted-and-nolock.aspx

Pros and Cons of Using Read-Uncommitted and NoLock

Definitions

Read-Uncommitted Isolation Level

The read-uncommitted isolation level is the least restrictive isolation level within SQL Server, which is also what makes it popular for developers when looking to reduce blocking. Blocking is “typically” not an issue when using this isolation level because the query will not request shared (S) locks on the tables that it is reading. In addition to this, other processes are still allowed to read and modify data within any table that your query may be accessing under this isolation level. I say this is “typically” not an issue because there are some locks that are still generated and can cause certain types of blocking. A schema-stability (Sch-S) lock will be placed on the table(s) being accessed and a shared (S) lock will be placed on the database. The (Sch-S) lock on the table will only prevent DDL actions from occurring on the accessed tables such as dropping columns within the table or dropping the actual table itself. The (S) lock on the database is put in place to prevent the database from being dropped while it is being accessed. When an isolation level is used, the isolation level is in effect for everything running under that current connection.
The syntax for using this isolation level is below:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

Nolock Table Hint

The nolock table hint behind the scenes performs the exact same action as running under the read-uncommitted isolation level. The only difference between the two is that the read-uncommitted isolation level determines the locking mechanism for the entire connection and the nolock table hint determines the locking mechanism for the table that you give the hint to. For example, if you have a stored procedure that will read data from 20 tables, the read-uncommitted isolation level would cause every table to be accessed without requesting shared (S) locks. If you only need a subset of those 20 tables to not acquire shared (S) locks, but have other tables that you need to only read committed data, then you would want to use the nolock hint instead.
The syntax for using a nolock hint is below:
SELECT * FROM DatabaseName.dbo.TableName WITH (NOLOCK)

Benefits

Reduced Blocking

As stated in the above sections, the primary benefit of using the read-uncommitted isolation level or the nolock table hint is that this reduces blocking. Users are able to use these two options and not run the risk of being blocked by readers and writers within the database. In addition to this, users do not have to worry about running the risk of blocking other readers and writers within the database tables.

Risks

Dirty Reads

There are many risks that you run when using the read-uncommitted isolation level or nolock table hints. The first risk that could occur is “Dirty Reads”. A dirty read occurs when your query reads a data page that is different in memory than the page on disk. Any data that has been changed, but not yet committed, is considered to be “dirty”. For some applications this may be ok, but for many applications, this can cause a major problem for the users that rely on the system.
A typical example that is used throughout the SQL Server community to illustrate this issue is: Suppose you need to get some repairs done on your wife’s car and you pull up to an ATM machine to withdraw the money needed. Your account currently has $300 in it. You put your debit card into the machine, enter your pin, and tell the machine you want to withdraw $200. Your wife then calls you and tells you that the car is working fine now and you don’t have to worry about getting repairs. You immediately cancel your transaction and drive off. Little did you know, the weekly balance report that you requested from your bank began processing after you entered your withdrawal amount but before you cancelled your transaction. The report is then emailed to you saying you have a balance of $100. Immediately you are now concerned and wondering what happened to your other $200. The issue is that the bank report read ‘dirty’ data that had not yet been committed and then passed along that ‘dirty’ information to you. This is a primary example of when reading ‘dirty’ data is not a good idea.

Non Repeatable Reads

Non repeatable reads becomes an issue when your application has code that executes which requires for data to be read multiple times and return the same consistent data upon each read. When using the read-uncommitted isolation level or nolock table hints shared (S) locks are not requested on the tables you are reading. As a result writers are able to access those tables and modify the data within them during the middle of your read operations. This can drastically skew your expected results.
To help illustrate this issue: Suppose I go to the bank to deposit some money into my account. I currently have a Checking account with a $100 balance and a Savings account with a $100 balance. I plan to deposit $300 into my checking account. Meanwhile, my wife is at home on her mobile application checking our account balance. The mobile application runs a stored procedure which reads the AccountBalance table one time for each account that you own, and then reads the table one final time to give you the sum of all your accounts. The application shows my wife we have $100 in the Checking account, $100 in the Savings account, and our two accounts have a total value of $500. Based on this information my wife received, she is completely confused by this inaccurate data.
What happened is the stored procedure read the table multiple times expecting the data would be the same in each read. However, an outside transaction (the deposit) was able to modify the data in between the multiple reads that the stored procedure issued. The developer of the stored procedure never anticipated this but he used a nolock hint on the table so that the application would not run into any blocking and results would return quicker. The developer did indeed prevent blocking, but at the expense of creating the risk of the application returning inaccurate data.

Same Data is Read Twice

There are rare occasions when the same data can be read twice when using the read-uncommitted isolation level or nolock hint. To illustrate this issue we have to give a little background first. Clustered Indexes are created on SQL Server tables to physically order the data within the table based on the Cluster Key. The leaf pages of the index contain the data pages which contains the actual data for the table. Data pages can hold 8K worth of data.
Scenario: You have an ETL process that will Extract all records from a table, perform some type of transformation, and then load that data into another table. There are two types of scans that occur in SQL Server to read data: allocation scans and range scans. Range scans occur when you have a specific filter (where clause) for the data you are reading, and an index can be used to help seek out those specific records. When you do not have a filter, an allocation scan is used to scan all of the data pages that have been allocated to that table. Pending you are not doing any type of sort operations, your data will read the data pages in the order as it finds them on the disk. For simplicity, let’s assume there is no fragmentation so your data pages are in order 1-10. So far your process has read pages 1-6. Remember your process is not requesting shared (S) locks so you are not blocking other users. Meanwhile, another process begins which inserts records into your table. This process attempts to insert records onto Page 3, but the page is full and the record will not fit. As a result the page has to be split and half of the records will remain on Page 3 and the other records will be moved to a new page which will be page 11. Your process has already read the data that was on Page 3, but now half of that data has been moved to page 11. As a result, as your process continues it will read Page 11 which contains data that has already been read. If there is no type of checks on the destination table, you will end up with bad duplicate data.
This is an example of how changing the locking mechanism can cause you to read duplicate data into your process and reduce the integrity of the data.

Phantom Reads

A phantom read occurs when you read data that is there one minute, and gone the next. A phantom read can become problematic when you have a process that performs some type of operation based on the data that it has read. To illustrate this issue, suppose you have stock in CompanyA and CompanyA decides to pay all of its stock holders a dividend. There is a stored procedure that first reads a table with all of the current stock holders and builds a temp table based on that list. Then the stored procedure initiates a dividend payout process for all of the accounts in the temp table. In the middle of this procedure running, a stock-holder sales all of his shares in the company. The sale of the stock did not complete until after the temp table was built, but before the dividend process completed. This will cause an issue with the dividend process because it will not be able to pay the dividend since the person no longer owns the stock.
Using the read uncommitted isolation level or no lock hint allowed the user to sale his stock at the same time the dividend process was running. If this isolation level or hint was not used, the user would have been blocked and unable to sale his stock until after the dividend payout completed.

Misconceptions

Using nolock table hints for update or delete statements

Sometimes I see developers create code which performs Update or Delete statements and use the nolock table hint. It is important to note that this hint only works for read-only operations. Anytime you are modifying data, such as in an Update or Delete statement, this hint is completely ignored by SQL Server. When performing these types of operations, locks have to be generated to ensure that transactions are atomic, consistent, isolated, and durable.

Conclusion

SQL Server is a very complex enterprise database solution with many options and flags that can be changed to alter the behavior of SQL Server. Although many of these options have a justified use, it is important to understand the risks that are associated with changing these options. The read-uncommitted isolation level and nolock table hint are no exception to this rule. Generally it is best practice to stick with the default isolation level and refrain from using table/query hints unless it is absolutely necessary and the solution has been thoroughly tested. Using read-uncommitted and nolock should be the EXCEPTION and not the RULE.

Replicado en este blog por:

Edwin Guzmán
@Tecnoficcion

TecnoFiccion