When SQL servers run into deadlocks, it's always a pain to identify the offending code and fix it. But more immediate is the need to break the deadlock so that the database and the application get back to working condition. In potential deadlock situations, the usual go-to solution is to restart the database server. I would suggest against it because there could be thousands of queries running against the database at any given time and there is no easy way to determine the impact of an unplanned and forced restart.
Is there an alternative approach? Well, let's start from the beginning. Usually, in deadlock situations, only certain functionalities or pages of the application will be unresponsive. We could immediately check the database for long-running queries, using the following query
-- Get currently running queries
SELECT sqltext.TEXT, req.session_id, req.status, req.start_time, req.command,req.cpu_time,
req.total_elapsed_time/1000 AS elapsed_time_in_sec, req.total_elapsed_time/1000/60 AS elapsed_time_in_min
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext
ORDER BY req.total_elapsed_time DESC
If there is a deadlock, you'll see a number of queries that have been running for minutes. There are no such queries in the screenshot below, but I'm sure you'll get the idea.
Now, the next step is to identify the session that has caused the lock-up. There are 2 possibilities here. First is that a SQL admin has logged into the SQL server, started a transaction but forgot to commit/close it. The following query will identify such sessions.
-- Get more details on the waiting tasks in SQL server
SELECT wt.session_id, ot.task_state, wt.wait_type, wt.wait_duration_ms, wt.blocking_session_id, wt.resource_description, es.[host_name], es.[program_name]
FROM sys.dm_os_waiting_tasks wt
INNER JOIN sys.dm_os_tasks ot ON ot.task_address = wt.waiting_task_address
INNER JOIN sys.dm_exec_sessions es ON es.session_id = wt.session_id
WHERE es.is_user_process = 1

As highlighted in the screenshot, the session 53 is blocking the rest of the sessions. Use the following query to identify the admin who started this session and ask him/her to close the transaction.
SELECT * FROM sys.dm_exec_sessions WHERE session_id = 53
Or, the second, more likely case, is that the application code has created the deadlock. In which case, we'll have no alternative but to identify offending session and kill it. Simply remove the WHERE condition from the query that's above the screenshot and use it to identify the offending session. Then, kill the session using
KILL <session_id>
Before killing a session, ensure that you understand what you are doing. Check this blog post to get the basic idea.
Also, check the 'Worker Processes' in IIS to ensure that there are no stuck requests.
Ideally, the timeouts set at various levels (in IIS & SQL server) should help prevent requests from being stuck for too long. When the timeout is reached, the app should gracefully through an error to the user. But if the timeout configuration has been altered manually, then, these checks need to be done manually.
Once the app is back to running condition, you can start to debug the code. If you had saved the results of the queries that we ran earlier (especially the first query), it could help with the debugging process as well.
Is there an alternative approach? Well, let's start from the beginning. Usually, in deadlock situations, only certain functionalities or pages of the application will be unresponsive. We could immediately check the database for long-running queries, using the following query
-- Get currently running queries
SELECT sqltext.TEXT, req.session_id, req.status, req.start_time, req.command,req.cpu_time,
req.total_elapsed_time/1000 AS elapsed_time_in_sec, req.total_elapsed_time/1000/60 AS elapsed_time_in_min
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext
ORDER BY req.total_elapsed_time DESC
If there is a deadlock, you'll see a number of queries that have been running for minutes. There are no such queries in the screenshot below, but I'm sure you'll get the idea.
Now, the next step is to identify the session that has caused the lock-up. There are 2 possibilities here. First is that a SQL admin has logged into the SQL server, started a transaction but forgot to commit/close it. The following query will identify such sessions.
-- Get more details on the waiting tasks in SQL server
SELECT wt.session_id, ot.task_state, wt.wait_type, wt.wait_duration_ms, wt.blocking_session_id, wt.resource_description, es.[host_name], es.[program_name]
FROM sys.dm_os_waiting_tasks wt
INNER JOIN sys.dm_os_tasks ot ON ot.task_address = wt.waiting_task_address
INNER JOIN sys.dm_exec_sessions es ON es.session_id = wt.session_id
WHERE es.is_user_process = 1

As highlighted in the screenshot, the session 53 is blocking the rest of the sessions. Use the following query to identify the admin who started this session and ask him/her to close the transaction.
SELECT * FROM sys.dm_exec_sessions WHERE session_id = 53
Or, the second, more likely case, is that the application code has created the deadlock. In which case, we'll have no alternative but to identify offending session and kill it. Simply remove the WHERE condition from the query that's above the screenshot and use it to identify the offending session. Then, kill the session using
KILL <session_id>
Before killing a session, ensure that you understand what you are doing. Check this blog post to get the basic idea.
Also, check the 'Worker Processes' in IIS to ensure that there are no stuck requests.
Once the app is back to running condition, you can start to debug the code. If you had saved the results of the queries that we ran earlier (especially the first query), it could help with the debugging process as well.

Comments
Post a Comment