Job Activity Monitor showing that a job/step is still running even though the job has completed (without running all steps)

This is one of those freaky scenarios where you just pull your hair out…this morning it happened again (second time).
My ETL did ran, but the Job Activity Monitor is showing that it’s still running somehow:

sql-agent-monitor-is-showing-running-while-job-has-finished-stopped-first-step


Turns out that it just completed Step 1 and after that stopped…
The thing is that when you dive deep into the logging, you notice that ‘Step 0 (Job Outcome)‘ is missing and for some reason SQL Agent decided it could stop after Step 1….WTF
(and yes, it suppossed to go to the next step)

sql-agent-monitor-is-showing-running-while-job-has-finished-stopped-first-step-job-history-query-step-0


The Query:


SELECT
j.name AS JobName
,j.job_id AS JobId
,CAST(DATEADD(S,(run_time/10000)*60*60 /* hours */
+((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */
+ (run_time - (run_time/100) * 100) /* secs */
,CONVERT(DATETIME,RTRIM(run_date),113))
AS DATETIME) AS ExecutionTime
,h.step_id AS StepId
,h.step_name AS StepName
,h.run_status AS StatusId
,StatusDesc = CASE
WHEN h.run_status=0 THEN 'Failed'
WHEN h.run_status=1 THEN 'Succeeded'
WHEN h.run_status=2 THEN 'Retry'
WHEN h.run_status=3 THEN 'Cancelled'
ELSE 'Unknown'
END
,s.on_success_action
,ActionDesc = CASE
WHEN s.on_success_action = 1 THEN 'QuitWithSuccess'
WHEN s.on_success_action = 3 THEN 'GotoNextStep'
ELSE 'http://technet.microsoft.com/en-us/library/ms142620(v=sql.105).aspx'
END
,h.*
FROM dbo.sysjobs j
INNER JOIN dbo.sysjobhistory h ON j.job_id = h.job_id
LEFT JOIN dbo.sysjobsteps s ON j.job_id = s.job_id AND h.step_id = s.step_id
WHERE DATEADD(S,
(run_time/10000)*60*60 /* hours */
+((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */
+ (run_time - (run_time/100) * 100) /* secs */,
CONVERT(DATETIME,RTRIM(run_date),113)) >= DATEADD(d,-2,GetDate())
AND j.name IN ('DWH_TEST (daily): *Step 2')
ORDER BY 1,ExecutionTime DESC



And this time I’ve decided to at least build some kind of mechanism to detect of prevent it.
After a couple of hours of brainstorming and testing things, I found that the shitty thing is is that there’s no ‘BatchId’ of some sort that combines the job_id with step_id and run_date/run_time in dbo.sysjobhistory.

So I figured that during the previous maintenance/ETL window, a SQL Agent Job should have a Step 0 when it has run (skipping Step 0 would indicate that freaky scenario may have happened).
Here’s what I’ve came up with:

sql-agent-monitor-is-showing-running-while-job-has-finished-stopped-monitoring-query-results

LastStepExecutionTime should have taken place in the same (last) maintenance/ETL window as Step0LastExecutionTime.


The Query:

WITH SourceData
AS
(
SELECT
j.name
,h0.Step0LastExecutionTime
,h.LastStepExecutionTime
FROM dbo.sysjobs j
INNER JOIN
(
SELECT
h.job_id
,MAX(
CAST(DATEADD(S,(run_time/10000)*60*60 /* hours */
+((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */
+ (run_time - (run_time/100) * 100) /* secs */
,CONVERT(DATETIME,RTRIM(run_date),113))
AS DATETIME)
) AS LastStepExecutionTime
FROM dbo.sysjobhistory h
WHERE run_status = 1 --Succeeded
GROUP BY
h.job_id
) AS h ON j.job_id = h.job_id
LEFT JOIN
(
SELECT
h.job_id
,MAX(
CAST(DATEADD(S,(run_time/10000)*60*60 /* hours */
+((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */
+ (run_time - (run_time/100) * 100) /* secs */
,CONVERT(DATETIME,RTRIM(run_date),113))
AS DATETIME)
) AS Step0LastExecutionTime
FROM dbo.sysjobhistory h
WHERE h.step_id = 0 --0 = (Job outcome)
GROUP BY
h.job_id
) AS h0 ON j.job_id = h0.job_id
--Last/final step should have been executed during the last maintenance window.
WHERE h.LastStepExecutionTime BETWEEN DATEADD(HOUR,1,CAST(CAST(GETDATE() AS DATE) AS DATETIME)) AND DATEADD(HOUR,6,CAST(CAST(GETDATE() AS DATE) AS DATETIME))
--While step 0 (Job outcome) should also have been executed/registered during the last maintenance window.
AND NOT h0.Step0LastExecutionTime BETWEEN DATEADD(HOUR,1,CAST(CAST(GETDATE() AS DATE) AS DATETIME)) AND DATEADD(HOUR,6,CAST(CAST(GETDATE() AS DATE) AS DATETIME))

)

SELECT
s.*
FROM SourceData s --Retrieve more info
ORDER BY 1,2,3



Next:
Build something that will alert you, like sending a database mail or integrate this into one of your maintenance reports.








Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s