Jobs stuck in queue

PSU: 4.2.13
Db: sql

I had a process that created a bunch of Job found a good number of them got stuck in “queued” status.

image

At this point im not sure what to do or why these jobs are showing as queued but not through hangfire. Could it be that the job never got to the hangfire queue? Could we add a button in the UI to requeue these?

Edit: or even show a queued date/time

Hi Mike - we had the same issue for a while on 4.2.x versions , currently on 4.2.7 with the issue not happening much any more

its seemed to get worse as the Job table filled up much beyond 50k rows, pointing the server at a fresh empty database “fixed” the problem (at the cost of losing job history and saved licence / secrets etc), so i think its related to the time it takes for the query to execute and might improve if you reduce the data in the backend (adjusting the history for example)

the status of queued is 0 so you can tidy things up a bit with this SQL

SELECT * FROM [dbo].[Job] 
where status=0 and CreatedTime<dateadd(hh,-50,getdate())
--anything that is queued should be set to failed
update [dbo].[Job] set status=3 where status=0 and CreatedTime<dateadd(hh,-50,getdate())

Not fixed in 4.3.0 sadly


@adam

We are seeing this issue as well still in version 4.5.1

image

When you look in hangfire it shows that the job was deleted?

@adam we need you :frowning:

Are you running these against any specific computers\computer groups or just the default queue?

Default queue.

One thing I’ve noticed is if I have a node in maintenance, then the jobs go to queued and show up under deleted in hangfire.

Below is a picture of hangfire and this is showing the queued job in deleted state and trying to process on the node that is in “maintenance” mode.

Ok. We need to fix this. The problem is that the job shouldn’t be sent to the node at all that is maintenance mode. And really, the job should be marked failed if it is sent to the node and not queued indefinitely. We have a check in place, right before starting a job, that should be doing that.

If hangfire deletes the job without PSU realizing it, that will cause the job to queue indefinitely since it never transitions out of that state since the job never runs.

Do you see this same behavior when you don’t have machines in maintenance mode?

Yes

I suspect it has something to do with schedules as well. Did you want to open a support case and do a screen share on our setup or do you have enough information to replicate this?

Here is a sample from schedules.ps1


$Parameters = @{
    Cron       = "0 8 31 JUN,DEC *"
    Script     = "report\myscript.ps1"
    TimeZone   = "America/Chicago"
    Credential = "Default"
    Name       = "Run some script"
    Condition = {
        $Environment -eq 'production'
    }
    Computer   = "ProdPSUNode"
}
New-PSUSchedule @Parameters

Please open a ticket. I likely won’t be able to replicate this easily.