CPU Usage on long running dashboard

We are having some performance issues with Universal Dashboard and are hoping to get some help to track down the issue.

Our dashboard has 12 UDCounter objects, 2 UDTable, and 1 UDChart. There are 5 Scheduled endpoints that cache the data used by the different display objects.

We do a garbage collection at the end of all of the endpoints.

On version 1.7.0 we don’t have any issues.

When we try to upgrade to any 2.x version we have found that the CPU and memory gradually increase over time and eventually (sometimes withing a few hows, but normally it takes upto 24 hours) it will get to 100% CPU usage and is unresponsive until the dashboard is restarted. I think if I close all of the web browsers that have the dashboard open that the CPU usage drops to zero without being restarted, but that is difficult to test in production since there are many clients that have the dashboard open.

Hosting the dashboard in IIS shows the same issue.

Any tips or suggestions on how we can trackdown the issue?

Hello @tonyb hoping this first post will not be you last post…ok so I have noticed this same behaviour but only when I added a load of monitoring on servers and it certainly had the 100% CPU issue. Have you tried taking out the endpoints 1 by 1 to see if one of those is causing the leek? I mean since I restructured my dashboard to select the server of interest then display the results I have got rid of these issues.
I mean I don’t know how you design the folder concept of your dashboards but I break all mine into individual pages to make debugging easier. As soon as I stopped the particular server page my problem went away.
Out of curiousity what are the endpoints connecting to? SQL?Exchange?WMI? I know scripts are sensitive especially when doing stuff for work, but if the hints I have tried to suggest do not work, then post a as much code as you can, and I’m sure someone else will have a better idea.

I mean i didn’t use it, but you could go all out and use some of the sysinternal tools to do some deep debugging under the hood. UD does have debugging in it as well, but I am still meaning to learn this…hope something I typed will help you out keep us posted. Peace

I have several different data sources, most are SQL but some are HTTP REST request.

I have a test dashboard that just uses “Get-Random” to generate fake data and if it let it run long enough it will also show the same behavior.

The same exact endpoints run fine in 1.7.0 with no CPU issue, if I run it under any newer version it will hit 100% CPU in less than a day.

I’ve tried to use Process Explorer to do some debugging but haven’t had much luck finding the issue.

I have this issue too, but i also have about 1000 rows of data kept in cache and anywhere between 20-40 users a day.

I kinda cheated by just running iisreset at midnight every night.

I work in a school so i can afford that though.

its not a solution, but it might help to mitigate some of the issues until a real solution is identified.

Sounds like a bug. Can someone enter an issue on GitHub so we can take a look?

@tonyb you cool to file the bug on universaldashboard github page as I resolved my issue?

I do the same thing - I have a runbook that runs daily to restart the web app running our UD as the web app becomes terribly slow.

I notice this more when Cache: Variables are used… for testing I removed the cache variables and found the memory bloat to be minimal.

Memory leak maybe?

Seems like it. There is an issue with endpoints not being cleaned up correctly but just sitting on a page shouldnt result in that. I’ll do some investigation into it. As @psDevUK knows, poshud.com even has a problem, albeit it usually takes a week or so, where it stops responding. Usually, it’s just a bunch of HTTP 500 errors though. Probably different symptoms of the same problem.

With PoshUD, it looks like it’s a memory issue. Looking into it now. It’s fine for weeks and then all of a sudden it spikes and no longer services requests.

Just on a flip note. Thought I would redesign that FTP site I did ages ago. It doesn’t use any endpoints, or functions or modules. Found using 2.6.1 the time it took to display say the count of 30,000+ files was ages too long to wait and watch. So downgraded it to 2.5.1 and its running just as quick as powershell would take to give you the count of files…something has gone really slow in 2.6.1…?

I’m not sure what would cause such a major issue with performance in 2.6. The only major change to how execution works is the module auto-load. Aside from that, there really hasn’t been anything major. I think we will need a log to see what’s up there.

In terms of the original issue mentioned here, I have isolated a problem where endpoints were not being cleaned up properly. The tonight’s nightly release will introduce a fix for that problem so if anyone experiencing issues with performance after running a dashboard for some time could check out that build, that would be fantastic in helping figure this out.

You can install nightly releases with this script.

1 Like

The nightly version seams better - it feels like it takes long to jump to 100% CPU but it is still happening.

I get this in the log file when the CPU starts to climb:

12:58:41 [Error] Microsoft.AspNetCore.Server.Kestrel Connection id “0HLQI0QR6139R”, Request id “0HLQI0QR6139R:0000042C”: An unhandled exception was thrown by the application.
13:06:59 [Warn] ComponentController RunScript() Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List1.Enumerator.MoveNextRare() at System.Management.Automation.Runspaces.InitialSessionState.Clone() at System.Management.Automation.Runspaces.RunspaceBase..ctor(PSHost host, InitialSessionState initialSessionState) at System.Management.Automation.Runspaces.RunspaceFactory.CreateRunspace(PSHost host, InitialSessionState initialSessionState) at UniversalDashboard.Services.UDRunspaceFactory.CreateRunspace() in D:\a\1\s\src\UniversalDashboard\Services\UDRunspaceFactory.cs:line 102 at UniversalDashboard.Services.ObjectPool1.CreateInstance() in D:\a\1\s\src\UniversalDashboard\Services\ObjectPool.cs:line 70
at UniversalDashboard.Services.ObjectPool1.AllocateSlow() in D:\a\1\s\src\UniversalDashboard\Services\ObjectPool.cs:line 100 at UniversalDashboard.Services.ObjectPool1.Allocate() in D:\a\1\s\src\UniversalDashboard\Services\ObjectPool.cs:line 88
at UniversalDashboard.Services.UDRunspaceFactory.GetRunspace() in D:\a\1\s\src\UniversalDashboard\Services\UDRunspaceFactory.cs:line 59
at UniversalDashboard.Execution.ExecutionService.ExecuteEndpoint(ExecutionContext context, Endpoint endpoint) in D:\a\1\s\src\UniversalDashboard\Execution\ExecutionService.cs:line 120
at UniversalDashboard.Controllers.ComponentController.<>c__DisplayClass8_0.b__0() in D:\a\1\s\src\UniversalDashboard\Controllers\ComponentController.cs:line 96
at System.Threading.Tasks.Task`1.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at UniversalDashboard.Controllers.ComponentController.d__8.MoveNext() in D:\a\1\s\src\UniversalDashboard\Controllers\ComponentController.cs:line 94

Oh, that’s not good. That’s coming out of the PowerShell SDK. Are you running PS Core or Windows PS? I’ll have to see how we can guard around this. What’s happen is that the runspace pool has run out of available runspaces so it’s trying to create another one but the PS SDK is mad.

Out of curiosity do you have powershell scheduled tasks running on this same machine? I been doing a lot of out of hours work and about 8pm my CPU maxed out…could only put it down to scheduled taks running that I had scheduled with powershell and my dashboard running all at same time as well as VSCode…totally killed my remote connection one night, and the other night I had to use tasklist and taskkill to even remote back into my pc…as CPU was 100% and nothing else not even powershell would let me remote in or kill tasks…I could ping my pc nothing else…so even if you don’t have scheduled tasks running, you just read a great sob story :cry: :slight_smile:

1 Like

I have been doing some testing over the last few weeks and found that when I remove the Cache variables and scheduled endpoints the dashboard runs quite smooth, I haven’t had to restart my Azure Web App in days.

In my last version of my dashboard I had 7 Cache Variables in 4 Scheduled endpoints and would need to reboot it everyday on a schedule. even some days with heavy traffic on the dashboard the elements would load very slowly and would need to reboot multiple times.

So I have modified my dashboard to use minimal session variables and no cache variables, but the rest loading on-demand within UDGrids and Tables. I changed the inputs from UDInput to UDSelect, UDText etc… then calling the Get-UDElement -Id ‘udselect’ within an onchange or onclick. This works so much faster but obviously does not fix the issue at hand.

All my endpoints pull data from Azure Logic Apps and Functions - I also moved my Azure Functions from a dedicated app service plan to consumption which I have noticed a huge improvement in pulling data from UD.

2 Likes

I’m running under Windows PS. I’ll try to test with PS Core but I’m not sure my dashboard will be compatible.

It is a Windows 2016 server, 4 CPU.

I have to use the Cache variable, I’ve got some endpoints that update very frequently (5 seconds) and I don’t want every browser session to be hitting the back-end data source that frequently.

I don’t use session variables.

Right now I’m using UDCounters on the page, I might try switching to UDElement. That would move the updates from a bunch of separate HTTP request into the websocket which is likely more efficient. I’m not sure that would impact the CPU usage though.

Don’t worry about switching over. I was just curious. The cache variable should be fine to use. Probably just need to figure out how to get into this condition where the PS runtime gets unhappy. Thanks for the info. I’ll keep plugging along trying to nail this one down but this gives me more to go on.

@tonyb - So I found an issue that was causing that exception. It doesn’t necessarily explain the high CPU but it might be part of the problem. It’ll be in tonight’s build. I’m also going to put together a diagnostics tool that people can use to help figure out what’s happening in a dashboard.

3 Likes