PSU 4.2.17 - Memory Leak

Is there a known memory leak bug in 4.2.17? Since upgrading from 4.2.16, the Universal.Server process ends up taking pretty literally all of the available memory on the server it’s running on, leading to failed health checks, Zabbix alerts about memory utilization, and the PSU web daemon being largely unresponsive. The only [temporary] fix seems to be bouncing the entire PowerShell Universal service, which seems to only take care of things for a bit (yesterday I had to restart the service 4 times within an 5 hour window).

Product: PowerShell Universal
Version: 4.2.17

Doing it now for the 2nd time of the day. I’m creating a dump file to see if I can figure out what’s causing the problem, but I may end up rolling back to 4.2.16 since I don’t recall having this problem prior to upgrading to 4.2.17.

According to the memory dump, it appears that it’s the coreclr.dll causing the issues, unless I’m reading the dump incorrectly.


************* Preparing the environment for Debugger Extensions Gallery repositories **************
   ExtensionRepository : Implicit
   UseExperimentalFeatureForNugetShare : true
   AllowNugetExeUpdate : true
   NonInteractiveNuget : true
   AllowNugetMSCredentialProviderInstall : true
   AllowParallelInitializationOfLocalRepositories : true

   EnableRedirectToV8JsProvider : false

   -- Configuring repositories
      ----> Repository : LocalInstalled, Enabled: true
      ----> Repository : UserExtensions, Enabled: true

>>>>>>>>>>>>> Preparing the environment for Debugger Extensions Gallery repositories completed, duration 0.000 seconds

************* Waiting for Debugger Extensions Gallery to Initialize **************

>>>>>>>>>>>>> Waiting for Debugger Extensions Gallery to Initialize completed, duration 0.031 seconds
   ----> Repository : UserExtensions, Enabled: true, Packages count: 0
   ----> Repository : LocalInstalled, Enabled: true, Packages count: 41

Microsoft (R) Windows Debugger Version 10.0.27553.1004 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\Administrator\AppData\Local\Temp\2\Universal.Server.DMP]
User Mini Dump File with Full Memory: Only application data is available


************* Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*
Symbol search path is: srv*
Executable search path is: 
Windows 10 Version 20348 MP (4 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Edition build lab: 20348.1.amd64fre.fe_release.210507-1500
Debug session time: Wed Apr 10 11:12:16.000 2024 (UTC - 4:00)
System Uptime: 0 days 1:57:16.861
Process Uptime: 0 days 1:57:09.000
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
......................................
Loading unloaded module list
................................................................
For analysis of this file, run !analyze -v
ntdll!NtWaitForMultipleObjects+0x14:
00007ffb`a543ffa4 c3              ret
0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*** WARNING: Unable to verify checksum for Hangfire.Core.dll
*** WARNING: Unable to verify checksum for Hangfire.MemoryStorage.dll
*** WARNING: Unable to verify checksum for host.dll

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 10093

    Key  : Analysis.Elapsed.mSec
    Value: 11814

    Key  : Analysis.IO.Other.Mb
    Value: 0

    Key  : Analysis.IO.Read.Mb
    Value: 1

    Key  : Analysis.IO.Write.Mb
    Value: 0

    Key  : Analysis.Init.CPU.mSec
    Value: 796

    Key  : Analysis.Init.Elapsed.mSec
    Value: 2495

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 265

    Key  : CLR.Engine
    Value: CORECLR

    Key  : CLR.Version
    Value: 7.0.222.60605

    Key  : Failure.Bucket
    Value: BREAKPOINT_80000003_coreclr.dll!Thread::DoAppropriateWaitWorker

    Key  : Failure.Hash
    Value: {2dbd2c7d-1f72-d894-343d-110a7c82a3a2}

    Key  : Failure.Source.FileLine
    Value: 3514

    Key  : Failure.Source.FilePath
    Value: D:\a\_work\1\s\src\coreclr\vm\threads.cpp

    Key  : Failure.Source.SourceServerCommand
    Value: raw.githubusercontent.com/dotnet/runtime/d037e070ebe5c83838443f869d5800752b0fcb13/src/coreclr/vm/threads.cpp

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 7036

    Key  : Timeline.Process.Start.DeltaSec
    Value: 7029

    Key  : WER.OS.Branch
    Value: fe_release

    Key  : WER.OS.Version
    Value: 10.0.20348.1

    Key  : WER.Process.Version
    Value: 1.0.0.0


FILE_IN_CAB:  Universal.Server.DMP

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 0000000000000000
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 0

FAULTING_THREAD:  00000acc

PROCESS_NAME:  Universal.Server.dll

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE_STR:  80000003

STACK_TEXT:  
00000021`c297db88 00007ffb`a2b9436c     : 00007ffb`3bc35970 00007ffb`9a182461 00000213`47000030 00000021`c297dc60 : ntdll!NtWaitForMultipleObjects+0x14
00000021`c297db90 00007ffb`9a1a5b3c     : 00000000`00000000 00000000`00000000 00000000`00000130 00000000`00000000 : KERNELBASE!WaitForMultipleObjectsEx+0xec
00000021`c297de80 00007ffb`9a1a5979     : 00007ffb`00000001 00007ffb`00000001 00000000`00000001 00007ffb`00000000 : coreclr!Thread::DoAppropriateWaitWorker+0x184
00000021`c297df60 00007ffb`9a1a39d8     : 00000000`00000000 00000000`00000001 00000000`00000000 00000213`42879500 : coreclr!Thread::DoAppropriateWait+0x89
00000021`c297dfe0 00007ffb`9a1a37ad     : 00000000`00000000 00007ffb`ffffffff 00000213`4c9b6990 00000213`428793d0 : coreclr!SyncBlock::Wait+0x1c8
00000021`c297e100 00007ffb`9767e9f9     : 00000000`ffffffff 00000213`570b69f8 00000000`00000000 00000213`570b68e8 : coreclr!ObjectNative::WaitTimeout+0xcd
00000021`c297e280 00007ffb`9769512d     : 00000213`4c9b6880 00000000`ffffffff 00000000`00000000 00000000`00000006 : System_Private_CoreLib!System.Threading.ManualResetEventSlim.Wait+0x189
00000021`c297e320 00007ffb`97694f22     : 00000213`4c9b67a8 00000000`ffffffff 00000000`00000000 00000000`00000006 : System_Private_CoreLib!System.Threading.Tasks.Task.SpinThenBlockingWait+0x9d
00000021`c297e3a0 00007ffb`976e8de2     : 00000213`570b6810 00000000`ffffffff 00000000`00000000 00000000`00000006 : System_Private_CoreLib!System.Threading.Tasks.Task.InternalWaitCore+0x72
00000021`c297e420 00007ffb`94b838b2     : 00000213`570b6810 00000000`00000000 00000000`00000000 00000000`00000006 : System_Private_CoreLib!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification+0x22
00000021`c297e450 00007ffb`3a783b7b     : 00000213`570b5ba8 00000000`00000000 00000000`00000000 00000000`00000001 : Microsoft_Extensions_Hosting_Abstractions!Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run+0x32
00000021`c297e480 00000213`570b5ba8     : 00000000`00000000 00000000`00000000 00000000`00000001 00000021`c297e480 : Universal_Server!Universal.Server.Program.<>c__DisplayClass3_0.<Main>b__0+0x18b
00000021`c297e488 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00000213`570b5ba8


STACK_COMMAND:  ~0s; .ecxr ; kb

FAULTING_SOURCE_LINE:  D:\a\_work\1\s\src\coreclr\vm\threads.cpp

FAULTING_SOURCE_FILE:  D:\a\_work\1\s\src\coreclr\vm\threads.cpp

FAULTING_SOURCE_LINE_NUMBER:  3514

FAULTING_SOURCE_SRV_COMMAND:  https://raw.githubusercontent.com/dotnet/runtime/d037e070ebe5c83838443f869d5800752b0fcb13/src/coreclr/vm/threads.cpp

FAULTING_SOURCE_CODE:  
No source found for 'D:\a\_work\1\s\src\coreclr\vm\threads.cpp'


SYMBOL_NAME:  coreclr!Thread::DoAppropriateWaitWorker+184

MODULE_NAME: coreclr

IMAGE_NAME:  coreclr.dll

FAILURE_BUCKET_ID:  BREAKPOINT_80000003_coreclr.dll!Thread::DoAppropriateWaitWorker

OS_VERSION:  10.0.20348.1

BUILDLAB_STR:  fe_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  7.0.222.60605

FAILURE_ID_HASH:  {2dbd2c7d-1f72-d894-343d-110a7c82a3a2}

Followup:     MachineOwner
---------


We changed very little between those versions so I would imagine it could be happening in v4.2.16 as well. I actually did a video about how to diagnosis this stuff. You can actually get a listing of the object types that are consuming the data.

If you have access to Visual Studio Enterprise, there is also the Debug Managed Memory feature. Open the Dump file in Visual Studio and then click Debug Managed Memory and it will do what WinDbg is doing in the video.

If you want me to take a look at it, feel free to send me a link of where to download it.

1 Like

@adam Thanks. I’ll watch that in a bit. I’m trying to compress the DMP file right now, as it’s currently 9.5 GB, but then I’ll send you a link to grab it.

@adam Here is a link to the DMP file, compressed with 7-Zip: 628.45 MB file on MEGA

The link will expire on 04/17/2024.

1 Like

There are a lot of PSObjects pinned from job executions. I’ll see if I can reproduce this locally and find out what’s happening.

1 Like

Sounds good. Thank you. I added -RunspaceRecycling onto each of the environments to see if it’ll help anything, since I saw you mention enabling it on a previous release having some memory leak issues.

Can you let me know which environment you are using for jobs (integrated, PS7, WinPS)?

@adam They’re all set to use the Default environment, and that’s set to Integrated. At least, the Scheduled jobs are. There are some scripts set to use PowerShell 7, but those are not being actively used yet.

I’ve been running jobs for the last few hours and see no noticeable change in memory usage. Let me dig around in the dump some more to see if I can see why all this is pinned.

Did it again overnight/this morning. :rage:

Can you try running in the non-integrated environment for these jobs? It will start a new PS process per job and any memory will be cleaned up.

1 Like

So, use Agent, for example?

Yep. That uses the same PS version as Integrated

1 Like

Okay. I changed the default to be Agent so I didn’t have to touch each script individually (since we aren’t really using anything other than “Default”). I’ll keep an eye on it through the day to see if it fixed/helped anything.

@adam Well, I was hopeful that changing the environment fixed it, but I got into the office and see that the memory issue returned.

I think I found the culprit. There’s a scheduled job that runs a script at 2AM every day. The script is fairly large and takes about an hour to complete if you run it manually. When I killed the agent child process for jobs, I could then see a failed job entry in PSU for that script.

Since pausing the schedule that runs this script, the server has not had any abnormal memory usage instances.