Script Performance

Is there something I can look at to investigate why scripts are taking so long to run?

This guy took over 10 minutes to run, but only takes a few seconds on my local PC. All it does is fetch some JSON from a REST endpoint and reformat it into PSCustomObjects to output to the pipeline.

Product: PowerShell Universal
Version: 2.7.1

It could be resource contention. Is the App Service maxed out in CPU or memory?

Oh wow yeah:

What a bummer. We’re on a “B1: 1” app service plan and don’t want to pay more than that. Guess I’m just gonna have to live with it. For scheduled scripts obviously it’s not an issue. Just a bummer when I’m trying to create/debug a script.

It seems like the more info you return to the pipeline from a script, the slower it executes.

I have a script that fetches a list of all our active employees from a web service in our HR system, and then massages the data and returns a subset of them to the pipeline. If I only return a small subset, the script runs in under 10 seconds. If I return all ~700 employees, it takes upwards of two minutes.

Does that sound right? Is script performance impacted by PSU capturing the pipeline output?

Edit: I just ran the script but captured the output to a $foo variable rather than returning it to the pipeline. Was taking over two minutes - just ran in 15 seconds. Obviously that’s no good for my use case (I need the pipeline output for other scripts) but it’s evidence that that’s the bottleneck.

We use the PSSerializer class to write files to LiteDB for pipeline output. If the CPU is strapped it kind of makes me think the serializer is causing this. It kind of depends on the depth of the objects. Are they flat or are they nested with a lot of properties?

You could also try storing the files yourself using JSON serialization to see if that makes a difference. Then you could just output a file name in your scripts so others could read that script. I realize that’s not ideal but might be a good debugging step.

They’re flat. They look like this:

If I add $foo = to the front of that (so the object isn’t returned to the pipeline) the script runs about ten times faster.

Interesting. I wonder why that is so slow in Azure.

Maybe try to just write to your own file and see what the difference is.

Just add | ConvertTo-Json | Out-File .\test.txt to the end of that pipeline.

17 seconds with that added.

Can you do one last test?

$Foo = [PSCustomObject]@{ } 
[System.Management.Automation.PSSerializer]::Serialize($Foo) | Out-File .\test.txt

If that is still fast then we have something going on that we need to address in terms of performance. It’s actually been a pretty big focus recently so I’d like to ensure that we can resolve this as well. 2.8 is much better at memory management but we haven’t focused on CPU usage quite as much.

OK so with the code looking like this:

(which is obviously inside a Foreach-Object loop as it processes each employee record)

28 seconds. Still like five times faster than just returning the PSCustomObjects to the pipeline.

Ok. Thanks. I will open an issue for this. We should be able to match the performance.

1 Like

I also just tried initialising a $results array variable to @() and then using $results += … inside the loop, then returning the $results to the pipeline at the end. Back to 2:16.

1 Like

I just wanted to provide some more information. It looks like the performance of the PSSerializer is the bottleneck. This is the serializer used by PowerShell remoting and is part of the PowerShell SDK.

I’m testing with Get-Process because it’s been notoriously slow for me in PSU. I’m running on a very beefy desktop machine (16 Core, 32 GB RAM, M.2 disk).

Obviously, it runs very quickly without serialization.

PS C:\Users\adamr> Measure-Command { GEt-Process }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 5
Ticks             : 51896
TotalDays         : 6.00648148148148E-08
TotalHours        : 1.44155555555556E-06
TotalMinutes      : 8.64933333333333E-05
TotalSeconds      : 0.0051896
TotalMilliseconds : 5.1896

Running it directly PowerShell with serialization takes 6.5 seconds. Surprisingly slow. By default, the serializer runs over a depth of 1. This means subobjects are not serialized.

PS C:\Users\adamr> Measure-Command { [System.Management.Automation.PSSerializer]::Serialize((Get-Process)) }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 6
Milliseconds      : 504
Ticks             : 65046999
TotalDays         : 7.52858784722222E-05
TotalHours        : 0.00180686108333333
TotalMinutes      : 0.108411665
TotalSeconds      : 6.5046999
TotalMilliseconds : 6504.6999 

Running within PSU takes 3 seconds. I don’t really understand why it’s faster because it’s using the same class for serialization.

I also tried ConvertTo-Json but it’s even worse at 7.1 seconds.

PS C:\Users\adamr> Measure-Command { (Get-Process | ConvertTo-Json -Depth 1) }
WARNING: Resulting JSON is truncated as serialization has exceeded the set depth of 1.

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 7
Milliseconds      : 150
Ticks             : 71505679
TotalDays         : 8.27612025462963E-05
TotalHours        : 0.00198626886111111
TotalMinutes      : 0.119176131666667
TotalSeconds      : 7.1505679
TotalMilliseconds : 7150.5679

One other method that I tried was using NewtonSoft JSON to serialize the PSObjects. This results in some wanky output but I was curious how it compared.

$ITems = Get-Process 
[NewtonSoft.Json.JsonConvert]::SerializeObject($Items) | Out-File C:\Users\adamr\OneDrive\Desktop\test.json

In PSU, it still took about 3 seconds and resulted in a 1.5 MB file.

image

All in all, it seems like serializing large amounts of pipeline data is going to use a decent amount of CPU. I know you said you are using the output of this script later and I’m wondering if you can use the $Cache (although it won’t survive a restart) to avoid the serialization.

I’ll continue to do some research into this since I’d love to improve this. I think one of the issues is that serializing PSObjects requires a live PowerShell pipeline\runspace because it needs to do evaluation while the serialization is happening. It’s not quite the same as serializing a plain old CLR object.

I mean, for these scripts I don’t need the pipeline output captured to the database for posterity - they’re purely building blocks for other scripts. So I guess I could build them into a module, or even try calling them directly. I’ll experiment with both options. Do you see any issues with simply calling the script directly, assuming I’m in the right working directory? (Or is there a PSU built-in variable that points to the script root folder I can use?)

The repository folder is available: https://docs.powershelluniversal.com/platform/variables#built-in-variables

You should also be able to use $PSScriptRoot to get the current folder of the executing script as well.

We have an open issue to set the current directory to the repository folder but that hasn’t been implemented yet. I think the current directory will just be the assembly base path which isn’t too helpful.

1 Like

OK, success!

I changed from:

$staff = Invoke-PSUScript -Script 'Get-FlexiPurchaseStaff.ps1' -Integrated -Wait 

… to:

Set-Location $Repository
$staff = .\Get-FlexiPurchaseStaff.ps1 

… and the script now runs in under 10 seconds rather than 2:30.

That’s a lot tidier code, too. I guess I lose a little bit of functionality in terms of being able to inspect the pipeline after the fact, but I don’t have that right now with my scripts running in vanilla PowerShell using Windows Task Scheduler, so I’m happy as long as this is a “supported” technique.

2 Likes