UD Load Testing

I’m just wanting to understand peoples thoughts/opinions around how UD should or shouldn’t be used in larger environments.

I work for a very large global firm, 19k+ employees just under the UK branch, but we have reach into many other countries.
Currently I host two dashboards in UD, one I use for internal team and other IT Teams, reporting on automation that my team provide to the business, its primarily a presentation layer on top of our scripts, mostly graphs, grids etc - nothing too complex.

I’ve just started hosting a second dashboard, which will be used for end user interaction.
We’re expecting to get a higher level of traffic on this and currently not sure how many concurrent connections we’ll end up with at peak times.
The dashboard itself is related to a project around O365 Teams.
We’re planning to ask people to visit the UD page, which lists their personal Teams, owners & members (info pulled from sql, with further details pulled from AD users within the endpoint), with buttons to ‘confirm’ they have checked, this in turn writes back to an sql database with the username and a datetime.

While I’d guess we’ll probably have no more than 30 people at any one time on the site, I’m really not sure how this project will go so I want to understand the capacity and what UD is capable of.

I’ve just got our load testing team involved and run tests on this to see what sort of capacity we can hit before we start getting issues. We’ve asked them to run 1, 25, 50, 100, 500 concurrent user tests (they’re using load runner).
As a baseline with 1 active user, we have minimal cpu, and around 200 - 250mb memory usage.
So far, 25 users was fine, 50 users started to max cpu and I was noticing ‘transient’ errors coming sporadically from get-aduser in the endpoints.
I’ve since switched out any references of the get-aduser cmdlet in favour of ADSI methods which are much faster - all errors have since stopped and it looks like the memory footprint is also reduced.
We re-ran the 50users test and it seemed to complete without issue, response times, cpu & memory still within acceptable levels (although memory jumps to 2GB).
Currently running the 100users test and it seemed to be performing okay. Mid way through I visited the diagnostics page at which point we saw a spike and some issues/slow response times. Obviously there’s a lot of info the diagnostics page is pulling and with that load its understandable.

I’ll update this post as we complete our load testing to let you know how I get on, but I just wondered what peoples opinions were in using UD under higher loads, if this is a good idea or not. If there are any actions we can take to ease the site under load, good & bad practices etc.

Just to give a bit more detail.
Our VM has 2 cores 2.3ghz, 8gb memory (hyper-v, dynamic)

100 Users test saw some brief spikes, with degraded response times during. But for the most part I think was okay. 500 users we saw the site overloaded, 100% cpu and definite page load issues.

I’m going to double the server cpu and mem and run the test again so see if this improves things.

So after doubling to 16gb memory and 4 cores, it now seems to handle 100 concurrent users ok. but runs into difficulty on 250 with slow response times for some requests.
CPU is the bottleneck, memory seems okay. I’m upping again to 8 core and i’ll run a final test to see how far we can push it.
I also wonder how it would function behind a load balancer with sticky sessions?
I know this platform isnt really built for scale like a typical web application, and my intention isnt to go down that route either, I just want to fully understand the limitations before I start using it for things that may not be best suited, or making sure I don’t run into problems further down the line.

Hi, I am performing the same tests if that helps!! Our use base is around 1000, where I would expect 200 concurrent sessions (I have already seen ~130). This is currently in a dev environment and i am planning production now. All this is running in Azure on a D16s_v3.

I would be interested to know what you are using for load testing. I have not found tools that accurately mimic load, but that is more to do with the makeup of my dashboard.

Overall for the last 6 weeks the performance has been good, however I still get random spikes of CPU to 100% (Unresponsive Dashboard 2.9.0) when some process is either performing a DoS or a lookup loop. I have not got to the bottom of this as yet (probably endpoint related). Another thing that has helped dramatically is using @psDevUK timeout, this stops a lot of errors on the dashboard.

I am also now looking into sticky sessions behind an Azure load balancer with 2 dashboards running HA, i have this working and its going into testing tomorrow.

1 Like

Yeah I saw that, definitely intend to tweak idle time and use the idle timeout component psdevuk made, he did a quick job of putting that together! It for sure will help.

I’ve one more test to go with 8cores. I’m seeing sql deadlocks too at the highest loads, I will question our DBA’s here if they know of any config I can adjust to ease this. Another option is to look into locally caching the requests in a sort of queue to be picked up by a job that limits how often the writes back to the DB are made.

We’re going to drive traffic to the page via email links so the main solution to any of these load issues is for us is to simply stagger the emails so that we don’t get everyone visiting at the same time, expected around 1000 - 2000 users over a 90day period so should be manageable. Probably no more than 30concurrent connections at peak times.

Also I have a nightly app pool cycle scheduled at 3am to clear any memory buildup. not likely to see any traffic then but I’ve also coded a broadcast message to users 5mins prior warning them, this looks for any schedules on the app pool recycle too so no need to update the code if you change the schedule in IIS

It would be interesting to know how you get on with load balancing requests!

Oh and I’m not personally running the load test, we have a testing team who are doing it for me, but I’ve been told they are using ‘Load Runner’ they initially had difficulty with clicking the buttons and getting 404 responses but they worked around it somehow.

Sounds like we are performing very silimar tasks here only I use Azure tables rather than sql. I have also seen bottlenecks so I do cache all the data, then on a write to the table it does it in 2 passes, like a queue it updates a placeholder then validates the update. This seems to have stopped most errors and has allowed me to load balance due to the validation steps.

I got load balancing working a treat yesterday but right at the last minute I tested timeouts using UdTimeout and they fail when the UD is behind a LB, not sure why at this point but will troubleshoot, may be related to probes…

Just an update. After spec’ing up the server to 8core and running tests again, it runs fine with 250 concurrent connections, all transactions are under 3sec so we could probably go higher. Some coding changes were also made to prevent the deadlocks we saw (there was an update query using a wildcard which has now been removed in place of a clustered index & array of Id’s).
At this point the only bottleneck for us is CPU which maxes out when we try run the 500concurrent test, response times then rocket upto around 50sec. But i’m pretty happy with the improvement for now, it will get us to where we need to be. :slight_smile:

1 Like