Force rebooting the psu service the job shows failed now with this error.
@{error=Status(StatusCode="Unavailable", Detail="failed to connect to all addresses", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1664369610.470000000","description":"Failed to pick subchannel","file":"..\..\..\src\core\ext\filters\client_channel\client_channel.cc","file_line":3218,"referenced_errors":[{"created":"@1664369610.470000000","description":"failed to connect to all addresses","file":"..\..\..\src\core\lib\transport\error_utils.cc","file_line":165,"grpc_status":14}]}"); line=}
We are running into issues where our ticketing system drops calls if the endpoint/api doesnt respond within 30 seconds. This is something we can not change. We sometimes have 10+ endpoint calls which fires off other psuscripts and it crushes the servers CPU with external pswh processes. When the CPU is pegged endpoint calls tend to hang.
We are just looking at better ways of integrating these jobs with out ticketing system.