speedtaya.blogg.se - Postgresql for loop intermittent spike cpu hanging

#Postgresql for loop intermittent spike cpu hanging code

My current guess, is that when pod is freshly created, and receives a huge load immediately, the pod tries to open all 100 connections (it is visible in DB open sessions chart) which makes it too much at the beginning. DbContexts never communicate with each other, and are never used together in same request. It was done to have a modular architecture. Each of this DbContext accesses different schema in the DB. I have 2 different DbContexts sharing same connection string (thus same connection pool). I am of course expecting that initially new pod instance will need some spin-up time when it operates at lower parameters, but I do not expect that so often pod will suffocate and throw 90% of errors. Our connection string have MaxPoolSize=100 Timeout=15 setting.

Readiness probe checks connection to the DB using standard. Our pods starts properly after Readiness probe returns success. This also helped, but still sometimes pods struggle. We added Minimum Pool Size=20 to connection string, to always keep at least 20 connections open.

Ensure some connections are always open and ready.

This helped a lot, but did not solve problem completely. saveDataAsync // under-the-hood EF core short lived transaction We had long running transactions, like this:

#Postgresql for loop intermittent spike cpu hanging code

I had few lines of blocking code inside async methods. Here goes what I did when attempting to fix it: I can see that other pods are doing well, and also that after initial struggle, all pods are handling the load without any issue. Problem is, that in about 10-20% of cases newly created pod struggles to handle initial load, DB requests are taking over 5000ms, piling up, finally resulting in exception that connection pool was exhausted: The connection pool has been exhausted, either raise MaxPoolSize (currently 200) or Timeout (currently 15 seconds) New pods are correctly created, then they start to receive new load. There is a certain scenario, when we receive huge load spike (from 4k per minute to 20k per minute).

The standard requests per minute is about 4000 (means 66 req-per-second per node, means 16 req-per-second per single pod). I run 3 or 4 pods at the same time, and it scales to 8 or 10 pods when necessary. I have an application which runs stable under heavy load, but only when the load increase in graduate way.