Watching the whole section of lectures for the HAP and CA, I got a question in my mind that can have a simple answer. Imagine a situation when you have a k8s cluster configured with HAP; let’s focus on PODs. SO, when you need run out-of-capacity, breaching a pre-defined threshold, new pods would be spun up, and load/pressure would be down to below of pre-defined threshold.
I got myself thinking that in case of new pods are created, it means new containers will be available under the LB, and the load could be better distributed – more containers, more capacity to answer requests. But, for the existing set of open connections, they can be moved from one container to the other?
Imagine, though, you have three pods, each running one container image of a specific app. They have the max capacity of 100 connections, and the pre-defined threshold for CPU is 85% before things start getting weird. My application is now using 86% of CPU, and the pressure start showing its face. New pods would be spun up at this point but, connections already open need to finish their requests or they can be moved over the new pods so we can better rebalance?
At that point wouldn’t you want to have a queue or something that can check for requests being completed and if failed re-send it to get processed by the newly spun up containers? Wouldn’t trying to transfer new connections to a new container be a huge bear to tackle?
I agree, moving session from one container to the other implies also track session state and the work they are doing, phew, that’s really a huge thing.