2 Answers
Hi Simon.
First up, I’d say that the EP object updates itself based on watching the API (rather than the SVC updating the EP). All comms and events etc. go via the API server.
Re noticing a failed Pod… Without looking at the code, I’d say this. Obviously there are different circumstances. However, usually the kubelet of the node running the Pod will notice it has failed and inform the API server (it might perform a retry of its own first – I honestly can’t remember off the top of my head). It is then the API server’s resposibility to find it a new home. The EP object will be watching the API server, and will notice and update itself accordingly. The SVC object will obviously leverage the updated EP object as normal.
HTH
There is a Master tunable terminated-pod-gc-threshold that plays part in this along with the other abstracts e.g. deployments, replica set, and scheduler. so if the pod is part of ReplicationController, ReplicaSet, or Deployment, then its restart policy is to restart always. unlike if it was a job where its restart policy would be never. so if a pod dies the responsible controller will notice and restart a new one, if a node fails the same case applies. all of these happens via the API server coordination with other controllers and scheduler.
Thanks Nigel,
this way is also makes sense why ep and svc are different entities … ep makes the bean counting with API and svc just refers to it as it should know how many beans (i mean pods) are out there… smarties!