I’m having some difficulty understanding why one would expose a service, at nodel level, on the node network, thus bypassing K8s own load-balancing solution? Can someone provide some concrete situations? Thanks.
I’m quite new to Kubernetes and I’m happy to be corrected but here’s my understanding so far:
The context needs amending: Kubernetes does not have its own load-balancing solution out of the box. When you specify the type of the Service to be LoadBalancer then Kubernetes is invoking the use of a plug-in CNI. What you will find in the real world is that Cloud providers have this all setup and will be triggered via API’s when you provision a Kubernetes service; but if someone has not done the hard work of setting up the load balancer then it can be quite involved to do so (and most of them have trade-offs).
Now, there are at least a couple of reasons you’d choose Nodeport over LoadBalancer:
1. As per above, having the LoadBalancer infrastructure setup can be a massive pain if it’s not already done for you.
2. Following on from 1, if you are setting this stuff up yourself and you have a hardware load balancer or some type that cannot be invoked by Kubernetes via an API then it’d be easy to tell your load-balancer to use port x on nodes 1-20.
3. Public IP’s can be a finite resource.
For every LoadBalanced Service created you consume a ‘public IP’ (from the perspective of the cluster network).
The whole reason to have a cluster network overlay in the first place is to have an huge addressable space of IP addresses which will be difficult to deplete. Conversely and typically, you are drawing your public IP’s from a more finite pool than the cluster network. Which leads us on to…
4. Most cloud providers charge for public IP’s.
On the other hand, using NodePorts means that you will likely need to do some NAT if your clients are expecting to access a service on a non-ephemeral port, which is very often the case.
And as described in the following lesson on "The Service Network", the Service network used for NodePort type of LoadBalancers isn’t really a LoadBalancer, it’s more like a layer-4 forwarder where each node has rules to forward the traffic for a given port to the correct node; bearing in mind that the pods that the service is attached to/ are forwarding to may exist on only some nodes in the cluster and not others.
Moreover, as mentioned in the course, this is not a very scalable option. Even if we don’t think about the overhead of kernel packet forwarding/switching, there is an inherent overhead in forwarding packets from one host to another (worst case scenario); likewise, you now have the problem of how to distribute your incoming client workload across the nodes and if you get that wrong by doing something silly like DNS round-robin you can end up with even worse performance problems.
As an aside, I’d highly recommend not learning on a public cloud but instead deploying Kubernetes on physical hardware.
You’ll find that the network setup is a real PITA with a steep learning curve but, you will get an understanding you’ll never get on the cloud where it’s already done for you.
Services that use NodePort, are in fact load balancers. In fact ALL service types act as load balancer. That’s one thing that a Service does. So you are not bypassing K8S load balancer. If your cluster is public and the nodes are reachable, then there are a number of reasons why you would want to use a NodePort. The main reason that comes to my mind is cost. Provisioning cloud provider load balancer is additional money. (by the way a Service that is of type LoadBalancer also creates a NodePort. 😉 Another reason is to apply firewall rules at the OS level (on the node) so you have a finer grain control. Yes I am aware you can do that with your cloud provider as well. I am sure people may have other reasons, but it’s good to be aware that it’s available to use. I agree with the premise of the question, the best thing is to make your cluster private and use a cloud provider Load Balancer.