My Kubernetes Installation Notes Using Antrea

As you may know, I have been heavily focused on Kubernetes in the Enterprise for the past 2 years, mainly with VMware PKS and NSX-T as the CNI.

This is a great combination for your enterprise deployments of Kubernetes as it allows you to address all three main stakeholders in large and medium-sized companies:

  1. Developers/DevOps – these are the users. They need a Kubernetes cluster for use, which actually only means they need a Kubernetes API to interact with and it’s LCM capabilities. All the underlying infrastructure has to be handled automatically without a single ticket being opened for the network, storage, security, and Load balancing. These folks need also the freedom to do whatever they want in their Kubernetes cluster even though it is managed as a service and that it’ll be the same with upstream Kubernetes. (openshift is its own thing that some like and others less as they prefer just straight-up upstream)
  2. Network team – care about being able to provide the required level of network services in a consistent operational model. Integration with the physical network using BGP, ECMP, Path prepending, etc are crucial.
  3. Security team – Conformance, control, and operational consistency. This team needs to make sure that no one is doing what they are not supposed to, or (god forbid) pushing security policies that are unauthorized to production.

The user experience with PKS, once it is up and running, is that the folks who are managing the platform can push a new cluster with a single command (pks create-cluster) while being able to control elaborate networking configs using network profile.

While PKS and NSX-T do achieve those objectives there are cases where developers just want to have Kubernetes on their laptops for testing or in the datacenter with just upstream Kubernetes. sometimes, all you want is a minimally viable solution, especially from a CNI point of view. For these fast and easy deployments, NSX-T may be overkill and why VMware has created a new open-source CNI project called “Antrea

Antrea, is a nifty CNI, based on OVS instead of iptables and is easily deployed using a single “kubectl apply” command.

Today it supports basic network functions and network policies. In the future, VMware is planning to provide a way to manage many “Antreas” using the NSX control plane, so that if you have a ton of developers using it, you can control the security policies and network configuration they deploy. As part of my “NSX Service Mesh” testing that I do with NSX Service Mesh, I decided I need to test it out. But then I found out that I haven’t deployed Kubernetes that is not PKS and I don’t know what I am doing 🤨

After trying to deploy a cluster the really manual way for a day with no success, I called my goto person for anything K Oren Penso (@openso), Oren gave me his raw notes on deploying k8s with kubeadm, which I refined a bit and added Antrea as the CNI and Metallb as the ingress controller. And this post is about sharing those. So, with no further ado here are the steps:

  • I use Ubuntu as my leader and follower OSs.
    (Leaders and followers is how I call the previously called Master and node, this is following the lead of
    Joe Beda @jbeda, no pun intended 😊)
  • The first thing you want to do is make sure to disable the swap in the OS by commenting out the swap entry in /etc/fstab and running:
sudo swapoff -a
  • Next, we want to install kubeadm. starting on the Leader. Now, there are other ways to deploy Kubernetes, such as using kind, or even cluster API (which I will switch to later this year) but for now, kubeadm is the best way for me. One thing is you need to make sure that the kubeadm version matches the version of Kubernetes you want to deploy. for example, I want to deploy the latest, I will run the following commands:
apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl

But if I want to deploy Kubernetes 1.15, which is what I wanted to do I had to deploy the Kubeadm and kubelet for kubeadm with a matching version like this:

apt-get install -y kubelet=1.15.5-00 kubeadm=1.15.5-00 kubectl
  • The next step is to deploy Kubernetes itself on the Leader. We will run the “kubeadm” command to do so, with a couple of notes. we want to specify a pod CIDR for our pod internal IPs. this is relating to the way Kubernetes allow pods to communicate internally and these IPs do not conflict with other clusters, but you don’t want it to overlap with the actual CIDR of the nodes themselves. In my lab the nodes are on 192.168.x.x so I had to specify also a new CIDR. Also, unless you want the latest Kubernetes, you need to specify the version as well, which in my case I did specify version 1.15 like this:

kubeadm init --kubernetes-version 1.15.5 --pod-network-cidr 172.12.0.0/16 | tee kubeadm-init.out read more

Istio, mTLS and the OSI layer

I have been playing a lot with Istio and recently tested mTLS encryption. The test, which I describe in this post, really materialized the OSI layer in front of my eyes. which is always interesting how new stuff can dust off your old basic knowledge.

The entire concept of service mesh and Istio is exciting and revolutionary in my view… but just like any new groundbreaking tech, it takes a few cycles to realize how it manifests beyond the papers, blogs and theory, at least for me. So, as I usually do, I share my experiences on this blog and in my sessions with other in the thought that if I can help even one person understand it better I have achieved my goal.
read more

NSX-T manager fails to load? It might be that the Corfu DB got corrupted

If you’re like me, and you are spinning new nested labs left and right, you are also probably over-committing on your VMFS datastore regularly.

The issue that happened to me was that I ran out of datastore space and it crashed my NSX-T manager. Perhaps this issue can also happen for other reasons. In any case the issue manifests itself by not being able to login to the NSX-T manager where it keeps saying that the service is not ready.

When runing the command “get management-cluster status” on the NSX-T manager you may get:

Number of nodes in management cluster: UNKNOWN

Management cluster status: INITIALIZING

Number of nodes in control cluster: UNKNOWN

This problem can heppn becuse the Corfu DB in NSX-T has failed to load. In the case of running out of datastore space it almost certainly a corruption in a record in the database. 

So how do we identify and resolve this issue?

Follow these steps:

  • ssh in to the NSX manager using user:admin
  • cd to /config/corfu/log/ directory. Here you should see the log files serially named. (example 280.log, 281.log,…)
  • Recommended to take a backup of the folder using cp -R /config/corfu/log/ /config/corfu/log.backup
  • In the appliance there is a log reader tool. use it to read teh latest log. e.g. corfu_logReader display <log file name> (example 281.log)
  • If the DB is corrupt the log (which might take a while to roll) will exit with an error. The output of this command will look something like the following:
  • read more