My Kubernetes installation notes using Antrea

As you may know, I have been heavily focused on Kubernetes in the Enterprise for the past 2 years, mainly with VMware PKS and NSX-T as the CNI.

This is a great combination for your enterprise deployments of Kubernetes as it allows you to address all three main stakeholders in large and medium-sized companies:

  1. Developers/DevOps – these are the users. They need a Kubernetes cluster for use, which actually only means they need a Kubernetes API to interact with and it’s LCM capabilities. All the underlying infrastructure has to be handled automatically without a single ticket being opened for the network, storage, security, and Load balancing. These folks need also the freedom to do whatever they want in their Kubernetes cluster even though it is managed as a service and that it’ll be the same with upstream Kubernetes. (openshift is its own thing that some like and others less as they prefer just straight-up upstream)
  2. Network team – care about being able to provide the required level of network services in a consistent operational model. Integration with the physical network using BGP, ECMP, Path prepending, etc are crucial.
  3. Security team – Conformance, control, and operational consistency. This team needs to make sure that no one is doing what they are not supposed to, or (god forbid) pushing security policies that are unauthorized to production.

The user experience with PKS, once it is up and running, is that the folks who are managing the platform can push a new cluster with a single command (pks create-cluster) while being able to control elaborate networking configs using network profile.

While PKS and NSX-T do achieve those objectives there are cases where developers just want to have Kubernetes on their laptops for testing or in the datacenter with just upstream Kubernetes. sometimes, all you want is a minimally viable solution, especially from a CNI point of view. For these fast and easy deployments, NSX-T may be overkill and why VMware has created a new open-source CNI project called “Antrea

Antrea, is a nifty CNI, based on OVS instead of iptables and is easily deployed using a single “kubectl apply” command.

Today it supports basic network functions and network policies. In the future, VMware is planning to provide a way to manage many “Antreas” using the NSX control plane, so that if you have a ton of developers using it, you can control the security policies and network configuration they deploy. As part of my “NSX Service Mesh” testing that I do with NSX Service Mesh, I decided I need to test it out. But then I found out that I haven’t deployed Kubernetes that is not PKS and I don’t know what I am doing ūü§®

After trying to deploy a cluster the really manual way for a day with no success, I called my goto person for anything K Oren Penso (@openso), oren gave me his raw notes on deploying k8s with kubeadm, which I refined a bit and added Antrea as the CNI and Metallb as the ingress controller. And this post is about sharing those. So, with no further ado here are the steps:

  • I use Ubuntu as my leader and follower OSs.
    (Leaders and followers is how I call the previously called Master and node, this is following the lead of
    Joe Beda @jbeda, no pun intended ūüėä)
  • The first thing you want to do is make sure to disable the swap in the OS by commenting out the swap entry in /etc/fstab and running:
sudo swapoff -a
  • Next, we want to install kubeadm. starting on the Leader. Now, there are other ways to deploy Kubernetes, such as using kind, or even cluster API (which I will switch to later this year) but for now, kubeadm is the best way for me. One thing is you need to make sure that the kubeadm version matches the version of Kubernetes you want to deploy. for example, I want to deploy the latest, I will run the following commands:
apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl

But if I want to deploy Kubernetes 1.15, which is what I wanted to do I had to deploy the Kubeadm and kubelet for kubeadm with a matching version like this:

apt-get install -y kubelet=1.15.5-00 kubeadm=1.15.5-00 kubectl
  • The next step is to deploy Kubernetes itself on the Leader. We will run the “kubeadm” command to do so, with a couple of notes. we want to specify a pod CIDR for our pod internal IPs. this is relating to the way Kubernetes allow pods to communicate internally and these IPs do not conflict with other clusters, but you don’t want it to overlap with the actual CIDR of the nodes themselves. In my lab the nodes are on 192.168.x.x so I had to specify also a new CIDR. Also, unless you want the latest Kubernetes, you need to specify the version as well, which in my case I did specify version 1.15 like this:

kubeadm init --kubernetes-version 1.15.5 --pod-network-cidr 172.12.0.0/16 | tee kubeadm-init.out read more

Istio, mTLS and the OSI layer

I have been playing a lot with Istio and recently tested mTLS encryption. The test, which I describe in this post, really materialized the OSI layer in front of my eyes. which is always interesting how new stuff can dust off your old basic knowledge.

The entire concept of service mesh and Istio is exciting and revolutionary in my view… but just like any new groundbreaking tech, it takes a few cycles to realize how it manifests beyond the papers, blogs and theory, at least for me. So, as I usually do, I share my experiences on this blog and in my sessions with other in the thought that if I can help even one person understand it better I have achieved my goal.
read more

Only I have the solution! and it is…

We live in a truly hyped era. Kubernetes, Docker, Istio, Serverless, PaaS, CaaS, FaaS you name the buzzwords, these words draw all the attention of the Dev/IT worlds, interstingly enough only a small percentage of organizations actually employ these technologies today, in production or even at all.

Like any new tech there are a barriers of knowledge and investment to get in, weighing the cost of moving to these platforms vs the pain it solves is hard to quantify. For each one of these trends and more that I may have forgotten, there is a group of followers who see these solutions as the be-all-end-all solution for every problem conceivable: read more

Service mesh is just another form of virtualization

When I started working with VMware ESX in the early 2000, I knew it was a very cool tech; and not only me, everyone knew there’s something special about it.

However, I haven‚Äôt fully grasped the full value of this technology right of the gate, at that point,¬†I only saw “server consolidation” in front of me.

When vMotion came out, and we realized that physics has changed for our servers, we were no longer tied to the hardware the Server was running on. That hardware abstraction allowed us to do things we couldn’t do before. like fixing hardware issues or patch it with no downtime, scale much better and faster by deploying VMs when we need them and monitor the health of the infrastructure much better, even self heal. A new exciting world of agility we never saw before was opened.

 

Due to the above combined with automation, the effort of managing servers has been lowered, and fewer people are needed to manage fleets of servers.

What does that has to do with Service mesh you ask?

Recently I started focusing on Service mesh, mainly Istio, testing it in the lab, learning the technology and feeling that magic again. While the technology is cool, I was trying to understand the business value that is more than buzz words like distributed, load balancing, observability etc. However, at some point, I realized that I was looking at it all wrong. I was looking for the value from a networking operations point of view, it’s only when I looked at it from a developer value when it clicked.

Service mesh is a form of virtualization

When I get excited, I let the world know, that’s why I love twitter

I see much equivalency in Service mesh to virtualization.

In the monolithic app world, many of the different pieces of code that compile the application or service are running on a small set of servers, so making decisions about how that component interacts with other parts of the application are written in the code.

That means that for every piece of meaningful code that differentiates the business the application is servicing, need to have much non-differentiate code along with it.

Things like server and client side communication, service lookups, error detection and response, telemetry, security are taken care of in the code or middleware software.

With the rise of micro-services (and the use of containers for that purpose) each container now runs a piece of differentiating code and is a single purpose server that communicates with other services on the network. The distributed architecture and the proliferation micro-services, bring new challenges to manage, monitor and troubleshoot problems.

 

What service mesh and Istio does is outsourcing the non-differentiating work to the sidecars with Envoy where each k8s pod now has a proxy that is responsible for communicating with other proxies and out of the mesh. (Envoy can work with more than k8s pods, it can even work with VMs or Pivotal PAS AIs!)

Now we’ve abstracted the non-differentiating code. Similarly to the value we gained by virtualizing the hardware with the hypervisors and adding a control plane, we gain for the operations of the proxy by adding a control plane in the form of Istio (I will not go into the deeper architecture in this post, there are literally hundreds of posts about it out there)

Here is a diagram to illustrate the abstraction layers in one picture

We can apply our desired state as policies to anything that is not the core function of our software, change policies on the fly without changing our code which saves much effort spent by developers, dynamically changing the policies without changing any code, apply security and authentication to transactions and have better visibility into the application health. Self-healing becomes a real thing now.

But just like virtualization brought its own set of challenges, Service mesh is no different,  which I will cover in my next post.

You can read more about the details of Istio features in this blog post: https://blogs.vmware.com/opensource/2018/10/16/service-mesh-architectures-inevitable

I think this analogy explains the subject, and the proliferation of abstraction layers brings a new set of challenges from a management point of view.

Have any thoughts on this? tweet your reply

@niranec

Niran

NSX-T manager fails to load? It might be that the Corfu DB got corrupted

If you’re like me, and you are spinning new nested labs left and right, you are also probably over-committing on your VMFS datastore regularly.

The issue that happened to me was that I ran out of datastore space and it crashed my NSX-T manager. Perhaps this issue can also happen for other reasons. In any case the issue manifests itself by not being able to login to the NSX-T manager where it keeps saying that the service is not ready.

When runing the command¬†‚Äúget management-cluster status‚ÄĚ on the NSX-T manager you may¬†get:

Number of nodes in management cluster: UNKNOWN

Management cluster status: INITIALIZING

Number of nodes in control cluster: UNKNOWN

This problem can heppn becuse the Corfu DB in NSX-T has failed to load. In the case of running out of datastore space it almost certainly a corruption in a record in the database. 

So how do we identify and resolve this issue?

Follow these steps:

  • ssh in to the NSX manager using user:admin
  • cd to¬†/config/corfu/log/ directory. Here you should see the log files serially named. (example 280.log, 281.log,‚Ķ)
  • Recommended to take a backup of the folder using cp -R¬†/config/corfu/log/¬†/config/corfu/log.backup
  • In the appliance there is a log reader tool. use it to read teh latest log. e.g.¬†corfu_logReader display <log file name> (example 281.log)
  • If the DB is corrupt the log (which might take a while to roll) will exit with an error. The output of this command will look something like the following:
  • read more

    What are these Spectre and Meltdown vulnerabilities all about

    For any of my friends that are not computer savvy, or usually don’t care. In this post I’ve digested the info for you about the security bug in CPUs, which is a BIG DEAL. You will start to hear the words like¬†#Meltdown¬†and¬†#Spectre¬†alot soon regarding your computer security. Allow me to explain in very high level, hopefully this helps some of you to better understand the biggest security bug in history :
    Meltdown is the name of a vulnerability found in Intel CPUs only, where 

    security is compromised to gain more speed. Basically Intel engineers designed their CPUs to be more performant but neglected to make sure they are secure enough, and the result is that one piece of code running on an Intel CPU can read the “kernel memory” of the operating system (OS) . Think of the kernel memory as your brain’s secret thoughts, what would have happened if I gained access there? In the computer world that’s where all your passwords are for example.
    The patches that are coming out for this one are on the OS side (windows, Linux etc) and they expect to slow down all Intel chip sets by 30%-50%. Yes, your computer will be slower.
    Do not underestimate this problem, code and guides how to exploit this vulnerability are already surfacing. (see link below)
    The second name you might hear is “Spectre”. This is a vulnerability that affects ALL cpu vendors. And the worst thing, this cannot be patched, it’s a basic design flaw and it will stay with us for at least a decade until the current HW cycle gets refreshed world wide. Fortunately this one is much harder to exploit. We will have to see how this rolls out.
    Most worrisome use case besides getting the password of your grandma back accounts, is shared HW, especially in the cloud. Think of one customer who rents compute resources from the cloud and is able to read password and data of other customers running on the same HW. Maybe your bank is the victim? And this affect everyone!
    That’s it, hope this helps, let me know your thoughts.
    Those who wants to read more see this link https://meltdownattack.com/ read more

    My VMworld sessions recordings from 2017

    This year’s VMworld was the most busy and unbelievably awesome I have been too. This year I was also extra busy myself with 3 Breakout sessions, including one with Microsoft PM on stage talking about our joint work together, that one unfortently was not recorded.
    Check the sessions recording out here:

    VIRT2211BU – Automating NSX for Virtual Machines and Containerized Applications

    VIRT1930PU – SQL Server on vSphere: A Panel with Some of the World’s Most Renowned Experts

    EU recording

    VIRT2211BU – Automating NSX for Virtual Machines and Containerized Applications

    My sessions recordings from VMworld US 2016

    It is so nice that VMworld has released the sessions recordings from VMworld US publicly for everyone, thanks to William Lam for publishing
    all the direct URLs here https://github.com/lamw/vmworld2016-session-urls

    As for the sessions themselves, we had a nice turnout of about 220 folks in each session and the reviews were great.

    Here are the recordings:

    VIRT7575 – Architecting NSX with Business Critical Applications for Security, Automation and Business Continuity

    VIRT7654 – SQL Server on vSphere: A Panel with Some of the World’s Most Renowned Experts

     

    Have fun,

    Niran

    My VMworld in 2016

    This is it, this year I am finally taking a very active role at VMworld after a few years of only being an attendee (except for one session at VMworld Europe in 2009 ) .

    For this year’s VMworld I am going to take on the role of¬†the Booth captain for the Virtualzing apps track booth, (YES!)¬†¬†I will¬†be working with a staff of 4: Sudhir Balasubramanian, Vas Mitra, Agustin Malanco the man (Twitter –¬†@agmalanco ) and Ryan DaWaele. such a great crew!

    We are planning 2 stations this year, where station #1 is going to run the traditional demos for Business critical applications with vSphere, features like: DRS, vMotion etc and new this year with vVols and vRA.

    Station #2 is new this year, we are going to have a second station solely focused on business critical apps with NSX demos. We are already working really hard on developing these demos so I don’t want to spoil it, but it is going to be epic! really cool stuff around Oracle RAC, SQL, SAP etc with really cool NSX demos. expect to be wowed.

    That’s not all, I have 2 sessions this year:

  • Architecting NSX with Business Critical Applications for Security, Automation and Business Continuity [VIRT7575] –¬†A session covering the Business critical apps use cases with NSX where me and my colleague Sudhir Balasubramanian are going to cover¬†the use cases to app owners and networking folks who are interested in applying NSX goodness to their app owners.
  • SQL Server on vSphere: A Panel with Some of the World’s Most Renowned Experts [VIRT7654] –¬†I will be facilitating a panel of world renowned SQL Server experts about anything SQL on VMware. The panelists are:
    Denny Cherry, Twitter –¬†@mrdenny¬†Principal Consultant, Denny Cherry & Associates Consulting
    Allan Hirt, Twitter Р @SQLHA Managing Partner, SQLHA LLC
    David Klee, Twitter –¬†@kleegeek¬†¬†Founder, Heraflux Technologies
    Thomas Larock, Twitter ¬†–¬†@SQLRockstar¬†¬†Head Geek, SolarWinds
  • read more

    VMware NSX Question – Can You Figure it Out?

    I wrote a blog post in the VMware official blog about a demo I recorded called “Dynamically enforcing Security On a Hot Cloned SQL Server With VMware NSX“.

    A bit long of a title but captures the essence of the demo perfectly. You can see the demo as well here:

    I got a question from a colleague of mine with has a very keen eye:

    “I just saw the great video you made, at 0:50 second of the demo we can see the rules for the prod app

    What is the¬†meaning of rule 6? ¬†If the source is the datacenter¬†and is broader than the App Server in rule 5, and the rule allows for ANY service, doesn’t it make rule 5 irrelevant? “

    This is a great observation by Manuel with a very simple explanation which demonstrates perfectly the power of VMware NSX, can you figure out the answer?

     

    Rule 6 makes sense, only if you know your NSX ūüôā