Kubernetes – Offline Installation Guide (Part 2 – Master, Workers and GPUs)

1*QtnQTnrgQ-N-r5k-nYUZ0g

Architecture

To re-iterate, the setup is as follows (CentOS 7 based):

Screen Shot 2017-11-19 at 11.22.44 AM

Master

We will use weave net for our pods, so get the weave net startup yaml file.

export kubever=$(kubectl version | base64 | tr -d ‘\n’)
​wget https://cloud.weave.works/k8s/net?k8s-version=$kubever

Alternatively, use this file I pre-downloaded on an internet computer. Link

At the node designated as your K8s master, type:

kubeadm init —kubernetes-version=v1.8.1 —apiserver-advertise-address=10.100.100.1

Take note of the join command – save it to txt or something because you will need this to join your other nodes later.

As your admin user (user requires “wheel” access), run the following. Don’t run it as root.

mkdir -p $HOME/.kube​
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config​
kubectl apply -f weave.yaml

This procedure makes your current user a Kubernetes user.

Allow pods to be scheduled on GPU1 master​. Our master has 512GB ram and 8 GPUs – no way we’re gonna waste that!

kubectl taint nodes –all node-role.kubernetes.io/master-

At GPU1 master​,

kubectl get svc

​Take note of the cluster IP. You will need it for the next step.

Worker

route add  gw 10.100.100.1

​Add the above line to /etc/rc.local (on worker node only) to always create this route on boot-up. The nodes sometimes reference the cluster-ip which is not seen by them. This will route the request through the master node, which is 10.100.100.1

In our setup, we have an external yum repository server located at 168.102.103.7. Hence, in order for our containers to all have access to this server, we do this in the rc.local file.

route add 168.102.103.7 gw 10.100.100.1

Remember to do 

chmod +x /etc/rc.d/rc.local

 so that this file is executable during the boot sequence process. 

Reboot, then check the routes using

route -n

Joining the Worker(s) to the Master Node

Remember the join command that was printed out when you created the master node? We need it now for joining the node to the master.

kubeadm reset​
kubeadm join –token=… 10.100.100.1:6443 –discovery-token-ca-cert-hash sha256:​

NOTE: If token has expired (unlikely if you’re doing Part 1 and Part 2 all in one shot), at master node: ​

kubeadm token create –ttl=0 ​

To create a token that never expires.

kubeadm token list

to see the tokens.

Enabling Your NVIDIA GPUs

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeconf

To the following line, add –feature-gates=”Accelerators=true”

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS […] –feature-gates=”Accelerators=true”

If the above doesn’t work for some reason, try adding –feature-gates=Accelerators=true    (no “”)

Then enable and restart the kubelet service.

systemctl enable kubelet && systemctl start kubelet

Follow the official K8s instructions here on how to make use of your GPUs in POD files.

Verification

Go to the master node (10.100.100.1), and type

kubectl get nodes

You should see a list of all the nodes that have joined your master. You can start running your pod yaml files now.

Try running GPU-enabled POD files following the instructions here.

Crap!

Didn’t work? I compiled a list of issues I encountered, and successfully trouble-shooted –  to be continued!

Kubernetes – Offline Installation Guide (Part 1 – Setting Up)

1*QtnQTnrgQ-N-r5k-nYUZ0g

A while back, I had the chance to set up a Kubernetes cluster on a group of GPU-enabled servers at my workplace. Each server had 8 GTX1080Ti NVIDIA GPUs in it, and 512GB of ram, and 72 CPU cores.

The criteria were as follows:

  1. Resource management – The user should not need to manage the resources of the cluster. The containers should be put into the appropriate server that can serve its requirements. Easily resolved by Kubernetes.
  2. Easy horizontal scaling  : One should be able to easily add new servers to the cluster.
  3. Offline Install : There is and will be a security air gap between the servers and the external internet world, meaning that we would have to install Kubernetes in an offline way.
  4. GPU Acceleration: Our users who run tensorflow algorithms required GPU acceleration within their containers.

I jumped at the chance to experiment with containers and K8s and volunteered. The setup looks something like this:

Screen Shot 2017-11-19 at 11.22.44 AM

Kubeadm

Kubeadm is an installer for Kubernetes, and is well-supported by the K8s community. For our offline installation, we mirror-ed closely the Kubeadm setup steps whenever we could.

https://kubernetes.io/docs/setup/independent/install-kubeadm/

However, Kubeadm is still largely online-based, an internet connection is assumed.

You will need a PC with internet access to do this installation for downloading of RPMs and Docker Images. I am doing this on a CentOS 7 system so the repository handling will be yum-based. 

 

 

Download the Required Kubernetes RPMs

You can use yum –downloadonly to download all required rpms. See the link below for a guide on how to use yum –downloadonly.

 https://www.ostechnix.com/download-rpm-package-dependencies-centos/

The required files are

  • kubeadm 1.8.1
  • kubectl 1.8.1
  • kubelet 1.8.1
  • kubernetes-cni 0.5.1-1
  • ebtables 2.0.10-15
  • ethtool 4.8-1

It’s all here in a zip archive for you lazy ones. These are the ones that I tested against. Of course, you may want to download the latest and greatest ones.  Link

Install Kubeadm and Friends

Install kubeadm, kubectl, kubelet and kubernetes-cni and start kubelet services.

yum install *.rpm

To install all the rpm files that you downloaded.

Install Docker

Follow the steps to install Docker here: https://kubernetes.io/docs/setup/independent/install-kubeadm/

It should be available in the repository of your linux distribution. Configure your distribution for local repository. Alternatively, download the docker rpm as well on a live internet machine. (same steps as above using yum –download-only)

Typically,

yum install docker

Getting the System Ready

Enable and Start Docker Service

systemctl enable docker && systemctl start docker

Turn off the swap file of the system.

sudo swapoff -a

Remove/comment swap entries in /etc/fstab.

sudo vim /etc/fstab

It looks something like this.

/dev/VolGroup00/LogVol02   swap     swap    defaults     0 0

Disable SELinux​

setenforce 0

 

Edit k8s.conf​

vim /etc/sysctl.d/k8s.conf​

In k8s.conf, add the lines

net.bridge.bridge-nf-call-ip6tables = 1​
net.bridge.bridge-nf-call-iptables = 1​

sysctl –system

 

Download the Docker Images

Kubeadm runs most required Kubernetes components on container images. Which is a great design as the underlying operating system is kept relatively “clean” compared to something like OpenStack.

You can inspect an existing kubeadm installation to see the container list after installing kubeadm on an online machine.

sudo docker images

Those that start with gcr.io/google_containers and weaveworks are what you need.

You can pull the container images one-by-one individually using the following command. Example here is pulling the kube-apiserver-amd64, v1.8.1.

docker pull gcr.io/google_containers/kube-apiserver-amd64:v1.8.1

docker save gcr.io/google_containers/kube-apiserver-amd64:v1.8.1 > kube-apiserver.tar

<repeat for all required containers>

Alternatively, the easy way is to download all the tar files I used for my working install here. Link

 

Load Docker Images into the Kubernetes Computer

CentOS should come with Python to run python scripts.

If you have downloaded my containers zip bundle v1.8.1_k8s_containers.zip, you can run :

python load_k8s_containers.py

to load all containers in the directory into your Kubernetes Computer’s docker repository. It’s just a script to run docker load command for all the containers in the directory.

At this juncture, your system is ready to run kubeadm. The next part will focus on how to set up the master K8s node and getting the slave nodes to join it. It should have been fairly straightforward, but there were some nitty gritty things that I encountered along the way. Also, I will cover the additional setup required to set up GPU acceleration for each of the 8-GPU equipped servers.

Part 2 – Master, Workers and GPUs