Kubernetes – Offline Installation Guide (Part 2 – Master, Workers and GPUs)

1*QtnQTnrgQ-N-r5k-nYUZ0g

Architecture

To re-iterate, the setup is as follows (CentOS 7 based):

Screen Shot 2017-11-19 at 11.22.44 AM

Master

We will use weave net for our pods, so get the weave net startup yaml file.

export kubever=$(kubectl version | base64 | tr -d ‘\n’)
​wget https://cloud.weave.works/k8s/net?k8s-version=$kubever

Alternatively, use this file I pre-downloaded on an internet computer. Link

At the node designated as your K8s master, type:

kubeadm init —kubernetes-version=v1.8.1 —apiserver-advertise-address=10.100.100.1

Take note of the join command – save it to txt or something because you will need this to join your other nodes later.

As your admin user (user requires “wheel” access), run the following. Don’t run it as root.

mkdir -p $HOME/.kube​
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config​
kubectl apply -f weave.yaml

This procedure makes your current user a Kubernetes user.

Allow pods to be scheduled on GPU1 master​. Our master has 512GB ram and 8 GPUs – no way we’re gonna waste that!

kubectl taint nodes –all node-role.kubernetes.io/master-

At GPU1 master​,

kubectl get svc

​Take note of the cluster IP. You will need it for the next step.

Worker

route add  gw 10.100.100.1

​Add the above line to /etc/rc.local (on worker node only) to always create this route on boot-up. The nodes sometimes reference the cluster-ip which is not seen by them. This will route the request through the master node, which is 10.100.100.1

In our setup, we have an external yum repository server located at 168.102.103.7. Hence, in order for our containers to all have access to this server, we do this in the rc.local file.

route add 168.102.103.7 gw 10.100.100.1

Remember to do 

chmod +x /etc/rc.d/rc.local

 so that this file is executable during the boot sequence process. 

Reboot, then check the routes using

route -n

Joining the Worker(s) to the Master Node

Remember the join command that was printed out when you created the master node? We need it now for joining the node to the master.

kubeadm reset​
kubeadm join –token=… 10.100.100.1:6443 –discovery-token-ca-cert-hash sha256:​

NOTE: If token has expired (unlikely if you’re doing Part 1 and Part 2 all in one shot), at master node: ​

kubeadm token create –ttl=0 ​

To create a token that never expires.

kubeadm token list

to see the tokens.

Enabling Your NVIDIA GPUs

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeconf

To the following line, add –feature-gates=”Accelerators=true”

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS […] –feature-gates=”Accelerators=true”

If the above doesn’t work for some reason, try adding –feature-gates=Accelerators=true    (no “”)

Then enable and restart the kubelet service.

systemctl enable kubelet && systemctl start kubelet

Follow the official K8s instructions here on how to make use of your GPUs in POD files.

Verification

Go to the master node (10.100.100.1), and type

kubectl get nodes

You should see a list of all the nodes that have joined your master. You can start running your pod yaml files now.

Try running GPU-enabled POD files following the instructions here.

Crap!

Didn’t work? I compiled a list of issues I encountered, and successfully trouble-shooted –  to be continued!

Super Easy Sequence Diagrams – PlantUML

http://plantuml.com/ is a very easy to use sequence diagram maker that is free. It has a extremely intuitive syntax that will allow you to create complex diagrams within minutes.

There’s a online version here : https://www.planttext.com

Take for example :

@startuml

participant FDC_FPGA

box "FPGA" #LightYellow
 participant FDC_FPGA
end box

box "MPC" #LightBlue
 participant Kernel
 participant User_Space
end box

User_Space -> Kernel : Wait Data Ready ioctl
activate User_Space
activate Kernel
FDC_FPGA -> Kernel : Data Ready Interrupt
Kernel -> User_Space : Data Ready Awake
deactivate Kernel
deactivate User_Space
User_Space -> Kernel : Setup Read Descriptors ioctl
Kernel -> FDC_FPGA : Set Registers
User_Space -> Kernel : Enable DMA Start ioctl
Kernel -> FDC_FPGA : Set Registers
User_Space->Kernel : Wait DMA Done ioctl
activate Kernel
activate User_Space
FDC_FPGA -> Kernel : Dma Done Interrupt
Kernel -> User_Space : Dma Done Awake
deactivate User_Space
deactivate Kernel

@enduml

This gives a very nice diagram that is ready for presentation :

Kernel_Driver_Flow

I use it all over the place for Software Design Documents. The best thing is – it has “source code” , so anytime there’s a change done, all I have to do is to use the “source” to regenerate the diagram. Happiness!