To re-iterate, the setup is as follows (CentOS 7 based):
We will use weave net for our pods, so get the weave net startup yaml file.
export kubever=$(kubectl version | base64 | tr -d ‘\n’)
Alternatively, use this file I pre-downloaded on an internet computer. Link
At the node designated as your K8s master, type:
kubeadm init —kubernetes-version=v1.8.1 —apiserver-advertise-address=10.100.100.1
Take note of the join command – save it to txt or something because you will need this to join your other nodes later.
As your admin user (user requires “wheel” access), run the following. Don’t run it as root.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f weave.yaml
This procedure makes your current user a Kubernetes user.
Allow pods to be scheduled on GPU1 master. Our master has 512GB ram and 8 GPUs – no way we’re gonna waste that!
kubectl taint nodes –all node-role.kubernetes.io/master-
At GPU1 master,
kubectl get svc
Take note of the cluster IP. You will need it for the next step.
route add gw 10.100.100.1
Add the above line to /etc/rc.local (on worker node only) to always create this route on boot-up. The nodes sometimes reference the cluster-ip which is not seen by them. This will route the request through the master node, which is 10.100.100.1
In our setup, we have an external yum repository server located at 126.96.36.199. Hence, in order for our containers to all have access to this server, we do this in the rc.local file.
route add 188.8.131.52 gw 10.100.100.1
Remember to do
chmod +x /etc/rc.d/rc.local
so that this file is executable during the boot sequence process.
Reboot, then check the routes using
Joining the Worker(s) to the Master Node
Remember the join command that was printed out when you created the master node? We need it now for joining the node to the master.
kubeadm join –token=… 10.100.100.1:6443 –discovery-token-ca-cert-hash sha256:
NOTE: If token has expired (unlikely if you’re doing Part 1 and Part 2 all in one shot), at master node:
kubeadm token create –ttl=0
To create a token that never expires.
kubeadm token list
to see the tokens.
Enabling Your NVIDIA GPUs
sudo vim /etc/systemd/system/kubelet.service.d/10-kubeconf
To the following line, add –feature-gates=”Accelerators=true”
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS […] –feature-gates=”Accelerators=true”
If the above doesn’t work for some reason, try adding –feature-gates=Accelerators=true (no “”)
Then enable and restart the kubelet service.
systemctl enable kubelet && systemctl start kubelet
Follow the official K8s instructions here on how to make use of your GPUs in POD files.
Go to the master node (10.100.100.1), and type
kubectl get nodes
You should see a list of all the nodes that have joined your master. You can start running your pod yaml files now.
Try running GPU-enabled POD files following the instructions here.
Didn’t work? I compiled a list of issues I encountered, and successfully trouble-shooted – to be continued!