Kubernetes – Offline Installation Guide (Part 2 – Master, Workers and GPUs)

1*QtnQTnrgQ-N-r5k-nYUZ0g

Architecture

To re-iterate, the setup is as follows (CentOS 7 based):

Screen Shot 2017-11-19 at 11.22.44 AM

Master

We will use weave net for our pods, so get the weave net startup yaml file.

export kubever=$(kubectl version | base64 | tr -d ‘\n’)
​wget https://cloud.weave.works/k8s/net?k8s-version=$kubever

Alternatively, use this file I pre-downloaded on an internet computer. Link

At the node designated as your K8s master, type:

kubeadm init —kubernetes-version=v1.8.1 —apiserver-advertise-address=10.100.100.1

Take note of the join command – save it to txt or something because you will need this to join your other nodes later.

As your admin user (user requires “wheel” access), run the following. Don’t run it as root.

mkdir -p $HOME/.kube​
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config​
kubectl apply -f weave.yaml

This procedure makes your current user a Kubernetes user.

Allow pods to be scheduled on GPU1 master​. Our master has 512GB ram and 8 GPUs – no way we’re gonna waste that!

kubectl taint nodes –all node-role.kubernetes.io/master-

At GPU1 master​,

kubectl get svc

​Take note of the cluster IP. You will need it for the next step.

Worker

route add  gw 10.100.100.1

​Add the above line to /etc/rc.local (on worker node only) to always create this route on boot-up. The nodes sometimes reference the cluster-ip which is not seen by them. This will route the request through the master node, which is 10.100.100.1

In our setup, we have an external yum repository server located at 168.102.103.7. Hence, in order for our containers to all have access to this server, we do this in the rc.local file.

route add 168.102.103.7 gw 10.100.100.1

Remember to do 

chmod +x /etc/rc.d/rc.local

 so that this file is executable during the boot sequence process. 

Reboot, then check the routes using

route -n

Joining the Worker(s) to the Master Node

Remember the join command that was printed out when you created the master node? We need it now for joining the node to the master.

kubeadm reset​
kubeadm join –token=… 10.100.100.1:6443 –discovery-token-ca-cert-hash sha256:​

NOTE: If token has expired (unlikely if you’re doing Part 1 and Part 2 all in one shot), at master node: ​

kubeadm token create –ttl=0 ​

To create a token that never expires.

kubeadm token list

to see the tokens.

Enabling Your NVIDIA GPUs

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeconf

To the following line, add –feature-gates=”Accelerators=true”

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS […] –feature-gates=”Accelerators=true”

If the above doesn’t work for some reason, try adding –feature-gates=Accelerators=true    (no “”)

Then enable and restart the kubelet service.

systemctl enable kubelet && systemctl start kubelet

Follow the official K8s instructions here on how to make use of your GPUs in POD files.

Verification

Go to the master node (10.100.100.1), and type

kubectl get nodes

You should see a list of all the nodes that have joined your master. You can start running your pod yaml files now.

Try running GPU-enabled POD files following the instructions here.

Crap!

Didn’t work? I compiled a list of issues I encountered, and successfully trouble-shooted –  to be continued!

Loading tmux on Boot in Linux

ss-tmux2

tmux is a wonderful tool for displaying virtual consoles on the linux command prompt screen. It’s the next best thing to getting actual GUI windows controllable with a mouse.

Mainly, I use it for ssh purposes. Where I can ssh to a pc that I know has tmux already launched in the background and type.

tmux a

which attaches the session to the on-going tmux background session, allowing you to see everything that is going on in that process. This is especially useful for embedded systems where there are multiple processes launched in the background and you want to monitor them all.

So I have a tmux script launcher.sh here:

#!/bin/bash

SESSION="MPC1"

#allow re-launch
/usr/bin/tmux has-session -t $SESSION 2> /dev/null && /usr/bin/tmux kill-session -t $SESSION
/usr/bin/tmux -2 new-session -d -s $SESSION

echo "Launching tmux"

/usr/bin/tmux split-window -h
/usr/bin/tmux split-window -v
/usr/bin/tmux select-pane -t 0

/usr/bin/tmux send-keys -t $SESSION.0 "cd /path/to/binary1folder" C-m
/usr/bin/tmux send-keys -t $SESSION.0 "./binary1" C-m

/usr/bin/tmux send-keys -t $SESSION.1 "cd /path/to/binary2folder" C-m
/usr/bin/tmux send-keys -t $SESSION.1 "./binary2" C-m

/usr/bin/tmux send-keys -t $SESSION.2 "cd /path/to/binary3folder" C-m
/usr/bin/tmux send-keys -t $SESSION.2 "./binary3" C-m

This basically opens up , three panes and splits the window horizontally first, then splitting again one of the split windows vertically. It then launches a binary in each of the window panes. I won’ t go too much into the scripting here as there are plenty of resources for doing so, like this.

Configuring tmux to boot on startup on CentOS 7

Normally this should be pretty straightforward, but I ran into some hiccups.

First,

sudo nano /etc/rc.local

And edit the rc.local file to include

su -c /path/toyourscript/launcher.sh -l your_user_id

-l your_user_id means that you do the launch of the script launcher.sh as the user your_user_id.

Make sure your rc.local is executable.

sudo chmod +x /etc/rc.local

And by right it should launch when CentOS boots, launching launcher.sh in the background which in turn launches tmux. However, I found that one of the abrt startup scripts, abrt-console-notification.sh was interfering with the launching of the tmux process/binaries. It would hang at the console terminal of the tmux screens. Doing the following resolved the problem for me.

cd /etc/profile.d/
chmod -r abrt-console-notification.sh

Basically, make abrt-console-notification.sh non-readable, allowing the profile.d startup process to skip over this particular script. It’s kind of a hack, but it worked. I reckon at most I don’t get the automatic bug reporting tool notifications at the console. Note that this script is run depending on what type of installation you chose when installing CentOS. I think that the minimal installl doesn’t run into this issue.

Hope this it useful to you, let me know!

ps. Here’s a tmux cheat sheet. https://gist.githubusercontent.com/afair/3489752/raw/e7106ac93c8f9602d3843696692a87cfb43c2d21/tmux.cheat

Installing CUDA 7 on CentOS 7 – The Golden Path for OpenGL Samples to Work

1. Install CentOS 7 – this should be pretty straightforward!

2. Follow this guide to install CUDA 7 : http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Getting_Started_Linux.pdf

CUDA installation has to be done in command line mode, no X Windows. So once you go in CentOS 7 GUI, open a terminal and type

$ systemctl set-default multi-user.target

$ reboot

CentOS 7’s default mode now will be to reboot to CLI.

Start the CUDA 7 installation.

Remember to install the OpenGL Libraries. Read 4.2 onwards carefully! Follow all the steps.

3. Disable the nouveau drivers (as you are installing NVIDIA drivers)

Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:

$ blacklist nouveau

$ options nouveau modeset=0

4. Regenerate the kernel initramfs

$ dracut --force

5. Run nvidia-xconfig to recreate the config file for X Windows.

$ nvidia-xconfig

$ reboot

6. Set the library paths for CUDA 7 libraries on boot       

$ cd  /etc/profile.d

$ vim cudapaths.sh

Type the following into the cudapaths.sh script.

export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH

Save and reboot the OS.

7. Install 3rd Party Libraries (for GL)

If you try to compile one of the projects under 3-Imaging, you will get a lot of lib* not found errors. You got to install the libraries manually.

$ yum install mesa-libGLES.x86_64 mesa-libGL-devel.x86_64 mesa-libGLU-devel.x86_64 mesa-libGLw.x86_64 mesa-libGLw-devel.x86_64 libXi-devel.x86_64 freeglut-devel.x86_64 freeglut.x86_64

8. Reinstall NVIDIA Drivers to Fix Symbolic Links

Due to a bug when installing the 3rd party drivers, you will need to re-run the driver installation to fix some symbolic links. Bug Info: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5094265

Download drivers here: http://www.geforce.com/drivers/results/84721

The next person who tells me installing stuff in Linux is as easy as doing it in Windows/OSX, I am going to punch him in the face.

Installing Nvidia Drivers on RHEL or CentOS 7

http://www.advancedclustering.com/act-kb/installing-nvidia-drivers-rhel-centos-7/

Another article, more descriptive.

http://www.dedoimedo.com/computers/centos-7-nvidia.html

EDIT: You know what ? Fuck the above 2 guides for wasting my time. They’re missing steps here and there.

Particularly this:

If the GPU used for display is an NVIDIA GPU, the X server configuration file, /etc/X11/xorg.conf, may need to be modified. In some cases, nvidia-xconfig can be used to automatically generate a xorg.conf file that works for the system. For non-standard systems, such as those with more than one GPU, it is recommended to manually edit the xorg.conf file. Consult the xorg.conf documentation for more information.

So there you have it – after installing the CUDA 7, run:

/usr/local/cuda-7.0/bin/nvidia-xconfig to generate a new xorg.conf file for X Server. Else your X Windows configuration may not know about the new CUDA 7 driver.

Follow this instead: NVIDIA’s official guide: http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Getting_Started_Linux.pdf