Samstag, 4. April 2020

Disaster Recovery for my ProLiant DL160 G6 Ubuntu 18.04

Because my server is my playground I searched for a possibility to rollback the installation to start. A realy easy way is REAR Relax-and-Recover https://github.com/rear/rear

My first try with REAR starts with the shipped version on Ubuntu 18.04 following the description https://linuxwissen.blogspot.com/2018/12/sicheres-backup-mit-rear-relax-and.html

Installation: 

First I installed REAR on the ProLiant server

$ sudo apt-get install rear

Then I created the described configuration files

/etc/rear/site.conf
TIMESYNC=NTP # since all your computers use NTP for time synchronisation
### Achtung ! Den NFS-Pfad OHNE den : vor dem / angeben !!!! ###
BACKUP_URL=nfs://192.168.178.20//srv/nfs/rear
BACKUP=NETFS
OUTPUT=ISO
USE_CFG2HTML=y
BACKUP_PROG=tar
BACKUP_PROG_COMPRESS_OPTIONS="--gzip"


/etc/rear/local.conf
###### Die folgende Zeile muss individuell an das jeweilige System angepasst werden ####
BACKUP_PROG_EXCLUDE=( '/dev/shm/*' '/home/[username]/.cache/*' '/root/.cache/*' '/tmp/*' '/var/cache/*' '/var/tmp/*' "$VAR_DIR/output/\*" )
LOGFILE="/var/log/rear/$HOSTNAME.log"
# Behalte ISO-BUILD in $TMPDIR/rear.XXXXXXXXXXXXXXX/rootfs/
KEEP_BUILD_DIR="true"
NETFS_KEEP_OLD_BACKUP_COPY=yes

/etc/rear/os.conf
OS_VENDOR=Ubuntu 
OS_VERSION=18.04

Backup

I store the created files of REAR on a NFS Mount on my Laptop. For that I have to mount the NFS share on the server:

$ sudo mount -t nfs 192.xxx.xxx.xxx:/srv/nfs/rear /mnt

After that there are two ways to create the backup:

1. Create the rescue ISO and the backup in two steps
$ rear -v -D mkrescue
$ rear -v -D mkbackuponly

2. Create the rescue ISO and the backup in one step
$ rear mkbackup

After the backup is finished you can find the created files on the Laptop under /srv/rear/<name_server>/ . The important files are the <name_server>.iso and the backup.tar.gz

Restore

For automatic restore of the backup through REAR it very important to mount the location of the backup on the laptop on the same place. Also it is necessary, that a active connection exists between the device running the virtual console of the LO and the ProLiant server itself.
In my case I running the virtual console in a VM on my laptop. For that I have to create a bridge connecting to the VM, so that the VM can communicate actively with the server.

1. Step after starting the virtual console of the LO is to mount the ISO as virtual media.

2. In the next step connect the virtual media to the server. Klick on the mounted media and then click "Connect". If the connection is successfully established, the entry contains the IP of the  bridge in the VM.

3. Reboot the server
4. The server boots now in  the start menu of REAR. Select the automated REAR Recovery


5. REAR then makes the rest
6. Klick 3 to reboot after the successful recovery

Challenges with the REAR version 2.3

My first try to recover the backup shows, that the keyboard not works. If you don't choose "Automatic Recover <name_server>" I can't do anything, cause I can't make any input.
Running the Automatic Recover works, but I can't reboot with the REAR menu showing at the end of the recover. I have to reboot the server over the LO. That works for me.

After one recover I decided to update REAR to the actual version 2.5.
$ wget http://download.opensuse.org/repositories/Archiving:/Backup:/Rear:/Snapshot/xUbuntu_18.04/amd64/rear_2.5-0git.0.dff3f6a.unknown_amd64.deb
$ sudo dpkg -i rear_2.5-0git.0.dff3f6a.unknown_amd64.deb

Create the same config files like described above.
Create a new complete backup with the ISO file

After this I want to know, if it is functional and try directly the recover like described for version 2.3. The procedure is exactly the same. With the different that the keyboard is working.

Sonntag, 26. Januar 2020

OpenShift 4.3 nightly on my home server

I want to install a OpenShift Cluster on my HP Proliant D160 G6 server at home. For me the biggest challenge was to organize all the required network settings.

Related to 2 great posts I get the cluster running and want to write a description as reminder for me .

https://itnext.io/install-openshift-4-2-on-kvm-1117162333d0

https://medium.com/@zhimin.wen/preparing-openshift-4-1-environment-dc40ecb7a763

I'm installing a OpenShift 4.3 nightly builds Cluster with 3 Master-Nodes and 2 Worker-Nodes. The VMs are build with KVM on Ubuntu 18.04
This description also works with OpenShift 4.2. I tested it many times ;-)

Following steps are neccessary:

  • KVM Setup
  • Dnsmasq set up and common configuration
  • Configure Ubuntu to use local dnsmasq 
  • dnsmasq cluster specific configuration
  • Loadbalancer and Gobetween configuration
  • Webserver for installation files
  • Create VMs with CoreOs Image
  • Installation of OpenShift
  • Traefik configuration to route the APi

KVM-Setup

Install required packages

$sudo apt -y install qemu qemu-kvm libvirt-bin bridge-utils virt-manager libosinfo-bin
Make sure the libvirtd service is enabled and running

Install the uvtool to create Ubuntu-VMs in an easy way

$sudo apt-get -y uvtool
$sudo uvt-simplestreams-libvirt --verbose sync release=bionic arch=amd64
   Adding: com.ubuntu.cloud:server:18.04:amd64 20200107
$ssh-keygen -b 4096 -t rsa -f ~/.ssh/id_rsa -N ""

Create a KVM-network disabling DNS and DHCP in this network

net-ocp.xml
<network>
  <name>ocp</name>
  <forward mode='nat'/>
  <bridge name='br-ocp' stp='on' delay='0'/>
  <dns enable="no"/> 
  <ip address='192.168.10.1' netmask='255.255.255.0'>
  </ip>
</network>
 
$virsh net-define net-ocp.xml
  Network ocp defined from net-ocp.xml
$virsh net-autostart ocp
  Network ocp marked as autostarted
$virsh net-start ocp
  Network ocp started
 
$sudo systemctl restart libvirt-bin
 
The bridge ist created and configured:
$ifconfig br-ocp
br-ocp: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.10.1  netmask 255.255.255.0  broadcast 192.168.10.255
        ether 52:54:00:dc:f2:c6  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Dnsmasq setup and common configurations

If you haven't already install dnsmasq do that and enable the service
$sudo apt-get install -y dnsmasq

Create a file /etc/dnsmaq.d/common.conf
# Listen on lo and br-ocp only
bind-interfaces
interface=lo,br-ocp 
 
# DHCP
dhcp-option=option:router,192.168.10.1
dhcp-option=option:dns-server,192.168.10.1
dhcp-range=192.168.10.10,192.168.10.254,12h 
 
# forward, use original DNS server
server=10.0.xxx.xxx
server=10.0.xxx.xxx
 
address=/webserver.oc.io/192.168.10.20
 

Configure Ubuntu to use local dnsmasq

Rename the /etc/resolve.conf (some application first ping this file)
$ sudo mv /etc/resolv.conf /etc/resolv.conf.bac
 
Backup /etc/systemd/resolved.conf and change the DNS entry to localhost
$sudo cp /etc/systemd/resolved.conf /etc/systemd/resolved.conf.bac 
/etc/systemd/resolved.conf (new): 
[Resolve]
DNS=127.0.0.1
 
$sudo systemctl restart systemd-resolved.service
$sudo systemd-resolve --status
  Global
         DNS Servers: 127.0.0.1
          DNSSEC NTA: 10.in-addr.arpa
          ...
The KVM host now use the local dnsmasq as DNS server
 

DNS configuration for OpenShift Installation

Create a file /etc/dnsmasq.d/test-ocp4.conf
Inside this file define for each node of the cluster following entries:
  1. DHCP IP address by MAC address
  2. DNS A record
  3. DNS PTR record
My test-ocp4.conf looks like here:
# Configuration for OCP Cluster *.test-ocp4.oc.io

# Bootstrap
dhcp-host=52:54:00:76:34:04,192.168.10.40
address=/bootstrap.test-ocp4.oc.io/192.168.10.40
ptr-record=40.10.168.192.in-addr.arpa,bootstrap.test-ocp4.oc.io

# Master1
dhcp-host=52:54:00:28:4e:09,192.168.10.41
address=/master1.test-ocp4.oc.io/192.168.10.41
ptr-record=41.10.168.192.in-addr.arpa,master1.test-ocp4.oc.io

# Master2
dhcp-host=52:54:00:07:45:14,192.168.10.42
address=/master2.test-ocp4.oc.io/192.168.10.42
ptr-record=42.10.168.192.in-addr.arpa,master2.test-ocp4.oc.io

# Master3
dhcp-host=52:54:00:78:05:d9,192.168.10.43
address=/master3.test-ocp4.oc.io/192.168.10.43
ptr-record=43.10.168.192.in-addr.arpa,master3.test-ocp4.oc.io

# Worker1
dhcp-host=52:54:00:e4:67:14,192.168.10.51
address=/worker1.test-ocp4.oc.io/192.168.10.51
ptr-record=51.10.168.192.in-addr.arpa,worker1.test-ocp4.oc.io

# Worker2
dhcp-host=52:54:00:e5:c5:38,192.168.10.52
address=/worker2.test-ocp4.oc.io/192.168.10.52
ptr-record=52.10.168.192.in-addr.arpa,worker2.test-ocp4.oc.io

# LoadBalancer
dhcp-host=52:54:00:41:e5:45,192.168.10.49
address=/lb.test-ocp4.oc.io/192.168.10.49
ptr-record=49.10.168.192.in-addr.arpa,lb.test-ocp4.co.io

# LB settings for the API
address=/api.test-ocp4.oc.io/192.168.10.49
address=/api-int.test-ocp4.oc.io/192.168.10.49
address=/.apps.test-ocp4.oc.io/192.168.10.49

# ETCD instance A and SRV records
address=/etcd-0.test-ocp4.oc.io./192.168.10.41
srv-host=_etcd-server-ssl._tcp.test-ocp4.oc.io,etcd-0.test-ocp4.oc.io,2380

address=/etcd-1.test-ocp4.oc.io/192.168.10.42
srv-host=_etcd-server-ssl._tcp.test-ocp4.oc.io,etcd-1.test-ocp4.oc.io,2380

address=/etcd-2.test-ocp4.oc.io/192.168.10.43
srv-host=_etcd-server-ssl._tcp.test-ocp4.oc.io,etcd-2.test-ocp4.oc.io,2380 

 
Before restart of the dnsmasq we may need to clear the lease chache, to make sure the DHCP IP allocation is not cached
$sudo rm -rf /var/lib/misc/dnsmasq.leases
$sudo touch /var/lib/misc/dnsmasq.leases
 
$sudo systemctl restart dnsmasq.service
 

 Loadbalancer and Gobetween configuration

For Loadbalancer we install Gobetween in a minimal Ubuntu VM. So first create the VM
$uvt-kvm create lb release=bionic --memory 4096 --cpu 4 --disk 50 --bridge br-ocp --password password 
$virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     lb                             running

$virsh dumpxml lb | grep 'mac address' | cut -d\' -f 2
52:54:00:8e:1d:32

Change the MAC address in the /etc/dnsmasq.d/test-ocp4.conf file.
Clear the lease cache and restart dnsmasq.
Restart the VM to make sure it gets the assigned IP
$virsh destroy lb
Domain lb destroyed

$virsh start lb
Domain lb started

Install the gobetween software 
$mkdir gobetween
$cd gobetween
$curl -LO https://github.com/yyyar/gobetween/releases/download/0.7.0/gobetween_0.7.0_linux_amd64.tar.gz
$tar xzvf gobetween_0.7.0_linux_amd64.tar.gz
  AUTHORS
  CHANGELOG.md
  LICENSE
  README.md
  config/
  config/gobetween.toml
  gobetween
$sudo cp gobetween /usr/local/bin/

Create a systemd service (copy file beyond to /etc/systemd/system/gobetween.service)
gobetween.service
[Unit]
Description=Gobetween - modern LB for cloud era
Documentation=https://github.com/yyyar/gobetween/wiki
After=network.target
 
[Service]
Type=simple
PIDFile=/run/gobetween.pid
#ExecStartPre=prestart some command
ExecStart=/usr/local/bin/gobetween -c /etc/gobetween/gobetween.toml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
 
[Install]
WantedBy=multi-user.target
 
Create a configuration file for gobetween
/etc/gobetween/gobetween.toml
[servers]
[servers.api]
protocol = "tcp"
bind = "0.0.0.0:6443" 

[servers.api.discovery]
  kind = "static"
  static_list = [ 
"192.168.10.40:6443","192.168.10.41:6443","192.168.10.42:6443","192.168.10.43:6443" ] 

[servers.api.healthcheck]
  kind = "ping"
  fails = 1
  passes = 1
  interval = "2s"
  timeout="1s"
  ping_timeout_duration = "500ms" 

[servers.mcs]
protocol = "tcp"
bind = "0.0.0.0:22623" 

[servers.mcs.discovery]
  kind = "static"
  static_list = [ "192.168.10.40:22623","192.168.10.41:22623","192.168.10.42:22623","192.168.10.43:22623" ] 

[servers.mcs.healthcheck]
  kind = "ping"
  fails = 1
  passes = 1
  interval = "2s"
  timeout="1s"
  ping_timeout_duration = "500ms"

[servers.http]
protocol = "tcp"
bind = "0.0.0.0:80"

[servers.http.discovery]
  kind = "static"
  static_list = [ "192.168.10.44:80","192.168.10.45:80","192.168.10.46:80" ]

[servers.http.healthcheck]
  kind = "ping"
  fails = 1
  passes = 1
  interval = "2s"
  timeout="1s"
  ping_timeout_duration = "500ms" 

[servers.https]
protocol = "tcp"
bind = "0.0.0.0:443"

[servers.https.discovery]
  kind = "static"
  static_list = [ "192.168.10.44:443","192.168.10.45:443","192.168.10.46:443" ]

[servers.https.healthcheck]
  kind = "ping"
  fails = 1
  passes = 1
  interval = "2s"
  timeout="1s"
  ping_timeout_duration = "500ms" 

Start the gobetween service
 

Webserver for installation files

Back to my server I have to install a webserver which will host the files which are necessary for the installation of the CoreOS VMs
$sudo apt-get install -y apache2

Check if the webserver is online
$ curl http://playground:80

There should be a output ;-) (playground is the name of my server)

Create VMs with CoreOs Image

To create CoreOS VMs we need first the ISO image and additional Ignition Files and an initramfs Image. Everything can be get from the Red Hat page with a Red Hat Account
Here you get the installer for the stable release and also the intaller for the nightly builds. 
 
Download the installer
$wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/latest-4.3/openshift-install-linux-4.3.0-0.nightly-2020-01-25-074537.tar.gz
 
Download your pull secret

Download the RHCOS image
$wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-43.81.202001142154.0-installer.x86_64.iso

Download compressed metal BIOS
$wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/rhcos-43.81.202001142154.0-metal.x86_64.raw.gz

Download the oc-client
$wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/latest-4.3/openshift-install-linux-4.3.0-0.nightly-2020-01-25-074537.tar.gz
To get the Ignition Files we have to take some manual steps:
1. unpack the installer in a new directory "install"
    $tar xzvf openshift-install-linux-4.3.0-0.nightly-2020-01-25-074537.tar.gz
2. Create a install-config.yaml file (see also the documentation)
    apiVersion: v1
    baseDomain: oc.io
    compute:
    - hyperthreading: Enabled
      name: worker
      replicas: 0
    controlPlane:
      hyperthreading: Enabled
      name: master
      replicas: 3
    metadata:
      name: test-ocp4
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
         hostPrefix: 23
      networkType: OpenShiftSDN
      serviceNetwork:
      - 172.30.0.0/16
   platform:
      none: {}
   pullSecret: '<pull-secret>'
   sshKey: '<ssh-key>'

3. Create the Kubernetes manifest
    $./openshift-install create manifests --dir=.
    After this command, there are a lot of manifests created in the new directory "manifests"

4. Modify the manifest/cluster-scheduler-02-config.yml to prevent Pods from being scheduled on the control plane nodes. For this change the entry "masterSchedulable" from true to false
5. Create the Ignistion files
   $./openshift-install create ignition-configs --dir=. 
   Now the Ignition Files for the bootstrap-, master-, and worker-nodes are present. Also the kubeconfig file and the kubeadmin password file is created under auth directory
6. Upload the Ignition-Files to the webserver
    $ sudo cp bootstrap.ign master.ign worker.ign /var/www/html/
    $ sudo chmod +r ./*
7. Upload the BIOS image to the webserver

    $ sudo cp ../rhcos-43.81.202001142154.0-metal.x86_64.raw.gz /var/www/html/rhcos-43-metal.raw.gz

For creation of the VM I copied the ISO to the directory /var/lib/libvirt/images/
$ sudo mv ../rhcos-43.81.202001142154.0-installer.x86_64.iso /var/lib/libvirt/images/rhcos-43.iso


Installation of OpenShift

The installation of OpenShift starts with the creation of the Bootstrap VM. After the 3 Control Plane Nodes are online the Kube-Api-Server starts to create all required Operators on the Control Plane Nodes.

Create the Bootstrap-VM:
$virt-install --connect=qemu:///system --name bootstrap --memory 15258 --vcpus 4 --os-type=linux --os-variant=rhel7.5 --disk /var/lib/libvirt/images/bootstrap.qcow2,device=disk,bus=virtio,size=120 --disk /var/lib/libvirt/images/rhcos-43.iso,device=cdrom --network network=ocp,model=virtio,mac=52:54:00:76:34:04

Open the console in the virt-manager and start the installation. The system will boot in an emergency mode, where you can also start the boot-mode with the kernel parameter. It's a little bit more comfortable like the kernel command line
in the emergency console fill in this command:
#/usr/libexe/coreos-installer -d vda -i https://<webserver>/bootstrap.ign -b https://<webserver>/rhcos-43-metal.raw.gz -p qemu 
Then unmount the ISO image and reboot the VM. Check if the VM is reachable via ssh:
$ssh core@bootstrap.test-ocp4.oc.io

Create 3 Control Plane Nodes
$virt-install --connect=qemu:///system --name master1 --memory 15258 --vcpus 4 --os-type=linux --os-variant=rhel7.5 --disk /var/lib/libvirt/images/master1.qcow2,device=disk,bus=virtio,size=120 --disk /var/lib/libvirt/images/rhcos-43.iso,device=cdrom --network network=ocp,model=virtio,mac=52:54:00:28:4e:09 --graphics=vnc --boot hd,cdrom

Again in the emergency console:

#/usr/libexe/coreos-installer -d vda -i https://<webserver>/master.ign -b https://<webserver>/rhcos-43-metal.raw.gz -p qemu 

Repeat this for all 3 Nodes with the different MAC addresses and names.

Create 2 Worker Nodes
$virt-install --connect=qemu:///system --name worker1 --memory 15258 --vcpus 4 --os-type=linux --os-variant=rhel7.5 --disk /var/lib/libvirt/images/worker1.qcow2,device=disk,bus=virtio,size=120 --disk /var/lib/libvirt/images/rhcos-43.iso,device=cdrom --network network=ocp,model=virtio,mac=52:54:00:e4:67:14 --boot hd,cdrom
 

In emergency console:

#/usr/libexe/coreos-installer -d vda -i https://<webserver>/worker.ign -b https://<webserver>/rhcos-43-metal.raw.gz -p qemu 

If you are looking to the logs on the bootstrap server have some patience... some things needs their time to bevome ready ;-)