User Guide

Cluster Upgrade from 4.22 to 4.23

This guide will lead you through the steps specific for upgrading a NetEye Cluster installation from version 4.22 to 4.23.

Warning

Remember that you must upgrade sequentially without skipping versions, therefore an upgrade to 4.23 is possible only from 4.22; for example, if you have version 4.14, you must first upgrade to the 4.15, then 4.16, and so on.

Before starting an upgrade, you should very carefully read the latest release notes on NetEye’s blog and check the feature changes and deprecations specific to the version being upgraded. You should check also the whole section Breaking Changes below.

The remainder of this section is organised as follows. Section Breaking Changes introduces substantial changes that users must be aware of before starting the upgrade procedure and may require to carry out some tasks before starting the upgrade; section Prerequisites provide information to be known before starting the upgrade procedure; section Conventions Used defines some notation used in this procedure; section NetEye Single Node Upgrade Procedure presents the actual procedure, including directions for special nodes; section Cluster Reactivation instructs on how to bring the NetEye Cluster back to complete functionality, and finally section Additional Tasks shows which tasks must be executed after the upgrade procedure has been successfully executed.

Breaking Changes

New underlying operating system

We recall (see section Before you start) that the upgrade procedure includes a change of operating system. Therefore, the upgrade consists of 3 steps for each NetEye Cluster node:

  1. Conversion from CentOS 7 to RHEL 7

  2. Conversion from RHEL 7 to RHEL 8

  3. NetEye upgrade finalization

A manual reboot of the system is required between each step. The 3-steps procedure can take a long time and must be performed in its entirety in one session, as systems in intermediate state are not supported.

Prerequisites

Time estimation

Upgrading a NetEye Cluster will take a nontrivial amount of time. During the upgrade, individual nodes will be put into standby mode and so overall performance will be degraded until the upgrade procedure is completed and all nodes are removed from standby mode.

An estimate for the time needed for a full upgrade (update + upgrade) when the NetEye Cluster is healthy, there is no additional NetEye modules installed, and the procedure is successful is approximately 2 hours per node plus at least 30 minutes of NetEye Cluster downtime (see below).

Warning

The estimation does not take into account the time required to download the packages and for any manual intervention: for example migration of configurations due to breaking changes, failure of tasks during the execution of the neteye update and neteye node system-upgrade commands, and so on. Moreover, also the time needed to address possible issues with customised services or custom Icinga 2 checks during the upgrade is not included.

System, Network, and other requirements

Be sure to meet the following requirements before starting:

  1. NetEye must be version 4.22 fully updated

  2. NetEye must be up and running with no health checks failing

  3. There must be at least 20GB of free space in /

  4. There must be at least 20GB of free space in /var

  5. All nodes must be able to reach the following domains over HTTPS (port 443 TCP):

    • cdn.redhat.com

    • cdn-ubi.redhat.com

    • cert-api.access.redhat.com

    • cert.cloud.redhat.com

    • subscription.rhsm.redhat.com

  6. Have a valid organization ID, activation key and name, which can be obtained through the official channels: sales, consultant or support portal

  7. Kernel requirements:

    • Some kernel modules are incompatible with RHEL 8 and need to be replaced. The system upgrade command will ensure those modules are not active in your installation. However, if any are found, refer to the related troubleshooting section for directions on how to proceed.

    • Ensure the system is running the latest installed kernel. To check this please compare rpm -q kernel with uname -r.

    • Only one unused kernel version should be installed on the system (this is needed to avoid filling the boot partition during the upgrade). To remove all kernels except the currently running version and the previous one you can use the command:

      neteye# package-cleanup --oldkernels --count=1
      
  8. Filesystem types requirements:

    • Ensure there are no XFS filesystems with ftype=0 mounted.

      Run the script:

      neteye# for dev in $(findmnt -t xfs -n -o SOURCE); do echo "$dev: $(xfs_info $dev | grep -w -o ftype=0)"; done
      

      Case 1: If no output is reported, then no XFS filesystems with ftype=0 are mounted.

      Case 2: If any output is reported, as in the snippet below, then please refer to XFS ftype=0 case.

      /dev/mapper/rhel-root: ftype=0
      /dev/mapper/rhel-usr: ftype=0
      /dev/vda1: ftype=0
      /dev/mapper/rhel-home: ftype=0
      /dev/mapper/rhel-var: ftype=0
      
    • Ensure no Network filesystems (e.g. NFS, SMB, GlusterFS) are mounted or configured to be mounted on reboot. Otherwise please refer to the support portal for assistance.

  9. Package installation requirements:

    • Ensure there are no unfinished YUM transactions. For that please run the following command:

      neteye# yum-complete-transaction --cleanup-only
      
    • Ensure no duplicate RPM packages are installed. To check if there are any duplicates you can execute:

      neteye# package-cleanup --dupes
      

      To remove any discovered duplicates run instead:

      neteye# package-cleanup --cleandupes
      
    • Ensure urllib3 and requests python modules are only installed via RPM packages. In case urllib3 and requests are installed via pip, please run the following commands:

      neteye# pip2 uninstall requests urllib3
      
      neteye# yum reinstall python-urllib3.noarch python-requests.noarch
      
  10. Ensure that there are no customizations or recurring tasks running as they may interfere with the upgrade. This includes for example backups, Beats, custom scripts and so on.

If your NetEye installation uses some custom services or some custom Icinga2 checks, it is possible that you will need to fix them manually during or after the upgrade and you may need the assistance of our support or consultant team.

Warning

The upgrade is not storing a big amount of data in /neteye partition, but during the upgrade some application may write a significant amount of logs. For this reason we suggest to have at least 5GB of free space in /neteye partition and also to have at least 10% of free space in all filesystems not mentioned above.

Procedure Overview

The procedure will upgrade all nodes from CentOS 7 to RHEL 8: each node will be removed from the existing cluster and added to a new RHEL 8 cluster, which is automatically created and configured like the previous one. As soon the quorum in the CentOS 7 cluster is lost, it will be stopped and the nodes converted to RHEL 8 will be started, replacing the CentOS cluster and taking over its duties. During this transition there will be a downtime of at least 30 minutes.

Before starting the Cluster upgrade, the following files are backed up for security and future reference:

  • /var/lib/pacemaker/cib/cib.xml

  • /etc/corosync/corosync.conf

During the creation of the RHEL 8 cluster, the password of the hacluster user, necessary for a node to join the NetEye Cluster, is saved in /root/.pwd_hacluster.

Conventions Used

A NetEye Cluster can be composed by different types of nodes, including Elastic-only and Voting-only nodes. The following notation has been devised, to identify nodes in the Cluster.

  • (ALL) is the set of all Cluster Nodes

  • (NN) indicates the NetEye Cluster Nodes

  • (E) is an Elastic-only node

  • (V) is a Voting-only node

For example if we take the sample NetEye Cluster defined in The Elected NetEye Master:

  • (ALL) is my-neteye-01, my-neteye-02, my-neteye-03, my-neteye-04, and my-neteye-05

  • (NN) is composed by my-neteye-01, my-neteye-02, my-neteye-03, and my-neteye-04

  • (E) is my-neteye-05

  • (V) is my-neteye-06

In the remainder we will refer to the NetEye Cluster before the migration (i.e., the nodes still running NetEye 4.22) as the CentOS Cluster, while the NetEye Cluster after the migration (ie., the nodes already migrated to NetEye 4.23) will be referred to as RHEL 8 Cluster.

Running the Upgrade

To perform the Cluster Upgrade, the following procedure must be executed and completed on each node strictly in the following order:

  • (NN) nodes following the order set in /etc/neteye-cluster

  • (V) nodes

  • (E) nodes

The procedure for upgrading the node number (NN + V)/2 + 1 (i.e., the node that would cause the CentOS Cluster lose the quorum) is slightly different and requires the execution of the neteye cluster enable 4.23 command.

This command stops the CentOS 7 Cluster by putting all its nodes in standby, then enables the cluster maintenance mode, and finally activates the RHEL 8 Cluster by removing its nodes from standby and disabling the cluster maintenance mode.

You can proceed with the upgrade only as soon as all resources are correctly started on the RHEL 8 Cluster. The status of the resources can be verified by executing the pcs status command on the 1st node. At this point you can also ensure that all the NetEye services and all your custom services and/or custom Icinga 2 checks are working fine on the RHEL 8 Cluster.

If some resource or other services cannot be correctly started on the RHEL 8 Cluster (for example a missing library), you have the possibility to stop the RHEL 8 Cluster and re-enable the CentOS 7 Cluster by executing

cluster# neteye cluster enable 4.22

This command prevents downtimes and allow to fix existing issues on the RHEL 8 Cluster in the meantime and execute again the command neteye cluster enable 4.23 to activate again the RHEL 8 Cluster.

Step 1: Conversion from CentOS 7 to RHEL 7

This step is the same for all nodes, except for node number (NN + V)/2 + 1 (i.e., the 2nd node on a 3-nodes cluster and the 3rd node on a 5-nodes cluster).

Only on this node, execute the following command. On all other nodes, skip to the next command.

cluster# neteye cluster enable 4.23

To perform the conversion of the current node to RHEL 7, run:

cluster# (nohup neteye node system-upgrade --org <organization_id> --key <activation_key> --name <name> &) && tail --retry -f nohup.out

This command is responsible for checking the system status, putting the node into standby, temporarily stopping the DRBD, and performing the conversion to RHEL 7. We recall that the organization ID and the activation key can be obtained upon request to our service desk.

If it is the first node to be upgraded, then the configurations of the pcs resources will be exported and used later for their creation on the new RHEL 8 cluster.

At the end of the conversion a success message is shown and you will be prompted to restart the system. To reboot the system run:

cluster# reboot

In case of any errors, the command will show the reason of the failure. After fixing the problem, please repeat Step 1 from the beginning.

Step 2: Upgrade from RHEL 7 to RHEL 8

After the previous reboot, it is time to upgrade from RHEL 7 to RHEL 8 by running command:

cluster# (nohup neteye node system-upgrade --org <organization_id> --key <activation_key> --name <name> &) && tail --retry -f nohup.out

During the upgrade the following kernel modules will be removed because they are not compatible with RHEL 8:

  • pata_acpi: pata_acpi is used for device configuration and power management of P-ATA storage devices

  • pam_pkcs11: this Linux-PAM login module allows a X.509 certificate based user login

The command will perform a pre-upgrade in order to check the status of the system; any of the following conditions may prompt a manual intervention:

  • A newer kernel version is installed but currently not running: it is necessary to reboot the system, then execute the neteye node system-upgrade command again

  • Some packages not provided by NetEye have been installed and are not compatible with RHEL 8: packages that are not compatible with RHEL 8 must be removed or can be updated by saving a RHEL 8 compatible update package in the folder /var/www/html/rhel8-rpms-migration. Alternatively, additional repositories to be used only during the upgrade can be added to file /etc/leapp/files/leapp_upgrade_repositories.repo.

  • Other system-specific cases: in this case a detailed report of the causes will be generated in /var/log/leapp/leapp-report.txt

In case of any errors, the command will show the reason of the failure. After fixing the problem, please repeat Step 2 from the beginning.

During this step the node is removed from the running CentOS 7 cluster.

At the end of the upgrade a success message is shown and you will be informed to restart the system. During the reboot further tasks will be performed in order to complete the upgrade to RHEL 8.

Warning

The whole reboot procedure will take at least 35-40 minutes.

To reboot the system run the following command:

cluster# reboot

Step 3: Upgrade finalization

To complete the procedure and finalize the NetEye upgrade, run:

cluster# (nohup neteye node system-upgrade --org <organization_id> --key <activation_key> --name <name> &) && tail --retry -f nohup.out

When the first node is converted, then the RHEL 8 cluster will be created by configuring the previously exported pcs resources; otherwise the node will simply be added to the RHEL 8 cluster. If the RHEL 8 cluster was already enabled, then pcs will distribute the resources across the RHEL 8 cluster nodes.

Single Purpose Nodes

In the context of the Upgrade procedure, Single Purpose Nodes are Voting-only (V) and Elastic-only (E) Nodes. They need to be upgraded with the same procedure and order (all Voting-only first, then all Elastic-only nodes) described above.

Additional Tasks

In this upgrade, no additional manual step is required.

Cluster Reactivation

When the procedure has been completed also on the last node, all nodes of the cluster will run a finalized NetEye 4.23; they are not in standby, and the operating system will be RHEL 8.

Optionally, run the checks in the section Checking that the Cluster Status is Normal. If any of the above checks fail, please call our service and support team before proceeding.

The mandatory task to execute on the last (NN) node is to re-enable fencing if it was already enable on the CentOS 7 cluster before the migration:

# pcs property set stonith-enabled=true