User Guide

Cluster Upgrade from 4.19 to 4.20

This guide will lead you through the steps specific for upgrading from a NetEye Cluster installation from version 4.19 to 4.20.

Warning

Remember that you must upgrade sequentially without skipping versions, therefore an upgrade to 4.20 is possible only from 4.19; for example, if you have version 4.14, you must first upgrade to the 4.15, then 4.16, and so on.

Before starting an upgrade, you should very carefully read the latest release notes on NetEye’s blog and check the feature changes and deprecations specific to the version being upgraded. You can check also the whole section Post Upgrade Steps below to verify if there are changes or specific steps that might significantly impact your NetEye Cluster installation.

Breaking Changes

Icinga2 Satellites

NetEye now supports Icinga2 satellites. You have to migrate you current installation following the procedure described in Additional steps after successfully completing the upgrade.

NATS telegraf user

The NATS telegraf user has been deprecated due to security issues and will be removed in future releases. It has been replaced by two new users:

  1. telegraf_wo with write-only privileges on NATS

  2. telegraf_ro with read-only privileges on NATS

Please change your telegraf collectors and consumers to use the two new users as described in Section Write Data to influxDB through NATS master of the User Guide. Once you have removed all occurrences of telegraf user please go to Configuration / Modules / neteye / Configuration, click Remove NATS telegraf user and Save Changes.

Cluster Upgrade Prerequisites

Upgrading a cluster will take a nontrivial amount of time. During the cluster upgrade, individual nodes will be put into standby mode and so overall cluster performance will be degraded until the upgrade procedure is completed and all nodes are removed from standby mode.

An estimate for the time needed for a full upgrade (update + upgrade) when the cluster is healthy and there are no problems is approximately 30 minutes, plus 15 minutes per node. So for instance on an 3-node cluster it may take approximately 1 hour and 15 minutes (30 + 15*3). This estimate is a lower bound that does not include additional time should there be a kernel update or if you have additional modules installed.

This user guide uses the following conventions to highlight in which node you should execute the process:

  • (ALL) is the set of all cluster nodes

  • (N) indicates the last node

  • (OTHER) is the set of all nodes excluding (N)

For example if (ALL) is neteye01.wp, neteye02.wp, and neteye03.wp then:

  • (N) is neteye03.wp

  • (OTHER) is neteye01.wp and neteye02.wp

The order in which (OTHER) nodes are upgraded is not important. However, you should note that the last node (N) to be upgraded will require a slightly different process than the other nodes (see Post Upgrade Steps For The Last Node (N) for details).

Cluster Upgrade Preparation

The Cluster Upgrade Preparation is carried out by running the command:

# nohup neteye upgrade

Warning

The neteye upgrade command can be run on a standard NetEye node, but never on an Elastic-only or a Voting-only Node.

Like neteye update, the neteye upgrade command will run a number of checks to make sure that:

  • NetEye installation is healthy

  • The version on NetEye installed is eligible for upgrade, that is, it checks which is the installed version (i.e., 4.xx) and that the last upgrade was finalized, i.e., the neteye_finalize_installation script was carried out successfully

  • NetEye is fully updated and there are no minor (bugfix) updates to be installed

Moreover, it checks are all successful, neteye upgrade will perform also these additional tasks:

  • Disable fencing (NetEye Clusters only)

  • Put all nodes into standby except the one on which the command is executed (NetEye Clusters only) so that they are no longer able to host cluster resources

Warning

The neteye upgrade command may take a long time before it completes successfully, so please do not interrupt it until it exits.

If any of these tasks is unsuccessful, a message will explain where the command failed, allowing you to manually fix the corresponding step. For example, if the exit message is similar to the following one, you need to manually install the latest updates.

"Found updates not installed"
"Example: icingacli, version 2.8.2_neteye1.82.1"

Then, if needed, the command will:

  1. Update all the NetEye repositories to the newer version (i.e., 4.yy, which is the next version to which it is possible to upgrade)

  2. Install all the RPMs of the newer version (i.e., 4.yy)

  3. Upgrade the NetEye’s yum groups

If the neteye upgrade command is successful, a message will inform you that it is possible to continue the upgrade procedure, by checking if there are some manual migrations to carry out: if there are, they will be listed in the next section.

Warning

When executed on a cluster, neteye upgrade will neither bring the nodes back from the standby, nor restore stonith: these steps need to be manually carried out after the upgrade has been successfully completed.

Upgrade All Cluster Nodes (ALL)

Repeat these upgrade steps for all nodes (ALL).

#1 Check cluster status

Run the following cluster command:

# pcs status

and please ensure that:

  • Only the last node (N) MUST be active

  • All cluster resources are marked “Started” on the last node (N)

  • All cluster services under “Daemon Status” are marked active/enabled on the last node (N)

#2 Check DRBD status

Check if the DRBD status is ok by using the drbdmon command, which updates the DRBD status in real time.

See also

Section 4.2 of DRBD’s official documentation contains information and details about the possible statuses.

https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-check-status

#3 Migrate configuration of RPMs

Each upgraded package can potentially create .rpmsave and/or .rpmnew files. You will need to verify and migrate all such files.

You can find more detailed information about what those files are and why they are generated in the official RPM documentation.

Briefly, if a configuration file has changed since the last version, and the configuration file was edited since the last version, then the package manager will do one of these two things:

  • If the new system configuration file should replace the edited version, it will save the old edited version as an .rpmsave file and install the new system configuration file.

  • If the new system configuration file should not replace the edited version, it will leave the edited version alone and save the new system configuration file as an .rpmnew file.

Note

You can use the following commands to locate .rpmsave and .rpmnew files:

# updatedb
# locate *.rpmsave*
# locate *.rpmnew*

The instructions below will show you how to keep your customized operating system configurations.

How to Migrate an .rpmnew Configuration File

The update process creates an .rpmnew file if a configuration file has changed since the last version so that customized settings are not replaced automatically. Those customizations need to be migrated into the new .rpmnew configuration file in order to activate the new configuration settings from the new package, while maintaining the previous customized settings. The following procedure uses Elasticsearch as an example.

First, run a diff between the original file and the .rpmnew file:

# diff -uN /etc/sysconfig/elasticsearch /etc/sysconfig/elasticsearch.rpmnew

OR

# vimdiff /etc/sysconfig/elasticsearch /etc/sysconfig/elasticsearch.rpmnew

Copy all custom settings from the original into the .rpmnew file. Then create a backup of the original file:

# cp /etc/sysconfig/elasticsearch /etc/sysconfig/elasticsearch.01012018.bak

And then substitute the original file with the .rpmnew:

# mv /etc/sysconfig/elasticsearch.rpmnew /etc/sysconfig/elasticsearch

How to Migrate an .rpmsave Configuration File

The update process creates an .rpmsave file if a configuration file has been changed in the past and the updater has automatically replaced customized settings to activate new configurations immediately. In order to preserve your customizations from the previous version, you will need to migrate those from the original .rpmsave into the new configuration file.

Run a diff between the new file and the .rpmsave file:

# diff -uN /etc/sysconfig/elasticsearch.rpmsave /etc/sysconfig/elasticsearch

OR

# vimdiff /etc/sysconfig/elasticsearch.rpmsave /etc/sysconfig/elasticsearch

Copy all custom settings from the .rpmsave into the new configuration file, and preserve the original .rpmsave file under a different name:

# mv /etc/sysconfig/elasticsearch.rpmsave /etc/sysconfig/elasticsearch.01012018.bak

Post Upgrade Steps

This section describes all steps, necessary after the upgrade, that must be done on specific nodes.

Post Upgrade Steps for the creation of new Cluster resources

In this section you find directions to manually create new cluster resources if needed, e.g., when installing new applications.

Warning

From NetEye 4.20 the Tornado module is automatically installed on all NetEye instances. For this reason, if Tornado was not already present and running in NetEye, you need to create the cluster resources needed by Tornado with the steps explained here.

Cluster resources required by Tornado

If the Tornado module was not previously installed on your NetEye, please follow the procedure below. You can safely skip to the next section if the Tornado module was already installed before the upgrade to NetEye 4.20.

  • Connect to the terminal of any of the NetEye Cluster Nodes (excluding Elastic-only and Voting-only Nodes).

  • Create the main cluster resources for Tornado:

    • Adapt the template /usr/share/neteye/cluster/templates/Services-tornado.conf.tpl to the settings of your cluster

    • Save it to a file with the same name without the .tpl suffix

    • Execute the following command:

      # /usr/share/neteye/scripts/cluster/cluster_service_setup.pl -c /usr/share/neteye/cluster/templates/Services-tornado.conf
      
  • Create the cluster resources for Tornado NATS JSON Collector:

    • Adapt the template /usr/share/neteye/cluster/templates/Services-tornado-nats-json-collector.conf.tpl to the settings of your cluster

    • Save it to a file with the same name without the .tpl suffix

    • Execute the following command:

      # /usr/share/neteye/scripts/cluster/cluster_service_setup.pl -c /usr/share/neteye/cluster/templates/Services-tornado-nats-json-collector.conf
      
  • If the SIEM module is installed, create the cluster resources for Tornado rsyslog Collector:

    • Adapt the template /usr/share/neteye/cluster/templates/Services-tornado-rsyslog-collector-logmanager.conf.tpl to the settings of your cluster

    • Save it to a file with the same name without the .tpl suffix

    • Execute the following command:

      # /usr/share/neteye/scripts/cluster/cluster_service_setup.pl -c /usr/share/neteye/cluster/templates/Services-tornado-rsyslog-collector-logmanager.conf
      

Post Upgrade Steps On (OTHER) Nodes

Run the NetEye Secure Install on (OTHER) nodes but wait for the successful execution of the NetEye Secure Install before running it on another node:

# nohup neteye_secure_install

Warning

In case during the upgrade procedure a new cluster resource has been installed, an error can be thrown. This error can be disregarded because it will be automatically fixed during the Post Upgrade Steps For The Last Node (N) step.

Post Upgrade Steps on the Elastic-only, Voting-only Nodes

Run the NetEye Secure Install on the Elastic-only and/or the Voting-only nodes:

# nohup neteye_secure_install

Post Upgrade Steps For The Last Node (N)

  • Run the NetEye Secure Install on the last node (N):

    # nohup neteye_secure_install
    

Cluster Reactivation (N)

You can now restore the cluster to high availability operation.

  • Bring all cluster nodes back out of standby with this command on the last node (N):

    # pcs node unstandby --all --wait=300
    # echo $?
    
    0
    

    If the exit code is different from 0, some nodes have not been not reactivated, so please be sure that all nodes are active before proceeding.

  • Run the checks in the section Checking that the Cluster Status is Normal. If any of the above checks fail, please call our service and support team before proceeding.

  • Re-enable fencing on the last node (N):

    # pcs property set stonith-enabled=true
    

Finalize the Cluster Upgrade (ALL)

You can now finalize the upgrade process by launching the following script on every node (ALL) one by one:

# neteye_finalize_installation

In this upgrade, no additional manual step is required.

Troubleshooting: Failing health checks, migration of modules

After the finalization procedure has successfully ended, you might notice in the Problems View that some health check fails and is in state WARNING. The reason is that you are using some module that needs to be migrated, because some breaking change has been introduced in the release.

Hence, you should go to the Problems View and check which health check is failing. There you will also find instructions for the correct migration of the module, which is in almost all cases amounts to enabling an option: the actual migration will then be executed manually.

Upgrade NetEye Satellites

Icinga2 Satellites

Migrate Icinga2 Satellites to Standard

Warning

Before to start upgrading your Satellites you must carefully check you configuration in /etc/neteye-satellite.d/. The procedure below describes several common scenarios for NetEye users.

With the introduction in NetEye of the support for Icinga2 Satellites, existing custom configurations must be migrated to the new standard.

Warning

The following procedures are supposed to be executed on the NetEye Master. Once the migration is completed at Master side, to conclude the Satellite migration please refer to the Satellite Upgrade Procedure.

Prerequisites

If you are on a cluster you need to put all your nodes in standby except one and execute the migration on the active node.

Migration procedure

Suppose to have an initial configuration file satellites.conf, located in /neteye/shared/icinga2/conf/icinga2/zones.d/, containing the following definitions:

object Endpoint "satellite3.example.com" {
  host = "satellite3.example.com"
}

object Zone "zone B" {
  endpoints = [ "satellite3.example.com" ]
  parent = "master"
}
Migrating the Icinga2 Zone to the new Schema with Tenants

Since Satellites can be arranged in tenants, it is recommended to migrate your Satellite satellite3 under a new zone that clearly indicates the Satellite’s tenant in its name. For example, if the tenant of satellite3 is called tenant_A, the new zone must contain this information. The existing zone zone B must be then renamed to tenant_A_zone B.

The migration can be achieved via the following procedure:

Step 1. Satellite configuration

Create a configuration file for the Satellite in /etc/neteye-satellite.d/tenant_A/satellite3.conf:

{
  "fqdn": "satellite3.example.com",
  "name": "satellite3",
  "icinga2_zone": "zone B",
  "ssh_port": "22",
  "ssh_enabled": true
}

Step 2. Zone configuration

Create a placeholder for the new zone without any Endpoint configured as follows

object Zone "tenant_A_zone B" {
  endpoints = [ ]
  parent = "master"
}

in a file called zone_tenant_A_zone_B.conf located in /neteye/shared/icinga2/conf/icinga2/zones.d/.

Note

Please note that the Zone name must contain only alphanumeric characters, underscores, dashes and whitespaces. The filename that defines the placeholder for the new zone, instead, must not contain whitespaces.

Step 3. Zone deploy and objects migration

Check the current configuration with icinga2-master daemon --validate

Now you have to restart icinga2-master and execute the following commands to execute the kickstart and deploy the new configuration.

icingacli director kickstart run
icingacli director config deploy run

At this point you have to change all your hosts, services, templates and apply rules which belonged to the zone zone B to the new zone tenant_A_zone B.

Step 4. Satellite deploy

To deploy the Satellite you have to:

  • Delete your old configuration file /neteye/shared/icinga2/conf/icinga2/zones.d/satellites.conf

  • Sync satellites configuration across all the cluster nodes with neteye config cluster sync (NetEye Clusters only)

  • Generate new configuration on the master with neteye satellite config create satellite3 this will also run the kickstart and deploy new configuration

  • Send the new configuration for the satellite with neteye satellite config send satellite3

Keeping the Existing Icinga2 Zone in Single Tenant Environments

In single tenant environments, where multi tenancy is not a requirement, the special tenant master can be used. This allows migrating the Satellite satellite3 while keeping the same Icinga2 zone, namely zone B.

The migration, in this case, can be achieved by the following procedure:

Step 1 Satellite configuration

Create a configuration file for the satellite in /etc/neteye-satellite.d/master/satellite3.conf:

{
  "fqdn": "satellite3.example.com",
  "name": "satellite3",
  "icinga2_zone": "zone B",
  "ssh_port": "22",
  "ssh_enabled": true
}

Step 2. Remove existing Satellite objects

Remove the Endpoint and Zone definitions referring to Satellite satellite3 from file /neteye/shared/icinga2/conf/icinga2/zones.d/satellites.conf, in order to avoid any duplication of Icinga2 objects in the configuration.

Step 3. Satellite configuration and deploy

Synchronize satellites configuration across all the cluster nodes with neteye config cluster sync (NetEye Clusters only). Generate the new configuration, for Satellite satellite3, on the Master with neteye satellite config create satellite3. The command will also run the Icinga2 kickstart and deploy the new Icinga2 configuration files.

Send the new configuration to the Satellite with neteye satellite config send satellite3

Keeping the Existing Icinga2 Zone in Existing Multi Tenant Environments

It can be the case, in very large multi tenant environments, that changing the name of the an Icinga2 zone can be detrimental for the monitoring capabilities of the whole NetEye ecosystem.

To avoid disrupting the monitoring, we allow, for existing multi tenant environments only, to keep the same Icinga2 zone of the Satellite, namely zone B.

Please refer to the following procedure:

Step 1 Satellite configuration

Create a configuration file for the satellite in /etc/neteye-satellite.d/tenant_A/satellite3.conf:

{
  "fqdn": "satellite3.example.com",
  "name": "satellite3",
  "icinga2_zone": "zone B",
  "ssh_port": "22",
  "ssh_enabled": true,
  "icinga2_tenant_in_zone_name": false
}

Note

Please note that the special tenant master must not be used in multi tenant environments

Note

Please note that the attribute icinga2_tenant_in_zone_name must be used only in already existing multi tenant installations

Step 2. Remove existing Satellite objects

Remove the Endpoint and Zone definitions referring to Satellite satellite3 from file /neteye/shared/icinga2/conf/icinga2/zones.d/satellites.conf, in order to avoid any duplication of Icinga2 objects in the configuration.

Step 3. Satellite configuration and deploy

Synchronize satellites configuration across all the cluster nodes with neteye config cluster sync (NetEye Clusters only). Generate the new configuration, for Satellite satellite3, on the Master with neteye satellite config create satellite3. The command will also run the Icinga2 kickstart and deploy the new Icinga2 configuration files.

Note

Please note that the Zone name must be unique across all tenants, and must contain only alphanumeric characters, underscores, dashes and whitespaces.

Send the new configuration for the Satellite with neteye satellite config send satellite3

Upgrade Satellites

To upgrade a Satellite it is required to have the latest configuration archive located in /root/satellite-setup/config/<neteye_release>/satellite-config.tar.gz. The archive is generated by the upgraded Master.

To generate the configuration archive on Master (see the Satellite Configuration)

To automatically download the latest upgrade you can run the following command on the Satellite:

neteye satellite upgrade

The command updates the NetEye repositories to the newer NetEye version, installs all the RPMs of the newer version, and upgrades the NetEye’s yum groups.

Please check for any .rpmnew and .rpmsave files (see the Migrate RPM Configuration section for further information).

If the command is successful, a message will inform you that it is possible to continue the upgrade procedure.

Execute the command below to setup the Satellite with the new upgrade:

neteye satellite setup

Complete the satellite upgrade process by launching the following script:

neteye_finalize_installation

Note

You should launch the finalize command only if all previous steps have been completed successfully. If you encounter any errors or problems during the upgrade process, please contact our service and support team to evaluate the best way forward for upgrading your NetEye System.