Cluster Upgrade from 4.18 to 4.19¶
This guide will lead you through the steps specific for upgrading from a NetEye Cluster installation from version 4.18 to 4.19.
Remember that you must upgrade sequentially without skipping versions, therefore an upgrade to 4.19 is possible only from 4.18; for example, if you have version 4.14, you must first upgrade to the 4.15, then 4.16, and so on.
Before starting an upgrade, you should very carefully read the latest release notes on NetEye’s blog and check the feature changes and deprecations specific to the version being upgraded. You can check also the whole section Post Upgrade Steps below to verify if there are changes or specific steps that might significantly impact your NetEye Cluster installation.
InfluxDB authentication and TLS encryption are enabled by default. Existing agent configurations must be manually updated to support the new security measures. Please refer to Additional Steps and to the User Guide to understand how to migrate direct clients to InfluxDB to nats output.
NATS Server multi-tenancy.conf file is overwritten by default on each neteye_secure_install execution. This is required to automate the configuration of the NATS Server Multi Tenancy features. To avoid the loss of the previous configuration the file is backed up before being overwritten. Customization of the multi-tenancy.conf is not supported.
Telegraf default telegraf.conf file is removed from
/neteye/shared/telegraf/conf. Telegraf is now considered a local service, thus existing Telegraf configurations must be manually migrated as explained in this section before proceeding with the upgrade.
Logstash default pipelines have been refactored to provide dedicate pipelines per Beats agent. Pipelines for Auditbeat, Filebeat and Winlogbeat are now part of the default pipelines (see Enabling El Proxy for further details), thus existing pipeline customizations must be manually migrated to the new pipeline schema. Please refer to Additional Steps to understand how to migrate your customizations.
Cluster Upgrade Prerequisites¶
Upgrading a cluster will take a nontrivial amount of time. During the cluster upgrade, individual nodes will be put into standby mode and so overall cluster performance will be degraded until the upgrade procedure is completed and all nodes are removed from standby mode.
An estimate for the time needed for a full upgrade (update + upgrade) when the cluster is healthy and there are no problems is approximately 30 minutes, plus 15 minutes per node. So for instance on an 3-node cluster it may take approximately 1 hour and 15 minutes (30 + 15*3). This estimate is a lower bound that does not include additional time should there be a kernel update or if you have additional modules installed.
This user guide uses the following conventions to highlight in which node you should execute the process:
(ALL) is the set of all cluster nodes
(N) indicates the last node
(OTHER) is the set of all nodes excluding (N)
For example if (ALL) is
The order in which (OTHER) nodes are upgraded is not important. However, you should note that the last node (N) to be upgraded will require a slightly different process than the other nodes (see Post Upgrade Steps For The Last Node (N) for details).
Cluster Upgrade Preparation¶
The Cluster Upgrade Preparation is carried out by running the command:
# nohup neteye upgrade
The neteye upgrade command can be run on a standard NetEye node, but never on an Elastic-only or a Voting-only Node.
Like neteye update, the neteye upgrade command will run a number of checks to make sure that:
NetEye installation is healthy
The version on NetEye installed is eligible for upgrade, that is, it checks which is the installed version (i.e., 4.xx) and that the last upgrade was finalized, i.e., the
neteye_finalize_installationscript was carried out successfully
NetEye is fully updated and there are no minor (bugfix) updates to be installed
Moreover, it checks are all successful, neteye upgrade will perform also these additional tasks:
Disable fencing (NetEye Clusters only)
Put all nodes into standby except the one on which the command is executed (NetEye Clusters only) so that they are no longer able to host cluster resources
The neteye upgrade command may take a long time before it completes successfully, so please do not interrupt it until it exits.
If any of these tasks is unsuccessful, a message will explain where the command failed, allowing you to manually fix the corresponding step. For example, if the exit message is similar to the following one, you need to manually install the latest updates.
"Found updates not installed" "Example: icingacli, version 2.8.2_neteye1.82.1"
Then, if needed, the command will:
Update all the NetEye repositories to the newer version (i.e., 4.yy, which is the next version to which it is possible to upgrade)
Install all the RPMs of the newer version (i.e., 4.yy)
Upgrade the NetEye’s yum groups
If the neteye upgrade command is successful, a message will inform you that it is possible to continue the upgrade procedure, by checking if there are some manual migrations to carry out: if there are, they will be listed in the next section.
When executed on a cluster, neteye upgrade will neither bring the nodes back from the standby, nor restore stonith: these steps need to be manually carried out after the upgrade has been successfully completed.
Upgrade All Cluster Nodes (ALL)¶
Repeat these upgrade steps for all nodes (ALL).
#1 Check cluster status
Run the following cluster command:
# pcs status
and please ensure that:
Only the last node (N) MUST be active
All cluster resources are marked “Started” on the last node (N)
All cluster services under “Daemon Status” are marked active/enabled on the last node (N)
#2 Check DRBD status
Check if the DRBD status is ok by using the drbdmon command, which updates the DRBD status in real time.
Section 4.2 of DRBD’s official documentation contains information and details about the possible statuses.
#3 Migrate configuration of RPMs
Each upgraded package can potentially create .rpmsave and/or .rpmnew files. You will need to verify and migrate all such files.
You can find more detailed information about what those files are and why they are generated in the official RPM documentation.
Briefly, if a configuration file has changed since the last version, and the configuration file was edited since the last version, then the package manager will do one of these two things:
If the new system configuration file should replace the edited version, it will save the old edited version as an .rpmsave file and install the new system configuration file.
If the new system configuration file should not replace the edited version, it will leave the edited version alone and save the new system configuration file as an .rpmnew file.
You can use the following commands to locate .rpmsave and .rpmnew files:
# updatedb # locate *.rpmsave* # locate *.rpmnew*
The instructions below will show you how to keep your customized operating system configurations.
How to Migrate an .rpmnew Configuration File
The update process creates an .rpmnew file if a configuration file has changed since the last version so that customized settings are not replaced automatically. Those customizations need to be migrated into the new .rpmnew configuration file in order to activate the new configuration settings from the new package, while maintaining the previous customized settings. The following procedure uses Elasticsearch as an example.
First, run a diff between the original file and the .rpmnew file:
# diff -uN /etc/sysconfig/elasticsearch /etc/sysconfig/elasticsearch.rpmnew
# vimdiff /etc/sysconfig/elasticsearch /etc/sysconfig/elasticsearch.rpmnew
Copy all custom settings from the original into the .rpmnew file. Then create a backup of the original file:
# cp /etc/sysconfig/elasticsearch /etc/sysconfig/elasticsearch.01012018.bak
And then substitute the original file with the .rpmnew:
# mv /etc/sysconfig/elasticsearch.rpmnew /etc/sysconfig/elasticsearch
How to Migrate an .rpmsave Configuration File
The update process creates an .rpmsave file if a configuration file has been changed in the past and the updater has automatically replaced customized settings to activate new configurations immediately. In order to preserve your customizations from the previous version, you will need to migrate those from the original .rpmsave into the new configuration file.
Run a diff between the new file and the .rpmsave file:
# diff -uN /etc/sysconfig/elasticsearch.rpmsave /etc/sysconfig/elasticsearch
# vimdiff /etc/sysconfig/elasticsearch.rpmsave /etc/sysconfig/elasticsearch
Copy all custom settings from the .rpmsave into the new configuration file, and preserve the original .rpmsave file under a different name:
# mv /etc/sysconfig/elasticsearch.rpmsave /etc/sysconfig/elasticsearch.01012018.bak
Post Upgrade Steps¶
This section describes all steps, necessary after the upgrade, that must be done on specific nodes.
Post Upgrade Steps On (OTHER) Nodes¶
Run the NetEye Secure Install on (OTHER) nodes but wait for the successful execution of the NetEye Secure Install before running it on another node:
# nohup neteye_secure_install
In case during the upgrade procedure a new cluster resource has been installed, an error can be thrown. This error can be disregarded because it will be automatically fixed during the Post Upgrade Steps For The Last Node (N) step.
Post Upgrade Steps on the Elastic-only, Voting-only Nodes¶
Run the NetEye Secure Install on the Elastic-only and/or the Voting-only nodes:
# nohup neteye_secure_install
Post Upgrade Steps For The Last Node (N)¶
Run the NetEye Secure Install on the last node (N):
# nohup neteye_secure_install
In case you have one or more Satellites configured, you need to migrate the existing NATS Server configuration. Please refer to this section to understand how to migrate your NATS Server configuration.
Starting with NetEye 4.19, InfluxDB built-in authentication based on user credentials and TLS encryption are enabled by
default, while Icinga2’s InfluxDB Writer feature and Grafana datasource configuration will be updated.
An InfluxDB admin user named
root is created at secure install time and its password is stored in
root user is the only user entitled to perform administrative changes to InfluxDB, like managing users
or databases, but it is not meant to be used for reading from or writing to a database.
Existing agent configurations, like telegraf for example, that read or write data directly to InfluxDB must be
manually updated to support the new authorization mechanism.
The NetEye administrator is responsible for updating agent configurations which writes or reads data to or from InfluxDB, because agents will not be updated automatically. Therefore, the NetEye administrator has to create an InfluxDB user with appropriate privileges (e.g., write or read) and update client configuration.
Logstash Beats-related configuration are being moved to dedicated pipelines. The change is at Ingestion (Logstash) level to better distribute the logs between filters. Each Beats agent has a dedicated pipeline and a dedicated configuration folder in which all the input, filter and output configuration files must be put.
The NetEye administrator is responsible for migrating custom configuration files related to Beats agents from conf.d folder into the correct pipeline folder.
For example, the Winlogbeat filter file 1_f03001_agent_beats_windows.filter must be moved from conf.d to conf.winlogbeat.d dedicated folder.
If you are unsure on what and how to migrate files please contact our support team.
Cluster Reactivation (N)¶
You can now restore the cluster to high availability operation.
Bring all cluster nodes back out of standby with this command on the last node (N):
# pcs node unstandby --all --wait=300 # echo $?
If the exit code is different from 0, some nodes have not been not reactivated, so please be sure that all nodes are active before proceeding.
Re-enable fencing on the last node (N):
# pcs property set stonith-enabled=true
Finalize the Cluster Upgrade (ALL)¶
You can now finalize the upgrade process by launching the following script on every node (ALL) one by one:
In this upgrade, no additional manual step is required.
Troubleshooting: Failing health checks, migration of modules¶
After the finalization procedure has successfully ended, you might notice in the Problems View that some health check fails and is in state WARNING. The reason is that you are using some module that needs to be migrated, because some breaking change has been introduced in the release.
Hence, you should go to the Problems View and check which health check is failing. There you will also find instructions for the correct migration of the module, which is in almost all cases amounts to enabling an option: the actual migration will then be executed manually.