User Guide

Cluster Upgrade from 4.38 to 4.39

This guide will lead you through the steps specific for upgrading a NetEye Cluster installation from version 4.38 to 4.39.

During the upgrade, individual nodes will be put into standby mode. Thus, overall performance will be degraded until the upgrade is completed and all nodes are revoked from standby mode. Granted the environment connectivity is seamless, the upgrade procedure may take up to 30 minutes per node.

Warning

Remember that you must upgrade sequentially without skipping versions, therefore an upgrade to 4.39 is possible only from 4.38; for example, if you have version 4.27, you must first upgrade to the 4.28, then 4.29, and so on.

Breaking Changes

Grafana 11

Support for AngularJS-based plugins has been turned off in Grafana 11. This prevents the loading of any data source or panel visualization that relies on AngularJS and therefore can potentially create disruptions on some of your dashboard panels.

To mitigate this issue, ensure all plugins are up to date and migrate from any remaining AngularJS plugins to a React-based alternative. Official Grafana documentation lists all known public plugins and provides migration advice when possible.

A complete list of all breaking changes can be found in the Grafana documentation under Breaking changes.

Elastic 8.16

In NetEye 4.39, Elastic Stack is upgraded from version 8.15.2 to version 8.16.0.

To ensure the full compatibility of your installation, please review the official release notes, focusing in particular on the breaking changes of the Elastic Stack components.

Among the known issues and breaking changes listed by Elastic, we would like to emphasize those that may have more impact on NetEye installations than the rest:

Breaking changes:

  1. Elasticsearch: JDK locale database was changed, which will affect you if you use custom date formats using textual or week-date field specifiers

  2. Beats: FQDN are lowercased when used as host.hostname

  3. Beats: Filebeat now needs dup3, faccessat2, prctl and setrlimit syscalls to run the journald input, click here for more information

  4. Elastic Agent: When using the System integration, uppercase characters in the host.hostname are being converted to lowercase in Elastic Agent output. This can possibly result in duplicated host entries appearing in Kibana

Known issues:

  1. Kibana: Stack Monitoring shows “Unable to load page error”

  2. Kibana: Onboarding, tutorial of APM and OpenTelemetry and some “Beats Only” integrations will show the error “Unable to load page error”

  3. Elastic Security: Duplicate alerts can be produced from manually running threshold rules

  4. Elastic Security: Manually running custom query rules with suppression could suppress more alerts than expected

  5. Elastic Security: Alerts page crashes if you upgrade to 8.16 and access it in a non-default Kibana space

Prerequisites

Before starting the upgrade, you should read very carefully the latest release notes on NetEye’s blog and check out the features that will be changed or deprecated after the upgrade.

  1. All NetEye packages installed on a currently running version must be updated according to the update procedure prior to running the upgrade.

  2. NetEye must be up and running in a healthy state.

  3. Disk Space required:

    • 3GB for / and /var

    • 150MB for /boot

  4. If the SIEM module is installed:

    • The rubygems.org domain should be reachable by the NetEye Master only during the update/upgrade procedure. This domain is needed to update additional Logstash plugins and thus is required only if you manually installed any Logstash plugin that is not present by default.

  5. If you have an RPM mirror configured, run the following command to refresh the configuration and apply the new changes:

    neteye rpmmirror setup
    
  6. From version 4.39 forwards, Keycloak is the only login option on NetEye. To finalize the migration to Keycloak and disable the legacy login, you need to switch to the new authentication method by visiting the Configuration > Modules > neteye > Configuration page and enabling the switch Finalize Auth Migration. On a new 4.38 installation, this will already be set.

    Until this option is set, the service auth-finalization-check-neteyelocal on the host neteye-local will remain in critical state.

    ../_images/auth-finalization-check.png

    Fig. 247 The service check that reminds you that the migration is not completed yet

1. Run the Upgrade

The Cluster Upgrade is carried out by running the following command:

cluster# (nohup neteye upgrade &) && tail --retry -f nohup.out

Warning

If the SIEM feature module is installed and a new version of Elasticsearch is available, please note that the procedure will upgrade one node at the time and wait for the Elasticsearch cluster health status to turn green before proceeding with the next node. For more information, please consult the dedicated section.

After the command was executed, the output will inform if the upgrade was successful or not:

  • In case of successful upgrade you might need to restart the nodes to properly apply the upgrades. If the reboot is not needed, please skip the next step.

  • In case the command fails refer to the troubleshooting section.

2. Reboot Nodes

Restart each node, one at a time, to apply the upgrades correctly.

  1. Run the reboot command

    cluster-node-N# neteye node reboot
    
  2. In case of a standard NetEye node, put it back online once the reboot is finished

    cluster-node-N# pcs node unstandby --wait=300
    

You can now reboot the next node.

3. Cluster Reactivation

At this point you can proceed to restore the cluster to high availability operation.

  1. Bring all cluster nodes back out of standby with this command on the last standard node

    cluster# pcs node unstandby --all --wait=300
    cluster# echo $?
    
    0
    

    If the exit code is different from 0, some nodes have not been reactivated, so please make sure that all nodes are active before proceeding.

  2. Run the checks in the section Checking that the Cluster Status is Normal. If any of the above checks fail, please call our service and support team before proceeding.

  3. Re-enable fencing on the last standard node, if it was enabled prior to the upgrade:

    cluster# pcs property set stonith-enabled=true
    

4. Additional Tasks