User Guide

Troubleshooting

The Update and Upgrade procedures can stop for disparate reasons. This section collects the most frequents cases and provide some guidelines to resolve the issue and continue the procedures.

If you find a problem that is not covered in this page, please refer to the official channels: sales, consultant or support portal. for help and directions on how to proceed.

Some check fails

In this case, an informative message will point out the check that failed, allowing to inspect and fix the problem.

For example, if the exit message is similar to the following one, you need to manually install the latest updates.

"Found updates not installed"
"Example: icingacli, version 2.8.2_neteye1.82.1"

Then, after the updates are installed, you can run it again and the command will start over the tasks.

An .rpmnew and/or .rpmsave file is found

This can happen in presence of a customisation in some of the installed packages. Check section Migrate .rpmsave and .rpmnew Files for directions on how to proceed. Once done, remember to run neteye update again.

The Elected NetEye Active Node is in standby mode

During a NetEye Cluster update or upgrade, exactly one node must act as a Elected NetEye Active Node. Read more in section The NetEye Active Node.

A cluster resource has not been created

During a NetEye Cluster upgrade, it can happen that there is the need of creating new cluster resources before running the neteye_secure_install script. Creation of a resource must be done manually, and directions can be found in section 4. Additional Tasks of the Cluster Upgrade from 4.30 to 4.31.

An health check is failing

…during the update/upgrade procedure

The NetEye update or upgrade commands run all the deep health checks to ensure that the NetEye installation is healthy before running the update or upgrade procedure. It might happen, however, that one of the check fail, thus preventing the procedures to complete successfully.

Hence, to manually solve the problem you should follow the directions that can be found in section The NetEye Health Check.

Once the issue is solved, the NetEye update/upgrade commands can be run again.

…after the finalization procedure

After the finalization procedure has successfully ended, you might notice in the Problems View (see Menu / Problems) that some health check fails and is in state WARNING. The reason is that you are using some module that needs to be migrated, because some breaking change has been introduced in the release.

Hence, you should go to the Problems View and check which health check is failing. There you will also find instructions for the correct migration of the module, which is in almost all cases amounts to enabling an option: the actual migration will then be executed manually.

How to check the NetEye Cluster status

Run the following cluster command:

# pcs status

and please ensure that:

  1. Only the last (N) node MUST be active

  2. All cluster resources are marked “Started” on the last (N) node

  3. All cluster services under “Daemon Status” are marked active/enabled on the last (N) node

How to check DRBD status

Check if the DRBD status is ok by using the drbdmon command, which updates the DRBD status in real time.

See also

Section 4.2 of DRBD’s official documentation contains information and details about the possible statuses.

https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-check-status

Elasticsearch rolling procedure: waiting for the green cluster

During upgrade and updates, in case a new version of Elasticsearch is available, a rolling procedure will be applied, with each node being updated/upgraded one at the time.

Moreover, the update/upgrade requires a restart of the service to be effective and this generally leads to the need of re-allocating shards. Following the official procedure outlined by Elastic, the rolling procedure waits for the Elasticsearch cluster health status to turn green before proceeding with the next node.

This increases the overall time the procedure may take, depending on the size of the Elasticsearch installation, the number of shards and the connectivity between the various nodes, possibly by up to an hour.

By default, the procedure fails in case the green status is not reached within an hour of waiting period.

However, in installations having a great amount of data, this operation could take longer.

For this reason, it is possible to customize the maximum waiting time, by specifying, when launching the update/upgrade command, two parameters which control the number of retries and the seconds between each retry. For example, to set a maximum waiting time of two hours, you can use the following update or upgrade commands:

neteye# (nohup neteye update --extra-vars '{"es_wait_for_green_retries":120,"es_wait_for_green_seconds_between_retries":60}' &) && tail --retry -f nohup.out
neteye# (nohup neteye upgrade --extra-vars '{"es_wait_for_green_retries":120,"es_wait_for_green_seconds_between_retries":60}' &) && tail --retry -f nohup.out

Furthermore, if you do not think that in your situation the check for the cluster health status is beneficial, you can skip it by using the skip_wait_for_green_cluster parameter, as follows:

neteye# (nohup neteye update --extra-vars '{"skip_wait_for_green_cluster":true}' &) && tail --retry -f nohup.out
neteye# (nohup neteye upgrade --extra-vars '{"skip_wait_for_green_cluster":true}' &) && tail --retry -f nohup.out