User Guide Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Director NetEye Self Monitoring Tornado Business Service Monitoring IT Operation Analytics - Telemetry Geo Maps NagVis Audit Log Shutdown Manager Reporting ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO NetEye Update & Upgrade How To NetEye Extension Packs Troubleshooting Security Policy Glossary
module icon Setup
Cluster Satellite Security Identity and Access Management External Identity Providers Configure federated LDAP/AD Emergency Reset of Keycloak Configuration Advanced Configuration Authorization Resources Tuning Advanced Topics
Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Introduction to NetEye Monitoring Business Service Monitoring IT Operation Analytics Visualization Network Visibility Log Management & Security Orchestrated Datacenter Shutdown Application Performance Monitoring User Experience Service Management Service Level Management & Reporting Requirements for a Node Cluster Requirements and Best Practices NetEye Satellite Requirements TCP and UDP Ports Requirements Additional Software Installation Introduction Single Node Cluster NetEye Master Master-Satellite Architecture Underlying Operating System Acquiring NetEye ISO Image Installing ISO Image Single Nodes and Satellites Cluster Nodes Configuration of Tenants Satellite Nodes Only Nodes behind a Proxy Additional NetEye Components Single Node Cluster Node Satellites Nodes only Verify if a module is running correctly Accessing the New Module Cluster Satellite Security Identity and Access Management External Identity Providers Configure federated LDAP/AD Emergency Reset of Keycloak Configuration Advanced Configuration Authorization Resources Tuning Advanced Topics Basic Concepts & Usage Advanced Topics Monitoring Environment Templates Monitored Objects Import Monitored Objects Data Fields Deployment Icinga 2 Agents Configuration Baskets Dashboard Monitoring Status VMD Permissions Notifications Jobs API Configuring Icinga Monitoring Retention Policy NetEye Self Monitoring 3b Concepts Collecting Events Add a Filter Node WHERE Conditions Iterating over Event fields Retrieving Payload of an Event Extract Variables Create a Rule Tornado Actions Test your Configuration Export and Import Configuration Example Under the hood Development Retry Strategy Configuration Thread Pool Configuration API Reference Configure a new Business Process Create your first Business Process Node Importing Processes Operators The ITOA Module Configuring User Permissions Telegraf Metrics in NetEye Telegraf Configuration Telegraf on Monitored Hosts Visualizing Dashboards Customizing Performance Graph The NetEye Geo Map Visualizer Map Viewer Configuring Geo Maps NagVis 3b Audit Log 3b Overview Shutdown Manager user Shutdown Manager GUI Shutdown Commands Advanced Topics Overview User Role Management Cube Use Cases ntopng and NetEye Integration Permissions Retention Advanced Topics Overview User Roles Nodes Test Cases Dashboard Use Cases Overview Architecture Authorization Elasticsearch Overview Enabling El Proxy Sending custom logs to El Proxy Configuration files Commands Elasticsearch Templates and Retentions El Proxy DLQ Blockchain Verification Handling Blockchain Corruptions El Proxy Metrics El Proxy Security El Proxy REST Endpoints Agents Logstash Elastic APM Elastic RUM Log Manager - Deprecated Overview Authorization in the Command Orchestrator Module Configuring CLI Commands Executing Commands Overview Permissions Installation Single Tenancy Multitenancy Communication through a Satellite Asset collection methods Display asset information in monitoring host page Overview Customers Availability Event Adjustment Outages Resource Advanced Topics Introduction Getting Started SATAYO Items Settings Managed Service Mitre Attack Coverage Changelog Before you start Update Procedure Single Node Upgrade from 4.41 to 4.42 Cluster Upgrade from 4.41 to 4.42 Satellite Upgrade from 4.41 to 4.42 DPO machine Upgrade from 4.41 to 4.42 Create a mirror of the RPM repository Sprint Releases Feature Troubleshooting Tornado Networking Service Management - Incident Response IT Operation Analytics - Telemetry Identity Provider (IdP) Configuration Introduction to NEP Getting Started with NEPs Online Resources Obtaining NEP Insights Available Packages Advanced Topics Upgrade to NetEye 4.31 Setup Configure swappiness Restarting Stopped Services Enable stack traces in web UI How to access standard logs Director does not deploy when services assigned to a host have the same name How to enable/disable debug logging Activate Debug Logging for Tornado Modules/Services do not start Sync Rule fails when trying to recreate Icinga object How to disable InfluxDB query logging Managing an Elasticsearch Cluster with a Full Disk Some logs are not indexed in Elasticsearch Elasticsearch is not functioning properly Reporting: Error when opening a report Debugging Logstash file input filter Bugfix Policy Reporting Vulnerabilities Glossary 3b

Cluster

Service Resource Management

To manage service resources, several scripts have been developed by the NetEye team and are provided with every NetEye installation. These scripts are wrappers of the PCS and DRBD APIs and their use is showcased in section Adding a Service Resource to a Cluster. Examples of commands that are useful for NetEye Cluster troubleshooting are introduced in section Cluster Management Commands.

Cluster Nodes Roles

In a NetEye cluster environment, some of the distributed NetEye services that run in multiple nodes can be configured to run only on specific nodes among the cluster. This functionality is useful to balance the load of the cluster nodes and be able to assign specific services to specific nodes depending on the needs of the customer.

To assign a specific role to a node or to modify the roles configuration of the cluster, it is necessary to edit the role assignation in the cluster configuration file /etc/neteye-cluster, adding or modifying the “roles” section of a specific node:

{
    "Hostname" : "my-neteye-cluster.example.com",
    "Nodes" : [
        {
            "addr" : "192.168.1.1",
            "hostname" : "my-neteye-01",
            "hostname_ext" : "my-neteye-01.example.com",
            "roles": [
               "mariadb"
            ],
            "id" : 1
        },
    ]
}

The roles that can be assigned to a node can be found in /usr/share/neteye/cluster/config_validators/roles.d/.

After modifying the configuration file, it is necessary to sync the cluster configuration to all the nodes in the cluster. This can be done by executing the following command:

cluster# neteye config cluster sync

Finally, to apply the changes to the cluster services configuration, it is necessary to execute the install procedure focused on the service you want to apply the changes to:

cluster# neteye install --restrict-to-services <service_name>

Where the <service_name> is the name of the service or a list of services separated by commas, usually related to the roles assigned to the nodes in the cluster that has been modified.

Please refer to neteye install for more information about the neteye install command.

Cluster Services Configuration

When dealing with services in a NetEye cluster, it is possible to define and configure NetEye specific parameters related to the architecture, such as the IP address of the service or the volume group to use. The configuration of these parameters can be done in the dedicated file dedicated to the service configuration located in /etc/neteye-services.d/<module_name>. The folder contains a set of YAML files, each one named as the service it refers to.

Once the configuration of a specific service has been performed, it is necessary to apply the changes to the cluster services configuration. This can be done by executing the install procedure focused on the service you want to apply the changes to:

neteye install --restrict-to-services <service_name>

Where the <service_name> is the name of the service you want to apply the changes to.

Please refer to neteye install for more information about the neteye install command.

Adding a Service Resource to a Cluster

Service resources can be added by modifying an existing template, located under the /usr/share/neteye/cluster/templates/ directory, then copying it to a suitable location, and finally using it in a script.

For example, consider the Services-core-nats-server.conf.tpl template.

{
    "volume_group": "vg00",
    "ip_pre" : "192.168.1",
    "Services": [
        {
            "name": "nats-server",
            "ip_post": "48",
            "drbd_minor": 23,
            "drbd_port": 7810,
            "folder": "/neteye/shared/nats-server/",
            "collocation_resource": "cluster_ip",
            "size": "1024"
        }
    ]
}

Copy it, then edit it.

cluster# cd /usr/share/neteye/cluster/templates/
cluster# cp Services-core-nats-server.conf.tpl  /tmp/Services-core-nats-server.conf
cluster# vi /tmp/Services-core-nats-server.conf

Hint

You can copy the edited file to any other location, to be used for reference or in case you need to change settings at any point in the future.

In the file, make sure to change the following values to match your infrastructure network:

  • ip_pre: the corporate network address of the node (i.e., the first three octets).

  • ip_post: the IP address of the node (only the last octet)

Once done, make sure that the JSON file you saved is valid syntactically, for example by using the jq utility:

cluster# jq . /tmp/Services-core-nats-server.conf

A valid file will be displayed on, but if there is some syntactic mistake in the file, an explanatory message will provide a hint to fix the problem. Some possible message is shown next.

parse error: Expected separator between values at line 7, column 21

parse error: Objects must consist of key:value pairs at line 12, column 10

Note

Even if multiple errors are present in the file, only one error message is shown at a time, so always run jq until you see the whole content of the file instead of error messages: this will prove the file contains valid JSON.

Finally, let the cluster pick up the changes and configuration.

cluster# cd /usr/share/neteye/scripts/cluster
cluster# ./cluster_service_setup.pl -c /tmp/Services-core-nats-server.conf

Cluster Management Commands

The most important commands used for checking the status of a (NetEye) Cluster and to troubleshoot problems are:

  • drbdmon, a small utility to monitoring the DRBD devices and connections in real-time

  • drbdadm, DRBD’s primary administration tool

  • pcs, used to manage a cluster, verify its resources, constraints, fencing devices and much more

Hint

You can find more information about all their functionalities and sub-commands in their respective manual pages: drbdmon, drbdadm, and pcs.

In the remainder, we show some typical use of these commands, starting from the simplest one.

cluster# drbdmon

As its name implies, this command monitors what is happening in DRBD and shows in real time a lot of information about the DRDB status. Within the interface, any resource highlighted in red is in a degraded status and therefore requires some inspection and fix. Click p to show only problematic resources.

The next command is the Swiss army knife of DRBD and is used to carry out all configurations, tuning, and management of a DRBD infrastructure. The most important option of the drbdadm command is -d (long option: --dry-run): the command is executed and behaves exactly like without the option, but it makes no changes to the system. This option should always be used before making any change to the configuration, to check for possible problems and unexpected side effects.

The command itself has a lot of options and sub commands, extensively described in the above-mentioned man page. Within a NetEye Cluster, the most used sub command is perhaps

cluster# drbdadm --dry-run adjust all

This command checks the content of the configuration file and synchronises the configuration on all nodes. The given command only shows what would happen, remove the --dry-run option to actually run it and make changes.

The third command is the main tool to manage the corosync/pacemaker stack of a cluster: pcs. Like drbdadm it has a number of sub commands and option.

cluster# pcs status

This command prints the current status of the NetEye Cluster, its nodes, and its resources, and allows to check whether there are any ongoing issues.

In the output, right above the Full list of resources, all the nodes (if any) are shown there, along with their state–Online/Offline and Standby being the most common.

The presence of Offline nodes, that is, nodes disconnected form the cluster or even shut down, is usually a sign of an ongoing problem and requires a quick reaction. Indeed, the only legitimate situation when a node can be Offline is after a planned reboot (like i.e., a kernel update or a hardware upgrade).

On the other hand, nodes should be in Standby state only during updates: if this is not the case, it is worth to check that node for problems.

If in the list of resources there is any resource marked as Stopped, below the list and right above the Daemon status appear some log entries for each stopped service. While these logs should suffice to give some hint about the reason for the resource being stopped, it is possible to check the full status and log files using the commands systemctl status <resource name> and journalctl -u <resource name>.

Additional sub commands of pcs are:

cluster# pcs property list

This command returns some information about the cluster and is similar to the following snippet:

Cluster Properties:
cluster-infrastructure: corosync
cluster-name: NetEye
dc-version: 1.1.23-1.el7_9.1-9acf116022
have-watchdog: false
last-lrm-refresh: 1648467995
stonith-enabled: false
Node Attributes:
neteye02.neteyelocal: standby=on

The important points here are:

  • stonith-enabled: false. This should always be true, a value of false, like in the example, implies that the cluster fencing has been enabled for the node. This should happen only during maintenance windows, otherwise an immediate inspection is required because it may result in a split-brain situation. It is important to remark that Fencing must always be configured on a cluster before starting any resource.

  • neteye02.neteyelocal: standby=on. The node is in Standby status, meaning it can not host any running services or resources, but will still vote in the quorum.

See also

Fencing is described in great details in NetEye’s blog post Configuring Fencing on Dell Servers.

cluster# pcs constraint

Returns a list of all active constraints on the cluster.

cluster# pcs resource show [cluster_ip]

This command shows all the configured resources; if the parameter cluster_ip is added, shows only the Cluster IP address.

See also

For more information, troubleshooting options, and debugging commands, you can refer to RedHat’s Reference Documentation for Pacemaker and high-availability, in particular Chapters 3. The pcs CLI, 9.7 Displaying fencing devices, and 10.3. Displaying Configured Resources.