User Guide Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Director NetEye Self Monitoring Tornado Business Service Monitoring IT Operation Analytics - Telemetry Geo Maps NagVis Audit Log Shutdown Manager Reporting ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO NetEye Update & Upgrade How To NetEye Extension Packs Troubleshooting Security Policy Glossary
module icon System Installation
Acquiring NetEye ISO Image Installing ISO Image Single Nodes and Satellites Cluster Nodes Configuration of Tenants Satellite Nodes Only Nodes behind a Proxy
Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Introduction to NetEye Monitoring Business Service Monitoring IT Operation Analytics Visualization Network Visibility Log Management & Security Orchestrated Datacenter Shutdown Application Performance Monitoring User Experience Service Management Service Level Management & Reporting Requirements for a Node Cluster Requirements and Best Practices NetEye Satellite Requirements TCP and UDP Ports Requirements Additional Software Installation Introduction Single Node Cluster NetEye Master Master-Satellite Architecture Underlying Operating System Acquiring NetEye ISO Image Installing ISO Image Single Nodes and Satellites Cluster Nodes Configuration of Tenants Satellite Nodes Only Nodes behind a Proxy Additional NetEye Components Single Node Cluster Node Satellites Nodes only Verify if a module is running correctly Accessing the New Module Cluster Satellite Security Identity and Access Management External Identity Providers Configure federated LDAP/AD Emergency Reset of Keycloak Configuration Advanced Configuration Authorization Resources Tuning Advanced Topics Basic Concepts & Usage Advanced Topics Monitoring Environment Templates Monitored Objects Import Monitored Objects Data Fields Deployment Icinga 2 Agents Configuration Baskets Dashboard Monitoring Status VMD Permissions Notifications Jobs API Configuring Icinga Monitoring Retention Policy NetEye Self Monitoring 3b Concepts Collecting Events Add a Filter Node WHERE Conditions Iterating over Event fields Retrieving Payload of an Event Extract Variables Create a Rule Tornado Actions Test your Configuration Export and Import Configuration Example Under the hood Development Retry Strategy Configuration Thread Pool Configuration API Reference Configure a new Business Process Create your first Business Process Node Importing Processes Operators The ITOA Module Configuring User Permissions Telegraf Metrics in NetEye Telegraf Configuration Telegraf on Monitored Hosts Visualizing Dashboards Customizing Performance Graph The NetEye Geo Map Visualizer Map Viewer Configuring Geo Maps NagVis 3b Audit Log 3b Overview Shutdown Manager user Shutdown Manager GUI Shutdown Commands Advanced Topics Overview User Role Management Cube Use Cases ntopng and NetEye Integration Permissions Retention Advanced Topics Overview User Roles Nodes Test Cases Dashboard Use Cases Overview Architecture Authorization Elasticsearch Overview Enabling El Proxy Sending custom logs to El Proxy Configuration files Commands Elasticsearch Templates and Retentions El Proxy DLQ Blockchain Verification Handling Blockchain Corruptions El Proxy Metrics El Proxy Security El Proxy REST Endpoints Agents Logstash Elastic APM Elastic RUM Log Manager - Deprecated Overview Authorization in the Command Orchestrator Module Configuring CLI Commands Executing Commands Overview Permissions Installation Single Tenancy Multitenancy Communication through a Satellite Asset collection methods Display asset information in monitoring host page Overview Customers Availability Event Adjustment Outages Resource Advanced Topics Introduction Getting Started SATAYO Items Settings Managed Service Mitre Attack Coverage Changelog Before you start Update Procedure Single Node Upgrade from 4.41 to 4.42 Cluster Upgrade from 4.41 to 4.42 Satellite Upgrade from 4.41 to 4.42 DPO machine Upgrade from 4.41 to 4.42 Create a mirror of the RPM repository Sprint Releases Feature Troubleshooting Tornado Networking Service Management - Incident Response IT Operation Analytics - Telemetry Identity Provider (IdP) Configuration Introduction to NEP Getting Started with NEPs Online Resources Obtaining NEP Insights Available Packages Advanced Topics Upgrade to NetEye 4.31 Setup Configure swappiness Restarting Stopped Services Enable stack traces in web UI How to access standard logs Director does not deploy when services assigned to a host have the same name How to enable/disable debug logging Activate Debug Logging for Tornado Modules/Services do not start Sync Rule fails when trying to recreate Icinga object How to disable InfluxDB query logging Managing an Elasticsearch Cluster with a Full Disk Some logs are not indexed in Elasticsearch Elasticsearch is not functioning properly Reporting: Error when opening a report Debugging Logstash file input filter Bugfix Policy Reporting Vulnerabilities Glossary 3b

Cluster Nodes

NetEye 4’s clustering service is based on the RedHat 8 High Availability Clustering technologies:

  • Corosync: Provides group communication between a set of nodes, application restart upon failure, and a quorum system.

  • Pacemaker: Provides cluster management, lock management, and fencing.

  • DRBD: Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.

Cluster resources are typically quartets consisting of an internal floating IP, a DRBD device, a filesystem, and a (systemd) service.

Once you have installed clustering services according to the information on this page, please turn to the Cluster Architecture page for more information on configuration and how to update.

See also

For more information about RedHat Cluster, check the official RedHat’s documentation on High Availability Clusters.

Prerequisites

A NetEye 4 cluster must consist of between 2 and 16 identical servers (“Nodes”) running RHEL 8; each node must satisfy the following requirements:

  • Networking:

    • Bonding across NICs must be configured

    • A dedicated cluster network interface, named exactly the same on each node

    • One external static IP address which will serve as the external Cluster IP

    • One IP Address for each cluster node (i.e., N addresses)

    • One virtual (internal) subnet for internal floating service IPs (this subnet MUST NOT be reachable from any machine except cluster nodes, as it poses a security risk otherwise)

    • All nodes must know the internal IPs (Virtual IP) of all other nodes, which must be stored in file /etc/hosts

    • All nodes must be reachable over the internal network

    • The Corporate Network’s NIC must be in firewall zone public, while the Heartbeat Network’s NIC must be in firewall zone trusted

  • Storage:

    • At least one volume group with enough free storage to host all service DRBD devices defined in Services.conf

  • In general, each node in a NetEye Cluster…

    • must have SSH keys generated for the root user

    • must store the SSH keys of all nodes in file /root/.ssh/authorized_keys

    • needs Internet connectivity, including the ability to reach repositories of Würth Phoenix and Red Hat

    • must have the dnf group neteye installed

    • must have the tags set with the command neteye node tags set. To know more about this command please refer to neteye node tags set

    • must be subscribed with a valid Red Hat Enterprise Linux license. This can be done with the command neteye node register. To know more about this command please refer to neteye node register

    • must have the latest operating system and NetEye 4 updates installed

    • if a virtual Cluster Node, its RAM memory must be completely reserved

    • requirements for characters that can be used in the hostnames are the same for Single and Satellite Nodes and can be checked in the installation procedure

See also

Section Cluster Requirements and Best Practices contains more detailed requirements for NetEye cluster installation.

Installation Procedure

The first step of a NetEye Cluster installation is to install the NetEye ISO image, after which you need to follow, for each Node, installation’s Part 1: Single Nodes and Satellite Nodes. Then, make sure to copy the SSH key of each node on all the other Node’s /root/.ssh/authorized_keys file. To accomplish this goal, you can use on each the command

cluster# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.27.0.3

Repeat the command for each Node, replacing 172.27.0.3 with the IP address of each of the other Nodes.

Once done, depending on the type of nodes you are installing in your cluster, select either of the following procedures: Cluster Services Configuration, NetEye Service Configuration, or Single Purpose Nodes

Once done, if your NetEye Cluster setup includes satellites, please make sure to carry out the steps in section Satellite Nodes Only after each Satellite Node’s installation.

Basic Cluster Installation

This task consists of two steps:

  1. Copy the cluster configuration json template from /usr/share/neteye/cluster/templates/ClusterSetup.conf.tpl to /etc/neteye-cluster and edit it to match your intended setup. You will be required to fill the following fieds:

    Key

    Type

    Description

    ClusterInterface

    str

    The name of the internal cluster network interface

    Hostname

    str

    Cluster’s FQDN that will resolve to ClusterIp

    ClusterIp

    str

    Floating IP address reserved for the cluster

    ClusterCIDR

    int

    Netmask in CIDR notation (8-32)

    Nodes

    list

    List of Operative node (must be at least 2)

    VotingOnlyNode

    object

    (Optional) Definition of the Voting only node

    ElasticOnlyNodes

    list

    (Optional) List of Elastic only nodes

    All the nodes specified in Nodes, VotingOnlyNode and ElasticOnlyNodes must have all of the following fileds:

    Key

    Type

    Description

    addr

    str

    The internal ip address of the node

    hostname

    str

    Internal FQDN of the node

    hostname_ext

    str

    External FQDN of the node

    roles

    list

    List of roles assigned to the node. The complete list of the roles assignable to a node can be found in /usr/share/neteye/cluster/config_validators/roles.d/

    id

    int

    An unique, progressive number (Note: ElasticOnlyNodes don’t require this field)

    Note

    take into account that the first node defined in the Nodes array, in the /etc/neteye-cluster, file will act as The NetEye Active Node during the update and upgrade procedures.

  2. After setting up the cluster configuration in /etc/neteye-cluster, run the command neteye config cluster check to verify that the configuration is correct. This command will check that the configuration defined in the /etc/neteye-cluster file is correct and that all the roles have a valid configuration in terms of node distribution.

    cluster# neteye config cluster check
    
  3. Run the cluster setup command neteye cluster install to install a basic Corosync/Pacemaker cluster with a floating clusterIP. In case of any issue which prevents the correct script execution you can run the same command again adding the option --force to override. This will destroy existing cluster on the nodes.

    cluster# neteye cluster install
    

    Note

    Any not recognised option given to the neteye cluster install command will be passed to the internal Ansible installation command.

  4. At this point, all cluster nodes must be online, hence, as last step, verify that the Cluster installation was completed successfully by running the command:

    cluster# pcs status | grep -A4
    

    This command returns something like:

    Node List:
    * Online: [ my-neteye-01.example.com my-neteye-02.example.com ]
    

    If the installation includes also a Voting-only Node, check that it is online by running:

    cluster# pcs quorum status
    

    The bottom part of the output is similar to the following snippet:

    Membership information
    ----------------------
    Nodeid Votes Qdevice Name
    1   1  A,V,NMW my-neteye-01.example.com (local)
    2   1  A,V,NMW my-neteye-02.example.com
    0   1  Qdevice
    

    The last line shows that the Voting-only Node is correctly online.

Cluster Fencing Configuration

This section describes the procedures you can use to configure, test, and manage the fence devices in a cluster. Fencing is useful when occour that a node is unresponsive and may still be accessing data. The only way to be certain that your data is safe is to fence the node using STONITH. STONITH is an acronym for “Shoot The Other Node In The Head” and it protects your data from being corrupted by rogue nodes or concurrent access. Using STONITH, you can be certain that a node is truly offline before allowing the data to be accessed from another node.

See also

For more complete general information on fencing and its importance in a Red Hat High Availability cluster, see Fencing in a Red Hat High Availability Cluster.

  1. Initial Setup

    • Fencing can be enabled upon setting an environment variable. However, it is recommended to keep fencing disabled until it is configured properly:

      pcs property set stonith-enabled=false
      pcs stonith cleanup
      
    • Install ipmilan fence device on each node

      yum install fence-agents-ipmilan
      
    • Test that IDRAC interface is reachable on port 623 on each node

      nmap -sU -p623 10.255.6.106
      

    Note

    Fencing on VMware Cluster In the case you’re a installing a virtual cluster please keep in mind that a fencing device must be different from IPMI. To install a fence device on VMware Cluster apply the following command:

    dnf install fence-agents-vmware-rest fence-agents-vmware-soap
    
  2. IDRAC Configuration

    • Enable IPMI access to IDRAC: IDRAC Settings > Connectivity > Network > IPMI Settings

      • Enable IPMI Over LAN: Enable

      • Channel Privilege Level Limit: Administrator

      • Encryption Key*: <mandatory random string, also 00000000>

    • Create a new user with username and password of your choice, Read-only privileges on console but administrative privileges on IPMI. (IDRAC Settings > Users > Local Users > Add)

      • User Role: Read Only

      • Login to IDRAC: enable

    • Advanced Settings

      • LAN Privilege Level: Administrator

    To test that the settings were properly applied to a news user you can check the status from NetEye machine

    ipmitool -I lanplus -H <IDRAC IP> -U <your_IPMI_username> -P <your_IPMI_password> -y <your_encryption_key> -v chassis status
    
  3. PCS Configuration

    To obtain information about your fence device run:

    pcs stonith list
    pcs stonith describe fence_idrac
    

    Create a fence device

    The following instructions will help you create a fence device.

    pcs stonith create <fence_device_name> fence_idrac ipaddr="<ip or fqdn>" pcmk_delay_base="5" lanplus="1" login="IPMI_username" passwd="IPMI_password" method="onoff" pcmk_host_list="<host_to_be_fenced>"
    

    Where:

    • fence_device_name: device name of your choice (e.g. idrac_node1)

    • fencing_agent: in this case fence_idrac, you can obtain this with pcs stonith list

    • ipaddr: IDRAC IP or FQDN

    • pcmk_delay_base: by default is 0, must differ on nodes by 5 seconds or more, based on how fast iDRAC can initiate a shutdown

    • lanplus: set always at 1 otherwise it will not connect

    • login: IPMI username (created before)

    • passwd: IPMI password created before

    • passwd_script: an alternative to password, if available you should use this instead of plain password

    • method: usually you should ‘onoff’ if available otherwise restart is not guarantee (power off/power on)

    • pcmk_host_list: list of host controlled by

    Warning

    In a 2-node cluster it may happen that both nodes are unable to communicate and both try to fence each other. This will cause a reboot of both nodes. To avoid this, set different pcmk_delay_base parameters for each fence device; this way one of the nodes will acquire more priority over the other.

    It is strongly suggested to set this parameter for EVERY cluster regardless of the number of its nodes.

    Note

    If possible use a passwd_script instead of passwd, as anybody with access to PCS can see the IPMI password. A password script is a simple bash script which performs an echo of the password and is also helpful to avoid escaping problems e.g.

    #!/bin/bash echo “my_secret_psw“

    and only root user has read privileges on it. (FYI chmod 500)

    You must put this script on all nodes e.g. in /usr/local/bin

    Example:

    pcs stonith create idrac_node1 fence_idrac ipaddr="idrac-neteye06.intra.tndigit.it" lanplus="1" login="neteye_fencing" passwd_script="/usr/local/bin/fencing_passwd.sh" method="onoff" pcmk_host_list="node1.neteyelocal"
    

    If your fence device has been properly configured running pcs status you should see the fencing device in status Stopped otherwise check in /var/log/messages.

    pcs stonith show <fence device> permit to view the current setup of device

    Now you have to create a fence device for each node of your cluster (remember to increase the delay)

    Note

    If you need to update a fence device properties, use the update command, e.g.:

    pcs stonith update <fence device> property=”value"
    
  4. Only for ‘onoff’ method

    edit the power key on /etc/systemd/logind.conf

    HandlePowerKey=ignore
    

    To do it programmatically:

    sed -i 's/#HandlePowerKey=poweroff/HandlePowerKey=ignore/g' /etc/systemd/logind.conf
    
  5. Increase totem token timeout

    Increasing totem token timeout at least to 5 seconds will avoid unwanted fencing (default is 1s); on cluster with virtual nodes it should be set to 10. It is not recommended to set the timeout to more than 10 seconds.

    pcs cluster config update totem token=10000
    

    To check if the value has been updated:

    corosync-cmapctl | grep totem.token
    

    Warning

    Stonith acts after totem token expiration, therefore it may take also 30-40 seconds to fence a node

  6. Testing

    To fence a device you can use the following command:

    pcs stonith fence <node1.neteyelocal>
    

    Warning

    The host will now be taken to a shutdown mode. Fencing should be tested on a node in standby.

  7. Enable fencing

    To enable fencing set property to true

    pcs property set stonith-enabled=true
    pcs stonith cleanup
    

    Warning

    If fencing fails cluster freezes and resources will not be relocated on a different node. Always disable fencing during updates/upgrades Disable fencing on virtual machines before shutting them down: it may happen that a fence device restarts a shutdown VM. A restart of a physical node may require several minutes so please be patient.

Cluster Services Configuration

PCS-managed Services

When installing a feature module, it is necessary to run all the related services on a NetEye Cluster. In order to do that, adjust all necessary options, including IPs, ports, DRBD devices, sizes, in the various *.conf.tpl files found in directory /usr/share/neteye/cluster/templates/.

In a typical configuration, e.g. the one below, you need to update only selected options:

{
   "volume_group": "vg00",
   "ip_pre": "192.168.1",
   "Services": [
       {
         "name": "my-service",
         "ip_post": "33",
         "drbd_minor": 12,
         "drbd_port": 7788,
         "folder": "/neteye/shared/my-service",
         "size": "1024",
         "service": "fancy-optional-name"
       }

     ]
 }
  • ip_pre, the prefix of the IP (e.g., 192.168.1 for 192.168.1.0/24), which will be used to generate the virtual IP for the resource

  • cidr_netmask, the CIDR of the internal subnet used by IP resources (e.g., 24 for 192.168.1.0/24).

  • size, the volume of the storage assigned to the service in MB. You can specify the volume, otherwise default values will be applied.

Run the cluster_service_setup.pl script on each *.conf.tpl file starting from Services-core.conf.tpl:

# cd /usr/share/neteye/scripts/cluster
# ./cluster_service_setup.pl -c Services-core.conf.tpl

The cluster_service_setup.pl script is designed to report the last command executed in case there were any errors. If you manually fix an error, you will need to remove the successfully configured resource template from Services.conf and re-run that command. Then you should re-execute the cluster_service_setup.pl script in order to finalize the configuration.

Non PCS-managed Services

For Cluster services that are not managed by PCS, it is necessary to manually configure the service in the cluster. Similarly to the PCS-managed services, you need to define all necessary options, including IPs, ports, Volume groups, sizes, in a .yaml file that will be used by the service configuration during the installation.

You can find the templates with the default options for all the available services configuration files in directory /etc/neteye-services.d/<module_name>, that you can copy to the same directory to compose the actual configuration file. The configuration file name should be the same as the template file name, removing the .tpl extension.

After all the service configuration files are in place, you can proceed to the next step of the installation procedure.

NetEye Service Configuration

  • Run the neteye install script only once on any cluster node. This script is designed to handle the configuration of all nodes specified in the cluster configuration file found at /etc/neteye-cluster.

    cluster# neteye install
    
  • Set up the Director field API user on slave nodes (Director / Icinga Infrastructure / Endpoints)

Single Purpose Nodes

This section applies only if you have are going to setup a Single Purpose Node, i.e., an Elastic-only or a NetEye Voting-only node.

Both Elastic-only and Voting-only nodes have the same prerequisites and follow the same installation procedure as a standard NetEye Cluster Node.

After installation, a Single Purpose Node requires to be configured as Elastic-only or Voting-only: please refer to Section General Elasticsearch Cluster Information for guidelines.