User Guide Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Director NetEye Self Monitoring Tornado Business Service Monitoring IT Operation Analytics - Telemetry Geo Maps NagVis Audit Log Shutdown Manager Reporting ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO NetEye Update & Upgrade How To NetEye Extension Packs Troubleshooting Security Policy Glossary
module icon Architecture
Introduction Single Node Cluster NetEye Master Master-Satellite Architecture Underlying Operating System
Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Introduction to NetEye Monitoring Business Service Monitoring IT Operation Analytics Visualization Network Visibility Log Management & Security Orchestrated Datacenter Shutdown Application Performance Monitoring User Experience Service Management Service Level Management & Reporting Requirements for a Node Cluster Requirements and Best Practices NetEye Satellite Requirements TCP and UDP Ports Requirements Additional Software Installation Introduction Single Node Cluster NetEye Master Master-Satellite Architecture Underlying Operating System Acquiring NetEye ISO Image Installing ISO Image Single Nodes and Satellites Cluster Nodes Configuration of Tenants Satellite Nodes Only Nodes behind a Proxy Additional NetEye Components Single Node Cluster Node Satellites Nodes only Verify if a module is running correctly Accessing the New Module Cluster Satellite Security Identity and Access Management External Identity Providers Configure federated LDAP/AD Emergency Reset of Keycloak Configuration Advanced Configuration Authorization Resources Tuning Advanced Topics Basic Concepts & Usage Advanced Topics Monitoring Environment Templates Monitored Objects Import Monitored Objects Data Fields Deployment Icinga 2 Agents Configuration Baskets Dashboard Monitoring Status VMD Permissions Notifications Jobs API Configuring Icinga Monitoring Retention Policy NetEye Self Monitoring 3b Concepts Collecting Events Add a Filter Node WHERE Conditions Iterating over Event fields Retrieving Payload of an Event Extract Variables Create a Rule Tornado Actions Test your Configuration Export and Import Configuration Example Under the hood Development Retry Strategy Configuration Thread Pool Configuration API Reference Configure a new Business Process Create your first Business Process Node Importing Processes Operators The ITOA Module Configuring User Permissions Telegraf Metrics in NetEye Telegraf Configuration Telegraf on Monitored Hosts Visualizing Dashboards Customizing Performance Graph The NetEye Geo Map Visualizer Map Viewer Configuring Geo Maps NagVis 3b Audit Log 3b Overview Shutdown Manager user Shutdown Manager GUI Shutdown Commands Advanced Topics Overview User Role Management Cube Use Cases ntopng and NetEye Integration Permissions Retention Advanced Topics Overview User Roles Nodes Test Cases Dashboard Use Cases Overview Architecture Authorization Elasticsearch Overview Enabling El Proxy Sending custom logs to El Proxy Configuration files Commands Elasticsearch Templates and Retentions El Proxy DLQ Blockchain Verification Handling Blockchain Corruptions El Proxy Metrics El Proxy Security El Proxy REST Endpoints Agents Logstash Elastic APM Elastic RUM Log Manager - Deprecated Overview Authorization in the Command Orchestrator Module Configuring CLI Commands Executing Commands Overview Permissions Installation Single Tenancy Multitenancy Communication through a Satellite Asset collection methods Display asset information in monitoring host page Overview Customers Availability Event Adjustment Outages Resource Advanced Topics Introduction Getting Started SATAYO Items Settings Managed Service Mitre Attack Coverage Changelog Before you start Update Procedure Single Node Upgrade from 4.41 to 4.42 Cluster Upgrade from 4.41 to 4.42 Satellite Upgrade from 4.41 to 4.42 DPO machine Upgrade from 4.41 to 4.42 Create a mirror of the RPM repository Sprint Releases Feature Troubleshooting Tornado Networking Service Management - Incident Response IT Operation Analytics - Telemetry Identity Provider (IdP) Configuration Introduction to NEP Getting Started with NEPs Online Resources Obtaining NEP Insights Available Packages Advanced Topics Upgrade to NetEye 4.31 Setup Configure swappiness Restarting Stopped Services Enable stack traces in web UI How to access standard logs Director does not deploy when services assigned to a host have the same name How to enable/disable debug logging Activate Debug Logging for Tornado Modules/Services do not start Sync Rule fails when trying to recreate Icinga object How to disable InfluxDB query logging Managing an Elasticsearch Cluster with a Full Disk Some logs are not indexed in Elasticsearch Elasticsearch is not functioning properly Reporting: Error when opening a report Debugging Logstash file input filter Bugfix Policy Reporting Vulnerabilities Glossary 3b

Cluster

The clustering service of NetEye 4 is based on the RedHat 8 High Availability Clustering technologies, including Corosync, Pacemaker, and DRBD, used to set up an HA cluster composed of a combination of operating nodes, Elastic-only nodes, and Voting-only nodes. NetEye cluster is a failover cluster at service level, meaning that it provides redundancy to avoid any downtime or service disruption whenever one node in the cluster goes offline. In such a case, indeed, services are moved to another node if necessary.

Reasons for a node to be offline include–but are not limited to:

  • A networking issue (failure of a network interface or in the connectivity) which prevents a node to communicate with the other nodes

  • A hardware or software issue which freezes or blocks a node

  • A problem with the synchronisation of the data

All the cluster services run on a dedicated network called Corporate Network: every cluster node has therefore two IP addresses: A public one, accessible by the running service (including e.g., SSH), and a private one, used by Corosync, Pacemaker, DRBD, and Elastic-only nodes.

Cluster resources are typically quartets consisting of a floating IP in the Corporate Network, a DRBD device, a filesystem, and a (systemd) service. Fig. 2 shows the general case for High Availability, where cluster services are distributed across nodes, while other services (e.g., Icinga 2, Elasticsearch) handle their own clustering requirements. The remainder of this section details the architecture and implementation of a NetEye cluster.

NetEye cluster architecture

Fig. 2 The NetEye cluster architecture.

If you have not yet installed clustering services, please turn to the Cluster Installation page for setup instructions.

Type of Nodes

Within a NetEye cluster, different types of nodes can be setup. We distinguish between Operative and Single Purpose nodes, the latter being either Elastic-only or Voting-only nodes. They are

Operative node

On an operative node runs any services offered by NetEye, like e.g., Tornado, Icinga 2, slmd, and so on. They can be seen as single nodes, connected by the clustering technologies mentioned above.

Elastic-only node

Elastic-only nodes host only the DB component of the Elastic Stack, while FileBeat, Kibana, and other Elastic Stack components are still clusterised resources and run on operative nodes. Elastic-only nodes are used for either data storage or to add to the cluster more resources and processing abilities of elasticsearch data. In the latter case, the following are typical use cases:

  • Process log data in some way, for example with Machine Learning tools

  • Implement an hot-warm-cold architecture

  • Increase data retention, redundancy, or storage to archive old data

Note

An operative node may also run services of the Elastic Stack, including its DB component. In other words, it is not necessary to have a dedicated node for Elastic services.

Voting-only node

Nodes of this type are a kind of silent nodes: They do not run any service and therefore require limited computational resources compared to the other nodes. They are needed only in case of a node failure to establish the quorum and avoid cluster disruption.

See also

Voting-only nodes and their use are described with great details in a NetEye blog post: https://www.neteye-blog.com/2020/03/neteye-voting-only-node/

The NetEye Active Node

During the update and upgrade operations, it is mandatory that one of the operative nodes is always active during the procedures. The nodes of a cluster are listed in the /etc/neteye-cluster file, for example like the following.

{
  "Hostname" : "my-neteye-cluster.example.com",
  "Nodes" : [
     {
        "addr" : "192.168.1.1",
        "hostname" : "my-neteye-01",
        "hostname_ext" : "my-neteye-01.example.com",
        "id" : 1
     },
     {
        "addr" : "192.168.1.2",
        "hostname" : "my-neteye-02",
        "hostname_ext" : "my-neteye-02.example.com",
        "id" : 2
     },
     {
        "addr" : "192.168.1.3",
        "hostname" : "my-neteye-03",
        "hostname_ext" : "my-neteye-03.example.com",
        "id" : 3
     },
     {
        "addr" : "192.168.1.4",
        "hostname" : "my-neteye-04",
        "hostname_ext" : "my-neteye-04.example.com",
        "id" : 4
     }
  ],
  "ElasticOnlyNodes": [
     {
        "addr" : "192.168.1.5",
        "hostname" : "my-neteye-05",
        "hostname_ext" : "my-neteye-05.example.com",
        "id" : 5
     }
  ],
  "VotingOnlyNode" : {
       "addr" : "192.168.1.6",
       "hostname" : "my-neteye-06",
       "hostname_ext" : "my-neteye-06.example.com",
       "id" : 6
  },
  "InfluxDBOnlyNodes": [
      {
         "addr" : "192.168.1.7",
         "hostname" : "my-neteye-07",
         "hostname_ext" : "my-neteye-07.example.com"
      }
   ]
}

The NetEye Active Node will always be the first node appearing in the list of Nodes, in this case it is the node with FQDN my-neteye-01.example.com and it is the one that must always be active during the update/upgrade procedure.

Therefore, before running neteye update and neteye upgrade, log in to my-neteye-01.example.com and make sure that it is not in stand-by mode. To do so, first execute the command to check the status of the cluster

cluster# pcs status

Then, if my-neteye-01.example.com is in standby, make it active with command

cluster# pcs node unstandby my-neteye-01.example.com

See also

How nodes are managed by the NetEye update/upgrade commands is described with great details in a NetEye blog post: https://www.neteye-blog.com/2021/10/hosts-and-neteye-upgrade/

Clustering and Single Purpose Nodes

The following services use their own native clustering capabilities rather than Red Hat HA Clustering. NetEye will also take advantage of their inbuilt load balancing capabilities.

Icinga 2 Cluster

An Icinga 2 cluster is composed by one master instance holding configuration files and by a variable number of satellites and agents.

See also

Icinga 2 clusters are described in great detail in the official Icinga documentation

Elasticsearch

Each cluster node runs a local master-eligible Elasticsearch service, connected to all other nodes. Elasticsearch itself chooses which nodes can form a quorum (note that all NetEye cluster nodes are master eligible by default), and so manual quorum setup is no longer required.

See also

Elastic clusters and Elastic-only nodes are described with more details in the General Elasticsearch Cluster Information section.

Galera

The Galera cluster is a synchronous multi-master cluster for MariaDB. It is used to provide high availability and redundancy for the MariaDB database service. Each node in the Galera cluster can accept read and write requests, and changes made on one node are automatically replicated to all other nodes in the cluster.

See also

Galera clusters are described in detail in the official Galera documentation.

Warning

When dealing with a Galera cluster, it is important to be aware of the following:

  • When restarting or starting a Galera node, the systemctl command will wait for the node to fully synchronize with the cluster before completing. This ensures that each node is properly aligned with the current cluster state and has consistent data before becoming operational. This synchronization process may take varying amounts of time depending on how much data needs to be transferred to bring the node up to date with the rest of the cluster.

  • The Galera cluster uses a quorum-based approach to ensure data consistency and availability. This means that a Galera Cluster will continue operating as long as more than half of the nodes (N/2 + 1) are up and synchronized. If the quorum is lost, the Galera cluster will block all operations to prevent data inconsistency across the cluster.

Node Roles

Among the different types of nodes in a cluster, it is possible to assign specific roles to specific NetEye nodes, depending on the customer needs and the node capabilities.

For more information about the node roles, please refer to the Cluster Nodes Roles section.

Clustering Services

The combination of the following software is at the core of the NetEye’s clustering functionalities:

  • Corosync: Provides group communication between a set of nodes, application restart upon failure, and a quorum system.

  • Pacemaker: Provides cluster management, lock management, and fencing.

  • DRBD: Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.

“Local” NetEye services running simultaneously on each NetEye node ( i.e. not managed by Pacemaker and Corosync ), are managed by a dedicated systemd target unit called neteye-cluster-local.target. This reduced set of local services is managed exactly alike the Single Node neteye target:

# systemctl list-dependencies neteye-cluster-local.target
neteye-cluster-local.target
● └─drbd.service
● └─elasticsearch.service
● └─icinga2.service
[...]

Cluster Management

There are several CLI commands to be used in the management and troubleshooting of clusters, most notably drbdmon, drbdadm, and pcs.

The first one, drbdmon is used to monitor the status of DRBD, i.e., to verify if the nodes of a cluster communicate flawlessly or if there is some ongoing issue, like e.g., a node or network failure, or a split brain.

The second command, drbdadm allows to carry out administrative tasks on DRBD.

Finally, the pcs command is used to manage resources on a pcs cluster only; its main purpose is to move services between the cluster nodes when required.

In particular, pcs status retrieves the current status of the nodes and services, while pcs node standby and pcs node unstandby put a node offline and back online, respectively.

More information and examples about these command can be found in section Cluster Management Commands.

Self-signed root CA

The NetEye install process creates a self-signed root Certificate Authority in /root/security/. This CA is synchronized throughout the NetEye Cluster.

The common CA is trusted automatically during installation with neteye install, leveraging the update-ca-trust script to update certificate authorities provided by the system. Once the CA is in place, each module on each cluster node can request its certificate which is then signed by the common CA.

By default, the NetEye CA is stored in /root/security/ and trust settings are set in /usr/share/pki/ca-trust-source/. That directory contains CA certificates and trust settings in the PEM file format, and these are interpreted as a default priority. This setting allows the administrator to override the CA certificate list. Of course, for correct trust behavior, the NetEye CA should not be overwritten.

Certificates Storage

Each component that uses certificates, stores them in its conf folder, under the directory certs.

  • For example, Elasticsearch stores the certificates in the path /neteye/local/elasticsearch/conf/certs/

The certs folder contains the public certificates, while the private keys are stored inside certs/private/.

  • For example, the public certificate of the Elasticsearch admin is stored in /neteye/local/elasticsearch/conf/certs/admin.crt.pem, while its private key is stored in /neteye/local/elasticsearch/conf/certs/private/admin.key.pem

Some components export the certificates in PKCS 12 bundles (.pfx files) inside the folder certs/private/. These bundles contain the private key together with its corresponding certificate. If not differently specified, the password to decrypt the pfx files is blank (i.e. empty password).

  • For example, Tornado exports its certificate and private key to PKCS 12 format inside the file /neteye/shared/tornado/conf/certs/private/tornado.pfx. This can be decrypted by using an empty password:

    openssl pkcs12 -in /neteye/shared/tornado/conf/certs/private/tornado.pfx -nodes -password pass:
    

Secure Intracluster Communication

Security between the nodes in a cluster is just as important as front-facing security. Because nodes in a cluster must trust each other completely to provide failover services and be efficient, the lack of an intracluster security mechanism means one compromised cluster node can read and modify data throughout the cluster.

NetEye uses certificates signed by a Certificate Authority to ensure that only trusted nodes can join the cluster, to encrypt data passing between nodes so that externals cannot tamper with your data, and allows for certificate revocation for the certificates of each component in each module.

Two examples of cluster-based modules are:

  • DRBD, which replicates block devices over the network

  • The Elastic stack, which the NetEye 4 Log Management is based on.