User Guide Functional Overview Requirements Architecture System Installation NetEye Additional Components Installation Setup The neteye Command Director NetEye Self Monitoring Tornado Business Service Monitoring IT Operation Analytics - Telemetry Geo Maps NagVis Audit Log Shutdown Manager Reporting ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO NetEye Update & Upgrade How To NetEye Extension Packs Troubleshooting Security Policy Glossary
module icon Elastic Stack
Overview Architecture Authorization Kibana Elasticsearch Cluster Elasticsearch Configuration Replicas on a Single Node Elasticsearch Performance tuning Overview Enabling El Proxy Sending custom logs to El Proxy Configuration files Commands Elasticsearch Templates and Retentions El Proxy DLQ Blockchain Verification Handling Blockchain Corruptions El Proxy Metrics El Proxy Security El Proxy REST Endpoints Agents Logstash Elastic APM Elastic RUM Elastic XDR Log Manager - Deprecated
ntopng Visual Monitoring with Alyvix Elastic Stack IT Operations (Command Orchestrator) Asset Management Service Level Management Cyber Threat Intelligence - SATAYO Introduction to NetEye Monitoring Business Service Monitoring IT Operation Analytics Visualization Network Visibility Log Management & Security Orchestrated Datacenter Shutdown Application Performance Monitoring User Experience Service Management Service Level Management & Reporting Requirements for a Node Cluster Requirements and Best Practices NetEye Satellite Requirements TCP and UDP Ports Requirements Additional Software Installation Introduction Single Node Cluster NetEye Master Master-Satellite Architecture Underlying Operating System Acquiring NetEye ISO Image Installing ISO Image Single Nodes and Satellites Cluster Nodes Configuration of Tenants Satellite Nodes Only Nodes behind a Proxy Additional NetEye Components Single Node Cluster Node Satellites Nodes only Verify if a module is running correctly Accessing the New Module Cluster Satellite Security Identity and Access Management External Identity Providers Configure federated LDAP/AD Emergency Reset of Keycloak Configuration Advanced Configuration Roles Single Page Application in NetEye Module Permissions and Single Sign On Within NetEye Importing User Federation Groups inside another Group Importing OIDC IdP Groups inside another Group Resources Tuning Advanced Topics Basic Concepts & Usage Advanced Topics Monitoring Environment Templates Monitored Objects Import Monitored Objects Data Fields Deployment Icinga 2 Agents Configuration Baskets Dashboard Monitoring Status VMD Permissions Notifications Jobs API Configuring Icinga Monitoring Retention Policy NetEye Self Monitoring Concepts Collecting Events Add a Filter Node WHERE Conditions Iterating over Event fields Retrieving Payload of an Event Extract Variables Create a Rule Tornado Actions Test your Configuration Export and Import Configuration Example Under the hood Development Retry Strategy Configuration Thread Pool Configuration API Reference Configure a new Business Process Create your first Business Process Node Importing Processes Operators The ITOA Module Configuring User Permissions Telegraf Metrics in NetEye Telegraf Configuration Telegraf on Monitored Hosts Visualizing Dashboards Customizing Performance Graph The NetEye Geo Map Visualizer Map Viewer Configuring Geo Maps NagVis Audit Log Overview Shutdown Manager user Shutdown Manager GUI Shutdown Commands Advanced Topics Overview User Role Management Cube Use Cases ntopng and NetEye Integration Permissions Retention Advanced Topics Overview User Roles Nodes Test Cases Dashboard Use Cases Overview Architecture Authorization Kibana Elasticsearch Cluster Elasticsearch Configuration Replicas on a Single Node Elasticsearch Performance tuning Overview Enabling El Proxy Sending custom logs to El Proxy Configuration files Commands Elasticsearch Templates and Retentions El Proxy DLQ Blockchain Verification Handling Blockchain Corruptions El Proxy Metrics El Proxy Security El Proxy REST Endpoints Agents Logstash Elastic APM Elastic RUM Elastic XDR Log Manager - Deprecated Overview Authorization in the Command Orchestrator Module Configuring CLI Commands Executing Commands Overview Permissions Installation Single Tenancy Multitenancy Communication through a Satellite Asset collection methods Display asset information in monitoring host page Overview Customers Availability Event Adjustment Outages Resource Advanced Topics Introduction Getting Started SATAYO Items Settings Managed Service Mitre Attack Coverage Changelog Before you start Update Procedure Single Node Upgrade from 4.43 to 4.44 Cluster Upgrade from 4.43 to 4.44 Satellite Upgrade from 4.43 to 4.44 DPO machine Upgrade from 4.43 to 4.44 Create a mirror of the RPM repository Sprint Releases Feature Troubleshooting Tornado Networking Service Management - Incident Response IT Operation Analytics - Telemetry Identity Provider (IdP) Configuration NetEye Cluster on Microsoft Azure Introduction to NEP Getting Started with NEPs Online Resources Obtaining NEP Insights Available Packages Advanced Topics Upgrade to NetEye 4.31 Setup Configure swappiness Restarting Stopped Services Enable stack traces in web UI How to access standard logs Director does not deploy when services assigned to a host have the same name How to enable/disable debug logging Activate Debug Logging for Tornado Modules/Services do not start Sync Rule fails when trying to recreate Icinga object How to disable InfluxDB query logging Managing an Elasticsearch Cluster with a Full Disk Some logs are not indexed in Elasticsearch Elasticsearch is not functioning properly Reporting: Error when opening a report Debugging Logstash file input filter Bugfix Policy Reporting Vulnerabilities Glossary

Elasticsearch Performance tuning

This guide summarises the relevant configuration optimisations that allows to optimize Elastic Stack and boost the performance to optimise the use on Netye 4. Applying these suggestions proves very useful and is suggested, especially on larger Elastic deployments.

Elasticsearch JVM

In Elasticsearch, the default options for the JVM are specified in the /neteye/local/elasticsearch/conf/jvm.options file. Please note how this file must not be modified, since it will be overwritten at each update.

If you would like to specify or override some options, a new .options file should be created in the /neteye/local/elasticsearch/conf/jvm.options.d/ folder, containing the desired options, one per line. Please note that the JVM processes the options files according to the lexicographic order.

For example, we can set the encoding used by Java for reading and saving files to UTF-8 by creating a /neteye/local/elasticsearch/conf/jvm.options.d/01_custom_jvm.options with the following content:

-Dfile.encoding=UTF-8

For more information about the available JVM options and their syntax, please refer to the official documentation.

Elasticsearch Database

Swapping

Swapping is very bad for performance, for node stability, and should be avoided at any costs, because it can cause garbage collections to last for minutes instead of milliseconds, it can cause nodes to respond slowly, or even to disconnect from the cluster. In a resilient distributed system, it proves more effective to let the operating system kill the node than allowing swapping.

Moreover, Elasticsearch performs poorly when the system is swapping the memory to disk. Therefore, it is vitally important to the health of your node that none of the JVM is ever swapped out to disk. The following steps allow to achieve this goal.

  1. Configure swappiness. Ensure that the sysctl value vm.swappiness is set to 1. This reduces the kernel’s tendency to swap and should not lead to swapping under normal circumstances, while still allowing the whole system to swap in emergency conditions. Execute the following commands on each Elastic Node and made changes persistent:

    sysctl vm.swappiness=1
    echo "vm.swappiness=1" > /etc/sysctl.d/zzz-swappiness.conf
    sysctl -p
    
  2. Memory locking. Another best practice on Elastic nodes is use mlockall option, to try to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. Set the bootstrap.memory_lock setting to true, so Elasticsearch will lock the process address space into RAM, preventing any portion of memory used by Elasticsearch from being swapped out.

    1. Uncomment or add this line to the /neteye/local/elasticsearch/conf/elasticsearch.yml file:

      bootstrap.memory_lock: true
      
    2. Edit limit of system resources on Service section creating the new file /etc/systemd/system/elasticsearch.service.d/neteye-limits.conf with the following content:

      [Service]
      LimitMEMLOCK=infinity
      
    3. Restart resources:

      systemctl daemon-reload
      systemctl restart elasticsearch
      
    4. After starting Elasticsearch, you can see whether this setting was applied successfully by checking the value of mlockall in the output from this request:

      sh /usr/share/neteye/elasticsearch/scripts/es_curl.sh -XGET 'https://elasticsearch.neteyelocal:9200/_nodes?filter_path=**.mlockall&pretty'``
      

Increase file descriptor

Check if the amount of file descriptor suffices by using the command lsof -p <elastic-pid> | wc -l on each nodes. By default the setting on Neteye is 65,535.

To increase the default value this create a file in /etc/systemd/system/elasticsearch.service.d/neteye-open-file-limit.conf with content such as:

[Service]
LimitNOFILE=100000

For more information, see the official documentation

DNS cache settings

By default, Elasticsearch runs with a security manager in place, which implies that the JVM defaults to caching positive hostname resolutions indefinitely and defaults to caching negative hostname resolutions for ten seconds. Elasticsearch overrides this behavior with default values to cache positive lookups for 60 seconds, and to cache negative lookups for 10 seconds.

These values should be suitable for most environments, including environments where DNS resolutions vary with time. If not, you can edit the values es.networkaddress.cache.ttl and es.networkaddress.cache.negative.ttl in the JVM options drop-in folder /neteye/local/elasticsearch/conf/jvm.options.d/.

Prevent Data from growing

Data is growing very fast, consuming too much cpu, RAM and disk. If a system is over 50% of cpu or disk, an action should be taken, which is indicated by the Icinga checks that help to take the right decisions.

Limit data retention

Index Lifecycle Management (ILM) is designed to manage data retention by automating the lifecycle of indices in Elasticsearch. It allows you to define policies that automatically manage indices based on their age and performance needs, including actions like rollover, shrinking, force merging, and deletion. This helps optimize storage costs and enforce data retention policies.

Learn more about configuring a lifecycle policy with an appropriate retention in official documentation.

Save disk space

  1. You can use a Time series data stream (TSDS) to store metrics data more efficiently. Metrics data stored in a TSDS may use up to 70% less disk space than a regular data stream. The exact impact will vary per data set. Learn more about when to use a TSDS in the official documentation.

  2. Use logsdb index mode. Logsdb index mode significantly reduces storage needs by using slightly more CPU during ingest. After enabling logsdb index mode for your data sources, you may need to adjust cluster sizing in response to the new CPU and storage needs. logsdb mode is very easy to implement. To do that, just add the following parameter into the index template:

    {
     "index": {
       "mode": "logsdb"
    }
    

    To learn more about how logsdb index mode optimizes CPU and storage usage, check the blog on Elasticsearch newly specialized logsdb index mode.

CPU and Data Ingestion

It might happen that events are stored late or not stored at all.

Important

In the case when data ingestion via Elastic agents is performed by means of a network listener, and a user is facing performance problems, they should use TCP in order to be able to see existing performance problems, because UDP is throwing away packets by design (in this case).

To provide a better latency or better throughput, you should set predefined values in the output definition of the agent policy. If setting predefined values as shown above is not enough, try more advanced settings.

In order to receive fast enough, change advanced yaml configuration in a new output definition to define batch size, more workers, etc. Besides settings for ssl.certificate_authorities, apply the following settings on big machines:

bulk_max_size: 1600 worker: 8 queue.mem.events: 100000 queue.mem.flush.min_events: 5000 queue.mem.flush.timeout: 10 compression_level: 1 idle_connection_timeout: 15

Make sure you find the best values matching your particular needs.

Note

The setting “Performance tuning” just above the advanced yaml configuration within the output definition will be set to custom, if you add such values into the advanced yaml configuration.

Memory / RAM settings

Keep Satellite accessible

In some environments, Elastic Agent integrations can unexpectedly consume excessive memory due to different reasons. When this happens, the Linux Kernel may invoke the OoM (Out of Memory) killer of systemd, terminating the Elastic Agent service and usually, disrupting data ingestion.

To avoid the Agent being terminated when exceeding available memory, follow the tips described on our blog.

Choose your license

If you’re using Elastic under the Enterprise licence, GB of RAM are the basis of licensing, and not the number of nodes. Hence, you should consider the type of license you’re using, and in case with Enterprise license make sure the memory is enough to be distributed properly between the nodes depending on your infrastructure.

Elasticsearch Index Recovery Settings

During the process of index recovery on a NetEye cluster, default settings of Elasticsearch may not appear optimal.

The default limit is set to the following values:

{
    "max_concurrent_operations": "1",
    "max_bytes_per_sec": "40mb"
}

which means that the reallocation of indices may appear to be slow when node leaves the Elasticsearch cluster. Default settings may under-utilize the internal cluster network bandwidth, so it is recommended to update the settings dynamically using the cluster update settings API call:

/usr/share/neteye/elasticsearch/scripts/es_curl.sh -XPUT -H 'content-type: application/json' https://elasticsearch.neteyelocal:9200/_cluster/settings -d '{
  "persistent" : {
    "indices.recovery.max_bytes_per_sec" : "100mb",
    "indices.recovery.max_concurrent_operations": 2
  }
}'

This dynamic setting will apply the same limit on every node in the Cluster.

For a NetEye Elastic Stack module it is required to have a 10GB/s Private connection for the Cluster, with all the nodes having the same capabilities.

When updating the settings, the max_bytes_per_sec value should be set to max 50% of the private network bandwidth if Operative nodes are also Elastic Data. In case Operative Nodes are not Data Nodes (i.e. all the data are stored on Elastic Data-only Nodes) the value can go up to 95% of the bandwidth.

The max_concurrent_operations value is recommended to be set to 2, and can be increased (e.g. 4) for larger clusters.

You can find more details in the official documentation.