User Guide

Resources Tuning

This section will contain a collection of suggested settings for various services running on NetEye.

MariaDB

MariaDB is started with default upstream settings. If the size of an installation requires it, resource usage of MariaDB can be adjusted to meet the higher requirements for performance. The following settings can be added to a file /neteye/shared/mysql/conf/my.cnf.d/custom.conf:

[mysqld]
innodb_buffer_pool_size=16G
tmp_table_size = 512M
max_heap_table_size = 512M
innodb_sort_buffer_size=16000000
sort_buffer_size=32M

Icingaweb2 GUI

Performance of the Icingaweb2 Graphical User Interface, can significantly be improved in high load environments by adding INDEX and updating the COLUMN definition of hostgroups and history related tables. To do this, execute the below queries manually:

ALTER TABLE icinga_hostgroups MODIFY hostgroup_object_id bigint(20) unsigned NOT NULL;
ALTER TABLE icinga_hostgroups ADD UNIQUE INDEX idx_hostgroups_hostgroup_object_id (hostgroup_object_id);
ALTER TABLE icinga_commenthistory ADD INDEX idx_icinga_commenthistory_entry_time (entry_time);
ALTER TABLE icinga_downtimehistory ADD INDEX idx_icinga_downtimehistory_entry_time (entry_time);
ALTER TABLE icinga_notifications ADD INDEX idx_icinga_notifications_start_time (start_time);
ALTER TABLE icinga_statehistory ADD INDEX idx_icinga_statehistory_state_time (state_time);

InfluxDB

InfluxDB is a time series database designed to handle high volumes of write and query loads in NetEye. If you want to learn more about InfluxDB you can refer to the official InfluxDB documentation

Migration of inmem (in-memory) indices to TSI (time-series)

From NetEye 4.14, InfluxDB will use the Time Series Index (TSI).

However, the existing setup will still use the TSM index for writing and fetching data until you perform the migration procedure, which consists of the following steps.

  1. Build TSI by running the influx_inspect buildtsi command:

    In a cluster environment, the below command must be executed on the node on which the InfluxDB resource is running:

    sudo -u influxdb influx_inspect buildtsi -datadir /neteye/shared/influxdb/data/data -waldir /neteye/shared/influxdb/data/wal -v
    

    Upon execution, the above command will build TSI for all the databases that exist in *

    Note

    If you want to build TSI only for a specific database then add the -database <database_name> parameter to the above command.

  2. Restart the influxdb service:

  • Single node:

    systemctl restart influxdb
    
  • Cluster environment:

    pcs resource restart influxdb
    

The official documentation of InfluxDB Upgrade contains more information about the inmem (in-memory) to TSI (time-series) migration process.

Log Analytics (Elastic Stack) Performance Tuning

This guide summarises the relevant configuration optimisations that allows to optimize Elastic Stack and boost the performance to optimise the use on Netye 4. Applying these suggestions proves very useful and is suggested, especially on larger Elastic deployments.

Elasticsearch JVM Optimization

In Elasticsearch, the default options for the JVM are specified in the /neteye/local/elasticsearch/conf/jvm.options file. Please note how this file must not be modified, since it will be overwritten at each update.

If you would like to specify or override some options, a new .options file should be created in the /neteye/local/elasticsearch/conf/jvm.options.d/ folder, containing the desired options, one per line. Please note that the JVM processes the options files according to the lexicographic order.

For example, we can set the encoding used by Java for reading and saving files to UTF-8 by creating a /neteye/local/elasticsearch/conf/jvm.options.d/01_custom_jvm.options with the following content:

-Dfile.encoding=UTF-8

For more information about the available JVM options and their syntax, please refer to the official documentation.

Elasticsearch Database Tuning

Swapping

Swapping is very bad for performance, for node stability, and should be avoided at any costs, because it can cause garbage collections to last for minutes instead of milliseconds, it can cause nodes to respond slowly, or even to disconnect from the cluster. In a resilient distributed system, it proves more effective to let the operating system kill the node than allowing swapping.

Moreover, Elasticsearch performs poorly when the system is swapping the memory to disk. Therefore, it is vitally important to the health of your node that none of the JVM is ever swapped out to disk. The following steps allow to achieve this goal.

  1. Configure swappiness. Ensure that the sysctl value vm.swappiness is set to 1. This reduces the kernel’s tendency to swap and should not lead to swapping under normal circumstances, while still allowing the whole system to swap in emergency conditions. Execute the following commands on each Elastic Node and made changes persistent:

    sysctl vm.swappiness=1
    echo "vm.swappiness=1" > /etc/sysctl.d/zzz-swappiness.conf
    sysctl -p
    
  2. Memory locking. Another best practice on Elastic nodes is use mlockall option, to try to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. Set the bootstrap.memory_lock setting to true, so Elasticsearch will lock the process address space into RAM, preventing any portion of memory used by Elasticsearch from being swapped out.

    1. Uncomment or add this line to the /neteye/local/elasticsearch/conf/elasticsearch.yml file:

      bootstrap.memory_lock: true
      
    2. Edit limit of system resources on Service section creating the new file /etc/systemd/system/elasticsearch.service.d/neteye-limits.conf with the following content:

      [Service]
      LimitMEMLOCK=infinity
      
    3. Restart resources:

      systemctl daemon-reload
      systemctl restart elasticsearch
      
    4. After starting Elasticsearch, you can see whether this setting was applied successfully by checking the value of mlockall in the output from this request:

      sh /usr/share/neteye/elasticsearch/scripts/es_curl.sh -XGET 'https://elasticsearch.neteyelocal:9200/_nodes?filter_path=**.mlockall&pretty'``
      

Increase file descriptor

Check if the amount of file descriptor suffices by using the command lsof -p <elastic-pid> | wc -l on each nodes. By default the setting on Neteye is 65,535.

To increase the default value this create a file in /etc/systemd/system/elasticsearch.service.d/neteye-open-file-limit.conf with content such as:

[Service]
LimitNOFILE=100000

For more information, see the official documentation

DNS cache settings

By default, Elasticsearch runs with a security manager in place, which implies that the JVM defaults to caching positive hostname resolutions indefinitely and defaults to caching negative hostname resolutions for ten seconds. Elasticsearch overrides this behavior with default values to cache positive lookups for 60 seconds, and to cache negative lookups for 10 seconds.

These values should be suitable for most environments, including environments where DNS resolutions vary with time. If not, you can edit the values es.networkaddress.cache.ttl and es.networkaddress.cache.negative.ttl in the JVM options drop-in folder /neteye/local/elasticsearch/conf/jvm.options.d/.

SIEM Additional Tuning (X-Pack)

Encrypt sensitive data check

If you use Watcher and have chosen to encrypt sensitive data (by setting xpack.watcher.encrypt_sensitive_data to true), you must also place a key in the secure settings store.

To pass this bootstrap check, you must set the xpack.watcher.encryption_key on each node in the cluster. For more information, see the official documentation.

Kibana Configuration tuning

There are same interesting tuning that could be done also on Kibana settings to improve performance on production.

For more information, see the official documentation.

Require Content Security Policy (CSP)

Kibana uses a Content Security Policy to help prevent the browser from allowing unsafe scripting, but older browsers will silently ignore this policy. If your organization does not need to support Internet Explorer 11 or much older versions of our other supported browsers, we recommend that you enable Kibana’s strict mode for content security policy, which will block access to Kibana for any browser that does not enforce even a rudimentary set of CSP protections.

To do this, set csp.strict to true in file /neteye/shared/kibana/conf/kibana.yml.

Memory

Kibana has a default maximum memory limit of 1.4 GB, and in most cases, we recommend leaving this setting to its default value. However, in some scenarios, such as large reporting jobs, it may make sense to tweak limits to meet more specific requirements.

You can modify this limit by setting --max-old-space-size in the NODE_OPTIONS environment variable. In Neteye this can be configured creating a file /etc/systemd/system/kibana-logmanager.service.d/memory.conf containing a limit in MB such as:

[Service]
NODE_OPTIONS="--max-old-space-size=2048"

For more information, see the official documentation.

User Customization

The Kibana environment file /neteye/shared/kibana/conf/sysconfig/kibana contains some options used by the Kibana service. Please note how this file must not be modified, since it will be overwritten at each update.

The dedicated file /neteye/shared/kibana/conf/sysconfig/kibana-user-customization can be used to specify or override one or more Kibana environment variables.

How to Enable Load Balancing For Logstash

Warning

This functionality is in beta stage and may be subject to changes. Beta features may break during minor upgrades and their quality is not ensured by regression testing.

The load balancing feature for logstash exploits NGINX ability to act as a reverse proxy and distribute incoming (logstash) connections among all nodes in the cluster. In this way, logstash will no longer be a cluster resource anymore, but a standalone service running on each node of the cluster.

Note however, that if you enable this feature, you will lose the ability to sign the log files. This happens because with this setup, logmanager has access only to the log files that are present on file systems mounted on the node where it is running.

Indeed, rsyslog can not take advantage of the load balancing feature, therefore only the logs on the node on which logmanager is running will be signed.

In the case of Beat, log files will be sent through the load balancer and therefore they will not be signed.

This how-to will guide you in setting up load balancing for logstash. In a nutshell, you need first to disable the logstash cluster resource, then to modify or to add logstash and NGINX configurations, and finally to keep the logstash configuration in sync on all nodes.

More in details, this are the steps:

  1. Permanently disable the cluster resource for logstash: run pcs resource disable logstash

  2. Create a local service of logstash on each node in the cluster, by following these steps:

    1. The configuration files will be stored in /neteye/local/logstash/conf, so copy them over from /neteye/shared/logstash/conf

      1. Fix all the paths into the conf files:

        find /neteye/local/logstash/conf -type f -exec sed -i 's/shared/local/g' "{}" \;
        
    2. Edit both the /neteye/shared/logstash/conf/sysconfig/logstash and /neteye/local/logstash/conf/sysconfig/logstash files and add to them the following lines:

      LS_SETTINGS_DIR="/neteye/local/logstash/conf/"   OPTIONS="--config.reload.automatic"
      
    3. Add the host directive in /neteye/local/logstash/conf/conf.d/0_i03_agent_beats.input (use the cluster internal network IP): host => “192.168.xxx.xxx”

    4. Create a new logstash service (call it e.g., logstash-local.service) with the following content:

      [Unit]
      Description=logstash local
      
      [Service]
      Type=simple
      User=logstash
      Group=logstash
      EnvironmentFile=-/etc/default/logstash
      EnvironmentFile=-/neteye/local/logstash/conf/sysconfig/logstash
      ExecStartPre=/usr/share/logstash/bin/generate-config.sh
      ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/neteye/local/logstash/conf/" $OPTIONS Restart=always
      WorkingDirectory=/
      Nice=19
      LimitNOFILE=16384
      
      [Install]
      WantedBy=multi-user.target
      
    5. Add the service into the neteye cluster local systemd targets. You can refer to the Cluster Technology and Architecture chapter of the user guide for more information.

    6. Edit file /etc/hosts to point the host logstash.neteyelocal to the cluster IP.

  3. Add the NGINX load-balancing configuration in a file called logstash-loadbalanced.j2 The name is very imporant because it will be used by neteye install to setup the correct mapping between the logstash service and NGINX. The file needs to have the following content. Please, pay special attention in copying the whole snippet AS-IS, especially the three-line of the for cycle, because it is essential in configuring NGINX on all the cluster nodes:

    upstream logstash\_ingest {
       {% for node in nodes %}
         server {{ hostvars[node].internal\_node\_addr }}:5044;
       {% endfor %}
    }
    
     server {
       listen logstash.neteyelocal:5044;
       proxy_pass logstash_ingest;
     }
    
  4. Remember that the logstash standalone configuration must be kept in sync on all nodes, therefore the /neteye/local/logstash/conf/ directory must have the same content on all nodes. To achieve this goal you can for example set up a cron job that uses rsync to maintain the synchronisation.

  5. Run neteye install only once on any cluster node.

  6. Start the local logstash service on every node: systemctl start logstash-local on every node