User Guide

Under the hood

Matcher Engine Implementation Details

The Matcher crate contains the core logic of the Tornado Engine. It is in charge of: - Receiving events from the Collectors - Processing incoming events and detecting which Filter and Rule they satisfy - Triggering the expected actions

Due to its strategic position, its performance is of utmost importance for global throughput.

The code’s internal structure is kept simple on purpose, and the final objective is reached by splitting the global process into a set of modular, isolated and well-tested blocks of logic. Each “block” communicates with the others through a well-defined API, which at the same time hides its internal implementation.

This modularization effort is twofold; first, it minimizes the risk that local changes will have a global impact; and second, it separates functional from technical complexity, so that increasing functional complexity does not result in increasing code complexity. As a consequence, the maintenance and evolutionary costs of the code base are expected to be linear in the short, mid- and long term.

At a very high level view, when the matcher initializes, it follows these steps:

  • Configuration (see the code in the “config” module): The configuration phase loads a set of files from the file system. Each file is a Filter or a Rule in JSON format. The outcome of this step is a processing tree composed of Filter and Rule configurations created from the JSON files.

  • Validation (see the code in the “validator” module): The Validator receives the processing tree configuration and verifies that all nodes respect a set of predefined constraints (e.g., the identifiers cannot contain dots). The output is either the same processing tree as the input, or else an error.

  • Match Preparation (see the code in the “matcher” module): The Matcher receives the processing tree configuration, and for each node:

    • if the node is a Filter:

      • Builds the Accessors for accessing the event properties using the AccessorBuilder (see the code in the “accessor” module).

      • Builds an Operator for evaluating whether an event matches the Filter itself (using the OperatorBuilder, code in the “operator” module).

    • if the node is a rule:

      • Builds the Accessors for accessing the event properties using the AccessorBuilder (see the code in the “accessor” module).

      • Builds the Operator for evaluating whether an event matches the “WHERE” clause of the rule (using the OperatorBuilder, code in the “operator” module).

      • Builds the Extractors for generating the user-defined variables using the ExtractorBuilder (see the code in the “extractor” module). This step’s output is an instance of the Matcher that contains all the required logic to process an event against all the defined rules. A matcher is stateless and thread-safe, thus a single instance can be used to serve the entire application load.

  • Listening: Listen for incoming events and then match them against the stored Filters and Rules.

Tornado Monitoring and Statistics

Tornado Engine performance metrics are exposed via Tornado APIs and periodically collected by a dedicated telegraf instance telegraf_tornado_monitoring.service. Metrics are stored into the database master_tornado_monitoring in InfluxDB.

Tornado Monitoring and Statistics gives an insight about what data Tornado is processing and how. These information can be useful in several scenarios, including workload inspection, identification of bottlenecks, and issue debugging. A common use case is to identify performance-related issues: for example a difference between the amount of events received and events processed by Tornado may identify a performance problem because Tornado does not have enough resources to handle the current workload.

Examples of collected metrics are:

  • events_processed_counter: total amount of event processed by Tornado Engine

  • events_received_counter: total amount of events received by Tornado Engine through all Collectors

  • actions_processed_counter: total amount of actions executed by Tornado Engine

Metrics will be automatically deleted according to the selected retention policy.

The user can configure Tornado Monitoring and Statistics via GUI under Configuration / Modules / tornado / Configuration. Two parameters are available:

  • Tornado Monitoring Retention Policy: defines the number of days for which metrics are retained in InfluxDB and defaults to 7 days, after which data will be no longer available.

  • Tornado Monitoring Polling Interval: sets how often the Collector queries the Tornado APIs to gather metrics and defaults to 5 seconds.

To apply changes you have to either run neteye_secure_install for both options or execute /usr/share/neteye/tornado/scripts/apply_tornado_monitoring_retention_policy.sh or /usr/share/neteye/tornado/scripts/apply_tornado_monitoring_polling_interval.sh, according to the parameter changed.

Note

On a NetEye Cluster, execute the command on the node where icingaweb2 is active.

Tornado Engine (Executable)

This crate contains the Tornado Engine executable code, which is a configuration of the Engine based on actix and built as a portable executable.

Structure of Tornado Engine

This specific Tornado Engine executable is composed of the following components:

  • A JSON Collector

  • The Engine

  • The Archive Executor

  • The Elasticsearch Executor

  • The Foreach Executor

  • The Icinga 2 Executor

  • The Director Executor

  • The Monitoring Executor

  • The Logger Executor

  • The Script Executor

  • The Smart Monitoring Executor

Each component is wrapped in a dedicated actix actor.

This configuration is only one of many possible configurations. Each component has been developed as an independent library, allowing for greater flexibility in deciding whether and how to use it.

At the same time, there are no restrictions that force the use of the components into the same executable. While this is the simplest way to assemble them into a working product, the Collectors and Executors could reside in their own executables and communicate with the Tornado Engine via a remote call. This can be achieved either through a direct TCP or HTTP call, with an RPC technology (e.g., Protobuf, Flatbuffer, or CAP’n’proto), or with a message queue system (e.g., Nats.io or Kafka) in the middle for deploying it as a distributed system.

Tornado Interaction with Icinga 2

The interaction between Tornado and Icinga 2 is explained in details in sections Icinga 2 and Smart Monitoring Check Result. In particular the Smart Monitoring Executor interacts with Icinga 2 to create objects and set their statuses. To ensure that the status of the Icinga 2 objects does not get lost, NetEye provides an automatism that stops the execution of Smart Monitoring Actions during any Icinga 2 restart or Icinga Director deployment.

The automatism keeps track of all Icinga 2 restarts (we consider also Icinga Director deployments as Icinga 2 restarts) in the icinga2_restarts table of the director database. As soon as an Icinga 2 restart takes place, a new entry with PENDING status is added in that table and at the same time the Tornado Smart Monitoring Executor is deactivated via API.

The icinga-director.service unit monitors the status of the Icinga 2 restarts that are in PENDING status and sets them to FINISHED as soon as the service recognizes that Icinga 2 completed the restart, then the Tornado Smart Monitoring Executor is activated. In case of Icinga2 errors, see the troubleshooting page ::ref::icinga2-not-starting.

Tornado Collectors

Below you will find a list of Tornado Collectors that are preconfigured on the NetEye side, and can be used out of the box.

JMESPath Syntax

JMESPath syntax is be used inside the configuration structure (filter, action) for accessing the event to extract particular properties for further processing.

It is used only internally by Tornado and should be never called directly.

Tornado Email Collector

The Email Collector Executable binary, built on actix, is an executable that generates Tornado Events from MIME email inputs.

On startup, it creates a UDS socket where it listens for incoming email messages. Each email published on the socket is processed by the embedded Tornado email Collector to produce Tornado Events which are, finally, forwarded to the Tornado Engine’s TCP address.

The UDS socket is created with the same user and group as the tornado_email_collector process, with permissions set to 770 (read, write and execute for both the user and the group).

Each client that needs to write an email message to the socket should close the connection as soon as it completes its action. In fact, the Email Collector Executable will not even start processing that email until it receives an EOF signal. Only one email per connection is allowed.

Tornado Rsyslog Collector

The rsyslog Collector binary is an executable that generates Tornado Events from rsyslog inputs.

This Collector is meant to be integrated with rsyslog’s own logging through the omprog module. Consequently, it is never started manually, but instead will be started, and managed, directly by rsyslog itself.

Here is an example rsyslog configuration template that pipes logs to the rsyslog-collector (the parameters are explained in the configuration file tornado_rsyslog_collector/conf/rsyslog_collector.toml) only logs with severity higher than warning:

module(load="omprog")

action(type="omprog"
       binary="/usr/lib64/tornado/bin/tornado_rsyslog_collector --logger=warn")

An example of a fully instantiated startup setup is:

module(load="omprog")

action(type="omprog"
       binary="/usr/lib64/tornado/bin/tornado_rsyslog_collector --config-dir=/tornado-rsyslog-collector/config --tornado-event-socket-ip=192.168.123.12 --tornado-event-socket-port=4747")

Note that all parameters for the binary option must be on the same line. You will need to place this configuration in a file in your rsyslog directory, for instance:

/etc/rsyslog.d/tornado.conf

In this example the Collector will:

  • Reads the configuration from the /tornado-rsyslog-collector/config directory

  • Write outgoing Events to the TCP socket at tornado_server_ip:4747

The Collector will need to be run in parallel with the Tornado Engine before any events will be processed, for example:

/opt/tornado/bin/tornado --tornado-event-socket-ip=tornado_server_ip

Under this configuration, rsyslog is in charge of starting the Collector when needed and piping the incoming logs to it. As the last stage, the Tornado Events generated by the Collector are forwarded to the Tornado Engine’s TCP socket.

This integration strategy is the best option for supporting high performance given massive amounts of log data.

Because the Collector expects the input to be in JSON format, rsyslog should be pre-configured to properly pipe its inputs in this form.

Tornado Nats JSON Collector

The Nats JSON Collector is a standalone Collector that listens for JSON messages on Nats topics, generates Tornado Events, and sends them to the Tornado Engine.

The Nats JSON Collector executable is built on actix.

On startup, it connects to a set of topics on a Nats server. Calls received are then processed by the embedded jmespath Collector that uses them to produce Tornado Events. In the final step, the Events are forwarded to the Tornado Engine through the configured connection type.

For each topic, you must provide two values in order to successfully configure them:

  • nats_topics: A list of Nats topics to which the Collector will subscribe.

  • collector_config: (Optional) The transformation logic that converts a JSON object received from Nats into a Tornado Event. It consists of a JMESPath Collector configuration as described in its specific documentation.

SNMP Trap Daemon Collectors

The _snmptrapd_collector_s of this package are embedded Perl trap handlers for Net-SNMP’s snmptrapd. When registered as a subroutine in the Net-SNMP snmptrapd process, they receives snmptrap-specific inputs, transforms them into Tornado Events, and forwards them to the Tornado Engine.

There are two Collector implementations, the first one sends Events directly to the Tornado TCP socket and the second one forwards them to a NATS server.

The implementations rely on the Perl NetSNMP::TrapReceiver package. You can refer to its documentation for generic configuration examples and usage advice.

The _snmptrapd_collector_s receive snmptrapd messages, parse them, generate Tornado Events and, finally, sends them to Tornado using their specific communication channel.

The received messages are kept in an in-memory non-persistent buffer that makes the application resilient to crashes or temporary unavailability of the communication channel. When the connection to the channel is restored, all messages in the buffer will be sent. When the buffer is full, the Collectors will start discarding old messages. The buffer max size is set to 10000 messages.

Consider a snmptrapd message that contains the following information:

PDU INFO:
  version                        1
  errorstatus                    0
  community                      public
  receivedfrom                   UDP: [127.0.1.1]:41543->[127.0.2.2]:162
  transactionid                  1
  errorindex                     0
  messageid                      0
  requestid                      414568963
  notificationtype               TRAP
VARBINDS:
  iso.3.6.1.2.1.1.3.0            type=67 value=Timeticks: (1166403) 3:14:24.03
  iso.3.6.1.6.3.1.1.4.1.0        type=6  value=OID: iso.3.6.1.4.1.8072.2.3.0.1
  iso.3.6.1.4.1.8072.2.3.2.1     type=2  value=INTEGER: 123456

The Collector will produce this Tornado Event:

{
   "type":"snmptrapd",
   "created_ms":"1553765890000",
   "payload":{
      "protocol":"UDP",
      "src_ip":"127.0.1.1",
      "src_port":"41543",
      "dest_ip":"127.0.2.2",
      "PDUInfo":{
         "version":"1",
         "errorstatus":"0",
         "community":"public",
         "receivedfrom":"UDP: [127.0.1.1]:41543->[127.0.2.2]:162",
         "transactionid":"1",
         "errorindex":"0",
         "messageid":"0",
         "requestid":"414568963",
         "notificationtype":"TRAP"
      },
      "oids":{
         "iso.3.6.1.2.1.1.3.0":"67",
         "iso.3.6.1.6.3.1.1.4.1.0":"6",
         "iso.3.6.1.4.1.8072.2.3.2.1":"2"
      }
   }
}

The structure of the generated Event is not configurable.