User Guide

Under the hood

Processing Tree Configuration

The location of configuration files in the file system is pre-configured in NetEye. NetEye automatically starts Tornado as follows:

  • Reads the configuration from the /neteye/shared/tornado/conf directory

  • Starts the Tornado Engine

  • Searches for Filter and Rule definitions in /neteye/shared/tornado/conf/rules.d

The structure of this last directory reflects the Processing Tree structure. Each subdirectory can contain either:

  • A Filter and a set of sub directories corresponding to the Filter’s children

  • A Ruleset

Each individual Rule or Filter to be included in the Processing Tree must be saved in its own, JSON formatted file.

Hint

Tornado will ignore all other file types.

For instance, consider this directory structure:

/tornado/config/rules
                 |- node_0
                 |    |- 0001_rule_one.json
                 |    \- 0010_rule_two.json
                 |- node_1
                 |    |- inner_node
                 |    |    \- 0001_rule_one.json
                 |    \- filter_two.json
                 \- filter_one.json

In this example, the Processing Tree is organized as follows:

  • The root node is a filter named filter_one.

  • The filter filter_one has two child nodes: node_0 and node_1:

    • node_0 is a Ruleset that contains two rules called rule_one and rule_two, with an implicit filter that forwards all incoming events to both of its child rules.

    • node_1 is a filter with a single child named inner_node. Its filter filter_two determines which incoming events are passed to its child node.

  • inner_node is a Ruleset with a single rule called rule_one.

Within a Ruleset, the lexicographic order of the file names determines the execution order. The rule filename is composed of two parts separated by the first _ (underscore) symbol. The first part determines the rule execution order, and the second is the rule name. For example:

  • 0001_rule_one.json -> 0001 determines the execution order, “rule_one” is the rule name

  • 0010_rule_two.json -> 0010 determines the execution order, “rule_two” is the rule name

Note

A Rule name must be unique within its own Ruleset, but two rules with the same name may exist in different Rulesets.

Similar to what happens for Rules, Filter names are also derived from the filenames. However, in this case, the entire filename corresponds to the Filter name.

In the example above, the “filter_one” node is the entry point of the Processing Tree. When an Event arrives, the Matcher will evaluate whether it matches the filter condition, and will pass the Event to one (or more) of the filter’s children. Otherwise it will ignore it.

A node’s children are processed independently. Thus node_0 and node_1 will be processed in isolation and each of them will be unaware of the existence and outcome of the other. This process logic is applied recursively to every node.

String Interpolation

An action payload can contain text with placeholders that Tornado will replace at runtime. The values to be used for the substitution are extracted from the incoming Events following the conventions mentioned in the previous section; for example, using that Event definition, this string in the action payload:

Received a ${event.type} with protocol ${event.payload.protocol}

produces:

*Received a trap with protocol UDP*

Note

Only values of type String, Number, Boolean and null are valid. Consequently, the interpolation will fail, and the action will not be executed, if the value associated with the placeholder extracted from the Event is an Array, a Map, or undefined.

Matcher Engine Implementation Details

The Matcher crate contains the core logic of the Tornado Engine. It is in charge of: - Receiving events from the Collectors - Processing incoming events and detecting which Filter and Rule they satisfy - Triggering the expected actions

Due to its strategic position, its performance is of utmost importance for global throughput.

The code’s internal structure is kept simple on purpose, and the final objective is reached by splitting the global process into a set of modular, isolated and well-tested blocks of logic. Each “block” communicates with the others through a well-defined API, which at the same time hides its internal implementation.

This modularization effort is twofold; first, it minimizes the risk that local changes will have a global impact; and second, it separates functional from technical complexity, so that increasing functional complexity does not result in increasing code complexity. As a consequence, the maintenance and evolutionary costs of the code base are expected to be linear in the short, mid- and long term.

At a very high level view, when the matcher initializes, it follows these steps:

  • Configuration (see the code in the “config” module): The configuration phase loads a set of files from the file system. Each file is a Filter or a Rule in JSON format. The outcome of this step is a Processing Tree composed of Filter and Rule configurations created from the JSON files.

  • Validation (see the code in the “validator” module): The Validator receives the Processing Tree configuration and verifies that all nodes respect a set of predefined constraints (e.g., the identifiers cannot contain dots). The output is either the same Processing Tree as the input, or else an error.

  • Match Preparation (see the code in the “matcher” module): The Matcher receives the Processing Tree configuration, and for each node:

    • if the node is a Filter:

      • Builds the Accessors for accessing the event properties using the AccessorBuilder (see the code in the “accessor” module).

      • Builds an Operator for evaluating whether an event matches the Filter itself (using the OperatorBuilder, code in the “operator” module).

    • if the node is a rule:

      • Builds the Accessors for accessing the event properties using the AccessorBuilder (see the code in the “accessor” module).

      • Builds the Operator for evaluating whether an event matches the “WHERE” clause of the rule (using the OperatorBuilder, code in the “operator” module).

      • Builds the Extractors for generating the user-defined variables using the ExtractorBuilder (see the code in the “extractor” module). This step’s output is an instance of the Matcher that contains all the required logic to process an event against all the defined rules. A matcher is stateless and thread-safe, thus a single instance can be used to serve the entire application load.

  • Listening: Listen for incoming events and then match them against the stored Filters and Rules.

Tornado Monitoring and Statistics

Tornado Engine performance metrics are exposed via Tornado APIs and periodically collected by a dedicated telegraf instance telegraf_tornado_monitoring.service. Metrics are stored into the database master_tornado_monitoring in InfluxDB.

Tornado Monitoring and Statistics gives an insight about what data Tornado is processing and how. These information can be useful in several scenarios, including workload inspection, identification of bottlenecks, and issue debugging. A common use case is to identify performance-related issues: for example a difference between the amount of events received and events processed by Tornado may identify a performance problem because Tornado does not have enough resources to handle the current workload.

Examples of collected metrics are:

  • events_processed_counter: total amount of event processed by Tornado Engine

  • events_received_counter: total amount of events received by Tornado Engine through all Collectors

  • actions_processed_counter: total amount of actions executed by Tornado Engine

Metrics will be automatically deleted according to the selected retention policy.

The user can configure Tornado Monitoring and Statistics via GUI under Configuration / Modules / Tornado / Configuration. Two parameters are available:

  • Tornado Monitoring Retention Policy: defines the number of days for which metrics are retained in InfluxDB and defaults to 7 days, after which data will be no longer available.

  • Tornado Monitoring Polling Interval: sets how often the Collector queries the Tornado APIs to gather metrics and defaults to 5 seconds.

To apply changes you have to either run neteye install for both options or execute /usr/share/neteye/tornado/scripts/apply_tornado_monitoring_retention_policy.sh or /usr/share/neteye/tornado/scripts/apply_tornado_monitoring_polling_interval.sh, according to the parameter changed.

Note

On a NetEye Cluster, execute the command on the node where icingaweb2 is active.

Tornado Engine (Executable)

This crate contains the Tornado Engine executable code, which is a configuration of the Engine based on actix and built as a portable executable.

Structure of Tornado Engine

This specific Tornado Engine executable is composed of the following components:

  • A JSON Collector

  • The Engine

  • The Archive Executor

  • The Elasticsearch Executor

  • The Foreach Executor

  • The Icinga 2 Executor

  • The Director Executor

  • The Monitoring Executor

  • The Logger Executor

  • The Script Executor

  • The Smart Monitoring Executor

Each component is wrapped in a dedicated actix actor.

This configuration is only one of many possible configurations. Each component has been developed as an independent library, allowing for greater flexibility in deciding whether and how to use it.

At the same time, there are no restrictions that force the use of the components into the same executable. While this is the simplest way to assemble them into a working product, the Collectors and Executors could reside in their own executables and communicate with the Tornado Engine via a remote call. This can be achieved either through a direct TCP or HTTP call, with an RPC technology (e.g., Protobuf, Flatbuffer, or CAP’n’proto), or with a message queue system (e.g., Nats.io or Kafka) in the middle for deploying it as a distributed system.

Tornado Interaction with Icinga 2

The interaction between Tornado and Icinga 2 is explained in details in sections Icinga 2 and Smart Monitoring Check Result. In particular the Smart Monitoring Executor interacts with Icinga 2 to create objects and set their statuses. To ensure that the status of the Icinga 2 objects does not get lost, NetEye provides an automatism that stops the execution of Smart Monitoring Actions during any Icinga 2 restart or Icinga Director deployment.

The automatism keeps track of all Icinga 2 restarts (we consider also Icinga Director deployments as Icinga 2 restarts) in the icinga2_restarts table of the director database. As soon as an Icinga 2 restart takes place, a new entry with PENDING status is added in that table and at the same time the Tornado Smart Monitoring Executor is deactivated via API.

The icinga-director.service unit monitors the status of the Icinga 2 restarts that are in PENDING status and sets them to FINISHED as soon as the service recognizes that Icinga 2 completed the restart, then the Tornado Smart Monitoring Executor is activated. In case of Icinga2 errors, see the troubleshooting page ::ref::icinga2-not-starting.