Skip to content

Event pipeline

Event Pipeline Overview Video

Event Processing Explained

  • Events are generated by collector daemons
  • Passive collection is performed by daemons like zentrap, zensyslog, etc.
  • Active collection is performed by daemons like zenpython, zencommand, etc.

  • Queued events are sent to ZenHub

  • If zenhub is not available to receive queued events, they will be held internally

    • Collector daemons keep an internal event queue
    • Not persistent
    • Defaults to 5000
      • Rotating buffer; holds only the 5000 most recent events
    • Adjust by via maxqueuelen setting in collector daemon config
  • Zenhub acts as an aggregator for events from collectors

  • zenhub's parent thread receives events from collectors
    • Events are validated; they must contain at least a device (which doesn't necessarily have to be added into RM for monitoring), a severity, and a message or else they're dropped
  • Event processing tasks are queued internally in zenhub's worklist
  • zenhub workers consume tasks from the worklist

    • SendEvents tasks are worked by publishing events to the zenoss.queues.zep.rawevents queue in RabbitMQ
  • Zeneventd consumes events from zenoss.queues.zep.rawevents

  • Maps events from /Unknown to appropriate class based on mappings

    • Event enrichment
    • Provides 'device' and 'component' contexts in transforms
    • Populates device data in events
      • ProductionState
      • Groups/Systems/Locations
      • Device Priority
      • etc
    • Applies transform code based on event class membership
    • Executes zeneventd post-plugins
    • Publishes contextualized events to zenoss.queues.zep.zenevents
  • Zeneventserver consumes events from zenoss.queues.zep.zenevents

  • Serves as the heart of the event processing engine and has multiple roles

    • Stores events in the zenoss_zep database
    • MariaDB-Events container
    • Indexes stored events in Lucene indexes
    • $ZENHOME/var/zeneventserver
    • Handles event aging and archiving
    • Provides the back-end for the event API
    • Serves the event console and archive UI elements
    • Serves the event 'rainbow' functionality in the UI
    • Maintains a list of ping-down devices which zenhub uses when it builds configs
    • Handles heartbeat monitoring
    • Evaluates triggers and publishes to zenoss.queues.zep.signal when a match is found
  • Zenactiond consumes events from zenoss.queues.zep.signal queue

    • Processes notifications, which can:
      • Email a user
      • Run a command to perform corrective action
      • Send a syslog message
      • Send an SNMP trap
      • Generate or update a support ticket
      • Integrate with other custom solutions
      • ServiceNow
      • RemedyITSM
      • etc

Troubleshooting event flow

Identifying Bottlenecks

If you suspect you're having a problem with the event pipeline, check rabbitmq first:

  • rabbitmq public endpoint
  • rabbitmqctl in the rabbitmq container
    • rabbitmqctl list_queues -p /zenoss
    • rabbitmqctl list_queues -p /zenoss messages consumers name

Rawevents Troubleshooting

Backups in the rawevents queue are typically:

  • An event flood
  • Look at collector performance graphs (event queues graph) to identify the source of the flood
  • Look in the event console for new syslog or snmp trap messages that are rapidly incrementing in count
  • As a last-ditch effort you can turn off zensyslog/zentrap temporarily to stop a flood

  • Slowness in zeneventd

  • Look in the zeneventd logs for long-running transform messages
  • You may need to optimize your event transforms
  • You may need more zeneventd workers or instances

  • Throttling in RabbitMQ (really only an issue in 4.x)

  • Make sure the RabbitMQ container has at least 1GB of memory available
  • Make sure that /var in the RabbitMQ container has at least 1GB of disk space available

Zenevents Troubleshooting

Backups in the zenevents queue are typically

  • An event flood or RabbitMQ throttling
  • Use the steps in the Rawevents Troubleshooting section above

  • A problem in zeneventserver

  • Check the zeneventserver log
  • If there's an error coming from the database, it will normally show up as a jdbc exception here
    • MariaDB/Mysql Error codes are often easy to diagnose with google-fu
  • If there's an error indexing events, it may be necessary to rebuild the Lucene indexes

    • Non-graceful shutdowns and other causes of data corruption can lead to index corruption, which may require rebuilding the zep Lucene indexes
  • Slowness in Mariadb-Events

  • Check database tuning

    • Make sure your innodb_buffer_pool_size is adequate for the size of your zep db
    • Make sure there's not a cpu/memory/disk IO constraint
    • Use zencheckdbstats to identify tuning issues
  • Slowness in zeneventserver

  • Do you have more than 200 triggers enabled?
    • You may need to increase your trigger cache size in zeneventserver.conf
    • Does zeneventserver need more memory?
    • You can increase the heapsize by increasing the RAM commitment

Signal Troubleshooting

Backups in the signal queue are typically

  • An event flood
  • Use previously-defined steps for troubleshooting floods

  • An external failure

  • Put zenactiond in debug mode
  • Check the zenactiond log

    • Slowness or failures to send notifications will show up here
  • Too many notifications for your zenactiond workers to keep up

  • Add more workers to zenactiond and restart it, or add more instances of zenactiond