Event pipeline
Event Pipeline Overview Video
Event Processing Explained
- Events are generated by collector daemons
- Passive collection is performed by daemons like zentrap, zensyslog, etc.
Active collection is performed by daemons like zenpython, zencommand, etc.
Queued events are sent to ZenHub
If zenhub is not available to receive queued events, they will be held internally
- Collector daemons keep an internal event queue
- Not persistent
- Defaults to 5000
- Rotating buffer; holds only the 5000 most recent events
- Adjust by via maxqueuelen setting in collector daemon config
Zenhub acts as an aggregator for events from collectors
- zenhub's parent thread receives events from collectors
- Events are validated; they must contain at least a device (which doesn't necessarily have to be added into RM for monitoring), a severity, and a message or else they're dropped
- Event processing tasks are queued internally in zenhub's worklist
zenhub workers consume tasks from the worklist
- SendEvents tasks are worked by publishing events to the zenoss.queues.zep.rawevents queue in RabbitMQ
Zeneventd consumes events from zenoss.queues.zep.rawevents
Maps events from /Unknown to appropriate class based on mappings
- Event enrichment
- Provides 'device' and 'component' contexts in transforms
- Populates device data in events
- ProductionState
- Groups/Systems/Locations
- Device Priority
- etc
- Applies transform code based on event class membership
- Executes zeneventd post-plugins
- Publishes contextualized events to zenoss.queues.zep.zenevents
Zeneventserver consumes events from zenoss.queues.zep.zenevents
Serves as the heart of the event processing engine and has multiple roles
- Stores events in the zenoss_zep database
- MariaDB-Events container
- Indexes stored events in Lucene indexes
- $ZENHOME/var/zeneventserver
- Handles event aging and archiving
- Provides the back-end for the event API
- Serves the event console and archive UI elements
- Serves the event 'rainbow' functionality in the UI
- Maintains a list of ping-down devices which zenhub uses when it builds configs
- Handles heartbeat monitoring
- Evaluates triggers and publishes to zenoss.queues.zep.signal when a match is found
Zenactiond consumes events from zenoss.queues.zep.signal queue
- Processes notifications, which can:
- Email a user
- Run a command to perform corrective action
- Send a syslog message
- Send an SNMP trap
- Generate or update a support ticket
- Integrate with other custom solutions
- ServiceNow
- RemedyITSM
- etc
- Processes notifications, which can:
Troubleshooting event flow
Identifying Bottlenecks
If you suspect you're having a problem with the event pipeline, check rabbitmq first:
- rabbitmq public endpoint
in the rabbitmq containerrabbitmqctl list_queues -p /zenoss
rabbitmqctl list_queues -p /zenoss messages consumers name
Rawevents Troubleshooting
Backups in the rawevents queue are typically:
- An event flood
- Look at collector performance graphs (event queues graph) to identify the source of the flood
- Look in the event console for new syslog or snmp trap messages that are rapidly incrementing in count
As a last-ditch effort you can turn off zensyslog/zentrap temporarily to stop a flood
Slowness in zeneventd
- Look in the zeneventd logs for long-running transform messages
- You may need to optimize your event transforms
You may need more zeneventd workers or instances
Throttling in RabbitMQ (really only an issue in 4.x)
- Make sure the RabbitMQ container has at least 1GB of memory available
- Make sure that /var in the RabbitMQ container has at least 1GB of disk space available
Zenevents Troubleshooting
Backups in the zenevents queue are typically
- An event flood or RabbitMQ throttling
Use the steps in the Rawevents Troubleshooting section above
A problem in zeneventserver
- Check the zeneventserver log
- If there's an error coming from the database, it will normally show up as
a jdbc exception here
- MariaDB/Mysql Error codes are often easy to diagnose with google-fu
If there's an error indexing events, it may be necessary to rebuild the Lucene indexes
- Non-graceful shutdowns and other causes of data corruption can lead to index corruption, which may require rebuilding the zep Lucene indexes
Slowness in Mariadb-Events
Check database tuning
- Make sure your innodb_buffer_pool_size is adequate for the size of your zep db
- Make sure there's not a cpu/memory/disk IO constraint
- Use zencheckdbstats to identify tuning issues
Slowness in zeneventserver
- Do you have more than 200 triggers enabled?
- You may need to increase your trigger cache size in zeneventserver.conf
- Does zeneventserver need more memory?
- You can increase the heapsize by increasing the RAM commitment
Signal Troubleshooting
Backups in the signal queue are typically
- An event flood
Use previously-defined steps for troubleshooting floods
An external failure
- Put zenactiond in debug mode
Check the zenactiond log
- Slowness or failures to send notifications will show up here
Too many notifications for your zenactiond workers to keep up
- Add more workers to zenactiond and restart it, or add more instances of zenactiond