Resource Manager Monitoring

Resource Manager provides self-monitoring capabilities. A visual pipeline-based status page in the browser interface shows an overview of the monitored Resource Manager instance, including key metrics for nodes in the metric collection, event generation, and modeling processes.

The following figure shows the Resource Manager status overview page.

Resource Manager monitoring listens to events in the event class /ZenossRM. Each pipeline includes nodes and aggregate data for the associated components. Events on monitored components are used to flag a data pipeline node for your investigation. When a threshold violation occurs, the node changes from green to flashing gray, as shown by the metricshipper node in the figure. To open the detail page for the associated component, click a node in a pipeline.

Use data to troubleshoot performance issues with Resource Manager components and proactively manage performance. For example, data can help you determine when a component or service in the system is overloaded. Before issues occur, optimize system performance by adding more instances, more memory, more CPU, or adjusting configuration settings.

The Resource Manager installation or upgrade process creates device class /ZenossRM, and creates and models the local Resource Manager device with ID 127.0.0.1. Locally stored metrics for the monitored device provide detailed performance tracking for Resource Manager components.

You can add multiple Resource Manager instances as devices. Monitor the health and performance of multiple Resource Manager and Control Center instances from the same or another Resource Manager instance. You can monitor remote Resource Manager instances on the same Control Center and remote Resource Manager instances that are on a different Control Center.

Resource Manager self-monitoring

As above, the local Resource Manager should appear in the /ZenossRM device class as 127.0.0.1. No manual configuration should be required.

Resource Manager remote monitoring

To monitor a remote Resource Manager instance:

From the Infrastructure > Devices page, click the "Add Devices" button and choose "Add a Single Device."
Provide a device name that does not resolve in DNS. This will prevent IP conflicts if you are already monitoring the underlying Control Center or Linux device.
Confirm the device class ("/ZenossRM") and the appropriate collector.
Uncheck "model device," and click "Add."
When the device lists in the Infrastructure view, click its name to be taken to the device overview page.
Click "Configuration Properties" and populate the zProperties listed below with their appropriate values.
Click the "Modeling" button at the bottom of the page, and choose "Model Device."
Monitoring should commence automatically.

Resource Manager monitoring Configuration Properties

Configuration Property	Description
zRMMonTenantHost	the vhost of the Resource Manager to monitor (zenoss5.yourrm.loc)
zRMMonTenantUser	the Resource Manager username
zRMMonTenantPassword	the Resource Manager password
zRMMonCCHost	the hostname or IP of the Control Center instance where the RM to monitor is running
zRMMonCCUser	the Control Center username
zRMMonCCPassword	the Control Center password

ZenossRM components

Components of the /ZenossRM device are automatically discovered. Attributes are updated on the normal remodeling interval, which defaults to 12 hours.

You can add custom events and custom thresholds, and edit shipped thresholds for components that have them.

Shipped threshold values are subject to change in future releases. If you edit a value, your change will be overwritten by future updates.

Collectors

The aggregation of collector daemons and their supporting services grouped as localhost.

CollectorDaemons

The aggregation of statistics broken down by individual daemon service; for example, zenpython and zencommand.

Durable Queues

Persistent RabbitMQ message queues.

MetricConsumer

Pulls data from the Redis queue and passes it to MetricShipper.

This component ships with a default threshold. The maximum number of seconds that MetricConsumer needs to process its internal queue at the current rate is currently 300. If this value is exceeded, the MetricConsumer node on the metric pipeline turns gray and flashes.

Shipped threshold values are subject to change in future releases. If you edit a value, your change will be overwritten by future updates.

MetricShipper

Inserts metrics into OpenTSDBWriter.

This component ships with a default threshold. The maximum number of seconds MetricShipper needs to process its Redis queue at the current rate is 300. If this value is exceeded, the MetricShipper node on the metric pipeline turns gray and flashes.

Shipped threshold values are subject to change in future releases. If you edit a value, your change will be overwritten by future updates.

OpenTSDBReader

Allows queries to metric storage.

OpenTSDBWriter

Performs writes to metric storage.

Redis

Used as a cache for device configuration and metrics data.

RegionServer

OpenTSDBWriter uses this component for metric storage.

Solr

Provides the index of modeled devices and system objects.

ZODB

Object database that stores the model.

ZenActionD

Watches the event stream and sends configured notifications.

ZenEventD

Filters, enhances, and transforms events.

This component ships with a default threshold. The maximum number of seconds ZenEventD needs to process the rawevents queue at the current rate is 300. If this value is exceeded, the ZenEventD node on the metric pipeline turns gray and flashes.

Shipped threshold values are subject to change in future releases. If you edit a value, your change will be overwritten by future updates.

ZenEventServer

Performs event processing, storage, and retrieval.

This component ships with a default threshold. The maximum number of seconds ZenEventServer needs to process the zenevents queue at the current rate is 300. If this value is exceeded, the ZenEventServer node on the metric pipeline turns gray and flashes.

Shipped threshold values are subject to change in future releases. If you edit a value, your change will be overwritten by future updates.

ZenHub

The central coordinator of event generation and configuration delivery to daemons.

ZenModeler

Periodically scans modeled devices and saves attribute changes to the model.

Zope

Web server processes that serve the browser interface, reports, Zenoss JSON API requests, and debugging.

Customizing Resource Manager monitoring

Customize monitoring of Resource Manager components to fit your environment.

Add and model one or more Resource Manager instances as devices. For each device, specify the device class /ZenossRM, and set configuration properties that have prefix zRMM.
To determine base-level performance for the Resource Manager instance, study component graphs after a week, a month, and so on. If a metric for a component captures data that indicates a potential problem, take the following actions:
- To the template for the affected component, add thresholds for the data point. - Create a threshold to send a warning level event that you can investigate. - Create another threshold to send an error level event to indicate a more serious problem.
- Create events with event class /ZenossRM for the threshold violations.
- As you collect more data, make adjustments to thresholds and events as needed.