Skip to content

Monitoring using ZenCommand

Resource Manager has the ability to run Nagios® and Cacti plug-ins though the ZenCommand process. ZenCommand can run plugins locally and remotely by using a native SSH transport. When run, the system tracks the return code of each plug-in and then creates events with plug-in output. Additionally, it can track performance information from a plug-in.

Plugin format for ZenCommands

Nagios® plugins are configured by using a command template. A template named "Device" will bind to all devices below the template definition. Within each template is a list of commands that will run. The commands can be any program that follows the Nagios® plug-in standard. Inputs are command line arguments; output is the first line of stdout, plus a return code.

Note: Resource Manager return codes differ from Nagios® return codes, as follows:

Value Resource Manager Nagios
0 Clear OK
1 Data Source WARNING
2 Data Source+1 CRITICAL
3 Data Source UNKNOWN

For comprehensive information about Nagios® plugins, refer to the Nagios® Plugins Development Guidelines.

A Nagios® command has several fields:

  • name - Specifies the name of the command object.

  • enabled - Indicates whether this command should be used on a given device.

  • component - Specifies the component name to use when zencommand sends events to the system.

  • event class - Specifies the event class to use when sending events to the system.

  • severity - Sets the default severity to use when sending events to the system.

  • cycle time - Sets the frequency a command should be run (in seconds).

  • command template - Specifies the command to run.

    The command template string is built by using Zope TALES expressions. Several variables are passed when evaluating the template.They are:

  • zCommandPath - Path to the zencommand plug-ins on a given box it comes from the configuration property zCommandPath. zCommandPath is automatically added to a command if a path is absent from the beginning of the command.

  • devname - Device name of the device against which the command is being evaluated.

  • dev - Device object against which the command is being evaluated.

  • here - Context of evaluation. For a device, this is equivalent to dev for a component (such as a file system or interface).This is the component object.

  • compname - If this command evaluates against a component, specifies its name as a string.

  • now - Current time.

    Template values are accessed like shell variables.

Testing ZenCommands

Use the zentestcommand script to test ZenCommand data sources.

  1. Log in to the Control Center master host as a user with serviced CLI privileges.

  2. Start an interactive session in the zenhub container as the zenoss user.

    serviced service attach zenhub su - zenoss
    
  3. Start the zentestcommand script. Replace DeviceID with the ID of the device to perform the command, and replace DataSourceName with the name of a template data source associated with the device.

    zentestcommand –d DeviceID --datasource=DataSourceName
    

    The zentestcommand script prints the results of the command to standard output.

  4. Exit the container.

    exit
    

Monitoring devices via SSH

You can monitor devices remotely through SSH. Follow the steps in the following sections to set up remote monitoring.

Changing Resource Manager to monitor devices remotely using SSH

You must edit system properties for the group where you want to collect remote information using SSH.

  1. Navigate to the device class path that you want to monitor remotely. You can apply this monitoring for a device or a device class path.
  2. Change the configuration properties value for the group. After selecting the device class, click Details, and then select Configuration Properties.
  3. On the Configuration Properties page, change the properties that are listed in the following table.

    The table includes sample values set up for remote devices. These have a pre-shared key (with no password) set up from the collector to the remote boxes. It also can use password authorization if the password is entered into zCommandPassword.

    Configuration properties Sample value
    zCollectorPlugins snmp|portscan
    zCommandPassword The SSH password for the remote machine
    zCommandPath The path to zenplugin.py
    zCommandUsername The SSH user name for the remote machine
    zSnmpMonitorIgnore True
  4. Two passes are required for full modeling. The first pass obtains the platform type (so that the system knows which plugins to run). The second pass provides detailed data on interfaces and file systems.

    1. Log in to the Control Center master host as a user with serviced CLI privileges.

    2. Display the list of zenmodeler services.

      serviced service list zenmodeler
      

      On a system with multiple collectors, the result is similar to the following example:

      Name            ServiceID                      DepID/Path
      zenmodeler      7itut0ryz759ua77ntrm3hi8w      1/Zenoss.resmgr/Zenoss/Collection/localhost/localhost/zenmodeler
      zenmodeler      e3bpfy6j6pyl8l346xq446myk      1/Zenoss.resmgr/Zenoss/Collection/localhost/collectorPool2/zenmodeler
      zenmodeler      7dnmgcwexlqxjqko6nja0942y      1/Zenoss.resmgr/Zenoss/Collection/localhost/collectorPool3/zenmodeler
      
    3. Select the zenmodeler service that is associated with the device to model, and then attach to it as the zenoss user. Replace ServiceID with the container ID of a zenmodeler service. For example, 7itut0ryz759ua77ntrm3hi8w.

      serviced service attach ServiceID su - zenoss
      
    4. Run the zenmodeler command. Replace DeviceName with the fully qualified device name.

      zenmodeler run -d DeviceName
      
    5. Repeat the zenmodeler command to employ the plugins the command gathered on the first pass.

Using the predefined /Server/Cmd device class

The /Server/Cmd device class is an example configuration for modeling and monitoring devices using SSH. The configuration properties have been modified (as described in the previous sections), and device, file system, and Ethernet interface templates that gather data over SSH have been created.

You can use this device class as a reference for your own configuration; or, if you have a device that needs to be modeled or monitored via SSH/Command, you can place it under this device class to use the pre-configured templates and configuration properties. You must set the zCommandUsername and zCommandPassword properties to the appropriate SSH login information for each device.

How to create a command data source

The following procedure makes some assumptions:

  • You are monitoring a Linux device in the /Server/SSH/Linux device class.
  • The monitoring user account specified in zCommandUsername is zenmonitor.
  • The chosen script called by the command data source (see below) is already in place on the target device in /home/zenmonitor/scripts/.

Procedure

  1. Create a new monitoring template.

    • In the Collection Zone UI, navigate to ADVANCED > MONITORING TEMPLATES.
    • Click the Add a monitoring template ("plus") button in the lower left corner.
    • Name the template TmpDirectoryMonitor.
    • Select /Server/SSH/Linux from the Path dropdown.
    • Click Submit.
  2. Add a data source to the template.

    • Click the Add data source ("plus") button in the top left.
    • Name the data source tmpcheck.
    • Select COMMAND from the Type dropdown.
    • Click Submit.
  3. Configure the data source.

    • Double-click the data source.
    • In the Command Template field, provide the full path to one of the example scripts:
      • /home/zenmonitor/scripts/example_cacti.sh
      • /home/zenmonitor/scripts/example_json.sh
      • /home/zenmonitor/scripts/example_nagios.sh
    • From the Parser dropdown, choose the parser that matches the chosen script (Cacti, JSON, or Nagios).
    • Check the Use SSH box.
    • Click Save.
  4. Add data points to the data source.

    • From the data source "action wheel," choose Add Data Point.
    • Name the data point tmpCount.
    • Click Submit.
    • Repeat the preceding steps for the tmpSize data point.
  5. Create graph definitions for each data point.

    • Click the Add graph definition ("plus") button on the lower right.
    • Name the graph Size and click Submit.
    • From the graph definition "action wheel," choose Manage Graph Points.
    • On the Manage Graph Points dialog, click the "plus" button and choose Data Point.
    • From the Data Point dropdown, choose tmpcheck.tmpCount, and click Submit.
    • Click Save.
    • Double-click the Count graph definition.
    • In the View and Edit Graph Definition dialog, set the following:
      • Units: Files
      • Min Y: 0
    • Click Submit.
    • Repeat the preceding steps to create the Size graph.
  6. Test the template against a single device.

    • From INFRASTRUCTURE > DEVICES, navigate to the chosen test device.
    • From the "action wheel" in the lower left, choose Bind Templates.
    • From the AVAILABLE column, select the TempDirectoryMonitor template.
    • Move the template to the SELECTED column by either:
      • double-clicking the template name
      • clicking the template name, then clicking the Add to Selected ("right arrow") button.
    • From the Monitoring button at the bottom of the device page, choose either Collect Device data or Collect Device data (Debug).
    • Check the Graphs page for the device to confirm metrics are being saved.

Scripts

The scripts below are designed to check the /tmp directory for size and the number of files it contains. They then return both values in one of the general purpose parser formats supported by command data sources.

Note

The following scripts are intended only to demonstrate the output formats recognized by each parser. They perform no "sanity checking," are not intended for production use, and may not work in your environment.

example_cacti.sh
#!/bin/bash

# Get the size and file count of /tmp
# and set them as perf variables.

TMPSIZE=$(du -s /tmp | awk '{ print $1 }')
TMPCOUNT=$(ls /tmp | wc -l)

# Substitute the variables into the output string.
# The Cacti output format looks like
#
# <key>:<value>
#
# For a command data source with data points datapoint1 and datapoint2
# the output would look like
#
# datapoint1:1234 datapoint2:9876
#

OUTPUT_STRING="tmpSize:$TMPSIZE tmpCount:$TMPCOUNT"

# Return the output string.

echo $OUTPUT_STRING
example_json.sh
#!/bin/bash

# Get the size and file count of /tmp
# and set them as perf variables.
# Get the 5 largest files and
# set them as an event variable.

TMPSIZE=$(du -s /tmp | awk '{ print $1 }')
TMPCOUNT=$(ls /tmp | wc -l)
TMPFILES=$(du -a /tmp/ | sort -n -r | head -n 6 | tail -n 5)

# Substitute the variables into the output string.
# The JSON parser expects a specific format of JSON
# and cannot accept/interpret an arbitrary payload.
# With a payload of only datapoint1 and datapoint2,
# the output would look like:
#
# { "values": { "": { "datapoint1": 1234, "datapoint2": 9876 } }, "events": [] }
#
# The "values":{} section can contain data points for
# multiple components:
#
# { "values": { "component1": { "datapoint1": 1234, }, \
# "component2": {"datapoint1": 5678} }, "events": [] }
#
# The use of the empty string "" indicates values for the
# device itself and is required for device-level data points.

OUTPUT_STRING="
{    
  \"values\": {
     \"\": {
        \"tmpSize\": $TMPSIZE,
        \"tmpCount\": $TMPCOUNT
      }
    },
    \"events\": [
        {
            \"severity\": 2,
            \"summary\": \"The largest files in /tmp are $TMPFILES\",
            \"eventKey\": \"datapoint1_errors\",
            \"eventClass\": \"/Capacity/Storage\"
        }
    ]
}
"

# Return the output string.

echo $OUTPUT_STRING
example_nagios.sh
#!/bin/bash

# Get the size and file count of /tmp
# and set them as perf variables.

TMPSIZE=$(du -s /tmp | awk '{ print $1 }')
TMPCOUNT=$(ls /tmp | wc -l)

# Create the parts of the output string
# and set them as variables.
# The Nagios output format looks like
#
# Some string of text | <'key'>=<value>, <'key'>=<value>
#
# For a command data source with data points datapoint1 and datapoint2
# the output would look like
#
# Data points 1 and 2 collected | 'datapoint1'=1234, 'datapoint2'=9876
#

OUTPUT_STRING="/tmp check: Size = $TMPSIZE bytes, File Count = $TMPCOUNT"
OUTPUT_SEPARATOR=" | "
OUTPUT_PERFDATA="'tmpSize'=$TMPSIZE, 'tmpCount'=$TMPCOUNT"

# Assemble the parts of the output string.

echo $OUTPUT_STRING$OUTPUT_SEPARATOR$OUTPUT_PERFDATA