Projecting other aggregates

You can project any numeric metric into the future by defining the number of days of past data to analyze along with the number of days into the future you want to predict. For example, you want to predict the next 30 days of an nth percentile calculation based on the last 60 days of data. This prediction returns a value for every day for the next 30 days. In order to accomplish this, a table (projection_config) is defined in the reporting database for the projection configuration of the future values.

Projections are performed using a polynomial function, which is beneficial when data follows a curved pattern toward a maximum or minimum value, such as percentage of disk storage used over time.

The projection_config table has the following form:

name - Name to identify the projection; e.g., bangalore_60_30, austin_95th.
description - Up to 255 characters to describe the projection.
nth_percentile_config_id - ID of the row in the nth_percentile_config table on which to base a projection (only used for nth percentile projections.
days_past - Integer indicating the number of past days to use in making the calculation.
days_future - Integer indicating the number of days into the future to make projections.
agg_column_name - Aggregation column name.

Projections run each day at the completion of aggregation and n th percentile processing. Resultant tables from projection computations have the same structure as aggregation tables with the addition of a column with the timestamp of when the projection was performed. For each aggregate table to be projected, the name of the table from which to read data is constructed as follows:

daily_<business_hours_config.name, if any>_<meta_metric.metric_name>

The name of the table to which to write data is constructed as follows:

proj_<projection_config.name><business_hours_config.name, if any><meta_metric.metric_name>

Each projection row may have one or more associated job filters and metric filters. A job filter reduces the device sample set by allowing you to create a device group and assigning devices to that group. Several job filters may be defined for a given projection. Each filter is additive and increases the number of the devices included in the sample set. If no job filters are defined, then all devices are included in the sample set.

In addition to the group filters, you may also define and associate metric filters which restrict the metrics processed in the projection. If no metric filters are defined, all metrics are processed. Each metric processed is run in a separate thread. The projection calculations use data from the raw_v2 tables. To accommodate the maximum flexibility in which data can be used, the metric name in this context is the name of the daily table without the preceding " daily_ ". For normal aggregated metrics, this equates to the metric name (e.g., cpu__pct). For specially aggregated (computed) data, such as business hours or n th percentile, the metric name includes the computed name (such as austin_cpu__pct).

Inserting records for daily aggregation projections

Perform the following steps to insert records into the reporting database. The code in this procedure is for demonstration purposes only. This example assumes that two data centers (Austin and Bangalore) already defined and the metrics you want to filter on are called analytics_test1 and analytics_test2.

Log in to the Analytics reporting database as the root user or a user with administrative permissions.
```
mysql -u root reporting
```

Enter the following command at the prompt substituting values that apply to your company's situation:

insert into projection_config
   (name, days_past, days_future, agg_column_name)
values
   ('austin_60_30', 60, 30, 'fct_avg'),
   ('bangalore_60_30', 60, 30, 'fct_avg'),

insert into projection_business_hours_assoc
   (projection_config_id, business_hours_config_id)
values
   (
      (select id from projection_config where name='austin_60_30'),
      (select id from business_hours_config where name='austin')
   ),
   (
      (select id from projection_config where name='bangalore_60_30'),
      (select id from business_hours_config where name='bangalore')
   );

insert into projection_group_assoc
   (projection_config_id, group_filter_id)
values
   (
      (select id from projection_config where name='austin_60_30'),
      (select id from group_filter where name='austin')
   ),
   (
      (select id from projection_config where name='bangalore_60_30'),
      (select id from group_filter where name='bangalore')
   );

insert into projection_metric_assoc
   (projection_config_id, metric_key)
values
   (
      (select id from projection_config where name='austin_60_30'),
      (select metric_key from meta_metric where metric_name='analytics_test1')
   ),
   (
      (select id from projection_config where name='austin_60_30'),
      (select metric_key from meta_metric where metric_name='analytics_test2')
   ),
   (
      (select id from projection_config where name='bangalore_60_30'),
      (select metric_key from meta_metric where metric_name='analytics_test1')
   ),
   (
      (select id from projection_config where name='bangalore_60_30'),
      (select metric_key from meta_metric where metric_name='analytics_test2')
   );

Inserting records for nth percentile projections

Similar to the previous section, you can insert records that allow n th percentile calculations. The following is a sample insertion to the projection_config table for a 95th percentile projection:

insert into projection_config
   (name, nth_percentile_config_id, days_past, days_future, agg_column_name)
values
   ('austin_95th', 1, 60, 30, 'fct_percentile')

where 1 is the ID of the row in the nth_percentile_config table where the Austin 95th percentile calculation is defined.