NAME

orca - Make HTML & PNG plots of daily, weekly, monthly & yearly data


SYNOPSIS

  orca [-gifs] [-o] [-r] [-v [-v [-v]]] configuration_file


DESCRIPTION

Orca is a tool useful for plotting arbitrary data from text files onto a directory on Web server. It has the following features:

  * Configuration file based.
  * Reads white space separated data files.
  * Watches data files for updates and sleeps between reads.
  * Finds new files at specified times.
  * Remembers the last modification times for files so they do not have to
    be reread continuously.
  * Can plot the same type of data from different files into different
    or the same PNGs.
  * Different plots can be created based on the filename.
  * Parses the date from the text files.
  * Create arbitrary plots of data from different columns.
  * Ignore columns or use the same column in many plots.
  * Add or remove columns from plots without having to deleting RRDs.
  * Plot the results of arbitrary Perl expressions, including mathematical
    ones, using one or more columns.
  * Group multiple columns into a single plot using regular expressions on
    the column titles.
  * Creates an HTML tree of HTML files and PNG plots.
  * Creates an index of URL links listing all available targets.
  * Creates an index of URL links listing all different plot types.
  * No separate CGI set up required.
  * Can be run under cron or it can sleep itself waiting for file updates
    based on when the file was last updated.

Orca is similar to but substantially different from other tools that record and display hourly, daily, monthly, and yearly data, such as MRTG and Cricket. To see these other tools, examine

  http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html

and

  http://www.munitions.com/~jra/cricket/


EXAMPLES

A static example of Orca is at

  http://www.orcaware.com/orca/orca-example/

Please inform me of any other sites using Orca and I will include them here.


COMMAND LINE OPTIONS

Orca has only four command line options. They are:

-gifs: Generate GIFs instead of PNGs. Tell Orca to generate GIFs instead of PNGs. You may not want to generate GIFs since PNGs are 1/3 the size of GIFs and take less time to generate.

-o: Once. This tells Orca to go through the steps of finding files, updating the RRDs, updating the PNGs, and creating the HTML files once. Normally, Orca loops continuously looking for new and updated files.

-r: RRD only. Have Orca only update its RRD files. Do not generate any HTML or PNG files. This is useful if you are loading in a large amount of data in several invocations of Orca and do not want to create the HTML and PNG files in each run since it is time consuming.

-v: Verbose. Have Orca spit out more verbose messages. As you add more -v's to the command line, more messages are sent out. Any more than three -v's are not used by Orca.

After the command line options are listed, Orca takes one more argument which is the name of the configuration file to use. Sample configuration files can be found in the sample_configs directory with the distribution of this tool.


RECOGNIZED SIGNALS

Orca, when it received the HUP signal, will look for new source data files the next time it runs through the main loop. If you have a constantly running Orca, this is a simpler and faster solution than restarting Orca, which takes time to reread all the source files.


ARCHITECTURE ISSUES

Because Orca is extremely IO intensive, I recommend that the host that locally mounts the RRD data files be the same machine that runs Orca. In addition, the HTML and image files that Orca creates also require a good amount of IO. The machine running Orca should always have the rrd_dir directory locally mounted. It is more important this rrd_dir be locally stored than html_dir for performance concerns. The two options html_dir and rrd_dir are described in more detail below.


INSTALLATION AND CONFIGURATION

The first step in using Orca is to set up a configuration file that instructs Orca on what to do. The configuration file is based on a key/value pair structure. The key name must start at the beginning of a line. Lines that begin with whitespace are concatenated onto the last key's value.

There are three main groups of options in a Orca confg: general options, file specific options, and plot specific options. General options may be used by the file and plot specific options. If an option is required, then it is only placed one time into the configuration file.

General options break down into two main groups, required and options. These are the required options:

Required General Options

state_file filename
For Orca to work efficiently, it saves the last modification time of all input data files and the Unix epoch time when they were last read by Orca into a state file. The value for state_file must be a valid, writable filename. If filename does not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to the filename.

Each entry for a data input file is roughly 100 bytes, so for small sites, this file will not be large.

html_dir directory
html_dir specifies the root directory for the main index.html and all underlying HTML and PNG files that Orca generates. This should not be a directory that normal users will edit. Ideally this directory should be on a disk locally attached to the host running Orca, but is not necessary.

If directory does not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to directory.

rrd_dir directory
rrd_dir specifies the root directory for the location of the RRD data files that Orca generates. For best performance, this directory should be on a disk locally attached to the host running Orca. Otherwise, the many IO operations that Orca performs will be greatly slowed down. It is more important this rrd_dir be locally stored than html_dir for performance concerns.

If directory does not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to directory.

If rrd_dir is not defined, then base_dir will be used as rrd_dir. Orca will quit with an error if both rrd_dir and base_dir are not set.

base_dir directory
If base_dir is set, then it is used to prepend to any file or directory based names that do not begin with /. These are currently state_file, html_dir, rrd_dir, and the find_files option in the group options.

Optional General Options

late_interval Perl expression
late_interval is used to calculate the time interval between a file's last modification time and the time when that file is considered to be late for an update. In this case, an email message may be sent out using the warn_email addresses. Because different input files may be updated at different rates, late_interval takes an arbitrary Perl expression, including mathematical expressions, as its argument. If the word interval occurs in the mathematical expression it is replaced with the sampling interval of the input data file in question.

This is useful for allowing the data files to update somewhat later than they would in an ideal world. For example, to add a 10% overhead to the sampling_interval before an input file is considered late, this would be used

  late_interval 1.1 * interval

By default, the input file's sampling interval is used as the late_interval.

warn_email email_address [email_address ...]
warn_email takes a list of email addresses of people to email when something goes wrong with either Orca or the input data files. Currently email messages are sent out the following circumstances:
  1) When a file did exist and now is gone.
  2) When a file was being updated regularly and then no longer is updated.

By default, nobody is emailed.

expire_images 1
If expire_images is set then .meta files will be created for all generated PNG files. If the Apache web server 1.3.2 or greater is being used, then the following modifications must added to srm.conf or httpd.conf.
  < 
  < #MetaDir .web
  ---
  >
  > MetaFiles on
  > MetaDir .
  < #MetaSuffix .meta
  ---
  > MetaSuffix .meta

By default, expiration of images is not enabled.

find_times hours:minutes [hours:minutes ...]
The find_times option is used to tell Orca when to go and find new files. This particularly useful when new input data files are created at midnight. In this case, something like
  find_times 0:10

would work.

By default, files are only searched for when Orca starts up.

html_top_title text ...
The text is placed at the top of the main index.html that Orca creates. By default, no addition text is placed at the top of the main index.html.

html_page_header text ...
The text is placed at the top of each HTML file that Orca creates. By default, no additional text is placed at the top of each HTML file.

html_page_footer text ...
The text is placed at the bottom of each HTML file that Orca creates. By default, no additional text is placed at the bottom of each HTML file.

sub_dir directory
In certain cases Orca will not create sub directories for the different groups of files that it processes. If you wish to force Orca to create sub directories, then do this
  sub_dir 1

Group Options

The next step in configuring Orca is telling where to find the files to use as input, a description of the columns of data comprising the file, the interval at which the file is updated, and where the measurement time is stored in the file. This is stored into a group.

A generic example of a group and its options are:

  group GROUP_NAME1 {
  find_files            filename1 filename2 ...
  column_description    column1_name column2_name ...
  date_source           file_mtime
  interval              300
  .
  .
  .
  }
  group GROUP_NAME2 {
  .
  .
  }

The key for a group, in this example GROUP_NAME1 and GROUP_NAME2, is a descriptive name that is unique for all files and is used later when the plots to create are defined. Files that share the same general format of column data may be grouped together. The options for a particular group must be enclosed in the curly brackets {}'s. An unlimited number of groups may be listed.

Required Group Options

find_files path|regexp [path|regexp ...]
The find_files option tells Orca what data files to use as its input. The arguments to find_files may be a simple filename, a complete path to a filename, or a regular expression to find files. The regular expression match is not the normal shell globing that the Bourne shell, C shell or other shells use. Rather, Orca uses the Perl regular expressions to find files. For example:
  find_files /data/source1 /data/source2

will have Orca use /data/source1 and /data/source2 as the inputs to Orca. This could have also been written as

  find_files /data/source\d

and both data files will be used.

In the two above examples, Orca will assume that both data files represent data from the same source. If this is not the case, such as source1 is data from one place and source2 is data from another place, then Orca needs to be told to treat the data from each file as distinct data sources. This be accomplished in two ways. The first is by creating another group { ... } set. However, this requires copying all of the text and makes maintenance of the configuration file complex. The second and recommend approach is to place ()'s around parts of the regular expression to tell Orca how to distinguish the two data files:

  find_files /data/(source\d)

This creates two groups, one named source1 and the other named source2 which will be plotted separately. One more example:

  find_files /data/solaris.*/(.*)/percol-\d{4}-\d{2}-\d{2}(?:\.(?:Z|gz|bz2))?

will use files of the form

  /data/solaris-2.6/olympia/percol-1998-12-01
  /data/solaris-2.6/olympia/percol-1998-12-02.Z
  /data/solaris-2.5.1/sunridge/percol-1998-12-01.gz
  /data/solaris-2.5.1/sunridge/percol-1998-12-02

and treat the files in the olympia and sunridge directories as distinct, but the files within each directory as from the same data source.

You'll notice that all but the first () has the form (?:...). This tells Perl to match the expression but not save the matched text in the $1, $2, variables. Orca uses the matched text to generate a subgroup name, which is used to place files into different subgroups. Here, only the hostname should be used to generate a subgroup name, hence all the (?:...) for matching anything else.

If any of the paths or regular expressions given to find_files do not begin with a / and the base_dir option was set, then the base_dir directory will be prepended to the path or regular expression.

interval seconds
The interval options takes the number of seconds between updates for the input data files listed in this group.

column_description column_name [column_name ...]
column_description first_line
For Orca to plot the data, it needs to be told what each column of data holds. This is accomplished by creating a text description for each column. There are two ways this may be loaded into Orca. If the input data files for a group do not change, then the column names can be listed after column_description:
  column_description date in_packets/s out_packets/s

Files that have a column description as the first line of the file may use the argument ``first_line'' to column_description:

  column_description first_line

This informs Orca that it should read the first line of all the input data files for the column description. Orca can handle different files in the same group that have different number of columns and column descriptions. The only limitation here is that column descriptions are white space separated and therefore, no spaces are allowed in the column descriptions.

date_source column_name column_name
date_source file_mtime
The date_source option tells Orca where time and date of the measurement is located. The first form of the date_source options lists the column name as given to column_description that contains the Unix epoch time. The second form with the file_mtime argument tells Orca that the date and time for any new data in the file is the last modification time of the file.

date_format string
The date_format option is only required if the column_name argument to date_source is used. Current, this argument is not used by Orca.

Optional Group Options

filename_compare Perl subroutine
The filename_compare option is used to sort the found filenames in a particular group. This function must be written as though it were being passed to the Perl sort() function, which takes the two items to compare in the package global $a and $b variables instead of the @_ array.

Use of this option has an additional effect on letting Orca know when it can flush data to the RRD files. It determines this when it compares the previously loaded filename to the filename about to be loaded using the filename_compare function. If the result of the comparison is greater than 1, then the data is flushed. If the comparison is equal to or less than 1, then the data is not flushed. Orca uses a value of 1 instead of 0 since there are cases when the filenames should still be ordered but not flushed.

For example, the orcallator.cfg file uses the following subroutine for filenames of the form ``orcallator-2000-02-14'':

  sub {
    my ($ay, $am, $ad) = $a =~ /-(\d{4})-(\d\d)-(\d\d)/;
    my ($by, $bm, $bd) = $b =~ /-(\d{4})-(\d\d)-(\d\d)/;
    if (my $c = (( $ay       <=>  $by) ||
                 ( $am       <=>  $bm) ||
                 (($ad >> 3) <=> ($bd >> 3)))) {
      return 2*$c;
    }
    $ad <=> $bd;
  }

When Orca is about to load a new data file it compares the new filename with the previous name. Using this function, if the year, or month is different, then data gets flushed. If these two are equal but the day divided by 8 is different, then the data gets flushed. So loading orcallator-2000-02-14 followed by orcallator-2000-02-15 will not cause a flush but when orcallator-2000-02-16 is about to be loaded, previously loaded data will be flushed.

If the filename_compare option is not used, then the filenames are sorted using the Perl <=> operator and data is not flushed until all of it is loaded.

reopen 1
Using the reopen option for a group instructs Orca to close and reopen any input data files when there is new data to be read. This is of most use when an input data file is erased and rewritten by some other process.

Plot Options

The final step is to tell Orca what plots to create and how to create them. The general format for creating a plot is:

  plot {
  title         Plot title
  source        GROUP_NAME1
  data          column_name1
  data          1024 * column_name2 + column_name3
  legend        First column
  legend        Some math
  y_legend      Counts/sec
  data_min      0
  data_max      100
  .
  .
  }

Unlike the group, there is no key for generating a plot. An unlimited number of plots can be created.

Some of the plot options if they have the two characters %g or %G will perform a substitution of this substring with the group name from the find_files ()'s matching. %g gets replaced with the exact match from () and %G gets replaced with the first character capitalized. For example, if

  find_files /(olympia)/data

was used to locate a file, then %g will be replaced with olympia and %G replaced with Olympia. This substitution is performed on the title and legend plot options.

Required Plot Options

source group_name
The source argument should be a single group name from which data will be plotted. Currently, only data from a single group may be put into a single plot.

data Perl expression
data regular expression
The data plot option tells Orca the data sources to use to place in a single PNG plot. At least one data option is required for a particular plot and as many as needed may be placed into a single plot.

Two forms of arguments to data are allowed. The first form allows arbitrary Perl expressions, including mathematical expressions, that result in a number as a data source to plot. The expression may contain the names of the columns as found in the group given to the source option. The column names must be separated with white space from any other characters in the expression. For example, if you have number of bytes per second input and output and you want to plot the total number of bits per second, you could do this:

  plot {
  source        bytes_per_second
  data          8 * ( in_bytes_per_second + out_bytes_per_second )
  }

The second form allows for matching column names that match a regular expression and plotting all of those columns that match the regular expression in a single plot. To tell Orca that a regular expression is being used, then only a single non whitespace separated argument to data is allowed. In addition, the argument must contain at least one set of parentheses ()'s. When a regular expression matches a column name, the portion of the match in the ()'s is placed into the normal Perl $1, $2, etc variables. Take the following configuration for example:

  group throughput {
  find_files /data/solaris.*/(.*)/percol-\d{4}-\d{2}-\d{2}
  column_description hme0Ipkt/s hme0Opkt/s
                     hme1Ipkt/s hme1Opkt/s
                     hme0InKB/s hme0OuKB/s
                     hme1InKB/s hme1OuKB/s
                     hme0IErr/s hme0OErr/s
                     hme1IErr/s hme1OErr/s
  .
  .  
  }
  plot {
  source        throughput
  data          (.*\d)Ipkt/s
  data          $1Opkt/s
  .
  .
  }
  plot {
  source        throughput
  data          (.*\d)InKB/s
  data          $1OuKB/s
  .
  .
  }
  plot {
  source        throughput
  data          (.*\d)IErr/s
  data          $1OErr/s
  .
  .
  }

If the following data files are found by Orca

  /data/solaris-2.6/olympia/percol-1998-12-01
  /data/solaris-2.6/olympia/percol-1998-12-02
  /data/solaris-2.5.1/sunridge/percol-1998-12-01
  /data/solaris-2.5.1/sunridge/percol-1998-12-02

then separate plots will be created for olympia and sunridge, with each plot containing the input and output number of packets per second.

By default, when Orca finds a plot set with a regular expression match, it will only find one match, and then go on to the next plot set. After it reaches the last plot set, it will go back to the first plot set with a regular expression match and look for the next data that matches the regular expression. The net result of this is that the generated HTML files using the above configuration will have links in this order:

  hme0 Input & Output Packets per Second
  hme0 Input & Output Kilobytes per Second
  hme0 Input & Output Errors per Second
  hme1 Input & Output Packets per Second
  hme1 Input & Output Kilobytes per Second
  hme1 Input & Output Errors per Second

If you wanted to have the links listed in order of hme0 and hme1, then you would add the flush_regexps option to tell Orca to find all regular expression matches for a particular plot set and all plot sets before the plot set containing flush_regexps before continuing on to the next plot set. For example, if

  flush_regexps 1

were added to the plot set for InKB/s and OuKB/s, then the order would be

  hme0 Input & Output Packets per Second
  hme0 Input & Output Kilobytes per Second
  hme1 Input & Output Packets per Second
  hme1 Input & Output Kilobytes per Second
  hme0 Input & Output Errors per Second
  hme1 Input & Output Errors per Second

If you wanted to have all of the plots be listed in order of the type of data being plotted, then you would add ``flush_regexps 1'' to all the plot sets and the order would be

  hme0 Input & Output Packets per Second
  hme1 Input & Output Packets per Second
  hme0 Input & Output Kilobytes per Second
  hme1 Input & Output Kilobytes per Second
  hme0 Input & Output Errors per Second
  hme1 Input & Output Errors per Second

Data Source Optional Plot Options

The following options are plot optional. Like the data option, multiple copies of these may be specified. The first option of a particular type sets the option for the first data option, the second option refers to the second data option, etc.

data_type type
When defining data types, Orca uses the same data types as provided by RRD. These are (a direct quote from the RRDcreate manual page):
GAUGE
is for things like temperatures or number of people in a room or value of a RedHat share.

COUNTER
is for continuous incrementing counters like the InOctets counter in a router. The COUNTER data source assumes that the counter never decreases, except when a counter overflows. The update function takes the overflow into account. The counter is stored as a per-second rate. When the counter overflows, RRDtool checks if the overflow happened at the 32bit or 64bit border and acts accordingly by adding an appropriate value to the result.

DERIVE
will store the the derivative of the line going from the last to the current value of the data source. This can be useful for gauges, for example, to measure the rate of people entering or leaving a room. Internally, derive works exactly like COUNTER but without overflow checks. So if your counter does not reset at 32 or 64 bit you might want to use DERIVE and combine it with a MIN value of 0.

ABSOLUTE
is for counters which get reset upon reading. This is used for fast counters which tend to overflow. So instead of reading them normally you reset them after every read to make sure you have a maximal time available before the next overflow.

If the data_type is not specified for a data option, it defaults to GAUGE.

data_min number
data_max number
data_min and data_max are optional entries defining the expected range of the supplied data. If data_min and/or data_max are defined, any value outside the defined range will be regarded as *UNKNOWN*.

If you want to specify the second data sources minimum and maximum but do not want to limit the first data source, then set the number's to U. For example:

  plot {
  data          column1
  data          column2
  data_min      U
  data_max      U
  data_min      0
  data_max      100
  }

color rrggbb
The optional color option specifies the color to use for a particular plot. The color should be of the form rrggbb in hexadecimal.

flush_regexps 1
Using the flush_regexps option tells Orca to make sure that the plot set including this option and all previous plot sets have matched all of the columns with their regular expressions. See the above description of using regular expressions in the data option for an example.

required 1
Because some of the input data files may not contain the column names that are listed in a particular plot, Orca provides two ways to handle missing data. By default, Orca will ignore data that does not exist or if a data item cannot be eval'ed or returns invalid data. In this case, the plot may never be created. However, if a plot is required, then set the required flag for a plot by placing
  required 1

in the options for a particular plot. In this case, Orca will record a *UNKNOWN* value for all invalid data.

Plotting Options

base number
If memory is being plotted (and not network traffic) this value should be set to 1024 so that one Kb is 1024 bytes. For traffic measurements, 1 Kb/s is 1000 b/s. By default, a base of 1000 is used.

plot_width number
Using the plot_width option specifies how many pixels wide the drawing area inside the PNG is.

plot_height number
Using the plot_height option specifies how many pixels high the drawing area inside the PNG is.

plot_min number
By setting the plot_min option, the minimum value to be graphed is set. By default this will be auto-configured from the data you select with the graphing functions.

plot_max number
By setting the plot_max option, the minimum value to be graphed is set. By default this will be auto-configured from the data you select with the graphing functions.

rigid_min_max
Normally Orca will automatically expand the lower and upper limit if the graph contains a value outside the valid range. By setting the rigid_min_max option, this is disabled.

logarithmic
Normally Orca will use a linear scale for the Y axis. If a plot contains this option, then a logarithmic scale will be used.

title <text>
Setting the title option sets the title of the plot. If you place %g or %G in the title, it is replaced with the text matched by any ()'s in the group find_files option. %g gets replaced with the exact text matched by the ()'s and %G is replaced with the same text, except the first character is capitalized.

y_legend <text>
Setting y_legend sets the text to be displayed along the Y axis of the PNG plot.

Multiple Plot Plotting Options

The following options should be specified multiple times for each data source in the plot.

line_type type
The line_type option specifies the type of line to plot a particular data set with. The available options are: LINE1, LINE2, and LINE3 which generate increasingly wide lines, AREA, which does the same as LINE? but fills the area between 0 and the graph with the specified color, and STACK, which does the same as LINE?, but the graph gets stacked on top of the previous LINE?, AREA, or STACK graph. Depending on the type of previous graph, the STACK will either be a LINE? or an AREA.

legend text
The legend option specifies for a single data source the comment that is placed below the PNG plot.


IMPLEMENTATION NOTES

Orca makes very heavy use of references to hashes and arrays to store all of the different data it uses.

The Digest::MD5 module is used to cache the result of some expensive calculations that commonly could be performed more than once. In particular, this arises when the same code is used to pull data from many different input data files into the same type of data structures. In this case, the code to be evaluated is run through MD5, where the resulting binary code is used as a key in a hash with the value being the anonymous subroutine array. This saves in memory and in processing time.


MAILING LISTS

Four mailing lists exist for Orca. To subscribe to any of the mailing lists, please visit the URL below. You have the option of choosing a digest form of the mailing list if you wish it when you subscribe to the mailing list or anytime thereafter. To send email to any of these lists you must subscribe to the list.

orca-announce

  Subscribe http://www.onelist.com/subscribe/orca-announce
  Archive   http://www.onelist.com/archive/orca-announce

The orca-announce@onelist.com mailing list is a LOW volume moderated mailing list for announcing stable releases of Orca.

orca-users@onelist.com

  Subscribe http://www.onelist.com/subscribe/orca-users
  Archive   http://www.onelist.com/archive/orca-users

The orca-users@onelist.com is a first stop mailing list for getting help in setting up and getting Orca running. Problems relating to downloading, configuring, compiling the necessary Perl modules, and installing Orca belong here. People interested anything more than this, such as developing data gathering modules or active Perl development, should be on one or both of the orca-discuss@onelist.com or orca-developers@onelist.com mailing lists. Once you get Orca running to your satisfaction, you may want to remove yourself from this list.

orca-discuss@onelist.com

  Subscribe http://www.onelist.com/subscribe/orca-discuss
  Archive   http://www.onelist.com/archive/orca-discuss

The orca-discuss@onelist.com mailing list is for active users of Orca who are doing new interesting things with Orca and want to discuss Orca but are not interested in actively developing Orca source code. These people are also not interested in helping people get Orca running on their systems.

orca-developers@onelist.com

  Subscribe http://www.onelist.com/subscribe/orca-developers
  Archive   http://www.onelist.com/archive/orca-developers

The orca-developers@onelist.com mailing list is for hackers of Orca who actually hack and improve Orca.


AUTHOR, COMMENTS, AND BUGS

Please direct all Orca comments and bugs to one of the above mailing lists.

If you wish to contact the author or Orca, Blair Zajac, directly, please email me to at the Orca Users mailing list.