Advanced monitoring and customizable alerting

Monitoring and alerting

Monitoring

One of the basic necessities in operating and optimizing IT infrastructure is the ability to monitor its resources in as many aspects as possible. In traditional IT environments such software has to be purchased, deployed and maintained which inherently raises the TCO of the infrastructure. With BitSwarm cloud such a service is already part of the basic offer and can be turned on or off by the user at will. In principle BitSwarm offers two versions of monitoring and alerting (MA):

  • Business MA
  • Enterprise MA

The difference is described below in the Monitoring and alerting options chapter.

How does it work

Basic parameters of your cloud servers are available at any time in the management console and cannot be turned off. These basic parameters include:

  • Total CPU usage
  • Internal bandwidth use
  • External bandwidth use

Additionally, values of extra parameters can be obtained but a specialized daemon in the cloud machine is necessary. The daemon is called ScoutSwarm and is already present on all the BitSwarm cloud machine templates. The daemon is basically a python script that is run as a service (daemon) and that periodically – every minute – sends the data to the BitSwarm data collector server. The impact of this process on the cloud server are minimal. The collector stores these values in the database for a period of 2 days. The client can then generate graphs of this data through the web interface.

Alerting

Based on the data of basic or extra parameters in the database specialized alarms can be configured. Alarms have three basic properties:

  • The parameter which they depend on
  • The threshold value of the parameter and its direction (above or below)
  • The period of time for for which the threshold has to be reached

When a parameter's value reaches and/or goes over (under) the defined threshold for the period of time specified, the alarm is triggered and an alert is sent to a predefined e-mail address or to a mobile phone number via SMS.

Use cases

Here are some example use cases where the monitoring and alerting would come very useful:

  • Cloud server uptime monitoring – if the cloud server does not send the extra parameters' data a null value is written into the database. An alarm for null values can be set up to alert the client in this case.
  • Disk space usage warnings – Alarms can be set up for all the cloud machines so that they trigger when disk usage on any attached storage volume is over certain percentage (for example 95%). The administrator can then safely free some space or extend the storage volume avoiding the unavailability of a service that runs on that cloud machine and needs more disk space.
  • Bandwidth monitoring to prevent excess usage and DDoS attacks.
  • Open files or other resource monitoring for discovering memory leaks in applications.
  • Many more – we have used the system extensively in our own infrastructure with great success.

Monitoring and alerting options

Business MA

The business grade monitoring and alerting includes the following (as described above):

  • Support for all basic and extra cloud parameters.
  • Parameters' values are saved with a frequency of 1 minute.
  • History of all parameter data for 48 hours.
  • Up to 1 alarm per cloud server.
  • Alerting by e-mail to 1 e-mail address.

Enterprise MA

The enterprise grade monitoring and alerting includes all the features of the basic model but also has the following:

  • History of all parameter data for 2 years (17520 hours).
  • Unlimited number of alarms per cloud server.
  • Alerting by e-mail to an unlimited number of e-mail addresses.
  • Alerting by SMS to an unlimited number of mobile phone numbers.

Extra parameters

Below is the list [TODO]


↑ Top