The Queue Monitoring MonALISA Module

Introduction

Information about the status of Local Resource Managment Systems (LRMS) is needed by Resource Brokers for decision making mechanisms and for the implementation of global policies. Resource Brokers, however, do not have access to LRMS and they cannot affect local policies. A mechanism is needed that will provide the aggregate status of a queuing System.

We are developing a Queue Monitoring Module for the MonALISA monitoring Framework. This custom module will provide aggregate status info for the most popular queuing systems (CONDOR, LSF, PBS, SGE) using the same attributes.

We aim to make the provided info compatible with the GLUE Schema. In particular, the Computing Element (CE) of the GLUE Schema represents the entry point to a Queue. There is one Computing Element per Queue.

The following table lists the attributes of the Status object of the CE Element:

Attribute Description
RunningJobs Number of currently running Jobs
WaitingJobs Number of jobs that are in a state other that running
TotalJobs Total Number of Jobs in the CE (Running + Waiting)
Status States a Queue can be in (Production, Closed, Queing, ...)
WorstResponseTime Worst time between job submission till when job starts its execution (in secs)
EstimatedResponseTime Estimated time between job submission till when job starts its execution (in secs)
FreeCPUs Number of Free CPUs available to the Scheduler

A presentation with ideas on Queue Monitoring was given at the PPDG collaboration Meeting and is available here .

In developing the ML Queue Monitoring Module we will take into account previously done work by others. In particular, the EDG has made available an MDS information provider that calculates the same attributes. In a way, our goal is to convert the MDS IP into a MonaLisa Monitoring Module.
There is also some work done in providing attributes of the LSF and PBS systems to MonALISA. Although the attrbutes provided are not GLUE-like, we certainly look into this work too.

Installation of the Queue Monitoring ML module

Accessing the Cluster Summaries

Since we run the MonALISA service as a web service you may access the information we publish in MonALISA using the Web Service Clients. You may also access the same informaton and make plots using the Global MonALISA client.

Stratos Efstathiadis - page was last modified