Configuration

Writing STAR Unified Meta – Scheduler configuration files

Introduction:

SUMS Configuration Guide Version 1.8.6

Note: This Guide deals with the configuration of SUMS 1.8.6 other versions may be different.

SUMS 1.8.6 is configured by the use of standard XML serialized scheduler objects handled by the java.beans.XMLDecoder class. More information about this class can be found in the JDK documentation at http://java.sun.com/ .

Basic syntax:

The basic syntax for primitive types is as follows:

<int>25</int>

<string>Hello World</string>

The first line is the same as saying “int myNum = 25;”. In other words the tag represents the object myNum.

High level objects are defined in another way. The syntax below defines the   java.util.ArrayList object.

<object class="java.util.ArrayList"/>

If we want to call the put member of this object then we need to expand the XML tags:

<object class="java.util.ArrayList">

<void method="add">

<string>Hello World</string>

</void>

</object>

This is the same as writing:

List MyList = new ArrayList();

MyList.put(“Hello World”);

Some objects have member functions that comply with the java Beans stander These can also be called by using the “<void method =”???” />” as in the above example however there is another way to call it. Member functions that comply with this stander come in pairs called setter and getter. An example would be “setMaxMemory” and “getMaxMemory”. In such a case it can also be referred to by dropping the first three letters of the function name, making the fourth letter lowercase and using the syntax below:    

<void property="maxMemory">

          <int>255</int>

</void>

The syntax above is identical to:

<void method=" setMaxMemory ">

          <int>255</int>

</void>

If an object needs or may need to be used more then once an ID can be used. The example below adds the same object to the list twice. In other words the list gets two pointers to exactly the same object. If the second add method where made to look like the first the list would be holding two pointers to two different objects that happen to be identical.

<object class="java.util.ArrayList">

 

<void method="add">

<object id="MySite"  class="gov.bnl.star.offline.scheduler.Site">

</void>

 

<void method="add">

            <object idref=" MySite " />

</void>

 

</object>

Warning : When using idref the object that the idref points to must always be defined above the idref, never below. Otherwise when the Scheduler tries to initialize it will only find a null object there. 

This covers everything that needs to be known about the syntax of Java XML objects. In the next section I will state what the actual objects are that need to be initialized in order to configure the scheduler correctly. It is highly recommended that a template be used for writing the configuration, because mistakes can be hard to track down and the probability of making mistakes is grater if you start with a blank slate. I would also recommend not trying to make too many changes at once.

The configuration file defines a whole hierarchy of classes. These classes are used to define sites and site assets such as gatekeepers, batch systems and queues to SUMS. So it follows that there are objects inside sums that represent each of these blocks.

The root element for the configuration file is the “java” tag which holds a hashtable object. The hashtable is a data structure composed of “String” – “Object” pairs. The “put” method is used to add these pairs. The method requires two arguments. The first is always the String/key and the other is always the object. See the example below.

<java version="1.4.2" class="java.beans.XMLDecoder">

            <object class="java.util.Hashtable">

                       

                        <void method="put">

                                    <string>MyObject</string>

                                    <object class=" gov.bnl.star.offline.scheduler.Site"/>

                        </void>

 

</object>

</java>  

 

KeyWords:

 

Any valid pair may be stored into this structure, no matter how irrelevant, without breaking SUMS. SUMS looks up these classes by there name/key and so some names are reserved, such as “localSite”, “defaultFileCatalog”, “defaultJobInitializer” and “gridView”.

 

 Keyword: localSite

The reserved key “localSite” is probably the easiest to understand.  In all templates you should see the block below commented out. The local site element is a string-string pair the second string forces SUMS to believe it is running at the site whose name it holds (see block below). This argument overwrites the –site option that the scheduler jar is given when by the “star-submit” script.

   <void method="put">

        <string>localSite</string>

        <string>nersc.gov</string>

    </void>

 

 Keyword: defaultFileCatalog

The reserved key “defaultFileCatalog” states the class that should be used for resolving catalog query tags. Currently this is the gov.bnl.star.offline.scheduler.catalog.StarCatalog class. No other interfaces for other catalogs have been written.

  <void method="put">

      <string>defaultFileCatalog</string>

      <object class="gov.bnl.star.offline.scheduler.catalog.StarCatalog"/>

</void>

 

Keyword: defaultJobInitializer

The job initializer is the first configurable block of the scheduler. It is responsible for interpreting a job the user is trying to submit. Currently this is done by way of a command line argument to the start-submit script. The class responsible for handling this is gov.bnl.star.offline.scheduler.initializer.XMLInitializer. The are two members of this class that have to be set. They are the SetDefaultFileListSyntax  and SetCheckIfFilesExistLocally. The SetDefaultFileListSyntax sets the default file syntax to be used in both the .list files and .csh INPUTFILE(n)’s file’s environment variables. The syntax options are “paths”, “rootd” and, “xrootd”. Below is a typical configured block:

  <void method="put">

 

           <string>defaultJobInitializer</string>

 

           <object class="gov.bnl.star.offline.scheduler.initializer.XMLInitializer">

 

                      <void property="defaultFileListSyntax">

                                 <string>paths</string>

                      </void>

 

                      <void property="checkIfFilesExistLocally">

                                <boolean>true</boolean>

                      </void>

 

           </object>

 

 </void>

The reserved key “gridView” is the bulkiest of these blocks and like wise holes the most information. It should hold a description of all the sites SUMS runs across. The root object is a list (java.util.ArrayList) that contains one or more site (gov.bnl.star.offline.scheduler.Site) objects. The sites intern hold batch-system (gov.bnl.star.offline.scheduler.BatchSystem) objects which intern hold queue (gov.bnl.star.offline.scheduler.Queue) and dispatcher objects. Each of these objects has many members that must be filled out. This block is too big to post an example of. Examples can be found in the scheduler’s preexisting configuration files. It is recommended these be used as templates.  

 

    User Defined Keywords

It was stated earlier that any key value pair can be added so long as it is not one of the reserved keys. This comes in handy when users want the option of using more then one policy per site. To backtrack, a policy is an algorithm that selects which queue to send jobs to from a list of possible queues. The “–p” or “–policy” option can be used by a user to select a policy other then there default policy. This policy has to be defined in a key value pair block. The name of the policy is the same as the key of the block. There is more them one policy class each uses a different algorithm for splitting jobs and determining which queue a job should go to. You can verify that a class is indeed a policy by checking in the java docs if it implements gov.bnl.star.offline.scheduler.Policy. The most fail safe policy is gov.bnl.star.offline.scheduler.policy.PassivePolicy as it relies on no external (external to SUMS) service for doing its job. Below is an example of a basic policy.

  <void method="put">

            <string>bnl_condor_cas</string>

            <object class="gov.bnl.star.offline.scheduler.policy.PassivePolicy">

                      <void method="addQueue"><object idref="bnl_condor_cas"/></void>

            </object>

  </void>

Note this is a passive policy submitting to only one queue, so there is very little decision making to be made. The queue is added to the policies list of possible queues to submit to via the “addQueue” method which may be called over and over again for adding more queues. Also note that that queue its self is not defined there. The queue object being passed is a reference to a queue object whose id is "bnl_condor_cas". Using this reference it is possible to define multiple policies each using the same queue(s). Even if you wanted to define a queue object in the policy block without using a reference in would not be possible because the queue has to be defined in the batch system block so it can inherit the other properties required for it to be fully functional. The user will be able to call this policy using the “–p bnl_condor_cas” or “–policy bnl_condor_cas” option when runing from the command line. These should be documented for users so they know what locations SUMS has been configured to send to.


A simple tutorial for setting up the scheduler (SUMS).

To make trying to set up a local copy of the scheduler (SUMS) simpler a basic configuration file is provided. It can be modified to run at your site by changing as little as one line.

Instructions to modify basic configuration to run at your site:

  1. Download and unzip the SUMS starter kit.

    Click here to download.

    When you unzip it, you should see:
    scheduler.jar – the core executable
    star-submit – the script to invoke the executable
    globalConfig.xml – the configuration file
    job.xml – The most basic sample job description


  2. Open and edit the config file. The file is large so it is highly recommended that you use a non-rapping text editor with XML coloring (EmEditor for windows for example or gedit for Linux). Find this line at about line 367:

    <void property="siteName"><string>rhic.bnl.gov</string></void>

    It needs to be changed. Run the command /bin/domainname and replace the string “rhic.bnl.gov” in the configuration with the output of the domainname command. Then save the file. You have just edited the site xml block and bound the site it describes to your own site.


  3. When you invoke the star-submit script internally the /bin/domainname name command is called and passed to the scheduler jar. The scheduler will check all the sites (this config file only has one site) in the configuration file until it finds your site by matching the siteName string. We can now test this with the included sample job. Run this command:
  4. star-submit job.xml

    The configuration is currently setup to send all the jobs to the fork job manager, because it is present on every node. The output should look something like this:

    STAR Scheduler 1.8.10
    *** Note: The default directory in which jobs start has be fix to $SCRATCH ***
    Your Log file can be found at: ./lbhajdu.log
    Reading request description file : /afs/rhic/star/users/starjobs/dev/job.xml
    Analyzing XML...XML OK
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_9.... done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_8.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_7.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_6.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_5.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_4.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_3.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_2.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_1.. done.
    Dispatching process 8DA5076A1D1367FB52ACCC27BEA5DE3A_0.. done.
    Wrote scheduling report to : sched8DA5076A1D1367FB52ACCC27BEA5DE3A.report
    Scheduling successful


  5. If you have a batch job manager running at your site edit the configuration again. Some default dispatchers exist for some of the most common batch systems. Go back to the site block you edited in step 2 and find the LocalAccessPoint inside which is the AccessMethod block. You should see several lines of the form:
  6. <void property="dispatcher"> <object idref="Fork_Dispatcher"/> </void>

    You will note only one can be active and the others are commented out. Just activate the block that corresponds to your batch system and comment out the default Fork line shown above. The current options are: Condor, SGE, PBS, FORK and, LSF. There are some others available that do not have samples in the sample like XGRID and condorG. For configurations for these dispatchers please contact us. Try to submit the sample job again. This time all the jobs should be sent to your batch system.

  7. You may need to tweak the dispatcher its self for your specific implementation. This can be a more advanced skill. The dispatchers are defined in the configuration. Just search for the comment “Start of Dispatcher list” to find the list. More parameters are available then what is defined in the sample configuration. Please consult the scheduler java doc for these options.

Levente Hajdu - page was last modified