Scheduler phases

  1. Receive submission
  2. Decide where to run the job
    - Use a policy to decide
  3. Dispatch on a remote machine
    - Use an abstract dispatcher to do that
    - in this implementation division between stage (move the files and prepare the target directory) and execute (LSF or RMI)
  4. When notified, de-stage the job (get the output files).

Receive submission

This part should do the parsing of the XML, and provide the placeholder for handling catalog information. In the future, the file submitted could be described as a catalog ID. This part will take care of it. There should be an abstract class (JobInitializer) that provides parsing of the XML file, and returns a class describing the job to be dispatched, through a method called "JobDescription analizeRequest(String xmlFileName)"

As a first implementation, it should basically just parse the XML file, check whether the job was described correctly, and create an object that represent the jobDescription.

Decide where to run the job

The decision should be delegated to a policy. There should be an abstract class (SchedulerPolicy) or an interface with an abstract method called "Machine assignTargetMachine(JobDescription job)". The policy should be the one that interrogates the machines on their status: you might envision a policiy that accepts a less optimum strategy to decrease the status interrogation on the other machines. LSFSchedulerPolicy, that inherits from SchedulerPolicy, should also have an protected method to query the machine resource status, so that the implementation of that can also be changed by subclassing. The method should be protected, because the status will be asked within the policy: "protected ResourceStatus getResourceInfo(Machine machine)".

Dispatch: stage job on the remote machine

The Dispatcher should be an abstract class (Dispatcher) with one method called "void dispatch(JobDescricption job, Machine target)". This method should run the job remotely. In the implementation this should be divided in other two phases: stage (prepare target directory) and execute (start remote comand).

Stage

The staging should be done through a method called "Properties stage(JobDescription job, Machine target)". It will basically move everything that needs to be moved on the target machine. It describes how files are moved and where are they moved. It will return a set of properties, containing all the environment variable that need to be set, and a special variable "command", which is the actual command to execute on the remote machine to actually execute the job.

This is what we would like to have as a first implementation: the Dispatcher should create on the target machine a subdirectory for the job in which all the needed files are copied, or soft-linked. The target directory is basically a directory chosen by the scheduler, created on the machine where the job will run. It is unknown and unseen to the user. The input files will be either copied or soft-linked here. The command should be run in this directory, and it will find all the files specified in the job description as local files with the name equal to the name of the remote file. The environment variable are yet to be defined, and will be implemented later. These will be used to pass some small information from the scheduler to the script or the program being executed. Examples are a jobID; a variable for each file input file.

Execute

The execution is the method that actually starts the job, and sets the environment variables. It should have an abstract method called "void execute(Properties env, Machine target)". "env" contains a list of environment variable to be set when executing the job. The special variable "command" will be the actual command to be executed remotely. Depending on the implementation, it can be through LSF, RMI...

Basically, changing the implementation class, will decide how jobs are staged and executed.

De-staging the job

The same abstract class (Dispatcher) should also have a method called "void retrieveOutput(JobDescription job, Machine target)" that takes the output from the remote machine and places them in the directory specified by the user.

As a first implementation, de-staging consist in coping back the ouput file where the user specified, and deleting the job directory.

Programming tips

One of the nicest feature of Java is that you can specify classes at runtime. To do that, you have a property files from which you will read the classes that will be used as SchedulerPolicy, Stager, and JobExecuter. Then you call Class.forName() to load the class, and get the class object. Provided that the class has a default constructor, which must have, you call the newInstance() method of the Class, and get a new object, which you can cast as SchedulerPolicy, Stager and JobExecuter. This way, we can have a very different configuration of the scheduler on different machines.

This would mean that we could have different Dispatcher, or even policies, in different locations (PDSF and BNL) and execute things differently, or have slightly different policies. To do this, one would have to prepare two different Dispatcher, for example BNLDispatcher and PDSFDispatcher, and put the name of the Dispatcher class in the property file. By changing the name, you change the implementation without the need to recompile. This is very useful during both deploy and development: in development because you can work on different implementations without having to work with branches in CVS; during deploy because it's very easy to revert to an older implementation, in case the new is not so stable.

Notes for the Catalog implementation

In the JobInitializer one has to check whether the catalogs ID (or whatever it is passed) actually exist. In the Policy, one might want to dispatch a job according to catalog information, therefore one should just consult the catalog. In the dispacther, the decision was made, therefore the file have to be present on the remote machine. Here copy/move commands are issued.


Gabriele Carcassi - page was last modified