Job requirements and XML Job description

There are two aspect in the interface between the scheduler and the end user. The obvious one is how the user describes the job to be executed by the scheduler. But as the scheduler will be able to divide the job into different sub-jobs, the user has to be aware how the scheduler is going to achieve that.

This provides as a specification for both these aspects, and hopefully this specification won't change when we will migrate to another scheduler.

XML job description

The Job will be described to the scheduler via an XML. Let's make an example for every different use case.

Use case 1: no data files

This is an example for the submission of a program with no data files. A simple command line is executed, and the directory from which the execution is perceived is an NFS mounted directory

<job username="carcassi" directory="/star/u/carcassi/test" title="My test program" description="Testing my program">

<command>myProgram -c -d myFile</command>

<stdin URL="nfs:/star/u/carcassi/test/input" />

<stdout URL="nfs:/star/u/carcassi/test/output" />

</job>

The main entity is called job, and has the following attributes:

Property Description
username (optional) the user that is going to submit the job. The default is the current user.
directory (optional) the directory in which the program is located. It coincides with the directory in which the program is going to be executed. The default is the current directory.

For more information about this property, check the "Moving job on local machine vs execution from NFS", as it describes issues with the working directory.

title (optional) a title describing your job. The default is to be determined.
description (optional) a lengthier description of your job. The default is blank.

The command entity specifies the command line to be executed. [NB This before was an attribute of job. It had to be changed because inside an attribute the quotes can't be used]

Standard input and standard output must be redirected using the sub-entities of process: stdin and stdout. There can only be one of each for each process. In the example, both files will be located through an URL. Other methods of file specification are available. You can find them in the file location description section.

Use case 2: data files, one job

This example describes the submission of one program with many data files. This program can't be split in different sub-jobs.

<process username="carcassi" command="myProgram -c -d ${fileList}" directory="/star/u/carcassi/test" title="My test program" description="Testing my program">

<stdin URL="nfs:/star/u/carcassi/test/input" />

<stdout URL="nfs:/star/u/carcassi/test/output" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile1" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile2" />

<input URL="nfs:/star/u/carcassi/test/otherDataFile" />

</process>

The difference between this case and the one before, is that some input files were added with the entity input for each of the file. As for stdin and stdout, other methods can be used to specify the location, and they are described in the file location description section.

Another peculiarity is that the job might need to know the list of the files. The scheduler will prepare a file containing the list (one line for each entry, containing the full path) of data files. The name of the file list is set in the environment variable fileList and can be put in the command line as shown.

Use case 3: data files, multiple sub-jobs

This example describes the submission of one program with many data files. This program can be split in different sub-jobs.

<process username="carcassi" command="myProgram -c -d ${fileList}" directory="/star/u/carcassi/test" title="My test program" description="Testing my program">

<stdin URL="nfs:/star/u/carcassi/test/input" />

<stdout URL="nfs:/star/u/carcassi/test/output_${jobID}" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile1" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile2" />

<input URL="nfs:/star/u/carcassi/test/otherDataFile" />

</process>

The only difference between this case and the previous is that the output file name contains the job ID, provided in the environment variable jobID. This means that each different sub-job will save its output in a different location. In the previous case, since the output file name was the same, there would have been a clash between two parallel executions. Notice that if the stdout is absent, the job can be divided.

Open issues

There are still some open issues:

File location description

All file entries (stdin, stdout, input, output) can be specified in different ways. The first one is by an URI/URL.

For a file local to a specific machine, you use the file: scheme, followed by the name of the machine and the full path. For example:

For a file located on NFS, one can use the nfs: scheme, followed directly by the full path. Only one slash is used after the scheme to indicate that the first element of the path is not a machine. For example:

For a file located on HPSS, use the hpss: scheme, followed by the full path. Only one slash is used after the scheme to indicate that the first element of the path is not a machine. For example:

The consequences of using HPSS directly are still to be carefully examined.

An input can also be represented by a query to the file catalog. The syntax is still not clear, this is a working draft. The scheme could be catalog:, followed by the name of the catalog and the query. For example:

No slash is used to indicate this URI doesn't specify a location (as in mailto:username@host).

Scheduler to job communication

If the scheduler is allowed to decide how execute the same job on different machines with different files, the scheduler must tell its decision to the job in one way or another.

File list communication

Three methods were considered

It is likely that the second and the third will be both made available. The second is already specified. We still lack specification for the third.

 

Other considerations

Moving job on local machine vs execution from NFS

There are two ways to submit the job to the remote machine: either require that the job is on a network drive (NFS) mounted on all machines, or to actually copy the file needed for the execution.

These are some points to keep in mind when comparing Copy vs NFS:

It is likely that for a first implementation we will keep job execution through NFS. In the future, though, one might want to change it. As stated, the only approach that makes sense is to copy an entire directory. Therefore the directory specified in the job description will be used differently in the two different cases.


Gabriele Carcassi - page was last modified