Job requirements and XML Job description

There are two aspect in the interface between the scheduler and the end user. The obvious one is how the user describes the job to be executed by the scheduler. But as the scheduler will be able to divide the job into different sub-jobs, the user has to be aware how the scheduler is going to achieve that.

This provides as a specification for both these aspects, and hopefully this specification won't change when we will migrate to another scheduler.

XML job description

The Job will be described to the scheduler via an XML. Let's make an example for every different use case.

Use case 1: no data files

This is an example for the submission of a program with no data files. A simple command line is executed, and the directory from which the execution is perceived is an NFS mounted directory

<command>myProgram -c -d myFile</command>

<stdin URL="nfs:/star/u/carcassi/test/input" />

<stdout URL="nfs:/star/u/carcassi/test/output" />

</job>

The main entity is called job, and has the following attributes:

Property	Description
username (optional)	the user that is going to submit the job. The default is the current user.
directory (optional)	the directory in which the program is located. It coincides with the directory in which the program is going to be executed. The default is the current directory. For more information about this property, check the "Moving job on local machine vs execution from NFS", as it describes issues with the working directory.
title (optional)	a title describing your job. The default is to be determined.
description (optional)	a lengthier description of your job. The default is blank.

The command entity specifies the command line to be executed. [NB This before was an attribute of job. It had to be changed because inside an attribute the quotes can't be used]

Standard input and standard output must be redirected using the sub-entities of process: stdin and stdout. There can only be one of each for each process. In the example, both files will be located through an URL. Other methods of file specification are available. You can find them in the file location description section.

Use case 2: data files, one job

This example describes the submission of one program with many data files. This program can't be split in different sub-jobs.

<stdin URL="nfs:/star/u/carcassi/test/input" />

<stdout URL="nfs:/star/u/carcassi/test/output" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile1" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile2" />

<input URL="nfs:/star/u/carcassi/test/otherDataFile" />

</process>

The difference between this case and the one before, is that some input files were added with the entity input for each of the file. As for stdin and stdout, other methods can be used to specify the location, and they are described in the file location description section.

Another peculiarity is that the job might need to know the list of the files. The scheduler will prepare a file containing the list (one line for each entry, containing the full path) of data files. The name of the file list is set in the environment variable fileList and can be put in the command line as shown.

Use case 3: data files, multiple sub-jobs

This example describes the submission of one program with many data files. This program can be split in different sub-jobs.

<stdin URL="nfs:/star/u/carcassi/test/input" />

<stdout URL="nfs:/star/u/carcassi/test/output_${jobID}" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile1" />

<input URL="file://rcas6023.rhic.bnl.gov/home/starcache/dataFile2" />

<input URL="nfs:/star/u/carcassi/test/otherDataFile" />

</process>

The only difference between this case and the previous is that the output file name contains the job ID, provided in the environment variable jobID. This means that each different sub-job will save its output in a different location. In the previous case, since the output file name was the same, there would have been a clash between two parallel executions. Notice that if the stdout is absent, the job can be divided.

Open issues

There are still some open issues:

a job might save to another file, and the path is hardcoded. Therefore, all the jobs would save to the same location, and still provide a clash. The user should be responsible for avoiding this.
a user might want to submit a job that can't be split, but would want to have the job ID in its output file. This may trigger job splitting, which is not wanted. There could be a keyword that can be set as a parameter of process (for example, jobSplitting="false"). So the question is: what should be the default behavior? Splits or doesn't split?
specify the output files as well as the input files. It would provide more flexibility if the user wouldn't hardcode directly a full path for output files other than the stdout. If, for example, the job would save always in the working directory, than the scheduler would move the output file to the specified location (maybe even HPSS directly). One could avoid the file copy by creating links in the working directory that point to the real output file.
While this scheme works perfectly if we move the job to the target machine, if the jobs are executed via NFS it might create problems because the working directory is the same (cfr. Moving job on local machine vs execution from NFS). The consequences are still to be determined exactly.

File location description

All file entries (stdin, stdout, input, output) can be specified in different ways. The first one is by an URI/URL.

For a file local to a specific machine, you use the file: scheme, followed by the name of the machine and the full path. For example:

file://rcas6020.rhic.bnl.gov/home/starcache/dataFile

For a file located on NFS, one can use the nfs: scheme, followed directly by the full path. Only one slash is used after the scheme to indicate that the first element of the path is not a machine. For example:

nfs:/star/u/carcassi/test/input

For a file located on HPSS, use the hpss: scheme, followed by the full path. Only one slash is used after the scheme to indicate that the first element of the path is not a machine. For example:

hpss:/home/starreco/reco/P01hg/2001/242::P01hg/daq_reco_dst

The consequences of using HPSS directly are still to be carefully examined.

An input can also be represented by a query to the file catalog. The syntax is still not clear, this is a working draft. The scheme could be catalog:, followed by the name of the catalog and the query. For example:

catalog:star?keyword=value,keyword=value

No slash is used to indicate this URI doesn't specify a location (as in mailto:username@host).

Scheduler to job communication

If the scheduler is allowed to decide how execute the same job on different machines with different files, the scheduler must tell its decision to the job in one way or another.

File list communication

Three methods were considered

Working directory link: create a link in the working directory for each file. This way the program would find in its directory the files requested, and wouldn't even have to know where they reside. This scheme, though, works only if job execution is done through a copy and not from NFS (see Moving job on local machine vs execution from NFS). Therefore it was discarded.
File: create a text file in which every line contains a full path to the file to be used. This would be very easy to use from the code.
Environment variables: create a set of environment variables to pass the names of the files. This would be very easy to use in scripts.

It is likely that the second and the third will be both made available. The second is already specified. We still lack specification for the third.

Other considerations

Moving job on local machine vs execution from NFS

There are two ways to submit the job to the remote machine: either require that the job is on a network drive (NFS) mounted on all machines, or to actually copy the file needed for the execution.

These are some points to keep in mind when comparing Copy vs NFS:

Copy makes the scheduling less dependent of the network configuration, and may enable submission to machine not in the farm/from other sites.
Copy makes the job independent from network failures during execution, but if the data files are on NFS, this is not true anyway.
Copy makes multiple execution of the same program easier, since the working directory may be different for every job.
NFS makes the job loading faster if it is executed only one (remote execution vs copy and local execution). Copy makes it faster if it is executed more than once (remote execution vs local execution). But in a typical application this is marginal.
In a typical analysis, many files are used, and it would be difficult for the user to select each and every file that are required to be copied. The only approach would be to deep-copy an entire directory, and the user has to provide a program that requires only files within that directory, and that are accessed through relative paths.

It is likely that for a first implementation we will keep job execution through NFS. In the future, though, one might want to change it. As stated, the only approach that makes sense is to copy an entire directory. Therefore the directory specified in the job description will be used differently in the two different cases.

If NFS is used, the directory will be the working directory. This will mean that more than one job can be executed at the same time on the same working directory. Jobs are therefore require not to write anything in the working directory. Or, less restrictively, to write only with a scheme that prevents clashes.
If Copy is used, the directory will be copied to the remote machine, and executed in a separate working directory, not accessible by the user.

Gabriele Carcassi - page was last modified