XML JDL Schema

Under:

Schema SUMS_UJDL.xsd

schema location:  SUMS_UJDL.xsd (HTML)
   
Elements  Complex types 
command  mapType 
input  stdType 
job   
output   



element command

diagram
type xs:string
used by
element job
annotation
documentation 
The command element doesn't have any attributes, and the data that it contains is the actual command script that
will be submitted using a csh script. You can use environment variable to retrieve special information, such as the
JOBID, the FILELIST, or more. But remember that the command line will be passed as it is, and therefore if csh doesn't
perform the substitution (for example, because part of your command containing the variable is between '...'), the
scheduler won't. Refer to csh man pages. If you have doubts on the correct execution of your command, you can
simulate the submission and manually check the script.


element input

diagram
used by
element job
attributes
Name  Type  Use  Default  Fixed  Annotation
URL  xs:string  required      
nFiles  xs:string  optional  100    
documentation The number of files returned by the query
singleCopy  xs:boolean  optional  true    
documentation 
Specify if the query should return one copy for each file, or it should return multiple
copies if they are available.

For example, suppose one file has two copies: one on rcas6060.star.bnl.gov and
one on NFS. By selecting "true", only one of them is returned. By selecting "false",
both of them can be returned. In the second case, you job will actually run on two
copies of the same file.

By default only one copy of the same file is returned.
preferStorage  xs:NMTOKEN  optional      
documentation 
When multiple copies are available for one file, this attribute is used to choose which
particular copy to get. This attribute has meaning if singleCopy is not set to false.
If more than one copy is available which the preferred storage (for example, a file
is available on two different machines), one copy is chosen at random.

This attribute was introduced because small jobs on small set of files are penalized
when dispatched on local files: they have to wait for a particular machine to free
up, and that might take a long time even if the rest of the farm is free. Executing on
local files make each job faster, but it might increase the waiting time before the job
gets executed. Therefore, NFS is recommended only for testing your analysis on a
small set and local when you run on the entire set.

Remember that the query might return local files even if you chose NFS. If you want
_only_ NFS or local files, then put "storage=NFS" inside your query.
annotation
documentation 
The input element is used to specify data input files. Input files can be specified by either a path and filename
resident on network mounted disks, such as AFS or NFS; it can be a file on a local disk; it can be a query to the
file catalog. We suggest that you use the latter, because it provides the system more flexibility on how to allocate
your process. One can specifies more than one input file element. You can mix NFS files with local files and catalog
queries. You can have more than one catalog query. To specify the location of the input files, you still use an
URL.


element job

diagram
children command stdin stdout stderr input output
attributes
Name  Type  Use  Default  Fixed  Annotation
simulateSubmission  xs:boolean  optional      
documentation 
Tells the scheduler whether to dispatch the actual jobs. If true, the file scripts are
created, but they are not actually submitted. This is useful to check whether everything
is functioning correctly.
name  xs:string  optional      
documentation 
Gives the job a name by which it can be identified in the underlying batch system.
mail  xs:boolean  optional  false    
documentation 
Tells the scheduler whether to allow the submission of a job that will returns it's output
by mail. If not this is not set, or is not equal to true, the scheduler will fail if a stdout
wasn't specified. This option is here to prevent a user to accidentally send himself all
the outputs by mail.
nProcesses  xs:long  optional      
documentation [New field]
minFilesPerProcess  xs:long  optional      
documentation 
Tells the scheduler the minimum number of files each process should run on. The
scheduler will do its best to keep this requirement, but it's not guaranteed to succeed.
If a correct distribution is not found, the user will be asked to validate it.
maxFilesPerProcess  xs:long  optional      
documentation 
Tells the scheduler how many input files to assign to each process at maximum.
This number should represent the number of files that your program, by design, is
not allowed to have (e.g. after 150 files memory use has increased too much due to
a memory leak). The actual number of files dispatched to the process is decided by
the scheduler, which takes into account user requirements (i.e. minFiles, maxFiles
and filesPerHour) and farm resources (i.e. length of the different queues).
filesPerHour  xs:double  optional      
documentation 
Tells the scheduler how many input files per hour the job is going to analyze.
This information is used by the scheduler to determine an estimate of the job
execution time. This is necessary to determine the correct usage of resources
(e.g. use the long or short queue). By combining the use of filesPerHour and
minFilesPerProcess, you can basically tell the scheduler what is the minimum time
required by your job, and force the use of long queues. If this attribute is not
provided, the job is assumed to be instantaneous (e.g. the processes will be
dispatched to the short queue no matter how many input files it has).
fileListSyntax  xs:NMTOKEN  optional  paths    
documentation 
This attribute tells the scheduler which syntax to use for the file list. There are
only a few possible values imposed by the schema. There are currently: paths,
rootd.

"paths" syntax returns both files on local disk and on distributed disk as a normal
path used by the filesystem. This syntax is useful within scripts. The "paths"
syntax looks like this /path1/file1 /path2/file2 /path3/file3 ...

"rootd" syntax returns files on distributed disk with paths, and files on local disk
with the rootd syntax. It also appends the number of events contained in each
file. This file syntax is designed to work with the MuDST makers, and has two
advantages:
(1) It allows root to access files that are on the local disk of a different node, making
it possible to guarantee the minFilesPerProcess
(2) By giving the number of events in the files, the MuDST maker doesn't have to
pre-scan the files, slightly improving performance.
The "rootd" syntax looks like this /NFSpath1/file1 nEvents1 /NFSpath2/file2 nEvents2
root://machine//path3/file3 nEvents3 root://machine//path4/file4 nEvents4 ...
inputOrder  xs:string  optional      
documentation 
This attributes tells the scheduler that you want your input files ordered according
to the value of some catalog attribute. This is not going to provide the filelists always
in sequence: there can always be gaps. It's only going to reorder the filelists after
they are produced. This options is only possible if all the inputs are catalog queries.
minStorageSpace  xs:long  optional      
documentation Tells the scheduler the minimal storage space (disk most likely) a job will
need to run. A job should not be scheduled on a node having less space
than this specified number.
maxStorageSpace  xs:long  optional      
documentation Tells the scheduler the maximum storage space (disk most likely) a job will
need to run. If not specified the job may fail if it has not enough space.
This value may be used for advanced reservation of storage space.
This is necessary to determine the correct usage of resources.
minMemory  xs:long  optional      
documentation Minimum memory expected for an individual job (in MB).
Setting this value will affect the scheduling priority.
maxMemory  xs:long  optional      
documentation Maximum memory an individual job is expected to use (in MB).
Setting this value will afefct scheduling priority.
annotation
documentation 
The top of the Scheduler submission description schema is the "job" element. This element MUST be present and
all other specifications relates to it. the "job" element however has many characteristics defined via attributes
documented herein.


element job/stdin

diagram
type stdType
attributes
Name  Type  Use  Default  Fixed  Annotation
URL  xs:anyURI  required      
documentation 
This complex type tells the scheduler to which file redirect the standard input, output
or error. The URL must be of the file protocol (that means that is a local file, accessible
via file system), and it should be visible on all machines (for example, a file on NFS or
AFS).

Remember that the stdout and the stderr must be different for every process,
otherwise all the process that the scheduler will divide your job in will overwrite the
same file. To achieve that, you can use the $JOBID environment variable.

For the input element, tells the scheduler which input files to associate to the processes.

Network file. To specify a file that is accessible on all machines on the same file path,
you should write "file:/path/name".

File on local disk. To specify a file that is resident on a target machine, that is a
machine on which the scheduler is allowed to submit the job, you should write
"file://host/path/name".

Filelist on Network disk. You can specify a text file that is going to contain a list
of files on which to run your analysis. You should write "filelist:/path/name".

Catalog query. To specify a query to the file catalog, you should write
"catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query;
star.bnl.gov tells you are querying the catalog for star at BNL, and query is
the actual query. The query is a comma separated keyword value pair
("keyword1=value1, keyword2=value2") that will be forwarded to the file
catalog. The syntax is the same allowed for the command line interface of the
file catalog at the -cond parameter.
discard  xs:boolean  optional      
documentation 
The discard attributes tells the scheduler to discard the stream, that is, to get
rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored
otherwise).

Be careful when using this option: when using the GRID you don't know where
your job is going to run, and the standard output/error are crucial to understand
what went wrong.
annotation
documentation Standard input


element job/stdout

diagram
type extension of stdType
attributes
Name  Type  Use  Default  Fixed  Annotation
URL  xs:anyURI  required      
documentation 
This complex type tells the scheduler to which file redirect the standard input, output
or error. The URL must be of the file protocol (that means that is a local file, accessible
via file system), and it should be visible on all machines (for example, a file on NFS or
AFS).

Remember that the stdout and the stderr must be different for every process,
otherwise all the process that the scheduler will divide your job in will overwrite the
same file. To achieve that, you can use the $JOBID environment variable.

For the input element, tells the scheduler which input files to associate to the processes.

Network file. To specify a file that is accessible on all machines on the same file path,
you should write "file:/path/name".

File on local disk. To specify a file that is resident on a target machine, that is a
machine on which the scheduler is allowed to submit the job, you should write
"file://host/path/name".

Filelist on Network disk. You can specify a text file that is going to contain a list
of files on which to run your analysis. You should write "filelist:/path/name".

Catalog query. To specify a query to the file catalog, you should write
"catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query;
star.bnl.gov tells you are querying the catalog for star at BNL, and query is
the actual query. The query is a comma separated keyword value pair
("keyword1=value1, keyword2=value2") that will be forwarded to the file
catalog. The syntax is the same allowed for the command line interface of the
file catalog at the -cond parameter.
discard  xs:boolean  optional      
documentation 
The discard attributes tells the scheduler to discard the stream, that is, to get
rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored
otherwise).

Be careful when using this option: when using the GRID you don't know where
your job is going to run, and the standard output/error are crucial to understand
what went wrong.
annotation
documentation Standard output


element job/stderr

diagram
type stdType
attributes
Name  Type  Use  Default  Fixed  Annotation
URL  xs:anyURI  required      
documentation 
This complex type tells the scheduler to which file redirect the standard input, output
or error. The URL must be of the file protocol (that means that is a local file, accessible
via file system), and it should be visible on all machines (for example, a file on NFS or
AFS).

Remember that the stdout and the stderr must be different for every process,
otherwise all the process that the scheduler will divide your job in will overwrite the
same file. To achieve that, you can use the $JOBID environment variable.

For the input element, tells the scheduler which input files to associate to the processes.

Network file. To specify a file that is accessible on all machines on the same file path,
you should write "file:/path/name".

File on local disk. To specify a file that is resident on a target machine, that is a
machine on which the scheduler is allowed to submit the job, you should write
"file://host/path/name".

Filelist on Network disk. You can specify a text file that is going to contain a list
of files on which to run your analysis. You should write "filelist:/path/name".

Catalog query. To specify a query to the file catalog, you should write
"catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query;
star.bnl.gov tells you are querying the catalog for star at BNL, and query is
the actual query. The query is a comma separated keyword value pair
("keyword1=value1, keyword2=value2") that will be forwarded to the file
catalog. The syntax is the same allowed for the command line interface of the
file catalog at the -cond parameter.
discard  xs:boolean  optional      
documentation 
The discard attributes tells the scheduler to discard the stream, that is, to get
rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored
otherwise).

Be careful when using this option: when using the GRID you don't know where
your job is going to run, and the standard output/error are crucial to understand
what went wrong.
annotation
documentation Standard error


element output

diagram
children copy register
used by
element job
attributes
Name  Type  Use  Default  Fixed  Annotation
fromScratch  xs:string  optional      
documentation 
With this attribute you specify either a file, a wildcard or a directory to be copied
back. The file, wildcard or directory must be expressed relative to the $SCRATCH
directory.

That is, to retrieve all the .root files your job saved in the $SCRATCH, simply use
*.root in this attribute
toURL  xs:anyURI  optional      
documentation 
Tells the scheduler where to copy the output. The URL must represent either a
network file or directory.

Network file. To specify a file, you should write "file:/path/name". You can specify
a file here only if the output you specified is a file (you are not allowed to copy a
directory in one file). You can specify a different name so that the file will be brought
back with the different name.

Network directory. To specify a directory, you should write "file:/path/".
annotation
documentation 
The output element is used to specify the output produced by your code. With this tag, you will be able to write
your output on a local scratch directory on the node the job will be dispatched to, and the scheduler will copy
your output at the end of the job. This will make better use of I/O resources. The environment variable $SCRATCH
will contain a path available for your job. This space is unique for each process your job will be divided into, and
will be deleted after you job ends. With the output element you are able to specify which files you want to bring
back. You don't need to bring everything back, of course, but the output won't be available anymore later.

Remember that your job will be divided into different processes, and that all the processes should use different
output filenames, or otherwise they will rewrite their outputs. You can always use the $JOBID to create unique
filenames.


element output/copy

diagram
type extension of mapType
attributes
Name  Type  Use  Default  Fixed  Annotation
ref  xs:ID  optional      
documentation The reference for this object
idref  xs:IDREF  optional      
documentation A reference to another object
URI  xs:anyURI  optional      
documentation A URI describing the final product.
storageService  xs:string  optional      
annotation
documentation 
A physical file copy from A to B service.


element output/register

diagram
type mapType
attributes
Name  Type  Use  Default  Fixed  Annotation
ref  xs:ID  optional      
documentation The reference for this object
idref  xs:IDREF  optional      
documentation A reference to another object
URI  xs:anyURI  optional      
documentation A URI describing the final product.
annotation
documentation 
A registration service for datasets or files (physical or logical)


complexType mapType

diagram
used by
elements output/copy output/register
attributes
Name  Type  Use  Default  Fixed  Annotation
ref  xs:ID  optional      
documentation The reference for this object
idref  xs:IDREF  optional      
documentation A reference to another object
URI  xs:anyURI  optional      
documentation A URI describing the final product.
annotation
documentation 
Describes any action to be done on a input/output. This complex type has a reference (ref)
and a pointer to a reference. Those references should be viewed (and used) as a reference
to the URI attribute. For example, output has an ID (attribute name ref), copy may refer to its
value via an IDREF (attribute idref) and define a final target via the URI which becomes the
reference (ref) for that action.


complexType stdType

diagram
used by
elements job/stderr job/stdin job/stdout
attributes
Name  Type  Use  Default  Fixed  Annotation
URL  xs:anyURI  required      
documentation 
This complex type tells the scheduler to which file redirect the standard input, output
or error. The URL must be of the file protocol (that means that is a local file, accessible
via file system), and it should be visible on all machines (for example, a file on NFS or
AFS).

Remember that the stdout and the stderr must be different for every process,
otherwise all the process that the scheduler will divide your job in will overwrite the
same file. To achieve that, you can use the $JOBID environment variable.

For the input element, tells the scheduler which input files to associate to the processes.

Network file. To specify a file that is accessible on all machines on the same file path,
you should write "file:/path/name".

File on local disk. To specify a file that is resident on a target machine, that is a
machine on which the scheduler is allowed to submit the job, you should write
"file://host/path/name".

Filelist on Network disk. You can specify a text file that is going to contain a list
of files on which to run your analysis. You should write "filelist:/path/name".

Catalog query. To specify a query to the file catalog, you should write
"catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query;
star.bnl.gov tells you are querying the catalog for star at BNL, and query is
the actual query. The query is a comma separated keyword value pair
("keyword1=value1, keyword2=value2") that will be forwarded to the file
catalog. The syntax is the same allowed for the command line interface of the
file catalog at the -cond parameter.
discard  xs:boolean  optional      
documentation 
The discard attributes tells the scheduler to discard the stream, that is, to get
rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored
otherwise).

Be careful when using this option: when using the GRID you don't know where
your job is going to run, and the standard output/error are crucial to understand
what went wrong.
annotation
documentation I/O streams: "stdin" "stdout" "stderr" elements.



XML Schema documentation generated with
XMLSPY Schema Editor http://www.altova.com/xmlspy