XML JDL Schema
Submitted by lbhajdu on Tue, 2008-04-29 17:52.
Under:
| diagram |  | | type | xs:string | | used by | | | annotation | | documentation | The command element doesn't have any attributes, and the data that it contains is the actual command script that will be submitted using a csh script. You can use environment variable to retrieve special information, such as the JOBID, the FILELIST, or more. But remember that the command line will be passed as it is, and therefore if csh doesn't perform the substitution (for example, because part of your command containing the variable is between '...'), the scheduler won't. Refer to csh man pages. If you have doubts on the correct execution of your command, you can simulate the submission and manually check the script. |
| | diagram |  | | used by | | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | URL | xs:string | required | | | | | nFiles | xs:string | optional | 100 | | | documentation | The number of files returned by the query |
| | singleCopy | xs:boolean | optional | true | | | documentation | Specify if the query should return one copy for each file, or it should return multiple copies if they are available.
For example, suppose one file has two copies: one on rcas6060.star.bnl.gov and one on NFS. By selecting "true", only one of them is returned. By selecting "false", both of them can be returned. In the second case, you job will actually run on two copies of the same file.
By default only one copy of the same file is returned. |
| | preferStorage | xs:NMTOKEN | optional | | | | documentation | When multiple copies are available for one file, this attribute is used to choose which particular copy to get. This attribute has meaning if singleCopy is not set to false. If more than one copy is available which the preferred storage (for example, a file is available on two different machines), one copy is chosen at random.
This attribute was introduced because small jobs on small set of files are penalized when dispatched on local files: they have to wait for a particular machine to free up, and that might take a long time even if the rest of the farm is free. Executing on local files make each job faster, but it might increase the waiting time before the job gets executed. Therefore, NFS is recommended only for testing your analysis on a small set and local when you run on the entire set.
Remember that the query might return local files even if you chose NFS. If you want _only_ NFS or local files, then put "storage=NFS" inside your query. |
|
| | annotation | | documentation | The input element is used to specify data input files. Input files can be specified by either a path and filename resident on network mounted disks, such as AFS or NFS; it can be a file on a local disk; it can be a query to the file catalog. We suggest that you use the latter, because it provides the system more flexibility on how to allocate your process. One can specifies more than one input file element. You can mix NFS files with local files and catalog queries. You can have more than one catalog query. To specify the location of the input files, you still use an URL. |
| | diagram |  | | children | command stdin stdout stderr input output | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | simulateSubmission | xs:boolean | optional | | | | documentation | Tells the scheduler whether to dispatch the actual jobs. If true, the file scripts are created, but they are not actually submitted. This is useful to check whether everything is functioning correctly. |
| | name | xs:string | optional | | | | documentation | Gives the job a name by which it can be identified in the underlying batch system. |
| | mail | xs:boolean | optional | false | | | documentation | Tells the scheduler whether to allow the submission of a job that will returns it's output by mail. If not this is not set, or is not equal to true, the scheduler will fail if a stdout wasn't specified. This option is here to prevent a user to accidentally send himself all the outputs by mail. |
| | nProcesses | xs:long | optional | | | | documentation | [New field] |
| | minFilesPerProcess | xs:long | optional | | | | documentation | Tells the scheduler the minimum number of files each process should run on. The scheduler will do its best to keep this requirement, but it's not guaranteed to succeed. If a correct distribution is not found, the user will be asked to validate it. |
| | maxFilesPerProcess | xs:long | optional | | | | documentation | Tells the scheduler how many input files to assign to each process at maximum. This number should represent the number of files that your program, by design, is not allowed to have (e.g. after 150 files memory use has increased too much due to a memory leak). The actual number of files dispatched to the process is decided by the scheduler, which takes into account user requirements (i.e. minFiles, maxFiles and filesPerHour) and farm resources (i.e. length of the different queues). |
| | filesPerHour | xs:double | optional | | | | documentation | Tells the scheduler how many input files per hour the job is going to analyze. This information is used by the scheduler to determine an estimate of the job execution time. This is necessary to determine the correct usage of resources (e.g. use the long or short queue). By combining the use of filesPerHour and minFilesPerProcess, you can basically tell the scheduler what is the minimum time required by your job, and force the use of long queues. If this attribute is not provided, the job is assumed to be instantaneous (e.g. the processes will be dispatched to the short queue no matter how many input files it has). |
| | fileListSyntax | xs:NMTOKEN | optional | paths | | | documentation | This attribute tells the scheduler which syntax to use for the file list. There are only a few possible values imposed by the schema. There are currently: paths, rootd.
"paths" syntax returns both files on local disk and on distributed disk as a normal path used by the filesystem. This syntax is useful within scripts. The "paths" syntax looks like this /path1/file1 /path2/file2 /path3/file3 ...
"rootd" syntax returns files on distributed disk with paths, and files on local disk with the rootd syntax. It also appends the number of events contained in each file. This file syntax is designed to work with the MuDST makers, and has two advantages: (1) It allows root to access files that are on the local disk of a different node, making it possible to guarantee the minFilesPerProcess (2) By giving the number of events in the files, the MuDST maker doesn't have to pre-scan the files, slightly improving performance. The "rootd" syntax looks like this /NFSpath1/file1 nEvents1 /NFSpath2/file2 nEvents2 root://machine//path3/file3 nEvents3 root://machine//path4/file4 nEvents4 ... |
| | inputOrder | xs:string | optional | | | | documentation | This attributes tells the scheduler that you want your input files ordered according to the value of some catalog attribute. This is not going to provide the filelists always in sequence: there can always be gaps. It's only going to reorder the filelists after they are produced. This options is only possible if all the inputs are catalog queries. |
| | minStorageSpace | xs:long | optional | | | | documentation | Tells the scheduler the minimal storage space (disk most likely) a job will need to run. A job should not be scheduled on a node having less space than this specified number. |
| | maxStorageSpace | xs:long | optional | | | | documentation | Tells the scheduler the maximum storage space (disk most likely) a job will need to run. If not specified the job may fail if it has not enough space. This value may be used for advanced reservation of storage space. This is necessary to determine the correct usage of resources. |
| | minMemory | xs:long | optional | | | | documentation | Minimum memory expected for an individual job (in MB). Setting this value will affect the scheduling priority. |
| | maxMemory | xs:long | optional | | | | documentation | Maximum memory an individual job is expected to use (in MB). Setting this value will afefct scheduling priority. |
|
| | annotation | | documentation | The top of the Scheduler submission description schema is the "job" element. This element MUST be present and all other specifications relates to it. the "job" element however has many characteristics defined via attributes documented herein. |
| | diagram |  | | type | stdType | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | URL | xs:anyURI | required | | | | documentation | This complex type tells the scheduler to which file redirect the standard input, output or error. The URL must be of the file protocol (that means that is a local file, accessible via file system), and it should be visible on all machines (for example, a file on NFS or AFS).
Remember that the stdout and the stderr must be different for every process, otherwise all the process that the scheduler will divide your job in will overwrite the same file. To achieve that, you can use the $JOBID environment variable.
For the input element, tells the scheduler which input files to associate to the processes.
Network file. To specify a file that is accessible on all machines on the same file path, you should write "file:/path/name".
File on local disk. To specify a file that is resident on a target machine, that is a machine on which the scheduler is allowed to submit the job, you should write "file://host/path/name".
Filelist on Network disk. You can specify a text file that is going to contain a list of files on which to run your analysis. You should write "filelist:/path/name".
Catalog query. To specify a query to the file catalog, you should write "catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query; star.bnl.gov tells you are querying the catalog for star at BNL, and query is the actual query. The query is a comma separated keyword value pair ("keyword1=value1, keyword2=value2") that will be forwarded to the file catalog. The syntax is the same allowed for the command line interface of the file catalog at the -cond parameter. |
| | discard | xs:boolean | optional | | | | documentation | The discard attributes tells the scheduler to discard the stream, that is, to get rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored otherwise).
Be careful when using this option: when using the GRID you don't know where your job is going to run, and the standard output/error are crucial to understand what went wrong. |
|
| | annotation | | documentation | Standard input |
| | diagram |  | | type | extension of stdType | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | URL | xs:anyURI | required | | | | documentation | This complex type tells the scheduler to which file redirect the standard input, output or error. The URL must be of the file protocol (that means that is a local file, accessible via file system), and it should be visible on all machines (for example, a file on NFS or AFS).
Remember that the stdout and the stderr must be different for every process, otherwise all the process that the scheduler will divide your job in will overwrite the same file. To achieve that, you can use the $JOBID environment variable.
For the input element, tells the scheduler which input files to associate to the processes.
Network file. To specify a file that is accessible on all machines on the same file path, you should write "file:/path/name".
File on local disk. To specify a file that is resident on a target machine, that is a machine on which the scheduler is allowed to submit the job, you should write "file://host/path/name".
Filelist on Network disk. You can specify a text file that is going to contain a list of files on which to run your analysis. You should write "filelist:/path/name".
Catalog query. To specify a query to the file catalog, you should write "catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query; star.bnl.gov tells you are querying the catalog for star at BNL, and query is the actual query. The query is a comma separated keyword value pair ("keyword1=value1, keyword2=value2") that will be forwarded to the file catalog. The syntax is the same allowed for the command line interface of the file catalog at the -cond parameter. |
| | discard | xs:boolean | optional | | | | documentation | The discard attributes tells the scheduler to discard the stream, that is, to get rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored otherwise).
Be careful when using this option: when using the GRID you don't know where your job is going to run, and the standard output/error are crucial to understand what went wrong. |
|
| | annotation | | documentation | Standard output |
| | diagram |  | | type | stdType | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | URL | xs:anyURI | required | | | | documentation | This complex type tells the scheduler to which file redirect the standard input, output or error. The URL must be of the file protocol (that means that is a local file, accessible via file system), and it should be visible on all machines (for example, a file on NFS or AFS).
Remember that the stdout and the stderr must be different for every process, otherwise all the process that the scheduler will divide your job in will overwrite the same file. To achieve that, you can use the $JOBID environment variable.
For the input element, tells the scheduler which input files to associate to the processes.
Network file. To specify a file that is accessible on all machines on the same file path, you should write "file:/path/name".
File on local disk. To specify a file that is resident on a target machine, that is a machine on which the scheduler is allowed to submit the job, you should write "file://host/path/name".
Filelist on Network disk. You can specify a text file that is going to contain a list of files on which to run your analysis. You should write "filelist:/path/name".
Catalog query. To specify a query to the file catalog, you should write "catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query; star.bnl.gov tells you are querying the catalog for star at BNL, and query is the actual query. The query is a comma separated keyword value pair ("keyword1=value1, keyword2=value2") that will be forwarded to the file catalog. The syntax is the same allowed for the command line interface of the file catalog at the -cond parameter. |
| | discard | xs:boolean | optional | | | | documentation | The discard attributes tells the scheduler to discard the stream, that is, to get rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored otherwise).
Be careful when using this option: when using the GRID you don't know where your job is going to run, and the standard output/error are crucial to understand what went wrong. |
|
| | annotation | | documentation | Standard error |
| | diagram |  | | children | copy register | | used by | | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | fromScratch | xs:string | optional | | | | documentation | With this attribute you specify either a file, a wildcard or a directory to be copied back. The file, wildcard or directory must be expressed relative to the $SCRATCH directory.
That is, to retrieve all the .root files your job saved in the $SCRATCH, simply use *.root in this attribute |
| | toURL | xs:anyURI | optional | | | | documentation | Tells the scheduler where to copy the output. The URL must represent either a network file or directory.
Network file. To specify a file, you should write "file:/path/name". You can specify a file here only if the output you specified is a file (you are not allowed to copy a directory in one file). You can specify a different name so that the file will be brought back with the different name.
Network directory. To specify a directory, you should write "file:/path/". |
|
| | annotation | | documentation | The output element is used to specify the output produced by your code. With this tag, you will be able to write your output on a local scratch directory on the node the job will be dispatched to, and the scheduler will copy your output at the end of the job. This will make better use of I/O resources. The environment variable $SCRATCH will contain a path available for your job. This space is unique for each process your job will be divided into, and will be deleted after you job ends. With the output element you are able to specify which files you want to bring back. You don't need to bring everything back, of course, but the output won't be available anymore later.
Remember that your job will be divided into different processes, and that all the processes should use different output filenames, or otherwise they will rewrite their outputs. You can always use the $JOBID to create unique filenames. |
| | diagram |  | | type | extension of mapType | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | ref | xs:ID | optional | | | | documentation | The reference for this object |
| | idref | xs:IDREF | optional | | | | documentation | A reference to another object |
| | URI | xs:anyURI | optional | | | | documentation | A URI describing the final product. |
| | storageService | xs:string | optional | | | |
| | annotation | | documentation | A physical file copy from A to B service. |
| | diagram |  | | type | mapType | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | ref | xs:ID | optional | | | | documentation | The reference for this object |
| | idref | xs:IDREF | optional | | | | documentation | A reference to another object |
| | URI | xs:anyURI | optional | | | | documentation | A URI describing the final product. |
|
| | annotation | | documentation | A registration service for datasets or files (physical or logical) |
| | diagram |  | | used by | | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | ref | xs:ID | optional | | | | documentation | The reference for this object |
| | idref | xs:IDREF | optional | | | | documentation | A reference to another object |
| | URI | xs:anyURI | optional | | | | documentation | A URI describing the final product. |
|
| | annotation | | documentation | Describes any action to be done on a input/output. This complex type has a reference (ref) and a pointer to a reference. Those references should be viewed (and used) as a reference to the URI attribute. For example, output has an ID (attribute name ref), copy may refer to its value via an IDREF (attribute idref) and define a final target via the URI which becomes the reference (ref) for that action. |
| | diagram |  | | used by | | | attributes | | Name | Type | Use | Default | Fixed | Annotation | | URL | xs:anyURI | required | | | | documentation | This complex type tells the scheduler to which file redirect the standard input, output or error. The URL must be of the file protocol (that means that is a local file, accessible via file system), and it should be visible on all machines (for example, a file on NFS or AFS).
Remember that the stdout and the stderr must be different for every process, otherwise all the process that the scheduler will divide your job in will overwrite the same file. To achieve that, you can use the $JOBID environment variable.
For the input element, tells the scheduler which input files to associate to the processes.
Network file. To specify a file that is accessible on all machines on the same file path, you should write "file:/path/name".
File on local disk. To specify a file that is resident on a target machine, that is a machine on which the scheduler is allowed to submit the job, you should write "file://host/path/name".
Filelist on Network disk. You can specify a text file that is going to contain a list of files on which to run your analysis. You should write "filelist:/path/name".
Catalog query. To specify a query to the file catalog, you should write "catalog:star.bnl.gov?query". catalog: tells that the URL is a catalog query; star.bnl.gov tells you are querying the catalog for star at BNL, and query is the actual query. The query is a comma separated keyword value pair ("keyword1=value1, keyword2=value2") that will be forwarded to the file catalog. The syntax is the same allowed for the command line interface of the file catalog at the -cond parameter. |
| | discard | xs:boolean | optional | | | | documentation | The discard attributes tells the scheduler to discard the stream, that is, to get rid of it. This attribute is meaningfull only for stdout and stderr (and will be ignored otherwise).
Be careful when using this option: when using the GRID you don't know where your job is going to run, and the standard output/error are crucial to understand what went wrong. |
|
| | annotation | | documentation | I/O streams: "stdin" "stdout" "stderr" elements. |
| XML Schema documentation generated with XMLSPY Schema Editor http://www.altova.com/xmlspy
|