A group of requests and a description of how the requests are organized into
a logical sequence of events (AND, OR, XOR, etc ... relationship) as well
as datasets which may be used by requests
MergeRequests { id=[xs:ID] }
RequestID { idref=[xs:IDref] }
....
ChooseBetweenRequests { id=[xs:ID] }
RequestID { idref=[xs:IDref] }
SetupRequestDependancy { id=[xs:ID] }
RequestToBeDoneFirstID { idref=[xs:IDref] }
RequestWaitingForInputID { idref=[xs:IDref] }
For easier handling of initialization, end-task garbagge collection and task
wrap up, we may want to define initialization/finalization special requests.
Those could be defined once, and re-used for every job generated by a
single request.
Also, there may be a need to do a unique request (task) for all jobs (not
for each jobs).
This element may be removed later (only a placeholder for later considertaion).
Ideally, outputs are themselves datasets and currentely, input/output are
non-symetrical.
Example: A XXTask will depend (or reference) a XXX aplication. Purpose
is to allow versionning.
In the newly proposed U-JDL, application and task are merged.
Either a jar or a class name
The location of the JAR file containing the main class
The arguments to use to execute the JAR file.
Below, some ideas from DIAL
Dataset{ identity=[xs:ID],
mutability=[locked,appending],
location=[virtual,logical,physical,staged,mixed],
composite=[xs:boolean]
}
Contentlist
ContentKey { type=[xs:string] }
In this case, type should specify what this generic set relates to (Physical, Logical, Event data-sets etc ...). Note that because the model is multiple-choice un-bounded, list is implied as soon as one element has been chosen.
LFN, PFN, ...
Event ID, ...
Reference to other DataSets (merging)
Range may be used for event number ranges. It may not have any special significance for physical files. A range defines however a collection and fits well with Catalog and List.
Catalog element is the connection to any relevant catalog for the DataSet. Here, we envision the handling of Event, MetaData or File catalogs. We may
name the DataSets difefrentely for clarity but the genericDataSet is meant to cover everything as needed. The return value of a Catalog is compatible with a List (otherwise, use Token)
A token is a referral to an extrenal iterrator returning the DataSet of interrest. For example, the scheduler may declare to an external service its intent to work on a perticular DataSet and given back a token which will be later used by the application to retreive that DataSet. In case of planners, the Token mechanism would be more appropriate. A token based DataSet is most likely virtual (here again, don't know how to add the constraint)
Any text, including command line arguments
Environment is a list (unbounded) of EnvironmentVariables. The notion was taken from datasets.dtd with change of attributes to elements.
The number of resource requirements may change in future.
Minimum may be used for dispatching purposes. A process will not start unless the available resource is above this minimum.
If unspecified, there are no known upper limits. If specified, this value indicates that the job will use this value as maximum quanta of the resource will be taken.
Rate estimated by the user for his request.
In MB
Total Memory (Virtual + RAM). Distinction is not relevant to users ??
In SUMS, we implemented a numFilesPerHour. This gave an indication of
the input rate for a request. The request would then be split into sub-jobs
according to available resources (CPUtime limit associated to a queue). A
job would end up with 100 files each for example ... while the dataset would
be of several 1000 files. BUT, input may be implemented with a different
concept than files. For example, input can be events, tracks, gene
sequence, etc ... This tries to generalize the idea.
Symetric of input ... Note that the type of storage is not specified here (like
if the storage is NFS, AFS, Db, etc ...). The rate combined with the unit
size will obviously limit the possibilities of where the jobs can run. This
implies that this resource alone may map to several low-level JDL
requirements (TransferRate, Type from the StorageDevice element
in Glue CE, or event connection speed if we speak of db)