Glue Computing Element Schema
version 1.1
FINAL
Last revision: 12 March 2003
The basic structure of the Glue schema is shown in the figure below. The major components of the initial Glue information model are:
GLUE representations are constructed from the top down. That is, once can deploy a computing element without specifying the cluster, a computing element and cluster without subcluster definitions, and subcluster definitions without providing host definitions.
Every information element in the GLUE schema has a unique name. This enables disambiguation of nodes. This can be important for example in determining that two different computing elements are actually referring to the same physical resources when determining available processing resources.
Note that this document does not currently contain information regarding file systems or any storage element data. These are recognized by the group as extremely important and high priority, and will be addressed shortly.
As illustrated in the figure above, information is organized hierarchically: hosts are composed into sub-clusters (is partition a better name), sub-clusters are grouped into clusters, and then computing elements refer to one or more clusters. Depending on the implementation, different techniques will be used to represent the structure of the information. For example in current MDS technology, the structure can be represented via a DIT in the GRIS that provides access to the information. In an OGSA base implementation, the structure could be directly represented via an XML document.
Information is represented via named attributes. Attributes may be used in more then one location in the information model. For example attributes at the host level to represent details of the host or node, while the same attribute may be used may also be used to represent summary information at the sub-cluster level. Attribute names are not scoped to the use, and therefore should be selected from a namespace that ensures uniqueness.
Attributes are grouped together to form named objects. There are two types of objects in the information model. Structural objects (computing elements, cluster, sub-cluster, nodes nodes and hosts) are containers for other objects. Auxiliary objects carry the attributes that carry the actual information. Each container can have required, advised and optional auxiliary objects associated with it.
The computing element models a computational service which access point is a queue. Each queue points to one or more clusters. The Computing Element is a container and can contain the following objects:
ComputingElement |
ID # from table |
|
UniqueID |
ID.5 |
A unique identifier for the computing element. For example, EDG uses CE-hn:CE-port/jobmanager-CE-lrms-CE-queue |
InformationServiceURL |
|
URL of the local information service providing for info about this entity |
Name |
A name for this service |
Info |
ID # from table |
|
LRMSType |
ID.3 |
Name of local resource management system |
LRMSVersion | Version of local resource manager | |
GRAMVersion |
QP.2 |
The GRAM version |
HostName |
ID.1 |
Fully qualified host name for host on which the gatekeeper that corresponds to the computing element runs. |
GatekeeperPort |
ID.2 |
Port number for the gatekeeper. |
TotalCPUs |
Num.1 |
Number
of CPUs available to the queue. |
Policy |
ID # from table |
|
MaxWallClockTime |
QP.3 |
The maximum wall clock time allowed for jobs submitted to the CE in mins (0=not specified) |
MaxCPUTime |
QP.4 |
The maximum CPU time allowed for jobs submitted to the CE in mins (0=not specified) |
MaxTotalJobs |
QP.6 |
The maximum allowed number of jobs in the CE (0=not specified) |
MaxRunningJobs |
QP.7 |
The maximum number of jobs allowed to be running (0=not specified) |
Priority |
QP.8 |
Info about the Queue Priority |
State |
ID # from table |
|
RunningJobs |
QS.1 |
Number of currently running jobs |
TotalJobs |
Number of jobs in the CE (=RunningJobs+WaitingJobs) |
|
Status |
QS.2 |
States a queue can be in: 1. Queueing: the queue can accept job submission, but can’t be served by the scheduler 2. Production: the queue can accept job submissions and is served by a scheduler 3. Closed: The queue can’t accept job submission and can’t be served by a scheduler 4. Draining: the queue can’t accept job submission, but can be served by a scheduler |
WaitingJobs |
QS.3 |
Number of jobs that are in a state different than running |
WorstResponseTime |
RT.1 |
Worst time between job submission till when job starts its execution in sec |
EstimatedResponseTime |
RT.2 |
Estimated time between job submission till when job starts its execution in sec |
FreeCPUs |
Free.1 |
Number of free CPUs available to a scheduler (generally used with Condor) |
AccessControlPolicyBase |
ID # from table |
|
Rule |
User.1 |
A
rule that grant/deny access to the Computing Element service, specific
semantic needs to be defined (e.g. list of X509 user certificate subjects,
VO names) |
Job |
ID # from table |
|
LocalOwner |
QJ.1 |
Owner local username |
GlobalOwner |
QJ.2 |
Owner GSI subject name |
LocalID |
QJ.3 |
Job local id |
GlobalID |
QJ.4 |
Job global id |
Status |
QJ.5 |
Job status {SUBMITTED, WAITING, READY, SCHEDULED, RUNNING, ABORTED, DONE, CLEARED, CHECKPOINTED} |
SchedulerSpecific |
QJ.6 |
Scheduler specific info |
The CE will always point to a cluster, this is a containment relationship. In the absence of other information, it is assumed that the CE has access to all of the resources contained in the cluster. If this is not true, the “Accessible” Attribute in the QueueInfoOpt object will enumerate what subset of the resources are available to the queue. This is needed in part to construct different logical partitionings for the resource.
The cluster information element provides a grouping of Hosts and sub-clusters. Only the name of the cluster is required, enumeration of underlying cluster structure is optional.
A cluster may represent a grouping of individually described nodes or hosts, or a set of SubClusters. An individual host may be represented in a cluster to capture a unique computational element, such as a head node, or in the case where subcluster grouping does not make sense, enumerate the actually computing elements. A cluster has the same attributes as an MDSHostNodeGroup. This is only a name – previously ID.G.4, now also ID.6.
Cluster |
ID # from table |
|
Name |
ID.6 |
Name of the cluster, taken from I.g.4, MDS-host-node-group-name |
InformationServiceURL |
|
URL of the local information service providing for info about this entity |
UniqueID |
ID.9 |
Unique ID for the cluster |
A subcluster is used to represent a collection of computing resources whose configuration is homogeneous enough that it can be represented by a common set of attributes. For example, a SubCluster can represent the part of a cluster that consists of the nodes with the same CPU, OS, memory and disk configuration. The definition of homogeneous is determined by the set of attributes that are enumerated in the subcluster definition. These attributes must be such that the values of the specified attributes are the same for every node included in the subcluster. Note that elements of a subcluster are only homogenious with respect to the specified values, and the detailed description of the nodes (if provided) may differ in values that are not specified at the subcluster level. We further constrain, the decomposition of resources into subclusters must be such that no node is included in more then one subcluster.
Each SubCluster object has a name attribute that provides a unique name to the subcluster. To represent the resources that the subcluster is summarizing, the SubCluster must contain a name (attribute SubClusterName, ID.7) and a count (attribute SubClusterCount, id.8).
SubCluster | ID # from table |
|
Name |
ID.7 |
Name of the Subcluster |
UniqueID | ID.10 | Unique ID for the Subcluster |
InformationServiceURL |
|
URL of the local information service providing for info about this entity |
In addition, much of the information available from theHost level may be optionally included in a subcluster is the values are the same across all of the nodes in the subcluster. These are labeled in the tables in the subcluster section.
The Host element is used to represent details of a specific computing element. Many of the objects that may be contained at the host level can be included in the subcluster, these are marked.
There are several attributes that will be able to be located at different levels of the hierarchy depending on their homogeneity properties. Two sets of these are the network attributes and the filesystem attributes. For example, if all of the nodes that can be addressed by a single CE share all of their file systems, that information could hang off the CE level. However, if it was uniform only at the sub-cluster level, it would be attached there, and if it were different for every node then it would need to be at the node level.
Architecture |
ID # from table |
Included in SubCluster? |
|
PlatformType |
Arch.5 |
informally describes the platform type of the computing element |
Yes |
SMPSize |
Num.2 |
number of CPUs in an SMP node |
Yes |
Operating System |
ID # from table |
Included in SubCluster? |
|
Name |
Os.1 |
informally names the OS using a vendor-specific convention |
Yes |
Release |
OS.2 |
informally names the OS release using a vendor-specific convention |
Yes |
Version |
OS.3 |
informally names the OS or kernel version using a vendor-specific convention |
Yes |
Benchmark |
ID # from table |
Included in SubCluster? |
|
SI00 |
Benchm.1 |
The SpecInt2000 benchmark of the nodes associated to the subcluster |
Yes |
SF00 |
Benchm.2 |
The SpecfFloat2000 benchmark of the nodes associated with the subcluster |
Yes |
ApplicationSoftware |
ID # from table |
Included in SubCluster? |
|
RunTimeEnvironment |
SW.1 |
List of softwares/packages installed on this subcluster |
Yes |
ArchitectureDetails |
ID # from table |
Included in SubCluster? |
Processor |
ID# from table |
Included in SubCluster? |
|
Vendor |
Arch.1 |
Informally names CPU vendor |
Yes |
Model |
Arch.2 |
Informally names CPU model |
Yes |
Version |
Arch.3 |
Informally names CPU version |
Yes |
ClockSpeed |
Arch.11 |
The MHz associated with the CPUS in the subcluster |
Yes |
ComputerISA |
Arch.4 |
informally names the Instruction Set Architecture (ISA) of the computing element |
Yes |
Features |
Arch.6 |
informally names optional CPU features |
Yes |
CacheL1 |
Arch.7 |
first-level unified cache size (in kb) of a cpu |
Yes |
CacheL1I |
Arch.8 |
first-level instruction cache size (in kb) of a cpu |
Yes |
CacheL1D |
Arch.9 |
first-level data cache size (in kb) of a cpu |
Yes |
Cachel2 |
Arch.10 |
second-level unified cache size (in kb) of a cpu |
Yes |
MainMemory |
ID # from table |
Included in SubCluster? |
|
RAMSize |
Mem.1 |
configured physical memory on any one CPU in the subcluster in MB |
Yes |
RAMAvailable |
Mem.2 |
unallocated RAM size in MB |
Yes |
VirtualAvailable | available virtual memory | Yes | |
VirutalSize |
Mem.3 |
configured disk-based virtual memory (VM) in MB in a computing node |
Yes |
The file system class can be specialized in REMOTE (for remote directory locally mounted) or LOCAL (for local directory); each local file system can contains directories. Each directory can be associated to a Storage Space.
FileSystem | |
Root | path name or other information defining the root of the file system |
Name | the name for the file system |
Type | the file system type (e.g. NFS, AFS) |
ReadOnly | is the file system readonly? |
Size |
Total space assigned for this file type (MB) |
AvailableSpace | Total available space for this file type (MB) |
File | |
Name | Name for the file |
Size | File size in bytes |
CreationDate | File creation date and time |
LastModified | Last modified date and time |
LastAccessed |
Last access date and time |
Latency | Time taken to access file in seconds |
Owner | File owner |
LifeTime | Date and time after which the file can be canceled |
(to add path attribute and cancel owner)
Directory | |
Name | Name for the file |
(to be updated to be specialization of a file)
(Storage Device: to be canceled, none is interested in publishing it; maybe also file class)
RemoteFileSystem |
ID # from table |
Included in SubCluster? |
|
Name |
Mem.1 |
configured physical memory on any one CPU in the subcluster in MB |
Yes |
RAMAvailable |
Mem.2 |
unallocated RAM size in MB |
Yes |
VirtualAvailable | available virtual memory | Yes | |
VirutalSize |
Mem.3 |
configured disk-based virtual memory (VM) in MB in a computing node |
Yes |
ProcessorLoad |
ID # from table |
Included in SubCluster? |
|
Last1min |
Free.2 |
1-minute average processor availability for a single node (the difference between the available CPUs and the average runable task count during that time) X 100 |
No |
Last5min |
Free.3 |
5-minute average processor availability for a single node (the difference between the available CPUs and the average runable task count during that time) X 100 |
No |
Last15min |
Free.4 |
15-minute average processor availability for a single node (the difference between the available CPUs and the average runable task count during that time) X 100 |
No |
SMPLoad |
ID# from table |
Included in SubCluster? |
|
Load1min |
Free.5 |
1-minute average processor availability for an SMP node (multi CPU), which is the difference between the available CPUs and the average runable task count during that time X 100 |
No |
Load5min |
Free.6 |
5-minute average processor availability for an SMP node (multi CPU), which is the difference between the available CPUs and the average runable task count during that time X 100 |
No |
Load15min |
Free.7 |
15-minute average processor availability for an SMP node (multi CPU), which is the difference between the available CPUs and the average runable task count during that time X 100 |
No |
NetworkAdapter |
ID # from table |
||
Name |
Net.3 |
names a network interface |
no |
IPAddress |
Net.4 |
ip address of a network interface |
no |
OutboundIP |
Net.1 |
Defines if outbound connectivity is allowed from "worker nodes"- can a worked node initiate outbound connectivity |
Yes |
InboundIP |
Net.2 |
{Defines if inbound connectivity is allowed} |
Yes |
MTU |
Net.6 |
maximum transmission unit size (in bytes) for a network interface |
no |