PassivePolicy

The PassivePolicy is really the heart of the scheduler right now. It decides how to query the catalog, how to split the job and which resources to use. More details of the different components will be given in the code, here we give an overview of the scope of the policy

The name comes from the fact that the policy doesn't change the state of the infrastructure: doesn't move files from one disk to the other, doesn't restore anything from HPSS... In that sense, the policy is passive.

Distributed Disk

One of the aim of the STAR scheduler was the use of local disks on the cluster nodes. Data files can basically be located on a distributed file-system, accessible by all the nodes, and on the local file-system, accessible to only the node that contains the data.

Job splitting, then, has to take into consideration this aspect. Files on local disk can only be grouped with other other files accessible on the same node. This is made more difficult by some possible requirements from the users: some jobs require a minimum number of files, what do we do if there aren't enough files on the node?

CopySelectors

Most of the time, the user will require a single copy of a file. The scheduler has to choose wisely, especially when there are min/max constraints. A CopySelector is an object that, given a query result that may have multiple copies for a file, chooses which copy to use.

The CopySelectorFactory is going to choose which CopySelector is appropriate to use for a particular situation. One first contacts the CopySelectorFactory, get an instance of a copy selector and then use it. If one was to need a different strategy for the choice of the copy, he would have to create a new class that implements the CopySelector interface, and change the logic of the factory.

The CopySelector solve a sub problem for the general problem of job splitting, which is solved by the AssignmentStrategy.

AssignmentStrategy

The scheduler not only has to choose which copy to use, but also how to divide them on different machines. This job is done by an AssignmentStrategy. Also here we use a factory to encapsulate the logic with which different strategies are used.

FileAssignment encapsulate the data structure used by the AssignmentStrategy (and the CopySelectors) to describe which files are used and how are they grouped.


Gabriele Carcassi - page was last modified