STAR   Computing Tutorials main page
Grid Collector Tutorial
Offline computing tutorial Maintained by Wei-Ming Zhang

This is the long version of the Grid Collector tutorial for STAR offline analysis. The shorter version is here.

ATTENTION: The GridCollector rely on both tags and a bitmap index to be build. If it is not in place, this tool will not work.

  1. A brief introduction
    Grid Collector provides an extension to the existing STAR analysis framework. It enables users to specify events of interest as conditions on tags. This allows the analysis code to read only the events of interest and speedups the analysis process. The word 'collector' in the name refers to this capability of 'collecting' events of interest for analyses. It is also a 'Grid' software because it is able to transfer files over the Grid on demand. This eliminate the need for the user to explicitly retrieve the files from HPSS or other storage sites.

    For more background details, see the slides from a GC talk at the July 04 Collaboration Meeting.

  2. Current status
    A working version is installed at RCAS and at PDSF for STAR users. It supports both event.root and MuDst.root data.

  3. Available productions in GC
    Grid Collector builds a bitmap index to enable search of events. At this point, all 'P04xx' productions are indexed and are available to the end user. Future productions are expected to automatically indexed as they are produced.

    Please contact John Wu, if you would like to have access to older productions.

  4. Limitations and restrictions
    1. The file IO of GC goes through StIOMaker which invokes StTreeMaker and StMuIOMaker for event.root and MuDst.root data, respectively. Support for other IO classes is NOT currently planned.
    2. A large job may be split easily on to multiple machines, however, at this point, there is no mechanism to ensure a repeat run of the same job will split the events in exactly the same order.
    3. For early productions, such as P02gh, StMuIOMaker may have trouble to read certain (bad?) MuDst events. However, we have not encountered any problem in reading events in recent productions so far.

  5. How to write analysis makers and macros
    Most existing analysis macros need to define a StFileI* object with new StFile, for example,
    StFileI* setFiles = new StFile("files");
    To use GC, simply initialize the variable with StGridCollector::Create() as follows,
    StFileI* setFiles = StGridCollector::Create("select ...");
    There are example analysis makers, StAnalysisMaker for event.root and StMuAnalysisMaker for MuDst.root, and their associated macro doEvents.C in CVS. StAnalysisMaker is kept the same as before the implementation of GC. StMuAnalysisMaker is new. It shows how to access branches of MuDst data when the standard file IO maker StIOMaker is used to open MuDst data files in a macro. The macro doEvents.C is updated for GC and MuDst. The default analysis maker in doEvents.C is StAnalysisMaker. To analyze MuDst data, the user has to instance a StMuAnalysisMaker instead of the default StAnalysisMaker in doEvents.C as in
    StMuAnalysisMaker *analysisMaker = new StMuAnalysisMaker("analysis");

  6. Where to find the examples
    They can be checked out from CVS. Please find StAnalysisMaker and StMuAnalysisMaker in $STAR/StRoot/ and doEvents.C in $STAR/StRoot/macros/analysis/

  7. Running doEvents.C on a list of files
    This is the standard way of using the STAR analysis framework. The updated doEvents.C can still be run in a sequential mode to read all events in a list of files. Here is an examples:
    .x doEvents.C(100, "filenames", "") 
    As usual, the first input parameter 100 in the above examples is the number of requested events. The second is a char string for the input file names. A long list of file names can be placed in a file and the second argument can be "@filename". Please include its path in the filename if a file is not in your working directory. Please start the path with "./" if the it is relative. Other options such as "dbon" still work as usual. The sequential mode can be used to test and debug analysis makers before turning on GC.

  8. How to run doEvents.C with GC
    The first parameter is still the number of requested events. GC would read all events available if it is negative. The second parameter should be a command for Grid Collector. The third parameter has to include "gc". A brief help will be displayed with an execution,
     .x doEvents.C() 

    To be more user-friendly, the macro doEvents.C allows the user to save the GC command in an ASCII file, say, GC_sample.txt, then run it as in

    .x doEvents.C(100, "@GC_sample.txt", "gc")

    There are two types of commands for Grid Collector, a SQL select statement or a gc command line. Here are two examples of the SQL select statement:

    "select event where production=P03ia and 20030300<=mProdTime<=20030330"
    for event.root data and
    "select MuDst where production=P02gc and magScale=FullField and numberOfPrimaryTracks > 200"
    for MuDst.

    The statement starts with a "select" clause followed by a "where" clause. It is not case sensitive. For running doEvents.C, the "select" clause can be omitted because "select event" is the default in GC. The "where" clause is mandatory. It consists of the reserved word "where" followed by a list of conditions joined together by logical operators AND, OR, XOR, and NOT. The conditions are of the forms of 'production=P03ia', '10<=NV0<=50', and 'primaryVertexX*primaryVertexX + primaryVertexY*primaryVertexY < 2'. The variable names in the conditions are the names of tags in tags.root files plus three additional names used by the File Catalog: production, trgSetupName, and magScale. These three additional attributes have the same meanings as in the fileCatalog. The majority of the variable names are inherited from tags.root files and are self explanatory. The number of tags and their names vary from production to production. As examples, tags of production P02gc, P03ia, and P04ie are listed in files P05ia_tags.txt, P02gc_tags.txt, P03ia_tags.txt, and P04ie_tags.txt, which are typical for the data from years 2002, 2003, and 2004, respectively.

    The conditions may also include common functions including "acos", "asin", "atan", "ceil", "cos", "cosh", "exp", "fabs", "floor", "frexp", "log10", "log", "modf", "sin", "sinh", "sqrt", "tan", "tanh", "atan2", "fmod", "ldexp", "pow". The definition of this functions are the same as in a standard math library (see "man math" on most Unix Systems). These standard math functions can be used in any condtions to replace a simple variable name, for example, "sqrt(VectorX*VectorX + VectorY*VectorY) < abs(Gamma)." In addition to these standard functions, there is a pesudo-function named "any" which can be used as the left-hand side of an equation of an "IN" expression, such as, "any(triggerId)=15007" and "any(triggerId) in (15007, 15006)."

    The command line style of GC commands can have similar elements as the SQL select statements, but provides more options to control Grid Collector. The two above examples can be alternatively expressed as

    "gc -select event -where 'production=P03ia and 20030300<=mProdTime<=20030330'"
    "gc -select MuDst -where 'production=P02gc and magScale=FullField and numberOfPrimaryTracks > 200'"

    Note the need to use an additional level of quote to make sure the conditions following the '-where' option is treated as one single string.

    To reduce the number of quotes required, one may specify options in the SQL style commands. This format defines three reserved words 'SELECT', 'FROM' and 'WHERE' as in a SQL SELECT statement. Each of these reserved words may follow the same arguments as above. For options that do not have corresponding reserved words, they still have to be specified as options, with two restrictions though. The options must appear before keyword 'WHERE', and must not split the keywords 'SELECT' or 'FROM' from their argument, for example,

    "select MuDst -m 10 -new where production=P02gc and magScale=FullField and numberOfPrimaryTracks > 200"
    "select MuDst from P02gc% -m 10 -new where magScale=FullField and numberOfPrimaryTracks > 200"

    Here is a list of most useful command line options for Grid Collector.

    In addition to use variables in the range expressions, the following functions can also be used.