User documentation

The command line interface to the FileCatalog

The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:

% get_file_list.pl [-all] [-distinct] -keys keyword[,keyword,...] [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] [-onefile] [-o outputfile]

Command line options

The command line options are described below:

-all
use all entries regardless of availability flag
-distinct
returns only distinct rows - the default setting is to allow repetition of returned values. Note that selection such as node,path,filename,storage shall return only a set of values with no duplicates. Any other queries would return duplicate entries
-keys
Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks.
-cond
Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations.
-onefile
A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many.
-start #
specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks)
-limit #
limit the number of records returned (default 100).
-delim $St
specify the characters that will separate the fields in the output (default: “::“)



Supported comparison or selection operators

<=

Not greater than


<

Lesser than


>=

Not less than


>

Greater than


<>

Not equal to


!=

Not equal to


=

equal to


!~

Not containing (i.e. do not match)

strings

~

Containing (i.e. approximately matching)

strings

[]

In range


][

Outside the range


%

Modulo

integer

%%

Not Modulo

integer



Logical operators

The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}

||

Logical OR

Strings or numbers

&&

Logical AND

Strings or numbers


Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.

The aggregate functions

These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.

sum

The sum of the values

avg

The average of the values

min

The minimum of the values

max

The maximum of the values

orda

Sort the output in ascending order by this keyword

ordd

Sort the output in descending order by this keyword

count

The count for a given selection

grp

Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context.



Examples of the use of get_file_list.pl command

Below are shown a few examples of how to use the get_file_list.pl command. The section gives the examples of a common questions a user may have and shows how to get an answer for them.

  1. The database contains files from many productions. What are they, and what is their description ?

    % get_file_list.pl -keys 'production,prodcomment'

    The output should be:

    simulation::
    raw::
    MDC4::
    MDC4::
    MDC4_test::
    MDC4::
    P00hd::
    P00he::
    P00hg::
    P00hi::
    P00hk::
    P00hi::
    P00hm::
    P01he::
    P01hf::
    P01hg::
    P01hi::
    P01gk::
    P02gd::
    P02ge::
    P02gc::
    P01gl::
    P02ga::
    P02gb::
    P02gf::
    P02gg::
    P02gh2::
    P02gx::
    P02gi1::
    P02gi2::
    n/a::
    P03ii::
    P03gi::
    P03ia::
    P03ib::
    P03ib::
     

    Using this kind of simple query one can get a list of valid values for most of keywords (e.g. storage, site, runtype, library). In our examples, we embrace the keywords and condition string with a single quote. This is because although without quotes, most query will be accepted, you have to be careful that some syntax use characters the shell understands as being special characters. Using single quotes around everything prevents the shell from interpreting the special characters.
    The query above returns twice four times the value MDC4 for production. What happens is that this value is found multiple times in the catalog but under different conditions. To eliminate the confusion, consider the following query instead


    % get_file_list.pl -keys 'production,library,prodcomment'
    simulation::sim::
    raw::raw::
    MDC4::trs2pp::
    MDC4::trsY2::
    MDC4_test::trs2y::
    MDC4::trsY2v::
    P00hd::SL00d::
    P00he::SL00e::
    P00hg::SL00g::
    P00hi::SL00i::
    P00hk::SL00k::
    P00hi::trs1i::
    P00hm::SL00m::
    P01he::SL01e::
    P01hf::SL01f::
    P01hg::SL01g::
    P01hi::SL01i::
    P01gk::SL01k::
    P02gd::SL02d::
    P02ge::SL02e::
    P02gc::SL02c::
    P01gl::SL01l::
    P02ga::SL02a::
    P02gb::SL02b::
    P02gf::SL02f::
    P02gg::SL02g::
    P02gh2::SL02h::
    P02gx::SL02x::
    P02gi1::SL02i::
    P02gi2::SL02i::
    n/a::n/a::
    P03ii::SL02i::
    P03gi::SL02i::
    P03ia::SL03ia::
    P03ib::SL02i::

    In the above, we see that the key-pair production,library is unique while production alone is not. This is because the two keywords are intimately related and the library keyword was used to differentiate two production run condition in this older production version. This is a typical case where the -distinct comes to the rescue .

    % get_file_list.pl -keys 'production,prodcomment' -distinct
    Simulation::
    raw::
    MDC4::
    MDC4_test::
    P00hd::
    P00he::
    P00hg::
    P00hi::
    P00hk::
    P00hm::
    P01he::
    P01hf::
    P01hg::
    P01hi::
    P01gk::
    P02gd::
    P02ge::
    P02gc::
    P01gl::
    P02ga::
    P02gb::
    P02gf::
    P02gg::
    P02gh2::
    P02gx::
    P02gi1::
    P02gi2::
    n/a::
    P03ii::
    P03gi::
    P03ia::
    P03ib::

    This time, we have only one set ...

  2. Optimizing on the above example, what to do to order the list ??

    % get_file_list.pl  -keys 'orda(production),prodcomment' -distinct
    MDC4::
    MDC4_test::
    n/a::
    P00hd::
    P00he::
    P00hg::
    P00hi::
    P00hk::
    P00hm::
    P01gk::
    P01gl::
    P01he::
    P01hf::
    P01hg::
    P01hi::
    P02ga::
    P02gb::
    P02gc::
    P02gd::
    P02ge::
    P02gf::
    P02gg::
    P02gh2::
    P02gi1::
    P02gi2::
    P02gx::
    P03gi::
    P03ia::
    P03ib::
    P03ii::
    raw::
    simulation::
  3. What are the file types coming from a given production ?

    % get_file_list.pl -distinct -keys 'production,filetype' -cond 'library=SL01g'

    The output should be:

    P01hg::daq_reco_tags
    P01hg::daq_reco_runco
    P01hg::daq_reco_hist
    P01hg::daq_reco_event
    P01hg::daq_reco_dst
     

    List a valid file types for a given library. Note how the -cond parameter is used. We also used -distinct in this query to ensure the return of unique key pairs.

  4. Where are the files of a specific type coming from a given production ?

    % get_file_list.pl -keys 'path,production,filetype' -cond 'library=SL01g,filetype=daq_reco_dst' -distinct

    The output should be:

     /home/starreco/reco/P01hg/2001/242::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/238::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/244::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/239::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/240::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/249::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/236::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/251::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/237::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/252::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/254::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/257::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/258::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/259::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/260::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/261::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/262::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/263::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/267::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/268::P01hg::daq_reco_dst
     /home/starreco/reco/P01hg/2001/270::P01hg::daq_reco_dst

    Note that there might be several conditions, but only one per keyword. Also note that you should NOT assume that the output is ordered (would be a rather fatal mistake on the perl Module interface level).

  5. Give me a "ready to use" list of files of a given type from a specific production.

    % get_file_list.pl -keys 'path,filename' -cond 'library=SL01g,filetype=daq_reco_dst' -delim '/'

    The output should be:

     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0001.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0002.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0003.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0004.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0005.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0006.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0007.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0008.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0009.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0010.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0011.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0012.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0013.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0014.dst.root
     /home/starreco/reco/P01hg/2001/242/st_physics_2242040_raw_0015.dst.root
     ...
     

    Note the using the delimiter of "/" between path and filename You get a "ready to use" list of files and their paths. A few w remarks : the list is long ... but by default, the interface will return to you only 100 lines / results. To get the full list (not advisable to do all the time ; there are more than 2 Million records in our Catalog), the -limit 0 command line option can be used. Any value <= 0 will return the full list while a number will make the interface return the exact number of records.

  6. Give me a "ready to use" list of files of a given type from a specific production, but only one location for each file.

    % get_file_list.pl -keys 'path,filename' -cond 'library=SL01g,filetype=daq_reco_dst' -delim "/" -onefile

    The output should be:

     The field list is path,filename
     The conditions list is library=SL01g,filetype=daq_reco_dst
     /home/starreco/reco/P01hg/2001/238/st_physics_2238001_raw_0105.dst.root
     /home/starreco/reco/P01hg/2001/238/st_physics_2238006_raw_0092.dst.root
     /home/starreco/reco/P01hg/2001/238/st_physics_2238009_raw_0085.dst.root
     /home/starreco/reco/P01hg/2001/238/st_physics_2238009_raw_0228.dst.root
     /home/starreco/reco/P01hg/2001/238/st_physics_2238013_raw_0020.dst.root
     /home/starreco/reco/P01hg/2001/239/st_physics_2239035_raw_0001.dst.root
     /home/starreco/reco/P01hg/2001/239/st_physics_2239041_raw_0068.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240003_raw_0030.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240003_raw_0160.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240003_raw_0283.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240007_raw_0100.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240009_raw_0102.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240013_raw_0045.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240013_raw_0175.dst.root
     /home/starreco/reco/P01hg/2001/240/st_physics_2240013_raw_0300.dst.root
     /home/starreco/reco/P01hg/2001/249/st_physics_2249008_raw_0006.dst.root
     /home/starreco/reco/P01hg/2001/249/st_physics_2249009_raw_0027.dst.root
     /home/starreco/reco/P01hg/2001/249/st_physics_2249009_raw_0154.dst.root
     /home/starreco/reco/P01hg/2001/236/st_physics_2236019_raw_0027.dst.root
     ...
     

    The -onefile keyword may be usefully when you have several copies of each files, but want to get the location of only one of them for each file (e.g. you need to process every MuDST, but each of them only once). This FileCatalog has been designed to account for multiple instance or replica of each file and by default, it may return multiple instances of a file.
    As a file may exist in several location, so was born the -onefile command line option.

  7. Count number of events in files in a given directory

    % get_file_list.pl -keys 'grp(path),sum(events)' -cond 'production=P02gc,filetype=daq_reco_event,storage=NFS'
     

    The output should be something like :

     /star/data06/reco/productionCentral600/FullField/P02gc/2001/313::54976
     /star/data06/reco/productionCentral600/FullField/P02gc/2001/318::65275
     /star/data06/reco/productionCentral600/FullField/P02gc/2001/319::67357
     /star/data06/reco/productionCentral600/FullField/P02gc/2001/320::91600
     /star/data06/reco/productionCentral600/FullField/P02gc/2001/321::42974
     /star/data06/reco/productionCentral600/ReversedFullField/P02gc/2001/324::90226
     /star/data06/reco/productionCentral600/ReversedFullField/P02gc/2001/325::63604
     /star/data06/reco/productionCentral600/ReversedFullField/P02gc/2001/326::19403
     /star/data06/reco/productionCentral600/ReversedFullField/P02gc/2001/327::48130
     /star/data24/reco/productionCentral1200/FullField/P02gc/2001/318::2079
     /star/data24/reco/productionCentral1200/FullField/P02gc/2001/319::3513
     /star/data24/reco/productionCentral1200/FullField/P02gc/2001/320::20709
     /star/data24/reco/productionCentral1200/FullField/P02gc/2001/321::35001
     /star/data24/reco/productionCentral1200/FullField/P02gc/2001/322::1006
     /star/data24/reco/productionCentral1200/ReversedFullField/P02gc/2001/324::8985
     /star/data24/reco/productionCentral1200/ReversedFullField/P02gc/2001/326::7646
     /star/data24/reco/productionCentral1200/ReversedFullField/P02gc/2001/327::7390
     /star/data24/reco/productionCentral1200/ReversedFullField/P02gc/2001/328::3116
     /star/data24/reco/productionCentral600/FullField/P02gc/2001/322::1244
     /star/data24/reco/productionCentral600/FullField/P02gc/2001/323::4724
     /star/data25/reco/ProductionMinBias/FullField/P02gc/2001/269::102989
     /star/data25/reco/ProductionMinBias/FullField/P02gc/2001/270::296570
     /star/data25/reco/ProductionMinBias/FullField/P02gc/2001/313::53934
     /star/data25/reco/ProductionMinBias/ReversedFullField/P02gc/2001/269::143083
     /star/data25/reco/ProductionMinBias/ReversedFullField/P02gc/2001/310::37807
     /star/data25/reco/ProductionMinBias/ReversedFullField/P02gc/2001/311::20774
     /star/data26/reco/Central/FullField/P02gc/2001/288::28445
     /star/data26/reco/productionCentral/FullField/P02gc/2001/311::50089
     /star/data26/reco/productionCentral/FullField/P02gc/2001/312::41030
     /star/data26/reco/productionCentral/FullField/P02gc/2001/313::97691
     /star/data26/reco/productionCentral/ReversedFullField/P02gc/2001/310::42260
     /star/data26/reco/productionCentral/ReversedFullField/P02gc/2001/311::75828
     

    The above output is only an example of outputs. Our query used storage=NFS and what we have on disk may depend on time ...
    The use of the aggregate function sum() requires in this case the use of the aggregate qualifier grp() - this is because sum(events) is not complete enough to specify what you want (the sum by directories ? By trigger setup ? ...). In this specific example, the data will be grouped according to the directory name, and in each group a sum of number of event will be calculated.

  8. Give me a "ready to use" list of files of a given type from a specific production, in batches of 1000

     % get_file_list.pl -keys 'path,filename' -cond 'library=SL01g,filetype=MC_reco_dst' -delim "/" -start 0 -limit 1000
     % get_file_list.pl -keys 'path,filename' -cond 'library=SL01g,filetype=MC_reco_dst' -delim "/" -start 1000 -limit 1000
     % get_file_list.pl -keys 'path,filename' -cond 'library=SL01g,filetype=MC_reco_dst' -delim "/" -start 2000 -limit 1000
     % get_file_list.pl -keys 'path,filename' -cond 'library=SL01g,filetype=MC_reco_dst' -delim "/" -start 3000 -limit 1000
     ...
     

    Note the combination of -start and -limit keywords used to divide the output in batches.

There are many more example we can give but but getting familiar with the keywords, operators and command line options and understanding the basics as described above is a good startup point for trying queries of your own.



******* GREEN SECTIONS BELOW ARE UN-CHECKED PARAGRAPHS *******



FileCatalog Database and PERL module

This documents describes briefly the FileCatalog database and the PERL module "FileCatalog" which can be used to access and manipulate data in this database.

The database

The following diagram shows the structure of the database. it containts a number of "dictionary" tables - the ones that do not reference any other tables and usually hold only one usefull field. They are:

FileTypes
StorageTypes
StorageSites
ProductionConditions

TriggerSetups Contains the online trigger setup name (which itself is a reference for a collection of triggers). The detail of the trigger composition is held by the
TriggerCompositions table
TriggerCompositions - holds data about the number of events collected with a specific TriggerWord, that are included in a given file.
These are the three main tables holding data about the files and run parameters, and binding all the other tables together. TriggerWords - holds collections of triggerWord.
RunTypes.


There are also special tables:

EventGenerators - holding the data about various Event Generators and their simulation parameters.
SimulationParams - parameters pertaining to the specific simulation run.
CollisionTypes - the type of particles colliding and their energy. Holds also special values like "cosmic" etc..
DetectorConfiguration - holds a detector configuration used to collect specific data. For simulation it holds the geometry version used.
RunParams - parameters of a single physics run. Each run is identified by its run number.
FileData - data about a specific file, without any information about its physical location. Each file is identified by a combination of a file name, file type, file sequence, production condtions and the run number it is connected to. File Data can be connected to a run number it comes from.
FileLocations - data describing a specific file location and its physical storage - the file path, the owner, protection etc.

In general the database is supposed to be hidden from the user. It is to be modified only through the PERL module and the corresponding functions. Manual changes to the database should not, in principle, be neccessary.

The "FileCatalog" PERL module

The FileCatalog perl module is intended to provide access to the databse both for data querying and retrieval as well as data insertion and modifications.
It is based on the concept of setting keywords within a context persistency. The user first sets a context using to a set of keywords with the desired value (or conditions), and then uses special commands to get/insert/delete/modify the data in the database. All sub-seuqnet operations will be made according to the context. In other words, the context consists of keywords i.e. "filename", "path", "storage" that may have values assigned to them. By using the method set_context("storage = HPSS"), we say that from now on, we will only consider operations on files which storage is HPPS. Consecutive calls to this method will only refine the context .


The following subroutines are available in the FileCatalog module:

new() : create new object FileCatalog, on which all the following operations would be carried out. It is neccessary to issue this command before any other operation with the module can be carried out.
connect([$user,$passwd]) : connect to the database FilaCatalog. If user and password are unspecified, the connection is made via a read-only access. You MUST specify the management user and password if you want to insert records.
connect_as($id) : id being a string identifier, connect to the database using the scope 'id'. Information about accessing the database using the scope 'id' must be described in the XML connection interface of the FileCatalog.
get_connection($id) : return sthe connection information within the scope 'id'. The output of this routine can be used
destroy() : destroy object and disconnect from the database FileCatalog
set_context("$kwd = $value"[,"$kwd = $value"...]) : set one of the context keywords to the given operator and value
get_context("$kwd") : get the context value connected to a given keyword $kwd
clear_context() : Clears the context (that is, forget about everything)
debug_off() : turn all kind of debugging OFF
debug_on([mode]) : turns debugging ON. Default is OFF.
mode is optional and may have the following values :

set_silent($mode) : Sets silent mode i.e. tell the module NOT to display any informational messages. Messages related to errors are still displayed. You shoudl not use this whenever you debug your code.
set_delimeter($string) : Sets the keyword/field delimiter to be
$string. By default, the delimiter is "::".
get_delimeter() : return sthe current delimiter string. set_delayed() : Sets the dealy mode on any action you plan to do on the catalog. Action includes inserts of new records, updates, deletion etc ... print_delayed([$flag]) : Prints to the screen the SQL commands it would otherwise have executed internally. The
set_delayed() method must be called prior to using it. It also resets the delay mode state which means that you also have to call set_delayed() again after you use this method. $flag, a logical value, if set to TRUE, will print an extraneous message on the screen (the number of commands and the time) before printing the SQL commands.
flush_delayed([$flag]) : Flush all command held by the
set_delayed() method i.e. executes them and reset the delay state. As for the print_delayed() method, you must call set_delayed() again after this method if you want to continue to use the delayed mode. The $flag variable, a logical value, if set to TRUE, will print an extraneous message on the screen (the number of commands and the time) before executing the SQL commands.

The following methods require a connection to the database and are meant to be used outside the module

First the methods of everyday use:
get_keyword_list() : returns as an array the list of available keywords
insert_file_location() : after setting the whole context this method inserts the file location record. If it doesn't find the corresponding file data or run parameters it insert the necessary information into the database. It also ensures data integrity and refuses to insert a record if some critical data was not specified in the context. For a new entry to appear, you need the combination
storage, path, node and site to be different.
run_query() : get the data from the database. As a parameter to this procedure user gives a list of keyowrds - the returned data is either an undefined value (if the query fails) or an array of strings. Each string corresponds to one record in the database ans is the concatenation by "::" of all requested keywords. Example:
The following code:
$fC->run_query("extension","trgsetupname","runtype","size","fileseq","owner","createtime");
will give an output similar to:
fzd::central::simulation::1042891200::1::starsink::20010913000000
fzd::central::simulation::998146800::2::starsink::20010913000000
fzd::central::simulation::1050894000::1::starsink::20010913000000
fzd::central::simulation::1040850000::2::starsink::20010913000000
fzd::central::simulation::1040590800::3::starsink::20010913000000
fzd::central::simulation::1041076800::4::starsink::20010913000000
fzd::central::simulation::950259600::5::starsink::20010913000000
fzd::central::simulation::1049792400::1::starsink::20010913000000
fzd::central::simulation::1046293200::2::starsink::20010913000000
run_query_st() : same as run_query() but the returned value is a string, each possible value '\n' separated, rather than an array. This function returns undef if the query fails.
This subroutines are database specific, and should be used rarely and only if necessary.
check_ID_for_params($kwd) : returns the database row ID from the dictionary table connected to the keyword $kwd
insert_dictionary_value($kwd) : inserts the value for keyword $kwd taken from the context into the dictionary table
insert_detector_configuration() : inserts the detector configuration from current context. This method does not require any arguments.
insert_run_param_info() : insert the run param record taking data from the current context
insert_simulation_params() : insert the simulation parameters taking data from the current context
get_current_run_param() : get the ID of a run params corresponding to the current context
get_current_detector_configuration() : gets the ID of a detector configuration described by the current context
get_current_file_data() : gets the ID of a file data corresponding to the current context
get_file_location() : base don the current Catalog context, returns the FileLocations information as an array. Each element of the array is an association
keyword=value so the returned array can be used as-is in a set_context() statement.
get_file_data() : base don the current Catalog context, returns the FileData information as an array. Each element of the array is an association
keyword=value so the returned array can be used as-is in a set_context() statement.
get_current_simulation_params : gets the ID of a simulation params corresponding to the current contex
delete_records({$doit}) : deletes the current file location. If it finds that the current file data has no file locations left, it deletes it too. The delete_records() method is based on the current context and works within a global scope (i.e. whatever has been selected with set_context() will be ALL deleted). Therefore, this function is EXTREMELY dangerous. Note that it is based on run_query() therefore, you may want to check what you will delete using that method instead to avoid catastrophes. The optional argument $doit is used to confirm the deletion. Default is NOT to delete. This method returns an array of deleted records (partial information is flid, fdid, path, filename.
clone_location() : based on the most recent query, creates an instance for FileData and a copy of the FileLocations information and return the success status. WARNING : You should NOT use a call to the
update methods if cloning has failed, the update would work but enter inconsistent values. Later version will protect against this problem. The suggested usage is then
   $fC->set_context(....);               # whatever is needed
   $fC->run_query("size");               # as an example
   if ( $fC->clone_location() ){
       $fC->set_context("storage=NFS");  # change the storage
       ...
       $fC->insert_file_location();      # insert the new entry
   }
This function's usage is as above i.e. create a clone (i.e. a copy of the fileMETA-Data) modify a few characteristics and create a new entry (file replication). This is handy whenever you copy a file from one place to another for example. Note : YOU MUST NOT try to change the FileData information but only modify keywords associated to the FileLocations table.
update_record($kwd,$newvalue{,$doit}) : modifies the data in the database tables or dictionaries. This method returns TRUE/FALSE on success/failure.
This method modifies 'a' field set in the current context and corresponding to the given keyword
$kwd to the value passed as argument $newvalue. The last optional argument, $doit, is used to confirm ($doit=1) the update or just debug the operation ($doit=0) ; the default is 1.
The example above would set the
storageSiteName field associated to the keyword site from its original value test to a new value test2
   $fC->set_context("site=test"); 
   $fC->update_record("site","test2",1);
If your intent is to update, not a dictionary but one of the main table (the META-Data or the FileLocation table information), you should use a as-specific-as-possible context otherwise, an entire range of records matching the most recent context will be updated ... For example, trying to update the number of events based on a set_context() will probably update far too much comparing to what you really want.
You should be using the
update_location() method for changing file's basic characteristics (much safer).
update_location($kwd,$newvalue{,$doit}) This method returns TRUE/FALSE on success/failure. It is used to update characteristics associated to the
FileLocation table AND ONLY THAT TABLE. The keyword you are attempting to change does not need to be part of a preceeding context (unlike the update_record() method). Usage example :
        $fC->set_context("site=test",
                         "storage=NFS",
                         "path=/star/data11/reco/ProductionMinBias/FullField/P02gx/2001/274",
                         "filename=st_physics_2274024_raw_0095.dummy.root");
        $fC->update_location("size",200,1);              
bootstrap($kwd,$delete) : database maintenance procedure. Looks at the all database entries and make a sanity check on keyword $kwd. Possible values for the keyword $kwd will be searched in its dictionary table and compared to the entry of its parent table. If none are found, you can delete the zombie-keyword value by specifying the $delete argument (0|1)
The following functions are now implemented and exported but will be merged with the above routine in a later version.
Currently implemented are bootstrap associated with any dictionary table, and the follwoing tables
FileData, FileLocations, TriggerCompositions, and TriggerWords.

Usage Example

The following example is the equivalent of the example 4.

use lib "/afs/rhic/star/packages/scripts";
use FileCatalog;

my($fC);

$fC = new FileCatalog; 
$fC->connect();        

$fC->set_context("library=SL01g","filetype=daq_reco_dst");

@all = $fC->run_query("path","production","filetype");
foreach $line (@all){
   print "$line\n";
}

$fC->destroy();         # Delete the instance



The list of examples would be long ... Instead, we include a few script examples which we hope will guide the novice through the details on how to use the FileCatalog interface.



Keyword list

Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)



keyword

Notes

Meaning

site


The site where the data is stored, eg. BNL, LBL

sitecmt


The site comment string

siteloc


A full string describing the site location in the world

storage

The storage medium, eg. HPSS, NFS, local disk. Note that the local disk storage does not allow for a unique file location. One must also select on node

node

The name of the node where data is stored (necessary to locate local disk storage)

path

the path to a specific copy of the file

filename

The name of the data file

filetype

The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ...

extension

The extension of the file - directly connected to type (each file type has an associated extension)

events

Number of events or entries in the file

size

The size of the data file

fileseq

The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files.

stream

The file stream if applicable (defaut is 0)

md5sum

Early stage db fill did not update this field. It may return 0.

The file's md5 checksum

production

The production tag with which a given file was produced. Can also be "raw" or "simulation"

library

The library version this file was produced with

trgsetupname

Used in to encode the path in production

The name of the online trigger setup name

trgname


The name of one trigger in a collection of triggers associated to a runumber.

trgcount


The event count having the associated trgname for a given runnumber

trgword

This is available for Year4 data and beyond for DAQ files

The trigger word associated to one trigger in a collection

trgversion

The trigger word version associated to a trgname

trgdefinition

The trigger definition of one trigger in a collection

runtype

the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets

configuration

The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone)

geometry

The geometry definition for a given simulation set.

runnumber

The number of the run. Arbitrary for simulations.

runcomments

The comments for a given run.

collision

The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200"

datetaken

Format was messed up at conversion old->new Catalog. Can be (and will be) recovered.

The date the data was taken. Arbitrary for simulation.

magscale

The name of the magnetic field scale, e.g. FullField

magvalue

The actual magnetic field value

filecomment

The comment to the file.

owner

The owner of the file.

protection

Subject to changes

The protection or read/write permissions, given in a format similar to UNIX 'ls -l'

available

is the file available ?

persistent

is the file persistent ?

createtime

Only HPSS files have a createtime which is not subject to changes

the time a file was created. Format is YYYYmmddHHMMSS

inserttime

the time a file data was inserted into the database.

simcomment

The comments for the simulation

generator

The event generator name

genversion

Event generator version

gencomment

Event generator comments

genparams

Event generator params

tpc

was the TPC in the data stream when specific data was taken?

svt

was the SVT in the data stream when specific data was taken?

tof

was the TOF in the data stream when specific data was taken?

emc

was the B-EMC in the data stream when specific data was taken?

eemc

was the E-EMC in the data stream when specific data was taken?

fpd

was the FPD in the data stream when specific data was taken?

ftpc

was the FTPC in the data stream when specific data was taken?

pmd

was the PMD in the data stream when specific data was taken?

rich

was the RICH in the data stream when specific data was taken?

ssd

was the SSD in the data stream when specific data was taken?

bbc

was the BBC in the data stream when specific data was taken?

bsmd

was the Barrel EMC SMD in the data stream when specific data was taken?

esmd

was the End-Cap SMD in the data stream when specific data was taken?



The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).

flid

Access the FileLocation ID of the FileLocation table

fdid

Access the FileData ID of the FileData table

rfdid

Access the FileData ID of the FileLocation table

pcid

Access the ProductionCondition ID of the ProductionConditions table

rpcid

Access the ProductionCondition ID of the FileData table

rpid

Access the runParam ID of the runParams table

rrpid

Access the runParam ID of the FileData table

ftid

Access the FileType ID of the FileTypes table

rftid

Access the FileType ID of the FileData table

stid

Access the storageType ID of the StorageTypes table

rtid

Access the storageType ID of the FileLocations table

ssid

Access the storageSite ID of the StorageSites table

rssid

Access the storageSite ID of the FileLocations table

tcfdid

Access the FileData ID of the TriggerCompositions table

tctwid

Access the TriggerWords ID of the TriggerCompositions table

twid

Access the TriggerWords ID of the TriggerWords table

dcid

Access the detectorConfiguration ID of the DetectorConfigurations table

rdcid

Access the detectorConfiguration ID o the RunParams table



lgnm

An aggregate keyword returning an equivalence to the logical name

lgpth

An aggregate keyword returning a logical path (a string which uniquely characterize the file's location)

fulld

An aggregate keyword returning a string completely defining all meta-data for real data

fulls

An aggregate keyword returning a string completely defining all meta-data for simulation data



Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.

keyword

Notes

Meaning

simulation

Is the data a simulation?

nounique

In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface get_file_list.pl however, this is set by default to 1 (does not ensure unique fields).

Should the module return all fields, instead of only unique selected fields.

noround

Turns off rounding of magfield, and collision energy.

startrecord

The PERL module will skip the first startrecord records and start returning data beginning from the next one.

limit

The PERL module will return the maximum of limit records.