logo star

STAR SSD

- SOFTWARE

How to submit a job with the scheduler



This page tries to collect the problems I have faced when submitting a job to the scheduler for the first time. I am using my directory under afs to work so I wanted to have the jobs running from that location. I follow the guide and informations posted on the STAR web to start.

I finally built the following xml file to run root4star with the macro bfc.C to reconstruct hits in the TPC, SVT and SSD, load them in StEvent and saved into root files. The second tag command contains the list of unix commands that I want to execute. The last tags are used to set the input and output files. The input tag is important since it is defining one of possibly several input files.

To submit the job, I used the following command from my working directory :
[rcas6020] /<2>star/users/lmartin/Current/> star-submit TpcSvtSsd.xml
STAR Scheduler 1.7.0
Reading request desciption file : TpcSvtSsd.xml
Analyzing XML...XML Ok
Wrote file assignment report to sched31E1C237BFA04C81CD7CC893BC834280.report
Dispatching jobs to rcas.rcf.bnl.gov
Dispatching process 31E1C237BFA04C81CD7CC893BC834280_0... done.
Reporting statistics... failed.
Scheduling successful
The jobId is then
31E1C237BFA04C81CD7CC893BC834280_0
and the following files are created in my working directory :
-rwxr-xr-x    1 lmartin  rhstar       1726 Nov 17 10:36 sched31E1C237BFA04C81CD7CC893BC834280_0.csh*
-rw-r--r--    1 lmartin  rhstar         64 Nov 17 10:36 sched31E1C237BFA04C81CD7CC893BC834280_0.list
-rw-r--r--    1 lmartin  rhstar       1528 Nov 17 10:36 sched31E1C237BFA04C81CD7CC893BC834280.report
The sched31E1C237BFA04C81CD7CC893BC834280.report file is giving some information on the submitted job :
[rcas6020] /<2>star/users/lmartin/Current/> more sched31E1C237BFA04C81CD7CC893BC834280.report
File assignment report for job 31E1C237BFA04C81CD7CC893BC834280

NFS: nFiles 1 - nProc 1 - valid

 
*************************************************************************

*   This is a history of Job assignments.                               *

*   If your Jobs are going to the wrong queue, this may tell you why.   *

*************************************************************************

31E1C237BFA04C81CD7CC893BC834280_0 trying star_cas_short(nfsQueue)
31E1C237BFA04C81CD7CC893BC834280_0 assigned to star_cas_short(nfsQueue)
 


---------------------------------- Jobs -----------------------------------
 ID                                  Queue           NFiles  Time    Target
---------------------------------------------------------------------------
 31E1C237BFA04C81CD7CC893BC834280_0  star_cas_short  1       0.0min  none
---------------------------------------------------------------------------


-------------------------------------- Queues ---------------------------------------
 ID          Name            Type  TimeLimit  MaxMem  Local  S.O.P.  Cluster           
-------------------------------------------------------------------------------------
 localQueue  star_cas_dd     LSF   90min      440MB   Y      1       rcas.rcf.bnl.gov
 nfsQueue    star_cas_short  LSF   90min      440MB   N      1       rcas.rcf.bnl.gov
 longQueue   star_cas_big    LSF   N/A        none    N      100     rcas.rcf.bnl.gov
-------------------------------------------------------------------------------------
The sched31E1C237BFA04C81CD7CC893BC834280_0.list file contains a list of input files specified in the xml file using the input tag :
[rcas6020] /<2>star/users/lmartin/Current/> more sched31E1C237BFA04C81CD7CC893BC834280_0.list 
/star/data03/daq/2004/093bis/st_physics_5093007_raw_4030002.daq
And finally the last file sched31E1C237BFA04C81CD7CC893BC834280_0.csh is the script that will be executed on a designated machine. This script can in principe be run interactively to check that it is doing what you wanted.
[rcas6020] /<2>star/users/lmartin/Current/> more sched31E1C237BFA04C81CD7CC893BC834280_0.csh 
#!/bin/csh
# ------------------- 
# Script generated  at Wed Nov 17 10:36:10 EST 2004 by the STAR scheduler and submitted with
# bsub -q star_cas_short -J 'TpcSvtSsd' -o /star/u/lmartin/Current/31E1C237BFA04C81CD7CC893BC834280_0.out -e /star/u/lmartin/Current/31E1C237BFA04C81CD7CC893BC834280_0.e
rr -R "rusage[sd3=14]"  /afs/rhic.bnl.gov/star/users/lmartin/Current/sched31E1C237BFA04C81CD7CC893BC834280_0.csh
# ------------------- 

# Preparing environment variables
setenv FILEBASENAME st_physics_5093007_raw_4030002
setenv FILELIST sched31E1C237BFA04C81CD7CC893BC834280_0.list
setenv INPUTFILECOUNT 1
setenv JOBID 31E1C237BFA04C81CD7CC893BC834280_0
setenv PROCESSID 0
setenv REQUESTID 31E1C237BFA04C81CD7CC893BC834280
setenv SCRATCH /tmp/$USER/$JOBID
setenv INPUTFILE0 /star/data03/daq/2004/093bis/st_physics_5093007_raw_4030002.daq

# Creating the scratch directory, return failure status
mkdir -p $SCRATCH
set STS=$status
if (! -d $SCRATCH) then
    echo "Scheduler:: Failed to create $SCRATCH on $HOST"
    exit $STS
endif

###################################################
# User command BEGIN ----------------------------->

stardev
root4star -q -b bfc.C\(1,\"ry2004,in,tpc_daq,tpc,-tcl,-tpt,-PreVtx,fcf,Physics,svtDb,dst,event,analysis,EventQA,tags,Tree,evout,l3onl,Corr2,svt_daq,SvtD,trgd,OSpaceZ,OSh
ortR,debug1,ssdDb,ssd_daq,spt\",\"$INPUTFILE0\"\)

# <------------------------------User command BEGIN
###################################################

# Copy output files (if any where specified)
/bin/cp -r $SCRATCH/*.root /star/u/lmartin/Current/
/bin/cp -r $SCRATCH/*.ps /star/u/lmartin/Current/

# Delete the scratch directory
/bin/rm -fr $SCRATCH
Several environment variables are set, a scratch directory is created and then the commands specified in the xml file. There is two important remarks to be done concerning the root4star command.

Last be not least, I wanted to run from and afs directory with my own libraries. When I first tried, I noticed that root4star was crashing at the ssd_daq options which does not exist in the official library and only in my private version of StBFChain.so. As far as I know, the is a problem of afs privilege. When the job is submitted, a new session is started but the afs token is not passed and thus you do not have the privilege to read the files.

So finally to get a working jobs, I had to change the afs privilege of the directories I really want to use (StBFChain, StSsdPointMaker, StSsdDaqMaker) :
[rcas6020] /<2>star/users/lmartin/Current/> fs setacl .sl302_gcc323/obj/StRoot/StBFChain system:anyuser rl
[rcas6020] /<2>star/users/lmartin/Current/> fs listacl .sl302_gcc323/obj/StRoot/StBFChain
Access list for .sl302_gcc323/obj/StRoot/StBFChain is
Normal rights:
  star rl
  system:administrators rlidwka
  system:anyuser rl
  lmartin rlidwka
  mailafs rl
I also add to change the privilege of the directory ./StarDb/svt/ssd because the StSsdPointMaker is using some root file for the pedestal and noise.
ssd back button Lilian.Martin