|
 |
STAR SSD
|
- SOFTWARE
|
How to submit a job with the scheduler
|
This page tries to collect the problems I have faced when submitting a job to the scheduler for the first time. I am using my directory under afs to work so I wanted to have the jobs running from that location. I follow the guide and informations posted on the STAR web to start.
I finally built the following xml file to run root4star
with the macro bfc.C
to reconstruct hits in the TPC, SVT and SSD, load them in StEvent and saved into root files.
The second tag command
contains the list of unix commands that I want to execute.
The last tags are used to set the input and output files. The input
tag is important since it is defining one of possibly several input files.
To submit the job, I used the following command from my working directory :
[rcas6020] /<2>star/users/lmartin/Current/> star-submit TpcSvtSsd.xml
STAR Scheduler 1.7.0
Reading request desciption file : TpcSvtSsd.xml
Analyzing XML...XML Ok
Wrote file assignment report to sched31E1C237BFA04C81CD7CC893BC834280.report
Dispatching jobs to rcas.rcf.bnl.gov
Dispatching process 31E1C237BFA04C81CD7CC893BC834280_0... done.
Reporting statistics... failed.
Scheduling successful
The jobId is then 31E1C237BFA04C81CD7CC893BC834280_0
and the following files are created in my working directory :
-rwxr-xr-x 1 lmartin rhstar 1726 Nov 17 10:36 sched31E1C237BFA04C81CD7CC893BC834280_0.csh*
-rw-r--r-- 1 lmartin rhstar 64 Nov 17 10:36 sched31E1C237BFA04C81CD7CC893BC834280_0.list
-rw-r--r-- 1 lmartin rhstar 1528 Nov 17 10:36 sched31E1C237BFA04C81CD7CC893BC834280.report
The sched31E1C237BFA04C81CD7CC893BC834280.report
file is giving some information on the submitted job :
[rcas6020] /<2>star/users/lmartin/Current/> more sched31E1C237BFA04C81CD7CC893BC834280.report
File assignment report for job 31E1C237BFA04C81CD7CC893BC834280
NFS: nFiles 1 - nProc 1 - valid
*************************************************************************
* This is a history of Job assignments. *
* If your Jobs are going to the wrong queue, this may tell you why. *
*************************************************************************
31E1C237BFA04C81CD7CC893BC834280_0 trying star_cas_short(nfsQueue)
31E1C237BFA04C81CD7CC893BC834280_0 assigned to star_cas_short(nfsQueue)
---------------------------------- Jobs -----------------------------------
ID Queue NFiles Time Target
---------------------------------------------------------------------------
31E1C237BFA04C81CD7CC893BC834280_0 star_cas_short 1 0.0min none
---------------------------------------------------------------------------
-------------------------------------- Queues ---------------------------------------
ID Name Type TimeLimit MaxMem Local S.O.P. Cluster
-------------------------------------------------------------------------------------
localQueue star_cas_dd LSF 90min 440MB Y 1 rcas.rcf.bnl.gov
nfsQueue star_cas_short LSF 90min 440MB N 1 rcas.rcf.bnl.gov
longQueue star_cas_big LSF N/A none N 100 rcas.rcf.bnl.gov
-------------------------------------------------------------------------------------
The sched31E1C237BFA04C81CD7CC893BC834280_0.list
file contains a list of input files specified in the xml file using the input
tag :
[rcas6020] /<2>star/users/lmartin/Current/> more sched31E1C237BFA04C81CD7CC893BC834280_0.list
/star/data03/daq/2004/093bis/st_physics_5093007_raw_4030002.daq
And finally the last file sched31E1C237BFA04C81CD7CC893BC834280_0.csh
is the script that will be executed on a designated machine. This script can in principe be run interactively to check that it is doing what you wanted.
[rcas6020] /<2>star/users/lmartin/Current/> more sched31E1C237BFA04C81CD7CC893BC834280_0.csh
#!/bin/csh
# -------------------
# Script generated at Wed Nov 17 10:36:10 EST 2004 by the STAR scheduler and submitted with
# bsub -q star_cas_short -J 'TpcSvtSsd' -o /star/u/lmartin/Current/31E1C237BFA04C81CD7CC893BC834280_0.out -e /star/u/lmartin/Current/31E1C237BFA04C81CD7CC893BC834280_0.e
rr -R "rusage[sd3=14]" /afs/rhic.bnl.gov/star/users/lmartin/Current/sched31E1C237BFA04C81CD7CC893BC834280_0.csh
# -------------------
# Preparing environment variables
setenv FILEBASENAME st_physics_5093007_raw_4030002
setenv FILELIST sched31E1C237BFA04C81CD7CC893BC834280_0.list
setenv INPUTFILECOUNT 1
setenv JOBID 31E1C237BFA04C81CD7CC893BC834280_0
setenv PROCESSID 0
setenv REQUESTID 31E1C237BFA04C81CD7CC893BC834280
setenv SCRATCH /tmp/$USER/$JOBID
setenv INPUTFILE0 /star/data03/daq/2004/093bis/st_physics_5093007_raw_4030002.daq
# Creating the scratch directory, return failure status
mkdir -p $SCRATCH
set STS=$status
if (! -d $SCRATCH) then
echo "Scheduler:: Failed to create $SCRATCH on $HOST"
exit $STS
endif
###################################################
# User command BEGIN ----------------------------->
stardev
root4star -q -b bfc.C\(1,\"ry2004,in,tpc_daq,tpc,-tcl,-tpt,-PreVtx,fcf,Physics,svtDb,dst,event,analysis,EventQA,tags,Tree,evout,l3onl,Corr2,svt_daq,SvtD,trgd,OSpaceZ,OSh
ortR,debug1,ssdDb,ssd_daq,spt\",\"$INPUTFILE0\"\)
# <------------------------------User command BEGIN
###################################################
# Copy output files (if any where specified)
/bin/cp -r $SCRATCH/*.root /star/u/lmartin/Current/
/bin/cp -r $SCRATCH/*.ps /star/u/lmartin/Current/
# Delete the scratch directory
/bin/rm -fr $SCRATCH
Several environment variables are set, a scratch directory is created and then the commands specified in the xml file.
There is two important remarks to be done concerning the root4star
command.
- All the bfc options are separated using commas and not white space has it can be done during an intereactive session.
- I have used the variable
INPUTFILE0
has the last argument of the macro instead of the variable FILELIST
. At the moment the latest does not work and it is the only way for me to process loop over at least one input file.
Last be not least, I wanted to run from and afs directory with my own libraries. When I first tried, I noticed that root4star
was crashing at the ssd_daq
options which does not exist in the official library and only in my private version of StBFChain.so
. As far as I know, the is a problem of afs privilege. When the job is submitted, a new session is started but the afs token is not passed and thus you do not have the privilege to read the files.
So finally to get a working jobs, I had to change the afs privilege of the directories I really want to use (StBFChain
, StSsdPointMaker
, StSsdDaqMaker
) :
[rcas6020] /<2>star/users/lmartin/Current/> fs setacl .sl302_gcc323/obj/StRoot/StBFChain system:anyuser rl
[rcas6020] /<2>star/users/lmartin/Current/> fs listacl .sl302_gcc323/obj/StRoot/StBFChain
Access list for .sl302_gcc323/obj/StRoot/StBFChain is
Normal rights:
star rl
system:administrators rlidwka
system:anyuser rl
lmartin rlidwka
mailafs rl
I also add to change the privilege of the directory ./StarDb/svt/ssd
because the StSsdPointMaker
is using some root file for the pedestal and noise.
Lilian.Martin