- Which runs to examine?
Discuss the recent production with the Production Crew and establish a
prioritized list of runs to QA. The express queue mechanism is still
under discussion and is not set up yet, but once it is established it
should recieve highest priority for timely feedback to the counting
house. The other criteria for setting priorities is whether urgent
feedback is needed for a library release, or other runs require
special attention. Otherwise, the shift crew should look at the most
recent production that has been QA-ed under the various classes of
data.
Since the autoQA mechanism queries the File Catalog once an hour (for
real data, less frequently for other data classes) and submits QA
batch jobs on rcas, there may be a significant delay between when
production is run and when the QA results become available. We will
have to monitor this process and adjust the procedures as
necessary. Feedback on this point from the shift crew is essential.
- How to look at a run
I will specify how to look at a run in the data class "Real Data
Production". Other data classes will have different selection
procedures, reflecting the differences in the File Catalog structure
for these different classes, but the changes should be obvious.
- Select "Real Data Production" from the pulldown menu in the banner.
- Use the pulldown menus to compose a DB query that includes the
run you are interested in. The simplest procedure at the moment is to
specify the runID and leave all other fields at "any". In the near
future these selections will include trigger, calibration and geometry
information. Note that the default for "QA status" is "done".
- Press "Display Datasets". A listing of all catalogued runs
corresponding to you query will appear in the upper frame.
- To examine the QA histograms, press the "QA details" button. In
the lower panel, a set of links to the histogram files will
appear. The format is gzipped postscript. If your browser is set up to
launch ghostview for files of type "ps", these files will be
automatically unzipped and displayed. Otherwise, you will have to do
something more complicated, such as save the file and view it another
way. Note that if the macro "bfcread_hist_to_ps" is reported to have
crashed, some or all histograms may be missing.
- To examine the QA scalars and tests, scroll past the histogram
links in the lower panel and push the button. Tables of scalars for
all the data branches will appear in the auxilliary window.
- To commpare the QA scalars to similar runs, press the "Compare
reports" button. Details on how to procede are found in the autoQA
documentation. Note that until more refined selections are available
for real data (e.g. comparing runs with idenitical trigger conditions
and processing chains), this facility will be of limited utility. Note
also that the planned functionality of automatically comparing to a
standard reference run has not yet been implemented, for similar
reasons.
- What QA data to examine
This area needs significant discussion. What we are generally looking
for is that all data are present and can be read (scalar values should
appear in all branches) and that the results look physically
meaningful (e.g. vertex distribution histograms). Comparison to
previous, similar runs to check for stability is highly desirable but
it is not clear how to carry this out at present, for reasons
described above. We should revisit this question as we gain more
experience.
The principal QA tool is the histograms, generated by
bfcread_hist_to_ps. The number of QA histograms has grown enormously
over the past six months and needs to be pruned back to be useful to
the non-expert. This work is going on now (week of July 10) and more
information will be forthcoming.
Description of all the macros run by autoQA is found here. This
documentation is important for understanding the meaning of the
QA scalars.
Here are some general guidelines on what to report:
- Status of run - completed, if not give error status (segmentation violation etc)
- Macros that crashed
- Macros whose QA status is not "O.K." (At present, this means
simply that there is no data in the branch that macro is trying to
read. No additional tests are applied to the data.)
- Anomalous histograms and scalars - this is necessarily vague at this point.
More specific rules for what should be in the report will be
forthcoming. Input on this question is welcome.
- How to report results
Once per shift you should send a status report to the QA
hypernews forum:
starqa-hn@coburn.star.bnl.gov
If you are doing Offline QA shifts, you should subscribe to this forum.
The autoQA framework has a "comment" facility that allows the user to
annotate particular runs or to enter a "global comment" that will
appear chronologically in the listing of all runs. These are displayed
together with the datasets, and while not appropriate for lengthy
reports, can serve as flags for specific problems and supply
hyperlinks to longer reports. Note that this is not a high security
system (anyone can alter or delete you messages).
You do not need the QA Expert's password to use this facility. Press
the button "Add or edit comments" in the upper right part of the upper
panel. You will be asked for some identifying string that will be
attached to your comments. Enter you name and press return. You will
have to press "Display Datasets" again, at which point a button "Add
global comment" will appear below the pulldown menus, and each run
listing will have an "Add comment" button. Follow the
instructions. Messages are interpreted as html, so links to other
pages can be introduced. One possibility is to enter the hyperlink to
the QA report you have sent to starqa-hn. This can obviously be
automated, but it isn't yet and doing it by hand should be
straightforward.
- Checking QA jobs on rcas
Every two hours you should check the status of autoQA jobs running on
rcas, by clicking on "RCAS/LSF monitor" (upper right, under the "Add
or Edit Comments" button). You cannot alter jobs using this browser
unless you have the Expert's password, so there is no possibility of
doing damage. Select jobs called QA_TEST. Each of these is a set of QA
macros for a single run, that should require up to 10 minutes CPU
time. The throughput of this system for QA is as yet unknown, but you
should check that jobs are not sitting in the PENDING queue for more
than an hour or two, and are not stalling while running (should not
take more than 15 minutes CPU). In case of problems, contact an expert.