Background and Approach


Background and Approach
Dear Colleagues,

Last week, John Harris asked you to serve on the STAR Computing Requirements
Task Force (henceforth known as the Force) to reassess the computing needs of
STAR, on about a three month time scale. Some of you have responded
positively, none of you has responded negatively, and I conclude that the
remainder are either on vacation or wish they were, but in any case are
willing to be with the Force. John asked me to chair this effort and I agreed
to do so. Some of you have already inquired how we will proceed, and though
all the pieces are not yet in place, I thought it would be a good idea to
tell you how I think we should approach this problem, so that we can discuss
this, arrive at an overall strategy, and get started. 

----------------------------------------------------------------------

Background
==========

The need for a new look at this issue arose from the DOE review in July of
the RHIC Computing Facility (RCF) at BNL (I was also a member of that
committee). In early 1996 a RHIC-wide committee of experimentalists, chaired
by Shiva Kumar and with several members of STAR participating, produced the
ROCOCO-2 report on the overall requirements for RHIC computing. This report
has served very usefully as the basis until now of the design of the RCF. 
However, the July review committee noted that ROCOCO-2 is not a detailed
enough basis for further design of the RCF. The committee recommended that
"the experimental computing requirements need more study, especially those
requirements that have profound effects on the RCF system architecture and on
the size and scope of the remote facilities". I do not want to retype all of
the recommendations of the committee here (the final report will be made
available to you shortly). Generally speaking, however, two of the most
important design issues that need to be addressed very soon are: 

(i) Central Analysis Server (CAS): what is the mix of the physics analysis to
be performed? Primarily I/O intensive ("data mining") or primarily cpu
intensive? This will drive the choice of using a farm of commodity (i.e.
cheap) Intel-based processors (cpu-intensive tasks) vs. an array of SMP
machines (Symmetric MultiProcessor, more expensive, high I/O). What bandwidth
is needed between CAS and the Mass Data Store (MDS)? What is the volume of
data needed for physics analysis on disk or on tape in the MDS, as opposed to
on the shelf? Note that there are actually two questions here: what is the
general character of the analysis, which drives the design (which is
scalable), and what are the data volumes and rates, which drives the actual
capacity. 

(ii) Offsite needs (I will address only the STAR issue here): we have stated
that our computing needs will not be met by the baseline RCF and have
requested additional funds from the DOE to establish a STAR simulations
facility at NERSC. Note that there are again two questions here: will RCF be
adequate for STAR to do all its physics, and if not, where is the best place
to site the additional computing capacity? We cannot answer the second
question, but it is not relevant until the first has been fully addressed and
documented, in detail that goes significantly beyond ROCOCO-2. 

Note also that the committee felt that the Central Reconstruction Server
(CRS) is well designed for its task, namely the production of DSTs from raw
data. While the committee did recommend that the CRS design (and its network
to the MDS) be scalable to be able to expand to process the raw data more
than once, it did not see any essential open design issues that depend upon
input from the experiments right now. In other words, how much cpu time it
takes to produce a STAR DST-level event from raw data, which was a main focus
of the STAR estimates in ROCOCO-2 and which is crucial to our overall
throughput, is not the most important design issue for the RCF right now.
Rather, we must now focus in some detail on how we will actually do our
physics analysis, going from DST to muDST to final physics results. (We must
of course say something about the cpu and network needs for the CRS, but I
think that experience with the current STAR chain folded together with
experience from currently running experiments give us sufficient data on this
point, and I do not propose any targeted new programming efforts specifically
for the purpose of the current round of estimates (i.e. before Nov. 1), but
opinions are welcome on this point, as on all others.)

Charge to the Task Force
========================

John's charge to us is quite comprehensive: develop computing requirements
(cpu, storage on disk, tape and shelf, network bandwidths) for STAR physics
for year 1 and future years for the full spectrum of the STAR program.  This
includes event reconstruction, DST and muDST analysis, and needed levels of
simulation, with special attention paid to those tasks that may require
unusually high resources of one type or another. 

It is clear that we cannot give the ultimate answers to these questions in
the coming months, and that this will be an ongoing process for at least the
next few years. But we should aim to give the best possible quantitative
estimates, concentrating on areas that they are urgently needed now for RCF
design decisions. 

Proposed Approach
=================

Given the above background and John's charge to the Force (this metaphor 
is satisfyingly physical: charges and forces. It has a lot of potential...),
which results directly from the recommendations of the July review committee,
I propose the following procedure: 

(i) Assemble a committee of STAR physicists having practical expertise in
each of the general STAR physics areas. That's you. I wanted a committee of
modest size, so some experts were necessarily left out. Before we are done
everyone will have a chance to give significant input and I do not intend to
exclude anyone. Once we have a first draft we will circulate it to a larger
circle of wise women and men, but if you know of anyone who wants to be or
should be in at this level please tell me quietly. 

(ii) Clearly separate the issues of the character of analysis projects from
the overall scale of cpu needs, data volume and network bandwidth. The RCF
design itself separates these issues by making many elements scalable, so
that if future physics requirements demand more of a certain resource, the
cost to supply this resource is only incremental. For the purpose of our
present estimates, I propose that each topical expert choose a very small
number of specific analysis projects from her/his area (preferably two
projects, three or four at most), one or two "garden variety" topics and one
or two much more difficult and demanding of resources, and analyse these in
great detail for number of events needed, major cpu-consuming tasks, number
of expected analysis iterations of each type, data volume, all simulations
needed, etc. You should use your years of accumulated experience and some
WAGs (the converse of experience) to make these estimates, but each person
has an as yet unspecified but certainly limited number of WAGs at her/his
disposal. 

(iii) Assumptions about running conditions: there is a bewildering variety of
possibilities (Au-Au at 200 GeV or lower energy, lighter symmetric nuclear
systems at top or lower energy, pA, pp) and we must choose a subset to give
ourselves some focus. Also there is the question of how these are mixed in an
actual calendar year. For the moment I propose that we bypass this issue and
discuss physics signals only for the following limited subset: (a) Au-Au at
sqrt(s)=200 GeV, (b) p-Au at sqrt(s)=200 GeV, and (c) polarized pp at
sqrt(s)=450 GeV. Let's take the unit of discussion to be a full year's data
at one running condition, i.e. 10**7 events of a given collision system (some
topics don't need this much, of course). We can scale appropriately later,
based upon input from the runtime committee, but I think there is virtue in
simply getting started to write this stuff down. What are these events (i.e.
what triggers)? You tell us: assume what you need to assume and be explicit
(centrality based upon E_T, or includes all single jets above a pt=10GeV in
abs(eta)<.5, or includes all photons above 3 GeV, or whatever). I'm not sure
this specific procedure is a good one and it's certainly something we need to
decide upon soon. 

(iv) Using (ii) under the assumptions of (iii), each topical group tries to
guess at how to scale from the specific individual topics in (ii) to the full
STAR collaboration, in order to estimate the magnitude of the resources
needed. 

(v) Each physics topical group of one or two people begins to work now, with
no further discussion among us of a common method for making an estimate of
resource needs, because a method that may work for hyperon physics may be
inappropriate for event-by-event or dilepton analysis. First draft from each
topical group due to me by September 15. 

(vi) I try to make some order out of the mess and then we have a free-for-all
for a couple of weeks, trying to poke holes in each other's estimates. We 
try to bring it all together again, and this is our preliminary report to 
John on Oct. 1.

(vii) Preliminary report goes out to a larger group of wise men and women
within STAR and possibly outside of STAR, or maybe the whole STAR
collaboration, with comments due back by Oct. 15. 

(viii) Two more weeks to assimilate comments and we produce our final 
report by Nov. 1.

Comments
========

We will have no meetings that require travel or picking up the phone. I will
try to do everything via email and the web, allowing all of us to work
asynchronously. There is no alternative for me personally since I will be at
CERN from next week until the middle of November. 

I have asked Torre to set up a web page devoted to this task force. It should
contain links to relevant documents as well as an email archive (and there
should be an email listserver) and places for drafts (holes for drafts?). 
Everything should be open to the collaboration. Important documents which you
should look at very soon are ROCOCO-2, something on the current RCF design
(transparencies from RCF review?), the DOE RCF Review Committee Final Report,
and other resource documents and revisions to ROCOCO-2 (Lanny wrote a
revision to it a couple of months ago in preparation for the DOE review).
Lanny and others should add to this list - please excuse my limited knowledge
of the resources available. I have also asked Torre to locate if possible
documents from e.g. Babar to illustrate how they approached this problem in
general terms (I don't know and am eager to learn). Torre should send around
mail when this is ready (or tell me that I need to do something to set it
up!). 

Interaction with the RCF: Torre is the main contact person on this matter,
and I would like him to write to us his view of the prioritized list of
questions that STAR needs to answer for the RCF, and how PHENIX is going 
about the same process.

Units: I personally prefer GFlop-sec to specint95, but for no good reason
other than familiarity. But I propose that, especially for raw->DST
production and simulation production of a specified flavour of simulation,
you should simply leave this for now as number of events. I think that Lanny
is the main resource person for what is known in terms of processing times
for various tasks. But as has been discussed a number of times, some of this
needs to be revisited too in light of experience with current experiments. I
gave my view on simulations on this issue at the collab. meeting, so I will
now leave it to someone else to speak up on this matter. 

This process will be "politics-free", i.e. there will be no requirement to
respect previous estimates or to try to match STAR's requests for resources
to those of PHENIX. In this world of limited resources, it is imperative that
we make a good estimate of what we really need to do physics. 

Summary
=======

Please reply with your comments on this proposal to the entire list of
recipients of this mail. Think about how to go about making estimates for
your physics topic and figure out whether the general approach makes sense (I
am especially worried about event-by-event physics here). Please comment on
item (iii) (assumptions about running conditions), if nothing else. In the
meantime, the web page should be set up and the various reference documents
made available. If I don't hear from you within a couple of weeks I will
contact you personally, and if there is no large change to the above proposal
I expect the first drafts by September 15.  May the Force be with you. 

Regards,

Peter

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Peter Jacobs,
MS 50A-1148,
Lawrence Berkeley National Laboratory,
Berkeley, CA 94720

Tel. (510)486-5413
Fax  (510)486-4818
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^