Report of the 2002 RHIC Computing Requirements Task Force

N. George, B. Gibbard, K. Hagel, B. Holzman, J. Lauret, D. Morrison, J. Porter,

T. Throwe, J. Velkovska, F. Videbaek

(June 2002)


History and Motivation:

As was done prior to the FY 2001 running of the RHIC experimental program^¹ a task force was assembled to assess the computing needs of the RHIC experiments for the FY 2003 running period (Run-3). Using the performance of the RHIC machine, the experiments and the RCF during the FY 2001/02 running period (Run-2), and the expected performance of the RHIC machine and the experiments in the upcoming running period, the representatives of the experiments produced at least two sets of estimates for their needs. One estimate was based solely on expected data rates to each of the experiments with the machine at expected performance, and the extrapolation of the current reconstruction and analysis programs of the experiments. A second estimate was also made adjusting the requirements according to the constraints imposed by the FY 2002 budget and the expected FY 2003 budget.

Traditionally, hardware purchases for the RCF have been made late in the fiscal year in order to maximize the resources obtained with the funds. In keeping with this tradition, the FY 2002 purchases are being made in the summer of 2002. But, due to the long shutdown of the RHIC machine that arose from the combining of the FY2001 and FY2002 runs and the need to accommodate upgrades to the experiments, which pushed the start of the next RHIC run into early FY 2003, it became clear that we could not put off the FY 2003 purchases to the summer of 2003. In order to obtain sufficient resources to approach the reconstruction and analysis of the data produced in Run-3, it was determined that the FY 2003 purchases would have to be made immediately at the beginning of FY 2003.

The estimates on needs and the cost to obtain the resources necessary to accommodate these needs then reflect the time scale of the purchases. The main impact on the estimates is that the same hardware parameters are used in each of the two purchases rather than being able to rely on Moore’s Law to improve the resources obtained in the FY 2003 purchase.


Current Estimates of Requirements:

The estimates made by the experiments are summarized in the following tables. The first table (Table I) contains the estimates of CPU, disk and bandwidth needed to reconstruct and analyze the expected data set size from a 25-week RHIC Au-AU run assuming a RHIC duty factor of 0.6.

Table I




The second table (Table II) contains the estimates constrained to the present RCF budget.

Table II


The experiments went from the “unconstrained” scenario to the “constrained” scenario through a variety of methods ranging from reducing the event rate, to increasing the time period over which the data will be reconstructed and analyzed, to changing the fundamental model of how data is stored moving from more expensive centralized RAID storage to less expensive distributed disk storage. None of the changes made by the experiments to reduce their needs are desirable since they all will increase the work needed to extract the physics from the experiments and reduce the physics impact of the data.
The third table (Table III) shows the percentage of the unconstrained estimates achieved under the constraints of the RCF budget.

Table III

The current RCF capacities are listed by experiments in Table IV.

Table IV


The above unconstrained and constrained scenarios were produced by the experiments over the course of the meetings of the task force. BRAHMS estimated their needs for a 25 week AU + Au run and based on their experience with the reconstruction and analysis of the Run-2 data, they determined that their greatest need was in the area of centralized disk space, since their present model is based on keeping a significant amount of data on disk. Therefore BRAHMS allocates the majority of its funds for disk purchases with a modest addition to their compute farms.

Using a data rate of 300 Hz based on the performance of their DAQ system during Run-2, PHENIX estimates that they would accumulate approximately 15 times the data taken in Run-2. Since the budgets will not allow the purchase of the resources necessary to reconstruct and analyze this volume of data, PHENIX has looked to improving their rare event trigger to reduce the overall event rate, and to reducing the number of reconstruction passes through the data to 1.2. A significant amount of manpower will need to be committed to the improvement of calibration procedures, online monitoring and the content of the micro- and nano-DSTs in order to achieve the above reductions. Overall, PHENIX feels that the funding for FY 2002 and FY 2003 falls short of their needs by a factor of 2 to 3.

PHOBOS is implementing two significant upgrades to their experiment to improve the data taking rate and trigger efficiency and therefore expects to take two orders of magnitude more data during Run-3 than they took in Run-2. Based on the current usage of their RCF resources, PHOBOS estimates that the funding for FY 2002 and FY 2003 will leave them with an order of magnitude fewer resources than are required for the amount of data they expect to accumulate. PHOBOS plans to record as much of the data from Run-3 as is possible and looks to improvements in their analysis code and stretching out the analysis time to resolve the gap in resources. Since it is not clear what magnitude of an improvement can be made in the code and since increasing the analysis time-scale will result in a decline in physics productivity, if PHOBOS and the RHIC machine perform as expected during the upcoming run, PHOBOS feels that there will be a clear need for additional computing resources in the latter half of FY 2003.

STAR is implementing new detectors and improvements to their DAQ system (a five-fold increase in their data rate and buffering of data in the counting house for later transfer during RHIC downtimes) prior to the Run-3 running period. As a result of these improvements, STAR expects the volume of their DST and micro-DST data to exceed the available resources from the FY 2002 and FY 2003 funding by a factor of 10 and that, even with an expected 40% reduction in reconstruction time, the processing time of the data would span the order of 3 years. To try to compensate for the resource mismatch STAR sees, they plan to mode to a distributed disk model to reduce the gap in storage and stretch out their data production period to 37 weeks. This approach is deemed risky in that it relies on a non-existent infrastructure to use the distributed disk. STAR also feels that to reach their physics goals in a timely fashion will require additional funding in the latter half of FY 2003.

Constrained Scenario:


Table V through Table VIII shows the breakdown, by experiment, of the cost and capacity by fiscal year with the budget constrained to the current FY 2002 funding and the expected funding for FY 2003.

Table V

Table VI

Table VII

Table VIII


The total capacity of the RCF in FY 2003, given the present funding, is indicated in Table IX.

Table IX

HPSS Tape I/O


Funding has been set aside to upgrade the mass storage system to support the increased data rates anticipated by the experiments for Run-3. The major item in this upgrade, in terms of cost and criticality, is the replacement of the currently used StorageTek 9940 tape drives by next generation 9940b tape drives. In addition, two additional HPSS server nodes will be purchased. The new tape drives will use the same media but will have approximately three times the I/O rate and three times the density of data on tape. The new capacity numbers are expected to be 30 MBytes/sec uncompressed streaming per tape drive and 200 GBytes per tape cartridge in native (uncompressed) mode. In typical RCF operations the actual I/O rate of tape drives is generally observed to be about 70% of the theoretical performance. As a result of data compression, the volume of data stored on a tape cartridge typically exceeds the native capacity by a modest amount. There are 34 tape drives that will be upgraded to this new technology. This new type of tape drive is currently in β-test at multiple sites and is expected to be generally available in August. It should be pointed out that, while such a problem appears unlikely, if these new drives were not available for Run-3, it would present a major problem for RHIC data handling. Meeting the tape I/O requirements would then require that RCF make a more costly purchase of older lower speed tape drives and additional tape storage silos to house the data. It would also require that the experiments purchase three times as many tapes.


Summary and Conclusions:

All of the experiments, through upgrades and/or trigger rate improvements and a desire to make up for the lower than expected data rates during Run-2, have increased the aggregate data rate expected during Run-3 to 230 MBytes/sec, well beyond the rate of 90 MBytes /sec predicted for this point in the life of the experiments prior to the turn on of RHIC. The direct result of this high data rate is a gap between the resources available in the RCF and the resources needed by the experiments to reach their physics goals. The resource gap varies among the experiments from perhaps a factor of two to an order of magnitude, and the experiments have addressed this shortfall in a variety of ways from reducing the trigger rate, to extending the time period for analysis, to changing their analysis model. Each of these changes from an “unconstrained” data collection and analysis scenario has a resulting impact on the experiments ability to reach their physics goals, and the actual impact will depend on the performance of the RHIC machine, the experiments and the RCF during the run.

With the possible exception of BRAHMS, each of the RHIC experiments feel that they will be underpowered in CPU and lacking in disk space to varying degrees for Run-3 given the present funding of the RCF. But, since the estimates of needs by the experiments were based on the upper limits of the performance of both the RHIC machine and the experimental systems, it can not be estimated at this time what the actual performance of either of the systems will be (especially since the species mix to be delivered during the run has not yet been determined) so it is too early to know exactly how underpowered the RCF is likely to be. Given the present funding, the RCF will be sized for a certain amount of data, and, as Run-3 proceeds, the experiments and RCF will be better able to assess the rate of data collection versus the available resources in the RCF. As the data collected compared to the expected data volume approaches the ratios expressed in Table III a decision would be made as to whether or not to request additional funds to make up for the shortfalls the experiments see in the resources at the RCF.

The efforts going into reducing event sizes, improving triggers to lower event rates, and improving code efficiencies by the experiments in order to produce physics output in the present funding scenario cannot continue. If the RHIC machine continues to improve at the rate predicted, then the experiments will not be able to keep up with the increased data flow without increases in hardware funding since the experiments feel that they are reaching the limits of the above improvements. In addition, since all of the experiments have manpower limitations, the effort going into these improvements is taking away from the effort going into producing the physics output of the experiments.

There is also concern that the current level of funding will result in a RCF that is stretched to its limit both in manpower and resources. We feel that the facility should function at the levels indicated by the “constrained” scenarios, but with little, if any, room to spare, and an unexpected failure could have a major impact on the productivity of the facility.

Appendix: Contributions from the Experiments



BRAHMS once again undertook a review of the computing requirements based on data collected in the 2001 run. In the 2001 run we stored data at an average rate of 5.5 Mb/s over stores where data were collected. In the next run, for an assumed efficiency factor of .6 for the accelerator and .95 for the experiment, we would obtain an average data rate of 3.1 Mb/s which would enable us to collect about 48 Tb over the course of a 25 week Au + Au run.

The size of the DST data sets generated in BRAHMS analysis of the data from the past year was approximately 34% of the raw data volume, which would lead to about 16 Tb.
In our current scheme of data selection, the size of the DSTs is 10% of the DST data. To be able to work on 3 different analyses concurrently would require a volume of 5 Tb.

In part because the available I/O-bandwidth from the HPSS does not match well with the CPU requirements the present analysis model is based on keeping a significant amount of the data (in particular DST) on disk for repeated calibration and analysis (multiple times per dataset).
The total disk capacity is calculated by assuming 3.5% of raw data resident on disk, 15% of DST data resident on disk and the full volume of uDST and nDST (10% of uDST) data to be resident on disk. This is 48*.035 + 16*.15 + 5 + .5 = 1.7 + 2.4 + 5 + .5 = 9.6 Tb. We currently have 5.5 Tb of available disk space.

We have also evaluated the CPU estimates for the upcoming run based on what is necessary to reconstruct and analyze the current data from the 2001 run. We learned that we require about 5 SPECint 95-sec/event in our reconstruction pass. We have also learned that we need an average of 2 reconstruction passes in order to properly calibrate and then reconstruct the data. When taking into account the efficiency of farm, we require about 1800 SPECint 95 in order to completely reconstruct the data in 16 weeks.

If we examine the table, we note that we have more than 1800 SPECint 95 processing power in the CRS farm already. That would indicate that the CPU computing resources are adequate in the current scheme.

We have also estimated that the CAS farm needs 1.5 times the processing power of the CRS farm. That would indicate 2700 SPECint 95 as opposed to the 1738 SPECint 95 currently in the farm.

We note that we have a deficit of about 5 Tb in required disk space. It is clear, therefore, that disk space is where we should concentrate the available funds. We choose, however, to devote some funds to “small” upgrades to the CPU power in order to ride the technology curve as a small investment can make a large difference in analysis ease given the more powerful processors that can be purchased later in time.

This preceding philosophy guides the following division of purchases from FY02 and FY03 funds. For BRAHMS, there will be ~$72k and ~$118k available for FY02 and FY03, respectively. We plan to purchase 1 CRS machine and 2 CAS machines in FY02 and 3 CRS machines and 3 CAS machines in FY03. This will bring the total CRS processor power to 3059 SPECint 95 and the total CAS processor power to 2467 SPECint 95.

The rest of the funds would be spent on disk upgrades. The plan is to purchase 3.2 Tb of disk space with FY02 funds and 5.1 Tb with FY03 funds. This will bring the total amount of disk space available to BRAHMS to 13.8 Tb, a comfortable margin that will allow an efficient analysis of the data.

PHENIX computing requirements for FY02 and FY03


The estimate presented here is based upon a 25 week Au-Au run at full RHIC luminosity with 60% RHIC duty factor and 95% PHENIX duty factor. Other running scenarios were also considered and the overall uncertainty in the presented requirements numbers was estimated to be within a factor of 2. During RUN2, the PHENIX DAQ system was demonstrated to be capable of recording data at 500Hz, however a more realistic sustained rate used for this estimate is 300Hz. Recording data with this rate will result in approximately 776 TB of raw data. This is a factor of ~15 more data than PHENIX has accumulated during RUN2. Clearly the combined FY02 and FY03 computing funds do not allow such volumes of data to be handled in the same way as we have done in the past. Since PHENIX is a rare event experiment, we can also look into collecting more triggered data and reducing the overall event rate without affecting significantly the PHENIX physics program. A constrained estimate presented here uses 200 Hz event rate as an input parameter. The resulting raw data volume in the constrained case is ~517 TB – a factor of 11 more data than the RUN2 value. Being able to collect data at such rates is contingent upon the integration into the system of the STK 9940b high-density/high-speed tape drives (3 in FY02 and 3 in FY03). Without these tape drives we will be severely limited not only by the speed at which data can be recorded and read out, but also by the prohibitive price tag on the tape storage that PHENIX has to buy from its operation funds.
To be able to handle hundreds of terabytes of data, we need to reduce the number of full reconstruction passes through the data. Our current experience shows that at least 1.5 passes through the data are necessary to assure high quality data analyses. However, to fit into the available computing funds, we need to restrict ourselves to 1.2 reconstruction passes. This reduction will require significant improvement of calibration procedures and online data monitoring.
The expected large volume of data also increases significantly the need for disk space. Currently, PHENIX is able to store 100% of the microDSTs and nanoDSTs on disk. For RUN3 only 30% of the microDSTs can be kept on disk. This constraint limits the speed with which the data can be analyzed. Possible solution is revision of DST, microDST and nanoDST optimizing their content and minimizing the size. Few analyses need the full information stored in the DSTs – the limited disk space resources will inevitably affect these.
In conclusion, the RCF funding for FY02 and FY03 may be a factor of 2 or 3 short of the PHENIX needs, depending on the amount of data that we will record during RUN3. With the current constrained scenario we will be able to meet our needs after committing significant manpower for improvement of calibration procedures, online monitoring and micro/nano/DST content.
Phobos Computing Requirements Proposal: 8^th May 02:
Two significant upgrades affecting the data taking rate will be implemented for the PHOBOS detector for the upcoming run in FY03. The first involves a significant upgrade to the PHOBOS DAQ allowing data to be taken at a rate of 500 Hz (50 MB/sec). The second is the addition of a spectrometer trigger enabling PHOBOS to trigger on single particles in pp and dA collisions. These upgrades are in part to compensate for the reduction in RHIC’s overall running time, and an emphasis on low multiplicity species scans for systematic studies.
Combining the funding from FY02 and FY03, PHOBOS has calculated that this funding level will undershoot the projected requirements by about an order of magnitude. This is a reasonable estimate, given that our current farm is ~100% loaded and that we are expected to take two orders of magnitude more data in the upcoming run. The dominant factor comes from the increase in CPU resources needed. To resolve the funding gap, PHOBOS has proposed two alternative extreme constrained scenarios. The first is that we increase the analysis time by an order of magnitude (1/2 year -> ~4 years) assuming no additional optimization can be achieved. The second is to try to optimize the code to effectively run an order of magnitude faster. In reality neither of these scenarios is guaranteed to work, and some combination of the two will result in a smaller but significant fall short of the needed computing resources. Given last years experience, where RHIC’s net luminosity was over-predicted by about a factor of 4, PHOBOS feels that, although there is clearly a need for additional computing resources above the current FY02+FY03 levels, it is premature at this time to request additional funds until we have a better sense of the expected data volume and processing time. This time should be in about May of FY03. We feel, that with the currently funded resources, we can take all of the raw data RHIC can deliver. This is contingent upon the functionality of the new tape drives which offer 3 times the current rate and capacity. The full analysis of the taken data will then have to be delayed until sufficient resources become available. Large increases in analysis time-scales provide problems such as a decline in physics productivity, which prevents achievement of stated goals. In addition this can affect funding and student availabilities. An additional concern remains regarding RCF’s ability to maintain an efficient operation if operating continuously at its very limits.

STAR Computing requirements proposal FY02/FY03

During the Year2 RHIC run, the STAR experiment accumulated over 10 M AuAu events and 25 M pp events. Although this data sample is a factor of 5 larger than in Year1, it was still much lower than we had predicted due to the reduced RHIC physics running time, the rather poor RHIC duty factor, and our data rate limit set by writing data to RCF. As a result, several measurements in STAR's physics program such as Open Charm and event characterization (elliptic flow and HBT) using rare probes are limited by the statistics in the Year2 data samples. To minimize our dependence on RHIC's performance in the upcoming run, STAR's DAQ-100 project was initiated with goals of a 60 MB/sec data rate (sustained), a 5-fold increase in our event rate, and a 40% reduction in our reconstruction time (ideal). The DAQ-100 project is now fully operational and expected to be used during the Year3 run.

The calculations presented here include the effects of the DAQ-100 project as well as effects from new detectors coming on line during the next year. We stress that the increased data volumes projected here from the DAQ-100 project are an attempt to recover the full physics program not yet realized due to RHIC past performance and to protect our program from future limitations of RHIC operations. We have made three scenario estimates taking into account FY02 and FY03 funds merging; only one is proposed as an achievable goal.

The DAQ-100 scenario extends our data deployment model of the current Year2 datasets through the analysis FY03 data. This scenario shows 100% of μDSTs and 25% of our DSTs on centralized disks and represents a known (safe) model for delivering data to physics analysis codes. The result shows a 280TB disk space deficit. This deficit could be reduced to about 160TB by removing the disk residency of the DST data; however, even with this reduction, we are 2 million dollars short of our budget requirements and a factor of 10 off our storage needs.

The second scenario (Constrained 1) reduced the number of passes, increased the compression level raw/DST, and assumed 50% (5%) of our μDST (DST) on central storage. It did not allow us to recover from the downfalls unless the processing time was spanned over 3 years (at the expense of our physics goals), mainly running on our existing resources. This is obviously not viable since our user farm is currently congested and cries for more CPU power in addition to the consequences from such a long turn around for data analyses.

Finally, Constrained 2 scenario, or the "challenge" scenario relies on a more risky distributed disk model for covering our disk space requirements without leaving us without processing power. The approach balances our CPU and disk space needs within a 37-week production period (an assumed 80% RCF farm uptime was also folded in). This approach entirely relies on a non-existing STAR infrastructure components and a robust and heavily accessed HPSS. However, it seems to be our only alternative given the current budget restrictions.

A factor of 4 reduction in the RHIC efficiency from the assumptions in these calculations (as did occur last year) would not translate to a factor of 4 reduction in our numbers. DAQ-100 allows us to buffer more events for later transfer during RHIC downtimes; the net effect being at worst a factor of 2 reduction in the data volumes and CPU resource requirements presented here. We therefore feel that, to reach our physics objectives in a reasonable time frame, there is a need for extended funding. However, we acknowledge that these numbers do depend on factors beyond our control such as RHIC and RCF performance. Finally, we would like to mention, once again, that HPSS reliability will become an even more critical requirement than ever before, our distributed disk model's success depending on it.

1 M. Baker, et al., Report of the 2001RHIC Computing Requirements Task Force, April 12, 2001