next up previous contents
Next: STAR Computing and Software Up: Overview Previous: Overview

The STAR Offline Computing Challenge

 
(With contributions from Craig Tull)


Scale of Computing Problem for STAR


The expected scale of the computing tasks for the RHIC experiments is unprecedented in nuclear and high energy physics, rivaling that of projects in the military, global weather and environmental surveys, and that of huge financial corporations. For STAR's computing needs we are faced with a number of daunting problems to solve:

As is typical for this early stage in the lifecycle of a large physics collaboration, almost all of the currently extant analysis software is presently undergoing intense development. Although much of the reconstruction software will approach an asymptote of finality early in the production stage of analysis (i.e. when a steady-state is achieved between data acquisition and data analysis rates), much of the physics software will continue to change significantly over the entire duration of the experiment and data analysis. During this same time period, the physicists in charge of various aspects of the physics analysis software will also continuously change as graduate students, post-doctoral researchers, and senior scientists join and leave the collaboration or change their focus on the analysis and physics.

This fluid nature of analysis software is fundamental to a dynamic physics analysis process. However, without careful planning a great deal of programming overhead can be required to maintain a complex analysis system for a prolonged time period.



STAR's Approach to Meeting These Challenges


At the RHIC wide level the RCF at BNL is an $8M computing and data management facility designed to handle the data volume, cpu, bandwidth and data access needs of the RHIC project. The RCF budget is only adequate to meet the event reconstruction, data storage/retrieval and physics analysis needs of the four RHIC experiments. Computing resources for simulations must be found elsewhere, by each RHIC experiment. For STAR the best opportunities at present for this additional computing are at the National Energy Research Supercomputer Center (NERSC) at LBNL and the Pittsburgh Supercomputer Center (PSC).

The SOFtware Infrastructure (SOFI) group in STAR is responsible for developing and maintaining general purpose software tools which facilitate the analysis of data and the investigation of physics issues by the collaboration. This includes the goal of developing a robust, user friendly analysis framework which will meet the computing challenge. The SOFI group has made great progress towards this goal, but much remains to do and new workers are urgently needed to do a number of tasks. Progress in this area has been leveraged by utilizing commercial software products and by adopting industry standards for interface definitions wherever possible.

The Simulations and Analysis Software (SAS) group in STAR is responsible for developing all the simulations, event reconstruction and calibration software for STAR. The group has developed a general, comprehensive design and a great deal of code development has been done. But, as with SOFI, much work remains to be done and many additional people and greater efforts are urgently needed.



Software Architecture and Analysis Framework


In a collaboration with many writers and users of analysis code, some mechanism for coordination of and communication between analysis software elements must exist to facilitate an efficient overall analysis process. One way of doing this is to write each analysis element as a stand-alone program executing one step in the analysis process and communicating with other analysis elements (other programs) through some well-defined file formats. The chain of analysis is then normally a batch job which executes the programs in order, cleaning up temporary files and saving results to permanent files. Though this approach has been used in many experiments, it works best when a very limited number of people are responsible for the entire analysis. This allows changes to data formats or data flow to be made with the reasonable expectation that all appropriate programs will be updated in a synchronised fashion. One drawback with this approach is that it lends itself to duplication of effort and/or code. Each analysis program must handle its own I/O, memory management, etc. This drawback can be mitigated with a set of common utility functions in a central library, but does not lend itself easily to global changes in data format, or in the analysis chain.

Another approach which has been successful is the use of an analysis shell. An analysis shell is a generic program which handles the system-like functions of data analysis such as data I/O, memory management, flow control, etc. without doing any real analysis of data. The actual data analysis is done by ``analysis modules'' conforming to some API (Application Programming Interface) which allows the modules to be ``plugged'' into the analysis shell in a modular fashion. The analysis shell invokes (calls) each analysis module within the analysis chain, either passing data to the module via the API, or presenting data when requested by shell functions invoked within the module. Often the analysis shell also contains other general purpose tools for investigation and analysis of data (e.g. histogramming, plotting, sorting, simple calculations).

SOFI has extended this concept of plug-in modules from the user written analysis code to the central, system-like analysis shell. By dividing the functions of the analysis shell into ``domains'' and adopting an interface standard for these system-like functions, a customizable framework for data analysis has been provided to the collaboration.

STAR begins taking data in 1999, and will continue to take data for many years thereafter. This means that, short of completely replacing the analysis system with another system part way through the experiment, the software design must have a lifespan of order 15 years. This is an incredibly long time in the dynamic world of computer software and hardware. To put this into perspective, one needs only consider the state of computing in physics 15 years ago. In 1982, the hot new machine in physics was the VAX 780, FORTRAN 77 was a newer language than FORTRAN 90 is today, and your choice of color terminals was green or amber.

We conclude that it is unrealistic to expect any software system written today to survive unmodified for 15 years. Hence, any sensible design for a software system needed to last that long must incorporate, at a fundamental level, the concept of graceful retirement (i.e. replacement in a controllable manner) of any and all of its constituent components.

Consider further that STAR encompasses many distinct physics programs, each with its own unique analysis needs and that scores, perhaps hundreds, of physicists of diverse backgrounds and skills will be using and contributing to the analysis system over its lifetime, and the magnitude of the challenge begins to be appreciated. The STAR Analysis Framework (STAF, a.k.a. Standard Analysis Framework or StAF) addresses each of these challenges in a manner which SOFI members believe can be expected to succeed over the long term as well as the short.

By making the division between the framework kernal (i.e. the software bus) and the plug-in service packages (see below) clean and well defined (what we have termed vertical modularity), we provide for the graceful retirement of communication protocols and interface standards. By dividing the system-like services into autonomous packages (horizontal modularity), we allow graceful retirement of code libraries, as well as easing the burden of code maintenance.

Finally, rather than try to define all of our own interface standards, external data representations, etc., we have tried wherever possible to adopt open software standards from the computer industry and computer engineering communities. These standards are often better designed and supported than home-grown standards due to the hundreds or even thousands of man-hours devoted to their development. They are generally well documented, providing guidance for programmers attempting to collaborate over long distances. Some of these standards allow use of powerful commercial software. And even standards which don't survive over the long term often provide migration paths and/or tools to other, functionally equivalent standards.

All this means that there are many advantages to adopting a framework architecture, both from the users' perspective and from the perspective of the framework programmer.

The STAR Analysis Framework (STAF) is a major part of the software solution that has been, and is being developed by the Software Infrastructure (SOFI) group within the STAR collaboration. STAF is a highly modular framework written (largely) in C++ and designed around a CORBA-compliant software bus package. CORBA is an industry adopted standard. STAF provides a CORBA-compliant encapsulation of data analysis algorithms which may be written in either FORTRAN-77, C, or C++ which allows the seamless integration of physics software components and system-like software components controlled at run time by a high-level scripting language and/or by Graphical User Interfaces. The first production release of STAF to the STAR collaboration occurred in June of 1996. Essentially all of the STAR simulation and analysis code has been converted to run in STAF.



Simulations and Analysis Software


The purpose of the SAS group is to develop, test, use and maintain the offline software which does the following:

The essential functionality of this body of software includes the following tasks:

The basic software elements which SAS members work with and develop include:



Physics Analysis


In STAR ``Physics Analysis'' refers to the physicist resource intensive event selection and analysis of reconstructed events in order to extract physics observables from the data for publication. In STAR there are presently seven Physics Working Groups which are responsible for developing all necessary physics analysis software and simulation tools and eventually for carrying out the analyses of the data STAR produces. Of course, these groups are open to all STAR collaborators. The groups and their convenors are:

The interaction of these groups with the STAR SOFI and SAS groups is coordinated by Tom LeCompte. Many of the Physics Working Group members are also active in SAS and SOFI. Software development for these analyses is in its infancy and physicists are urgently needed in each working group.


next up previous contents
Next: STAR Computing and Software Up: Overview Previous: Overview
Lanny Ray
2/20/1998