Dr. Jerome Lauret
STAR Software & Computing Leader
Physics Department, Brookhaven National Laboratory
Bldg 510a, Upton, NY 11973


 

For the Solenoid Tracker At RHIC (STAR) collaboration

Grid Collaboratory Pilot activities & iVDGL expression of interest

 


 

 

 

 

 

 

 

 

 


 

STAR Grid activities

Expression of interest in iVDGL participation

The STAR Collaboration

The STAR experiment is one of the four Relativistic Heavy Ion Collider (RHIC) experiment at the Brookhaven National Laboratory (BNL). Unlike traditional Nuclear Physics experiment, the STAR experiment records each year massive amount of data: up to a PByte of data a year (1012 bytes) is being accumulated representing to date an integrated total size of 3 PB spanning over 3 million files.

The projection for the next RHIC run (also called, Year4 run which will start by the end of 2003), shows and increase by a factor of five on the number of collected events and brings our production time turn around to the next order of magnitude: from month to years and within a very active and aggressive physics program, the local available processing resources will not suffice to bring the science within a reasonable time scale. The situation will become more and more problematic as our Physics program evolves toward the search for rare probes: the currently decadal plan (next 10 years STAR activities development) clearly described the need for several upgrade phases including a factor of 10 in data taking and throughput by 2007.

Data management (including making available the second pass of production physics summary tape to our remote institutions) and pass0 facility production become problematic. However, the STAR computing strategy has envisioned and planned to cope for those difficulties by engaging itself in Grid activities and the use of middle-ware tools to resolve those issues.

The STAR Grid activities

Since the beginning of its formation, the STAR collaboration is part of the PPDG (Particle Physics Data Grid) Collaboratory Pilot. Although lightly supported by, the progress and activities have been impressive and speaks for themselves.

Our past years activities have been clearly focused on the realization of a production Grid infrastructure and deployment of a set of tools without disrupting our main RHIC program that is, our objectives and primary mission to the DOE: delivering science to our NSF and/or DOE institutions/university and the scientific community at large. Our main strategy was therefore oriented toward the introduction of either additional (low risk) tools or providing to the user stable and robust APIs shielding our scientists from the fabric details.

 

A few activities can illustrate the strategy and path we have taken:

·         With the use of the SRM tools, over 20% of our data is reliably shipped between the BNL (Brookhaven National Laboratory) and LBL (Lawrence Berkeley Laboratory). The use of this tool enables us to remove the mindless task data management is, allocating the manpower on the next phase of Grid tools and needs.

·         Largely driven by the need to prepare our user community to running on the Grid, we have developed a job submission system (wrapper around Grid middleware) and engaged into the strong support of interactive analysis tools on the Grid (interfacing with SRM and making data movement transparent).

·         Additional to the batch submission, the STAR collaboration has the unique capability of a interactive user analysis framework (GridCollector) which is currently able to fetch files across the grid and deliver a selective amount of event to the application.

·         For monitoring of the resource consumption on the fabric, the focus was made on a foreseen stable and community accepted solution: Ganglia monitoring tool at both end of our two major sites, coupled with our work on the MDS publishing of the Ganglia information, now provide the first corner stone for resource monitoring on the fabric. This work leaded to reporting discrepancies between the advertised GLUE schema and python provider.

 

Additional, several activities were started in collaboration with our local or remote Computer Scientists and CS research groups.  It is noteworthy to mention that the STAR collaboration do have the support from the Information and Technology division (ITD) at the Brookhaven National Laboratory.

 

For the coming year, we also intend to develop, test and use replica registration service components, freeze of our fabric monitoring approach and/or further enhance its capabilities as needed, work on setting up a multi-site production environment using heterogeneous computing resources for Monte Carlo simulation. Moreover, we believe we will be in a position to open the Grid to our users for running analysis jobs by 2004 and for this goal, will need functional database solution, logger utilities at application level, error recovery and tracking system some of which are part of ongoing investigative activities.

 

 

STAR/ITD and iVDGL

Production level data replication across the US Grid have been demonstrated as being a successful first step, but data production/data mining remains at an embryonic stage (developer level) and requires hardening and interoperability before a generalized use amongst the user community. The ongoing Physics program, with increasing data throughput, drives the need for use of distributed resources available across collaborating institutions, some of which include Europeans collaborators. In fact, no less than four institutions will join our Grid efforts in the coming months and put us in a good position to test the many facets of the actual grid infrastructure that is, not only under a controlled Monte-Carlo like driven testing but also at user analysis job level and interactive analysis using the Grid.

The STAR collaboration would greatly benefit from joining the iVDGL collaboration and we believe this benefit will be mutual.

·         Several ongoing activities are overlapping with iVDGL interest (monitoring, possible testing and/or deployment of VDT, use of MDS etc …) and bringing our experience and feedback to the source of the activities, while benefiting from a greater awareness of direction and development, will lead to a net resource benefit by avoiding activity replication.

·         IVDGL has extensive experience in packaging and distribution of standard toolkit. We hope to benefit from this experience.

·         While iVDGL is aimed toward driving the Grid to an every day production use of Petabyte scale, we believe that, as an active on going experiment, we can truly bring this goal to reality: we are an active experiment with real PetaByte scale data samples every year, resource constrained we do need to migrate both our production and user analysis to a Grid infrastructure. In other words, we believe that existing experiments, with a large number of potential early users of the Grid, offer a superb and immediately available test-bed for Grid infrastructure and developments.

·         In this context, we believe we can play an important role in projects such as Grid3. While shaped around providing infrastructure and services for LHC-oriented production, we firmly believe that running experiments offer a unique (user) environment in which base components of the Grid can be stress tested and therefore, allow for catching possible problems at an early stage. We see this as a major strength of our possible collaborative efforts as robustness and scalability issues of the components can be immediately assessed by our community.

·         Sharing the same facilities with other experiments, participation to existing collaboratory pilot projects is fundamental to us as we can cross benefit from development, experience and manpower put into other areas, allowing for a faster convergence toward our Grid activity goals.

·         Furthermore, we view the participation and emancipation of the IT department at the Brookhaven National Laboratory as a crucial ingredient toward a long term successful Grid deployment.  In the past involved in activities focused around cyber-security, networking, user and general application service, etc …, the IT department has showed a strong interest in Grid activities and joined the STAR team in the PPDG effort. This unprecedented collaborative effort between STAR (Physics department) and the IT department was surprisingly not attempted by anyone before. To broaden the scope of Grid activities outside the Physics department where it is currently confined, and bring Brookhaven Grid to a Laboratory wide level, we believe a strong participation of the IT department in iVDGL activities to be an essential step toward success. It is clear in our mind that since the Grid activities are pushing the frontier of networking, security, software application in general, it is natural to turn and include in the long term planning our local IT team having both most experience in dealing with a large heterogeneous community and lab-wide responsibility in needed activities. Participation in issues such as (as an example) security is essential to ensure compatibility between the Grid ideals and the DOE laboratory regulations. As a facility support group, participation and teaming with iVGDL / iGOC is natural.

 

Conclusion

The STAR team, with help from the IT department, has engaged itself in achieving several important tasks on the path toward consolidating its grid infrastructure and achieving a fully deployed Grid environment by 2005. However, much is still to be done. In order to achieve such an environment, including its operational aspects, it is appropriate for us to explore new approaches to sharing and competitiveness, and to seek new partnerships within the field and with those outside the field. Furthermore, long term planning and comprehensive Grid strategy is an essential component to our success and within this scope, this document is and opening to partnership with iVDGL.