Table of Contents
1
Introduction................................................................................................................................................................................... 2
2 PPDG Year 3 Plans......................................................................................................................................................................... 3
2.1 Action Items from ‘03 Questionnaires................................................................................................................................ 4
2.2 File Cataloguing, Replication and registration projects, Storage
Management........................................................... 4
2.2.1 Deliverables..................................................................................................................................................................... 4
2.2.2 Milestones....................................................................................................................................................................... 4
2.2.3 Participants...................................................................................................................................................................... 5
2.2.4 Dependencies................................................................................................................................................................. 5
2.3 VO registration, membership handling, Grid infrastructure............................................................................................. 5
2.3.1 Deliverables..................................................................................................................................................................... 5
2.3.2 Milestones....................................................................................................................................................................... 5
2.3.3 Participants...................................................................................................................................................................... 6
2.3.4 Dependencies................................................................................................................................................................. 6
2.4 Job scheduling and job description.................................................................................................................................... 6
2.4.1 Deliverables..................................................................................................................................................................... 6
2.4.2 Milestones....................................................................................................................................................................... 6
2.4.3 Participants...................................................................................................................................................................... 7
2.4.4 Dependencies................................................................................................................................................................. 7
2.4.5 Issues and Concerns..................................................................................................................................................... 7
2.5 Database services integration to the Grid.......................................................................................................................... 8
2.5.1 Deliverables..................................................................................................................................................................... 8
2.5.2 Milestones....................................................................................................................................................................... 8
2.5.3 Participants...................................................................................................................................................................... 8
2.5.4 Issues and Concerns..................................................................................................................................................... 8
3 General concerns........................................................................................................................................................................... 8
4 PPDG Year 4 and 5......................................................................................................................................................................... 8
5 Background.................................................................................................................................................................................... 8
5.1 Summary Table from PPDG proposal.................................................................................................................................. 8
5.2 Evaluation and Continuing Support................................................................................................................................. 10
The STAR experiment is one of the four Relativistic Heavy Ion Collider (RHIC) experiment at the Brookhaven National Laboratory (BNL). As a Nuclear Physics experiment however, the STAR experiment records massive amount of data: up to a PByte of data a year (10^12 bytes) is being accumulated representing to date an integrated total size of 3 PB spanning over 3 million files. The projection for the next RHIC run (also called, Year4 run), shows and increase by a factor of five on the number of collected events. Not only managing such a vast amount of data, but also processing it, will become more and more problematic as our Physics program evolves toward the search for rare probes.
The STAR-PPDG team has therefore an important role: prepare and design the computing infrastructure necessary for coping with such reality. In that regard, our past years activities have been clearly focused on the realization of a production Grid infrastructure without disrupting our main RHIC program that is, our objectives and primary mission to the DOE : delivering science. Our main strategy was therefore oriented toward the introduction of either additional (low risk) tools or providing to the user stable and robust APIs shielding our scientists from the fabric details.
Along the past years, we have therefore collaborated with computer scientists, mainly the SDM Group, to first deliver a production level data transfer for our two main facilities, BNL and NERSC. Now in full production since early 2002, the file transfer alone allows us to de-localize our still-standard analysis approach over two sites. Over 20% of our data is reliably shipped between the two sites and we are now in the consolidation phase of the tools and software from the SDM group. The use of this tool enables us to remove the mindless task data management is, allocating the manpower on the next phase of Grid tools and needs. The second year saw its focus on the development and release of several projects which intent was to prepare our users for running on the Grid. Amongst those, the legacy File Catalog is being phased-out and re-designed to accommodate for a replica and meta-data Catalog. This rendered possible the emergence of new activities relying on dataset or logical collection queries rather than direct access to Physical location of the files. Two projects were born or enhanced by this work:
Additional, several activities were started in collaboration with our local Computer Scientists from the Information and Technology division (ITD) at BNL. We were pleased of the addition of the personnel from the site-facility and could start, early 2002, a trail of activities of general interest for the community.
Considering the work done and objectives needed for a running experiment, the STAR PPDG supported team is however composed only of a total of ~ 1 FTE namely, Gabriele Carcassi (BNL) and Eric Hjort (LBL), the core of the manpower currently arising from collaboration with computer scientists from within our outside the scope of PPDG, internal additional manpower and external funding.
As outlined in the introduction, a real challenge is ahead of the STAR collaboration for the experiment Year4 run. The long Gold on Gold running period will lead to an unprecedented amount of data which will require drastic change in the computing model, waiting for years for the data to be analyzed not being an option. In that regard, it will be a test of the strength of choices in the adoption of some of the Grid technology. Although foreseen, our general goals include
Consolidation and finalization of our File or data collection Replication mechanism: To this end, we must
Finalize our monitoring strategy and find/deploy/adopt a common set of tools suitable for most (if not all) experiments at our respective shared facilities.
Finalize the convergence toward a site registration authority agent. Work with facility personnel to define and deploy a STAR “VO” and evaluate existing tools for VO membership handling and account mapping.
Full migration of some work load to the grid, two of which is within our reach:
While our focus must remain on the above objectives, GT3 recent release will ultimately lead to an increase usage in the community: its attempt to generalize services through the OGSA and Web Services approach using a potentially interoperable way to specify services (WSDL) makes it an attractive and awaited stable product we cannot ignore. While a full deployment of GT3 is seen as part of our Grid Year 4 activities and hardening, we will actively work on migration of our existing tools toward a Web Service oriented approach (WSDL/G-SOAP).
We will try to go in more details in the following sections.
Considering past activities and our objectives, it is clear that collaboration with the Computer Science is essential.. We hope to pursue our good collaboration with the members of the SDM group and continue the work we have started on many front and bring to the community the next wave of robust, well defined and polished set of tools. We were particularly pleased of the attention and efforts this group provided to our real-world experience and problems and are comfortable with the current convergence of vision and mutual driven benefits.
We are also seeking a more direct and intense collaboration with the Condor team as our job submission scheme somehow depends on it. We need immediate technical advice and evaluate the robustness of the submission through Condor-G considering the initial target of an average 5k job a day with peak activities up to 20k jobs a day. Later, we would like to take advantage of the matchmaking mechanism Condor provides. At the last PPDG collaboration meeting, we understood this collaboration to be a granted coming activity and surely hope to further increase the use of Condor and Condor-G throughout our facility if this would be the case.
We are planning to intensify our collaboration with the Globus team and especially, test and provide feedback of the GT3 release as requested. We see this activity as being mutually beneficial: our feedback will hopefully lead to a faster convergence toward a stable release and our involvement at an early stage will allow us to be ready for our planned migration from GT2 to GT3.
On a different front, with the soon joining of our European colleagues in our grid efforts, we hope to fully start the evaluation of the OGSA-DAI and Grid Data Service (GDS). Our goals are to evaluate the existing software and the DAIS specifications from the UK e-Science, document wherever necessary the interface and service description, study its applicability and usability for the general US community, work on the definition and consolidation of schemas and converge toward a fully functional Web Service oriented access for the commonly used database: Oracle, MySQL, Postgres, etc … to name only those.
Our current file and replica catalog is fully operational at BNL but partly operational at NERSC. On a local level, all replica made at BNL is currently registered instantaneously in our File Catalog. This is not yet the case when files are transferred over the Grid using the HRM tool. Work was started to converge toward a Replica Registration Service to account for this discrepancy. This effort will further evolve toward a fully functional and robust replica management system.
We would greatly benefit from following closely the development of Web Service for SRM and will commit to the deployment and when available. In fact, the use of such interoperable approach would render the joining of our European collaborators easier and open new avenues.
Finally, we intent to continue to test, support and propagate the use of SRM in the community.
By a continuous effort on the Replication and Registration Service (RRS), we hope to define and document a common specification for the replication of physical files, datasets or collections and their registration in “a” catalog. We will describe below the steps necessary for both the full use of our Catalog and the finalization of the RRS. We also hope to later replace our local BNL replica registration approach by a fully deployed and integrated DRM/HRM approach, completing the work on file replication.
An early participation in the Web Service for SRM activity would allow us to further extend the use of this tool across borders.
The total PPDG funded FTE-equivalent is projected to be
Further internal manpower for the deployment on another facility will be provided through external funding.
This work is self contained and requires cooperation from the SDM team for the RRS/HRM interaction. The job scheduling program relies heavily on an accurate Cataloguing of our data collections and therefore of this project success.
Priorities: RRS and FileCatalog deployment is of the highest priority; testing of new features will be accounted for as time allows (SRM/WSDL first, V2.x)
Currently, the STAR
grid infrastructure is handled by having users requesting certificates from the
DOE science
The Grid activities at BNL are currently not well centralized. Especially, the infrastructure do not include a common strategy for error reporting, diagnostic, help desk, consistent software deployment. Furthermore, the Grid activities do not connect well with the internal security policies in effect at BNL. As an example, DOE cyber-security policies require a centralization of the user account information. This is completely disconnected to any VO and certificate information.
Since our team also include members of the ITD personnel, we hope to drive and effort toward the consolidation of the local infrastructure, including, the delocalization of the registration authority agent for BNL, a stronger participation of the ITD security personnel in the Grid activities, the deployment of a common set of tools for bug tracking, problem reports, mailing lists, help desks and the continuous support for Grid tutorials and lecture to the community at large.
The same emphasis and effort will be the focus and effort we commit to dedicate to the convergence toward the deployment of a common set of tools for monitoring the fabric.
Finally, we will be helping in testing and, as far as possible, using VDT as part of our infrastructure.
We identify three distinct projects we will tag in our milestones as
[P22] will be kept from within the STAR specific available manpower in conjunction with the PDSF personnel. A total 3 FTE months will be required for documentation of the VO, evaluation of user management tools, deployment and consolidation.
The two other projects will involve local Information & Technology teams (ITD) with guidance from the experiments (STAR, Phenix and Atlas).
Some of the activities will be part of the DOESG VO Management project.
Note: An agreement with experiments on this project has not been made to date and this section will have to be reshaped as the situation clarifies.
Priorities: Registration, VO and Grid User Management system are extremely important to open the Grid to our users. Monitoring will somehow have less emphasis.
The STAR team will
focus on the consolidation and further development of the Job scheduling user
interface in the coming year. We are
planning to proceed by steps with clear milestones for
Our main objective is to provide a fully functional tool and Wrapper for Grid Job Submission by further developing the STAR Scheduler [W-GJS], define and implement at all level of infrastructure a logger utility [L-GJS], to document the requirements for a high level and develop a U-JDL schema [U-JDL]. We would also transition our current job scheduling tool toward a Web-Service based approach and deploy the service to more one or more site facility. This later activity will be part of joint effort between STAR and Jefferson Lab (see text in appendix).
The U-JDL and definition will require a 3-5 FTE month (depending on participation from other teams). This work will be done in collaboration of the JLab team identified as one of the main collaborating team.
The first part of the current Scheduler functionalities consolidation will require a 1 FTE month. The Phenix collaboration has expressed interest in evaluating our current tool and investigates the integration of their replica catalog. Java-Classes oriented, we do not expect this to be a problem. If come to realization, we will provide technical assistance and guidance through our conceptual design and hope the feedback interaction will help enhancing the current design.
3 FTE months (optimistic) time will be needed for the deployment, stress and regression test of the local Condor-G implementation and reach MC level testing.
Consolidation of the local submission through Condor-G, assistance and of the MC production and further development of the user level job submission will likely absorb a total of 3 FTE months.
The migration to a Web-Service based client will be pat of the STAR/Jlab joint project and is estimated at a 50% of 6 FTE months spanned over a longer time period. Additional manpower may be recruited as necessary.
Logging implementation will require either a collaborative effort or internal manpower. We believe this task can be accomplished using a 4 FTE month equivalent.
Dire need for an increase support from the Condor team; we will also need for a better communication mechanism for reporting problems (if any) to the globus team.
Our mentioned objectives of 20k job submission a day may stress test the current Globus infrastructure in ways not observed before. We can only repeat and stress one essential component: Support and collaboration from the Condor and/or Globus team is essential and may very well be the key to our success.
This activity relies heavily on the convergence and rapid hardening of tools and infrastructure beyond our control. Here again, a good communication with the Globus team is essential to our success.
This activity alone will allocate 18 FTE months equivalent.
Priorities: We will invest and focus our efforts on the completion
of the submission using Condor/Condor-G as it appears to be a more immediate
achievable path toward our goals. A strong emphasis and push will need to be
made on the resolve of any Condor-G or Globus problems encountered. This will
be out top priority activity.
Re-designing the Scheduler architecture and its evolution toward a WebService
architecture is next on our priority scale. Logger utility will remain at a
somewhat lower level and will be driven by internal STAR manpower as it
involves code and infrastructure re-works.
Document the specifications and requirement for the gridification of a database solution. Implement within MySQL as an example and bring the GSI-enabled MySQL work back to the community at large.
Evaluate the usability of the OGSA-DAI release 2 (or later) and the DAIS specification, work on the definition and /or consolidation of schemas converging toward a Web Service oriented access to databases.
We will seek funding for this project and provide internal manpower and expertise as well as test bed for the application to a Catalog connection and condition database access. 6 to 10 FTE months equivalent at will be necessary for bringing this project to completion.
This work is needed prior from running real-life data mining on the Grid. The timescale being short, we would require assistance and help from the PI and executive committee members to shape such proposal. The time schedule, voluntarily fuzzy and volatile, will depend on funding and/or manpower availability.
Currently helped by PPDG funding at the level of 1 FTE year, our current plan layout requires 45 months FTE equivalent. Only 12 months are covered by direct funding, the rest is planned to be covered by collaborative efforts with other teams (SDM, Condor, Jlab)., internal manpower driven at great cost from within the experiment and external funding. The failure risk is high while the need for a transition to a Grid model for computing is within a year. This situation will ultimately lead to a dilemma.
For all project, we would like PPDG to concentrate its efforts on a strategy for end-to-end diagnostics, resource management policies at all levels and error recovery. Those two areas cannot be accomplished in a short time scale but are however necessary for the consolidation of our production Grid infrastructures.
The STAR team will concentrate its efforts on logging utilities and further database developments.
This is the summary table from the PPDG proposal. The full proposal is available at http://lbnl2.ppdg.net/docs/scidac01_ppdg_public.doc
Project Activity |
Experiments |
Yr1 |
Yr2 |
Yr3 |
CS-1 Job Description Language – definition of job processing requirements and policies, file placement & replication in distributed system. |
|
|
|
|
P1-1 Job Description Formal Language |
D0, CMS |
X |
|
|
P1-2 Deployment of Job and Production Computing Control |
CMS |
X |
|
|
P1-3 Deployment of Job and Production Computing Control |
ATLAS, BaBar, STAR |
|
X |
|
P1-4 Extensions to support object collections, event level access etc. |
All |
|
|
X |
CS-2 Job Scheduling and Management - job processing, data placement, resources discover and optimization over the Grid |
|
|
|
|
P2-1 Pre-production work on distributed job management and job placement optimization techniques |
BaBar, CMS, D0 |
X |
|
|
P2-2 Remote job submission and management of production computing activities |
ATLAS, CMS, STAR, JLab |
|
X |
|
P2-3 Production tests of network resource discovery and scheduling |
BaBar |
|
X |
|
P2-4 Distributed data management and enhanced resource discovery and optimization |
ATLAS, BaBar |
|
|
X |
P2-5 Support for object collections and event level data access. Enhanced data re-clustering and re-streaming services |
CMS, D0 |
|
|
X |
CS-3 Monitoring and Status Reporting |
|
|
|
|
P3-1 Monitoring and status reporting for initial production deployment |
ATLAS |
X |
|
|
P3-2 Monitoring and status reporting – including resource availability, quotas, priorities, cost estimation etc |
CMS, D0, JLab |
X |
X |
|
P3-3 Fully integrated monitoring and availability of information to job control and management. |
All |
|
X |
X |
CS-4 Storage resource management |
|
|
|
|
P4-1 HRM extensions and integration for local storage system. |
ATLAS, JLab, STAR |
X |
|
|
P4-2 HRM integration with HPSS, Enstore, Castor using GDMP |
CMS |
X |
|
|
P4-2 Storage resource discovery and scheduling |
BaBar, CMS |
|
X |
|
P4-3 Enhanced resource discovery and scheduling |
All |
|
|
X |
CS-5 Reliable replica management services |
|
|
|
|
P5-1 Deploy Globus Replica Catalog services in production |
BaBar, JLab |
X |
|
|
P5-2 Distributed file and replica catalogs between a few sites |
ATLAS, CMS, STAR, JLab |
X |
|
|
P5-3 Enhanced replication services including cache management |
ATLAS, CMS |
|
X |
|
CS-6 File transfer services |
|
|
|
|
P6-1 Reliable file transfer |
ATLAS , BaBar, CMS, STAR, JLab |
X |
|
|
P6-2 Enhanced data transfer and replication services |
ATLAS, BaBar, CMS, STAR, JLab |
|
X |
|
CS-7 Collect and document current experiment practices and potential generalizations |
All |
X |
X |
X |
Also from the proposal. Please could you comment on these items from your teams perspective:
“During the last phase of PPDG we will:
Web services are now established as viable component architecture for constructing large, distributed systems. Good progress has been made on data grid components, in particular the Storage Resource Manager Interface specification, for which multiple, interoperable implementations are being developed. The logical next step is to begin to prototype the deployment of computational components, in particular meta-scheduler Web Services. Both the STAR experiment and Jefferson Lab are ready to begin this prototyping effort, having already deployed SRMs and other XML technology. These two groups propose, by studying the architecture components of the STAR scheduler[*] or the JLab AUGER system[†], to develop a common meta-scheduler Web Service interface definition, analogous to the SRM data management interface specification, as part of their respective Year 3 PPDG activities. The prototype implementations will immediately serve the respective communities, and will serve as valuable input to a larger community effort.
Web services are typically described using WSDL, with remote service invocations being conveyed via XML encoded messages. These two aspects are ideal for constructing loosely coupled, independently evolving components. Firstly, because the WSDL describes all aspects of the call without regard to implementation, a specification described in this way is by construction an interoperability specification. Secondly, since XML passes data by name and not by position, it supports adding new features without breaking backward compatibility (i.e., schema evolution, with only a few constraints on naming). In particular, optional arguments or returned values can be easily added without breaking old clients or servers; changes can be transparent.
The SRM interface specification, developed collaboratively and now supported by multiple independent implementations, has demonstrated the value of this approach in a real world data grid component. Additional data grid components are needed, and are beginning to be targeted. In the coming year, both STAR and Jefferson Lab will also need to deploy computational grid components, and would like to take a similar approach for these as has worked so well for the SRM.
The STAR Scheduler project is the core software used by the STAR collaboration in order to submit jobs on a farm, site or the Grid. Initially using a simple description with lists of physical files, the tool has now interfaced with the STAR File and Replica Catalog allowing for the use and discovery of files or data collections statically distributed on the fabric. Coupled with a converging XML based U-JDL (user job description language), this tool is part of a main strategy to slowly move users to a submission model based on Grid resources. The U-JDL approach was chosen to allow the user to advertise to the Scheduler his intent (which dataset would be processed for example) rather than specifying the way the job has to be run, the meta-scheduler therefore serve the function of a basic resource broker and discovery.
The current envisioned approach for Grid submission relies on the use of Condor-G (in development). A Web-based job monitoring and statistics gathering (including usage, option tracking etc …) is provided as a first approach for job monitoring. In an early development phase, it has lower priority as the future direction would greatly benefit from a design work and strengthening.
While the architecture is currently well defined, it is clear however that its separate components would need to be consolidated and re-engineered to account for the road map laid out by emerging technology such as GT3 and the OGSA effort. Additionally, some efforts are required to implement features such as dynamic flow control for the required input for a job and testing of interoperability, to name only those.
Jefferson Lab has recently deployed Auger, a replacement for the JOBS batch job system. Auger is slated to have an XML based web services interface this coming year. Auger serves an overlapping set of functionality as the STAR’s scheduler, but without the connection to the experiment data and replica catalog (yet).
Since both sites are currently engaged in XML and web services for meta-scheduler based developments, the JLab and STAR team therefore propose to work together to achieve this goal in a few steps:
Many of those activities are either started or part of our mutual plans at this stage (job submission, JDL or U-JDL, Web service approach and XML schema to name only those) so working together will reduce the total effort and yield to re-usable solutions. Moreover, the different computational approaches of the two groups working on a common goal will likely lead to a stronger and better defined solution.
Specific development activities will become more clearly defined as this activity goes along, but certain components can already be anticipated, and are described in the following section.
The following is a list of development goals leading towards a web services based meta-scheduler system for both STAR and JLab.
1. Prototype web service component interface (near term)
a. Meta Job described in XML, with tags sufficient for near term needs (U-JDL definition)
b. inputs and outputs can reference SRM managed data or RC (replica catalog) entries
c. Batch capabilities will include all basic operations of PBS, LSF, Condor, … (submit, status, suspend, resume, cancel or kill) and will support both serial and parallel jobs (fixed size, not dynamic in first version). Will collaborate as much as achievable with similar efforts aimed toward a general batch Web Service interface definition.
2. Implementation for single site over PBS and/or LSF and/or
Condor.
3. Basic meta-scheduler implementation
b. Will use SRM / RC to co-locate job and data, then either dispatch to site meta-scheduler web service or use existing Web Service components as available (Condor-G Web Service for example)
c. Some form of load monitoring will be employed (not ultimate in first version), and multi-site fair share will be attempted Experience and components used by the STAR Scheduler may be employed.
Accepted Constraints on the Prototype
1. Grid monitoring and job tracking may remain at a very
rudimentary level in 1st year
2. Network bandwidth estimates may be poor in 1st year
3. Job graphs may not be supported in first year (multi-site
dependencies)
4. Proxy credentials may need to be long enough lived for entire
batch job
5. Executables must be available for target platform (no
auto-compilation or bundle approach)
Additional external team effort will be seek for the consolidation of this project as required and as it generates interest within a larger audience.
Possible Future Extensions
· Job Portal as a client above the web services: interactively aides in generating meta-jobs (multi-file and parameter sweep), remembers last parameters so simple changes are easily generated, etc.
There exists the possibility that a number of client tools, and ultimately a grid meta-scheduler, could be shared by multiple experiments.
This specification and corresponding implementations, will build upon (use) the SRM interface for file migration. It will need at least some level of Replica Catalog lookup and publish, and if no standardization of that has occurred, an interim joint standard will be developed for use by STAR and JLab. Similarly, interim or standard definitions of web services based monitoring of batch sites (compute elements) for load will be needed. All such interfaces used will be carefully documented so that the implementations can be adapted to other systems. Finally, some standardization of a mechanism for multi-site logging of usage would be nice so as to achieve multi-site fair share (minor goal or multi-year goal, to be implemented if resources permit).
The RHIC/Phenix collaboration has approach the PPDG
collaboration with intent of participation. The document can be found at http://www.ppdg.net/docs/PPDGYear3/phx_ppdg.doc
.