Online Linux pool

Under:

Online Linux pool for general experiment support needs

 

GOAL: 

Provide a Linux environment for general computing needs in support of the experiemental operations.

HISTORY:

A pool of 14 nodes, consisting of four different hardware classes (all circa 2001) has been in existence for several years.  For the last three (or more?) years, they have had Scientific Linux 3.x with support for the STAR software environment, along with access to various DAQ and Trigger data sources.  The number of significant users has probably been less than 20, with the heaviest usage related to L2.  User authentication was originally based on an antique NIS server, to which we had imported the RCF accounts and passwords.  Though still alive, we have not kept this NIS information maintained over time.  Over time, local accounts on each node became the norm, though of course this is rather tedious.  Home directories come in three categories:  AFS, NFS on onllinux5, and local home directories on individual nodes.  Again, this gets rather tedious to maintain over time.

There are several "special" nodes to be aware of:

  1. Three of the nodes (onllinux1, 2 and 3) are in the Control Room for direct console login as needed.  (The rest are in the DAQ room.)
  2. onllinux5 has the NFS shared home directories (in /online/users).  (NB.  /online/users is being backed up by the ITD Networker backup system.)
  3. onllinux6 is (was?) used for many online database maintenance scripts (check with Mike DePhillps about this -- we had planned to move these scripts to onldb).
  4. onllinux1 was configured as an NIS slave server, in case the NIS master (starnis01) fails.

 

PLAN:

For the run starting in 2008 (2009?), we are replacing all of these nodes with newer hardware.

The basic hardware specs for the replacement nodes are:

Dual 2.4 GHZ Intel Xeon processors

1GB RAM

2 x 120 GB IDE disks

 

These nodes should be configured with Scientific Linux 4.5 (or 4.6 if we can ensure compatibility with STAR software) and support the STAR software environment.

They should have access to various DAQ and Trigger NFS shares.  Here is a starter list of mounts:

 

Shared DAQ and Trigger resources
SERVERDIRECTORY on SERVERLOCAL MOUNT PONTMOUNT OPTIONS
 evp.starp /a /evp/a ro
 evb01.starp /a /evb01/a ro
 evb01 /b /evb01/b ro
 evb01 /c /evb01/c ro
 evb01 /d /evb01/d ro
 evb02.starp /a /evb02/a ro
 evb02 /b /evb02/b ro
 evb02 /c /evb02/c ro
 evb02 /d /evb02/d ro
 daqman.starp /RTS /daq/RTS ro
 daqman /data /daq/data rw
 daqman /log /daq/log ro
 trgscratch.starp /data/trgdata /trg/trgdata ?
 startrg.starp /scratch/startrg/data/scalers /trg/scalers ?
 online.star /export /onlineweb/www rw

 

 

WISHLIST Items with good progress:

  • <Uniform and easy to maintain user authentication system to replace the current NIS and local account mess.  Either a local LDAP, or a glom onto RCF LDAP seems most feasible> -- An ldap server (onlldap.starp.bnl.gov) has been set-up and the 15 onllinux nodes are authenticating to it.
  • <Shared home directories across the nodes with backups> -- The ldap server is also hosting the home directories and sharing them via NFS.
  • <Integration into SSH key management system (mechanism depends upon user authentication method(s) selected).> --  The ldap server has been added to the STAR SSH key management system, and users are able to login to the new onllinux nodes with keys now.
  • <Common configuration management system> -- Webmin is in use.

 

WISHLIST Items still needing significant work:

  • <Shared home directories across the nodes with backups> -- still needs a backup system in place.  EMC Networker would seem to be a natural choice.
  • <Ganglia monitoring of the nodes> -- Since we don't have any Ganglia monitoring for online nodes working at this time, this is on hold.
  • <Osiris monitoring of the nodes> -- Should be easy to add to the existing osiris system.