The STAR Run Control Operation Manual

Contacting Experts

Usually, problems that will come up during running are not problems with the run control, but rather problems with the systems that the run control is controlling. Using the procedures in this manual you should be able to determine which subsystem you are having a problem with and contact the appropriate expert. To do this, you should consult the "Expert Schedule" for each subsystem. For problems with the run control system itself contact:

Jeff Landgraf (Cell Phone xxx-xxxx:email jml@bnl.gov)

I. Introduction

The STAR run control system is designed around a large number of competing requirements. In short, it is supposed to make STAR easy to use, while allowing reasonably complete control of the STAR Trigger, DAQ, L3, and detector systems.

The Run Control system controls only the parts of STAR that are actively setup from run-to-run. These systems are trigger, L3, and DAQ. The detector subsystems slow controls are all separate from the Run Control, and must be turned on before any data taking is started. When a detector or subdetector is referred to in the Run Control system, it is that detectors data acquisition computers that are actually being discussed. Although certain problems, (ie. not turning on the TPC FEE's), can be debugged using the Run Control system, for the most part, a problem with one of the physical detectors can not be discovered from the run control. One must resort to the alarms on the slow controls system for each detector and/or to the online monitoring software to determine whether they are working properly.

The Run Control system will usually be running on the computer RTS02. If it is not running, see Restarting the Run Control below.

II. How to determine the state of STAR

Global overview of system state

The STAR Run Control main panel displays the state of the main subsystems at a glance. The boxes in the upper left hand corner refer to run-time and detector subsystems. The color indicates the state of the system:

There are three reasons that a system can be disconnected from the run control.

The PRESENT and READY states are the same from the purpose of the run control. The difference is that most systems are placed into the present state imediately after boot. The ready state is reserved for systems that have been configured at least one time. This state means that the system is in a quiescent state. Its configuration may be changed, or the run can be started.

The RUNNING state means that the system is taking data. When the system is running you can not change its configuration. The only meaningfull actions are to STOP or PAUSE the run.

The PAUSED state is only meaningfull for the trigger. This means that the trigger has temporarily stopped issuing new events. This can be either because the operator has explicitly paused the run, or else because the trigger has finished sending all of the events that were requested. When the system is paused, the operator may either issue more events, or stop the run.

Finally, the ERROR state means that something is wrong with one or more of the subsystems. This can arise from many different circumstances, some of which will be detailed below.

In addition to the above states, it is also possible to have a box that is flashing. This means that the system is changing from one state to another. Depending on the specific conditions of the run, starting and stopping runs can take anything between one second and two-three minutes. If the system seems to be hung for an unusual amount of time, you should check out the debugging section of this document.

All of the systems have multiple components, so the color shown in the system box is summary of the states of all of the its components. The states of the individual components are displayed on the Show Component Tree screen. You might see the some boxes flash odd colors (RED) for short amounts of time when starting or stopping runs. This is not a problem, it simply means that some components of the system transitioned to their final states at different times.

In addition to the system boxes, there is also some text information displayed on the main run control panel. This includes the following information:

Finally, at the bottom of the main screen is a box in which messages to the operator may scroll by.

Detailed System State

The first step in debugging any problem that becomes visible on the main screen, is to determine the specific components that are having trouble. To do this press the "Show Component Tree..." button on the main screen. You should get a screen that looks like:

By default, only the components that are in the run will be displayed. If you select the "Show all Nodes" box, then all possible components will be displayed. In this screen, the state of each component is reflected in the icon displayed on the left of the component. The color code is the same as for the system boxes on the main screen, although in this case, the state represents the actual state of the component rather than a summary of the states of all the components that are part of the system.

The relationship to the states displayed on this screen and the states displayed for the same components on the DAQ Monitoring system is useful to understand. The states displayed on this screen are a reflection of the what the run control handler thinks the state of the system is. One the other hand, the DAQ Monitoring system displays the state as it comes directly from the component. While descrepencies between the two are possible for a correctly functioning system, they usually indicate a problem with the run control.

III. How to take data

Taking data should be very simple. To do so, you need to find out from the shift-leader what run configuration to use, and this configuration must be prepared. If this is true, then just follow these steps:

Starting a run:

  1. Select the detectors that you want in the run: to add or remove a detector, simply left click on the box for the system you are interested in.
  2. Select the Run Configuration from the drop down box on the main panel.
  3. Select a data destination (Usually RCF) from the drop down box on the main panel.
  4. Press the "Start Run" button.
  5. You will be asked how many triggers you wish to send. Enter a number (the number can be zero. If so, the run will start up in a paused state, and you can issue the triggers later).
  6. Wait (five - sixty seconds). When all of the system boxes are green, the run is started.

Pausing a run:

The run will automatically pause when all the triggers you requested have been issued. If you want to pause the run explicitly, follow these steps:

  1. Press "Stop Triggers"
  2. Wait for the trigger system box to turn yellow (five - sixty seconds). When the trigger system box is yellow, the run has been paused.

Resuming a paused run:

  1. Press "Issue Triggers..."
  2. Enter the number of triggers you want to send

Stop a run:

  1. Press "Stop Run"
  2. You will be asked if the data was usefull. Check the appropriate box. If you check no, then the run will be marked in the run log database as "JUNK" so that offline will ignore the data. In addition, if the data is being written to local disk, then the files from this run will be renamed with the prefix "JUNK_". These files will then automatically be deleted from the local disks at some time period. You should only answer "no" if the run should not be analysed for some specific reason (ie. A test run, or FEE's were not powered on, etc...)
  3. Wait for the system buttons to turn blue. (This step can take up to a few minutes while the buffers are written to storage.)

Make a comment for the Run Log

  1. Press "Enter Comment"
  2. Enter the comment.

IV. Run Configurations

The run configuration for STAR is non-trivial. Each of the run time systems, Trigger, DAQ, and L3 is complicated. The configuration of the trigger, at its lowest level, requires specifying which code is running in the DSM boards, documentation of the TCU inputs that result from this code, the mapping of these inputs to the generation of triggers, the setup of interspersed triggers, thresholds, and L2 and L3 algorithms and parameters. The DAQ system parameters describe various kinds of processing (cluster finding, pedestal calculation, zero suppression, gain correction, L3 processing, ASIC parameters, DATA output formats) which depend on the type of run as well as the detector and even subdetector.

For these reasons, shift operators are not expected to modify most parameteres. Run Control enforces this by disallowing shift users to change them. However, all users are encouraged to view the settings. These parameters can be displayed by selecting the "Edit Configurations..." button on the main screen. They are also automatically saved to the run log database. <\p>

Some of these parameters are responsible for modifying the character of the the data run. For instance a centrality trigger has a completely different purpose from a peripheral trigger. In the same way, a pedestal run is processed in a completely different way than a physics run. However, there are many parameters set up by the run control that change the behaviour of the system without changing the character of the run. For example, a physics run with a central trigger has the same meaning even if one TPC receiver board is removed from the run. This also holds for entire detectors. A pedestal run for the TPC and for the SVT has the same character, even if the pedestal subtraction rule is slightly different for each detector. In addition, there are also parameters that should have no effect on the intended use of the data at all, but can have important ramifications for operations (ie. debug options, magic tokens, data destinations).

The organization of the run control parameters was designed to minimize the unnecessary "outer-product" proliferation of configuration files, while retaining full control of parameters and a full history (through the databases) of runs. To do this the configuration is broken up into three catagories:

A. Run Parameters vs. Setup Parameters

B. The run parameters

1. Trigger

2. DAQ

3. L3

4. Slow Controls

C. The setup parameters

1. Trigger

2. DAQ

3. L3

4. Slow Controls

V. Run Control Architecture (How it works)

A. Run Control Handler

B. GUI Architecture

1. Configuration Editor

2. Communications

3. Databases

VI. Frequently Asked Questions

General Setup

  1. RTS02 is logged off. How do I start the Run Control?

    1. Log on to RTS02 using the operator account. (The password should be displayed near the computer).
    2. Double Click the "Star Run Control" icon on the desktop.

  2. Why are all of the system buttons black after I start the Run Control?

    This could mean that somebody took all of the systems out of the run. However, most likely, the run control handler is not running. To check this:

    1. Log on to daqman using the operator account. (The password should be displayed near the computer).
    2. Type "ps -ef |grep handler"
    If the handler is not running, then:
    1. Type "cd /RTS/bin/HANDLER"
    2. Type "starthandler.sh"
    and restart the Run Control GUI.

Stopping the run

  1. The run stopped and I didn't do anything. Why?

    The various components of the system can each run into error conditions and request a stop run. When this happens, the run will stop automatically, but a message should be displayed on the screen describing the component that requested the stoprun along with the reason.

  2. I pressed stoprun and its taking forever. Why?

    The run doesn't get marked as complete until the data has been stored. There can be up to 1.5GBytes of data in the Buffer Box's memory buffer that need to be transferred and this will take at least a minute to transfer.

  3. Stoprun is really taking forever, how can I tell if anything is happening at all?

    Go to the DAQ Monitoring display and check the "RCF Taper" node. If the number of events in is decreasing, then the data is still being written.

  4. I checked the DAQ Monitoring display, and nothing was changing. How can I kill the run and try again?

    Press stop run again. You will be informed that the run is already being stopped, and asked if you want to force the stop. If you force the system to stop, there will be some repercusions on the data. First, the unwritten data will be lost (This should only be the very end of the run). Second, some of the end-run database records might not be properly updated.


Jeff Landgraf
Last modified: Fri Jul 6 14:42:28 EDT 2001