Jeff Landgraf (Cell Phone xxx-xxxx:email jml@bnl.gov)
The STAR run control system is designed around a large number of competing requirements. In short, it is supposed to make STAR easy to use, while allowing reasonably complete control of the STAR Trigger, DAQ, L3, and detector systems.
The Run Control system controls only the parts of STAR that are actively setup from run-to-run. These systems are trigger, L3, and DAQ. The detector subsystems slow controls are all separate from the Run Control, and must be turned on before any data taking is started. When a detector or subdetector is referred to in the Run Control system, it is that detectors data acquisition computers that are actually being discussed. Although certain problems, (ie. not turning on the TPC FEE's), can be debugged using the Run Control system, for the most part, a problem with one of the physical detectors can not be discovered from the run control. One must resort to the alarms on the slow controls system for each detector and/or to the online monitoring software to determine whether they are working properly.
The Run Control system will usually be running on the computer RTS02. If it is not running, see Restarting the Run Control below.
The STAR Run Control main panel displays the state of the main subsystems at a glance. The boxes in the upper left hand corner refer to run-time and detector subsystems. The color indicates the state of the system:
There are three reasons that a system can be disconnected from the run control.
The PRESENT and READY states are the same from the purpose of the run control. The difference is that most systems are placed into the present state imediately after boot. The ready state is reserved for systems that have been configured at least one time. This state means that the system is in a quiescent state. Its configuration may be changed, or the run can be started.
The RUNNING state means that the system is taking data. When the system is running you can not change its configuration. The only meaningfull actions are to STOP or PAUSE the run.
The PAUSED state is only meaningfull for the trigger. This means that the trigger has temporarily stopped issuing new events. This can be either because the operator has explicitly paused the run, or else because the trigger has finished sending all of the events that were requested. When the system is paused, the operator may either issue more events, or stop the run.
Finally, the ERROR state means that something is wrong with one or more of the subsystems. This can arise from many different circumstances, some of which will be detailed below.
In addition to the above states, it is also possible to have a box that is flashing. This means that the system is changing from one state to another. Depending on the specific conditions of the run, starting and stopping runs can take anything between one second and two-three minutes. If the system seems to be hung for an unusual amount of time, you should check out the debugging section of this document.
All of the systems have multiple components, so the color shown in the system box is summary of the states of all of the its components. The states of the individual components are displayed on the Show Component Tree screen. You might see the some boxes flash odd colors (RED) for short amounts of time when starting or stopping runs. This is not a problem, it simply means that some components of the system transitioned to their final states at different times.
In addition to the system boxes, there is also some text information displayed on the main run control panel. This includes the following information:
Finally, at the bottom of the main screen is a box in which messages to the operator may scroll by.
The first step in debugging any problem that becomes visible on the main screen, is to determine the specific components that are having trouble. To do this press the "Show Component Tree..." button on the main screen. You should get a screen that looks like:
By default, only the components that are in the run will be displayed. If you select the "Show all Nodes" box, then all possible components will be displayed. In this screen, the state of each component is reflected in the icon displayed on the left of the component. The color code is the same as for the system boxes on the main screen, although in this case, the state represents the actual state of the component rather than a summary of the states of all the components that are part of the system.
The relationship to the states displayed on this screen and the states displayed for the same components on the DAQ Monitoring system is useful to understand. The states displayed on this screen are a reflection of the what the run control handler thinks the state of the system is. One the other hand, the DAQ Monitoring system displays the state as it comes directly from the component. While descrepencies between the two are possible for a correctly functioning system, they usually indicate a problem with the run control.
The run will automatically pause when all the triggers you requested have been issued. If you want to pause the run explicitly, follow these steps:
The run configuration for STAR is non-trivial. Each of the run time systems, Trigger, DAQ, and L3 is complicated. The configuration of the trigger, at its lowest level, requires specifying which code is running in the DSM boards, documentation of the TCU inputs that result from this code, the mapping of these inputs to the generation of triggers, the setup of interspersed triggers, thresholds, and L2 and L3 algorithms and parameters. The DAQ system parameters describe various kinds of processing (cluster finding, pedestal calculation, zero suppression, gain correction, L3 processing, ASIC parameters, DATA output formats) which depend on the type of run as well as the detector and even subdetector.
For these reasons, shift operators are not expected to modify most parameteres. Run Control enforces this by disallowing shift users to change them. However, all users are encouraged to view the settings. These parameters can be displayed by selecting the "Edit Configurations..." button on the main screen. They are also automatically saved to the run log database. <\p>
Some of these parameters are responsible for modifying the character of the the data run. For instance a centrality trigger has a completely different purpose from a peripheral trigger. In the same way, a pedestal run is processed in a completely different way than a physics run. However, there are many parameters set up by the run control that change the behaviour of the system without changing the character of the run. For example, a physics run with a central trigger has the same meaning even if one TPC receiver board is removed from the run. This also holds for entire detectors. A pedestal run for the TPC and for the SVT has the same character, even if the pedestal subtraction rule is slightly different for each detector. In addition, there are also parameters that should have no effect on the intended use of the data at all, but can have important ramifications for operations (ie. debug options, magic tokens, data destinations).
The organization of the run control parameters was designed to minimize the unnecessary "outer-product" proliferation of configuration files, while retaining full control of parameters and a full history (through the databases) of runs. To do this the configuration is broken up into three catagories:
The components in the run are controlled by manipulations done on the main screen and in the "Component Tree" screen.
Run parameters control how the systems work without changing the character of the run. These parameters are not loaded with configuration file changes. Rather, they keep there same values from one run to the next. The perfect example of this type of parameter is a field used to set logging levels for a subsystem.
These parameters define the behavior of the run. They grouped together, and encapsulated into configuration files.
This could mean that somebody took all of the systems out of the run. However, most likely, the run control handler is not running. To check this:
The various components of the system can each run into error conditions and request a stop run. When this happens, the run will stop automatically, but a message should be displayed on the screen describing the component that requested the stoprun along with the reason.
The run doesn't get marked as complete until the data has been stored. There can be up to 1.5GBytes of data in the Buffer Box's memory buffer that need to be transferred and this will take at least a minute to transfer.
Go to the DAQ Monitoring display and check the "RCF Taper" node. If the number of events in is decreasing, then the data is still being written.
Press stop run again. You will be informed that the run is already being stopped, and asked if you want to force the stop. If you force the system to stop, there will be some repercusions on the data. First, the unwritten data will be lost (This should only be the very end of the run). Second, some of the end-run database records might not be properly updated.