Requirements, Justification and Implementation of the New TCU Design.
E. G. Judd
02/22/02
The aim of this document is to explain and justify the re-design of the STAR Trigger Control Unit (TCU). There are four sections. The first is a list of requirements for the TCU functionality with their justification. Next there is a description of the original TCU implementation. This is followed by an explanation of where the original design did and did not meet each requirement and a list of additional problems that arose during the debugging and usage of this module. Finally there is a description of the proposed new implementation that should met all the requirements, and solve the additional problems.
TCU Requirements
Justification: All the DSM boards and TCD boards that provide input to the TCU are also 9U VME boards and this will enable it to fit easily into that system.
Justification: The DSM tree looks at data from the trigger detectors every tick of the RHIC clock, and presents it to the TCU, so the TCU must be able to receive it.
Justification: The last DSM board can produce 32 output bits. These are split into two identical groups, one to go to the TCU and the other to go to the Scaler board. The maximum size of either group is therefore 16 bits.
Justification: The TCU is only supposed to issue triggers to detectors when they are LIVE
Justification: Again, the TCU is only supposed to issue triggers to detectors when they are LIVE, and if a trigger is issued to a detector at the end of one RHIC clock tick the TCU MUST know if that detector is supposed to be BUSY at the beginning of the very next tick.
Justification: The detectors do not want to be triggered if the triggered event will be contaminated by data from a preceding event that occurred so close in time to the triggered event that the two cannot be distinguished.
Justification: The user needs to decide whether or not to abort events that are followed too closely by another one.
8. Requirement: Decision - The TCU must issue triggers for events based on the information from the DSMs and which detectors are LIVE.
Justification: This is the only data that is available to the TCU on which to base a decision.
Justification: When a trigger is sent to the detectors it is also passed to the next level of the trigger system. That level is required to read out the DSM data that led to the trigger before it can be over-written. If there is no memory available to store the data, or no CPU available to initiate the reading, then the next trigger level will not be able to meet its requirements, so the TCU should not issue any triggers.
Justification: Some event types are very common and can just be sampled. Others are very rare and every one should be triggered.
Justification: The detectors are not all interested in every trigger, even if they are all LIVE.
Justification: The later levels of the trigger system and DAQ need this information in order to read the data from the DSMs and to know which non-trigger detectors should be read out.
Justification: Data is saved in the DSM circular input buffers for 7 ms after it is written. Data corresponding to events that were triggered must be read out within this time to make it possible to reconstruct why the event was triggered. If, for any reason, the next level of the trigger system does not manage to read out the data in time then the data would get over-written, and the reconstruction would not be possible.
Justification: The TCU makes its decisions based on a very simple analysis of the trigger detector data. The later levels of the trigger system have time to perform a more detailed analysis of this data. This analysis can show the event to be uninteresting, so there is no point in continuing to process it.
Justification: The detectors need to be able to determine when a trigger is being issued to them (either a new event, or an abort) and when they should do nothing.
Justification: If the TCU is LIVE the reasons for not issuing a trigger are that no interesting interaction occurred, the non-trigger detectors are BUSY or that this event is not part of the user-specified fraction for this type (Requirement 10). By counting how many events of every type occurred, and comparing that to the number of events that were triggered (counted elsewhere in the trigger system) it will be possible to calculate relative event rates (part of the cross-section calculation), monitor the dead-time of the non-trigger detectors and monitor the TCU itself to see that the user-specified fractions have been specified correctly.
17. Requirement: LIVE Monitor - The TCU must count the number of RHIC clock ticks when it issues triggers, when it is LIVE but does not issue a trigger and when it is not LIVE.
Justification: This will enable us to monitor the performance of the TCU itself.
Original TCU Implementation
The original TCU, see Figure 1, was a 9U VME board with a 6U back-of-the-crate interface card (TCUI). This interface card received the clock and control signals from the RCC system. It also latched in the DSM input bits and the BUSY/LIVE bits from the other detectors every tick of the clock. All these signals were passed though a VME J3 backplane to the TCU. The TCU sent triggers back to the TCUI, which then distributed them to the Trigger Clock Distribution modules (TCD) (Requirements 1 and 2).
The external BUSY/LIVE status bits were combined with internally generated BUSY bits in an FPGA. The internal BUSY bits were generated for 3 ticks of the RHIC clock for each detector whenever a trigger was issued to that detector (Requirement 5).
The DSM and BUSY/LIVE bits were used as input to the trigger word look-up table. The output of this table was the trigger word that classified the event as being of a particular type. The map from the input bits to the trigger word was specified at the beginning of a run when the TCU was configured.
Figure 1. Block Diagram of the Original TCU.
The trigger word was used as input to the prescale system (Requirement 8). At the start of a run the user assigned a prescale value to every trigger word, and a counter was loaded with this value. A value of zero would mean that the trigger word was completely uninteresting and would never be triggered. During a run the prescale system looked at every trigger word. If the current value of the prescale counter for that trigger word was greater than 1 then the counter was decremented by 1 and the new value was saved until the next time. If the current value had counted down to 1 then a trigger flag was issued, to indicate the event should be accepted, and the counter was reset to its original value. If the current value was zero then no action was taken (Requirement 10). This prescale system was only activated if there were tokens in the token FIFO. Each token represented an available resource (buffer) in the later levels of the trigger system (Requirement 9).
The trigger word was also used as input to the action word look-up table (Requirement 8). The output of this look-up table was a bitmask indicating which detectors were to be triggered for this trigger word, and what type of action those detectors were supposed to take; normal readout, calibration readout, etc… (Requirement 11).
If the trigger flag was set then the action word and the next token were passed to the TCDs via the TCUI. This information was also written into output FIFOs along with a pointer to the DSM buffers, the trigger word, the input from the DSMs and the modified BUSY/LIVE bits (Requirement 12).
If the trigger flag was not set the TCU would select any commands that had been loaded into the Response FIFO, and distribute them to the TCDs and output FIFOs in the same way. A response could be an abort, or accept, command produced by a later level of the trigger system and loaded into the response FIFO for the TCU to distribute at an appropriate moment (Requirement 14).
If the trigger flag was not set, and there were no responses to issue, then the TCU sent zeros to the TCDs, and put nothing in the output FIFOs (Requirement 15).
In parallel with all of this there were two counters. One counted every combination of the input bits. The other counted every trigger word. Both counters were in fact composed of two identical sub-counters. One sub-counter would count until any one channel came close to overflowing. It would then set a bit, for L1CTL, to indicate it was in the “overflow” state, and the second sub-counter would automatically start counting. While the second sub-counter was counting the user could read and clear the first sub-counter, which resulted in the “overflow” status being reset. When the second sub-counter overflowed counting would resume on the first sub-counter (assuming it was not still in the “overflow” state), and the second one would be available to be read and cleared. The trigger word counter was implemented “gated LIVE”, i.e. it only counted when the TCU was running and there were tokens in the token FIFO (Requirement 16). At the time of writing this document, the input counter was implemented in hardware, but not in software.
Problems with the Original TCU Implementation
In summary, we feel the original design concept for the TCU was a good one. It allowed us to issue prescaled triggers based on data from the fast trigger detectors and the BUSY/LIVE state of the rest of the system. The module was fast enough to make the trigger decision within one RHIC clock tick and flexible enough allow the user fine control over exactly what conditions were triggered on and what triggers were issued. However, there were some problems in the details of the implementation that meant some of the functional requirements were not met; most importantly it was not possible to accommodate all the input bits from the DSM board tree and the detectors. Also, the board was un-necessarily hard to debug and monitor.
New TCU Implementation
The new TCU will be similar in overall data flow to the original TCU. It is mainly the input section that will be changed (see Figure 2). The bottleneck in the original design was the 18-bit maximum input to the trigger word look-up table. Since this look-up table was right at the input to the board this effectively limited the total number of input bits to the original TCU. In the five years since the original TCU was designed this bottleneck has eased only slightly, with the advent of 20-bit memory chips. 20-bits is still not enough to allow for the maximum 16 bits from the DSM board tree and the 6 detectors that have been available so far. In the 2002 run we expect to have at least 11 detectors. In order to accommodate all these input bits, and leave room for future expansion, it is proposed that, in the new TCU design, we implement a configurable 28-to-20-bit MUX. Initially the MUX will be configured to select 14 detector bits and 6 bits from the DSM path. In the future, if necessary, the MUX could be reconfigured to select a different combination of detector and DSM bits. A new look-up table will be added to compress the 16 DSM input bits down to the number of bits that the MUX is configured to select.
The processing of the input bits will be split into two pieces. 16 bits from the DSM board tree (Requirement 3) will be input to a Physics Word look-up table that will, by default, produce a 6-bit physics word to classify the event type. A 6-bit physics word has 63 non-zero combinations which is enough to allow each of the detectors a few special triggers per run and still leave room for the physics events. The LUT will also produce a one-bit contamination flag for a set of user-selected combinations of the 16 input bits. This flag will indicate that the trigger detectors saw enough particles to contaminate any close event. Separately 16 detector LIVE/BUSY bits will go to an FPGA (Requirement 4). This FPGA will also receive the list of detectors that are included whenever a trigger is issued. For user-specified, detectors the FPGA will use the “detector-triggered” bit to generate an internal BUSY state for that detector (Requirement 5). This state will be long enough to cover the time it takes for the trigger to reach that detector, and for it to set its LIVE/BUSY bit to the BUSY state. For each detector the external LIVE/BUSY state and the internal BUSY state will be combined.
Figure 2. Block Diagram of the New TCU
The “preceded” logic will also be implemented in the FPGA (Requirement 6). Inside the FPGA the contamination bit will be used to generate a “Contamination BUSY state” for each of the detectors starting on the next RHIC clock tick. The length of this BUSY state will configurable for each detector. For certain, user-specified, detectors that contamination BUSY state will then be combined with the external LIVE/BUSY status provided by the detectors themselves and the internal BUSY state (generated by the FPGA whenever a trigger is issued to a detector) to produce the final LIVE/BUSY state. Since the TCU only issues triggers to detectors that are LIVE the contamination BUSY state will prevent triggers being issued to detectors when they are contaminated with data from a previous interaction.
The contamination bit from the physics word look-up table can also be used to flag triggered events that are contaminated by a following one (Requirement 7). When a trigger is issued to a detector the FPGA can generate an “event protection” signal for each detector, lasting for as long as the detector needs to be protected. If the contamination bit gets set while the event protection is in place a bit can be set in a register that L1CTL can then read. In this way L1CTL will know that the event may be contaminated by a following event and can decide whether or not to abort it. It should be noted that, in this planned implementation, the followed logic has a hole in it for fast detectors. Only one register is planned, with one bit for each detector, so when the user reads the register there is nothing to specify exactly which event the information applies to. For slow detectors (TPC, SVT, etc…) there is only one event in the trigger system at any one time so the register information obviously applies to that event. However, for fast detectors (e.g. EMC), where there can be multiple events in the trigger system simultaneously, it is not possible to determine which event the register information applies to. In order to solve this ambiguity there would either have to be one register for every token (4095 registers!) or the fast detectors would have to be slowed down. Since neither of these options is really desirable the current plan is to use the single register described above, with the understanding that the information is only supposed to be used by the slow detectors. Since the fast detectors are, by definition, BUSY for a very short period of time the probability of a triggered event being followed is correspondingly very short, so hopefully the lack of “followed” event logic should not be a problem.
The final LIVE/BUSY states of the detectors, and the output of the physics word look-up table will then be combined in the MUX and used as input to the trigger word LUT. The default configuration of the MUX will be to select 6 bits from the physics word LUT and 14 LIVE/BUSY states. The output of the trigger word LUT will be the same 16-bit trigger word as is currently implemented on the existing TCU. This 16-bit trigger word will then be used as input to the prescale system and the action word look-up table (Requirement 8), again exactly as is implemented on the current TCU. The output of the action word look-up table will now be 24 bits containing a 16-bit bitmask, indicating which detectors are to be triggered for this trigger word, and 4-bit trigger and DAQ commands indicating what type of action those detectors were supposed to take; normal readout, calibration readout, etc… (Requirement 11).
As before, the trigger flag will be used to decide if the data that is passed to the TCDs, and stored in the output FIFOs, should be the action word and token, any response from the Response FIFO or nothing (Requirements 12, 14 and 15). In order to avoid modifying the TCD hardware, but still make room for the increased number of bits in the detector bitmask (16 rather than 8), and the increased number of detector LIVE/BUSY bits (also 16 rather than 8) these bits will be placed on a cable that used to be used for passing the trigger word to the TCD modules. They never used the trigger word, so the bits are available to be re-defined. In order to fully understand why an event was triggered, more output FIFOs will be added. They will now save all the LIVE/BUSY status components for all the detectors as well as the DSM input bits, the physics word and the trigger word.
The monitor and halt logic will be implemented using dual ported static RAM instead of just static RAM (Requirement 13). This will allow one port to be used by the TCU, every tick of the RHIC clock, for tagging new events as they are triggered, and monitoring old events. The other port can be used, totally asynchronously, by L1CTL to un-tag events that have been fully processed. This will remove the requirement of the old design, of allowing 3 accesses (1 read and 2 writes) to the static RAM every tick of the clock, without significantly increasing the control logic complexity. If the TCU ever detects that data from a triggered event is about to be over-written it will issue a “halt” command. The command will go to the RCC module, which will distribute it back to all the DSMs and the TCU at the same time. L1CTL will also be notified that the “halt” command has been issued. When they receive this command the DSMs will stop moving through their circular input buffers and just halt where they are. The TCU will suspend the prescale system, so it will stop issuing triggers. L1CTL will then have time to read-out the DSM data for all currently triggered events. When it has completely finished L1CTL will notify the TCU, which will then remove the “halt” command and data taking will continue.
A new counter will be added to the new TCU (Requirement 17) to monitor its performance. The input to this counter will be the TCU LIVE bits (the token FIFO empty bit and the halt bit), the physics word and contamination bit from the physics word look-up table and the trigger flag.
The following changes will also be made to solve the remaining implementation problems on the original TCU: