1)

Requirements, Justification and Implementation of the New TCU Design.

E. G. Judd

02/22/02

The aim of this document is to explain and justify the re-design of the STAR Trigger Control Unit (TCU). There are four sections. The first is a list of requirements for the TCU functionality with their justification. Next there is a description of the original TCU implementation. This is followed by an explanation of where the original design did and did not meet each requirement and a list of additional problems that arose during the debugging and usage of this module. Finally there is a description of the proposed new implementation that should met all the requirements, and solve the additional problems.

TCU Requirements

Requirement: Format - The TCU must be a 9U VME board.

Justification: All the DSM boards and TCD boards that provide input to the TCU are also 9U VME boards and this will enable it to fit easily into that system.

Requirement: Speed - The TCU must accept new input data every tick of the RHIC clock.

Justification: The DSM tree looks at data from the trigger detectors every tick of the RHIC clock, and presents it to the TCU, so the TCU must be able to receive it.

Requirement: DSM Data - The TCU must accept all the bits that the DSM tree can provide. Currently this is 16 bits.

Justification: The last DSM board can produce 32 output bits. These are split into two identical groups, one to go to the TCU and the other to go to the Scaler board. The maximum size of either group is therefore 16 bits.

Requirement: Detector Status - The TCU must accept a bit from each of the non-trigger detector systems indicating that system’s LIVE/BUSY status.

Justification: The TCU is only supposed to issue triggers to detectors when they are LIVE

Requirement: Internal BUSY - The TCU must generate a short internal BUSY state for user-specified detectors to cover the time it takes between when the TCU issues a trigger to those detectors and when they send their BUSY states back to the TCU.

Justification: Again, the TCU is only supposed to issue triggers to detectors when they are LIVE, and if a trigger is issued to a detector at the end of one RHIC clock tick the TCU MUST know if that detector is supposed to be BUSY at the beginning of the very next tick.

Requirement: Preceded - The TCU must also be able to disable a detector (i.e. make it artificially BUSY) for a user-specified amount of time if the trigger detectors indicate the presence of contaminating data in the system.

Justification: The detectors do not want to be triggered if the triggered event will be contaminated by data from a preceding event that occurred so close in time to the triggered event that the two cannot be distinguished.

Requirement: Followed - If the trigger detectors indicate that a triggered event has been followed by another one within a user-specified amount of time then that information must be saved and made available to the outside world.

Justification: The user needs to decide whether or not to abort events that are followed too closely by another one.

8. Requirement: Decision - The TCU must issue triggers for events based on the information from the DSMs and which detectors are LIVE.

Justification: This is the only data that is available to the TCU on which to base a decision.

Requirement: Downstream Resources - The TCU should only issue a trigger if there are resources available in the rest of the trigger system to deal with it.

Justification: When a trigger is sent to the detectors it is also passed to the next level of the trigger system. That level is required to read out the DSM data that led to the trigger before it can be over-written. If there is no memory available to store the data, or no CPU available to initiate the reading, then the next trigger level will not be able to meet its requirements, so the TCU should not issue any triggers.

Requirement: Prescale - The TCU must be capable of triggering on a user-specified fraction of events of a given type.

Justification: Some event types are very common and can just be sampled. Others are very rare and every one should be triggered.

Requirement: Detector Selection - The TCU must be able to issue triggers to any subset of the existing detectors.

Justification: The detectors are not all interested in every trigger, even if they are all LIVE.

Requirement: Notification - When the TCU issues a trigger to any detector it must also make enough data available to the rest of the trigger system and DAQ so they know where to get the DSM data that led to the event, which detectors were triggered and what was the event type.

Justification: The later levels of the trigger system and DAQ need this information in order to read the data from the DSMs and to know which non-trigger detectors should be read out.

Requirement: Halt - The TCU must monitor the location, in the DSMs, of data for triggered events. If this data has not been read out by the time it is due to be over-written, then the TCU must halt the saving of new data in the DSMs, and halt issuing triggers, until the situation has been fixed.

Justification: Data is saved in the DSM circular input buffers for 7 ms after it is written. Data corresponding to events that were triggered must be read out within this time to make it possible to reconstruct why the event was triggered. If, for any reason, the next level of the trigger system does not manage to read out the data in time then the data would get over-written, and the reconstruction would not be possible.

Requirement: Abort - A mechanism must be provided to allow the later levels of the trigger system to abort an event that has previously been triggered.

Justification: The TCU makes its decisions based on a very simple analysis of the trigger detector data. The later levels of the trigger system have time to perform a more detailed analysis of this data. This analysis can show the event to be uninteresting, so there is no point in continuing to process it.

Requirement: Null Event - If the current event is not triggered, and there are no aborts to be issued, the TCU must actively send zeros to all the non-trigger detectors.

Justification: The detectors need to be able to determine when a trigger is being issued to them (either a new event, or an abort) and when they should do nothing.

Requirement: Event Rate monitor - The TCU must count how many events of each type it sees when it is LIVE (i.e. able to issue triggers) irrespective of whether or not a trigger is actually issued.

Justification: If the TCU is LIVE the reasons for not issuing a trigger are that no interesting interaction occurred, the non-trigger detectors are BUSY or that this event is not part of the user-specified fraction for this type (Requirement 10). By counting how many events of every type occurred, and comparing that to the number of events that were triggered (counted elsewhere in the trigger system) it will be possible to calculate relative event rates (part of the cross-section calculation), monitor the dead-time of the non-trigger detectors and monitor the TCU itself to see that the user-specified fractions have been specified correctly.

17. Requirement: LIVE Monitor - The TCU must count the number of RHIC clock ticks when it issues triggers, when it is LIVE but does not issue a trigger and when it is not LIVE.

Justification: This will enable us to monitor the performance of the TCU itself.

Original TCU Implementation

The original TCU, see Figure 1, was a 9U VME board with a 6U back-of-the-crate interface card (TCUI). This interface card received the clock and control signals from the RCC system. It also latched in the DSM input bits and the BUSY/LIVE bits from the other detectors every tick of the clock. All these signals were passed though a VME J3 backplane to the TCU. The TCU sent triggers back to the TCUI, which then distributed them to the Trigger Clock Distribution modules (TCD) (Requirements 1 and 2).

The external BUSY/LIVE status bits were combined with internally generated BUSY bits in an FPGA. The internal BUSY bits were generated for 3 ticks of the RHIC clock for each detector whenever a trigger was issued to that detector (Requirement 5).

The DSM and BUSY/LIVE bits were used as input to the trigger word look-up table. The output of this table was the trigger word that classified the event as being of a particular type. The map from the input bits to the trigger word was specified at the beginning of a run when the TCU was configured.

Figure 1. Block Diagram of the Original TCU.

The trigger word was used as input to the prescale system (Requirement 8). At the start of a run the user assigned a prescale value to every trigger word, and a counter was loaded with this value. A value of zero would mean that the trigger word was completely uninteresting and would never be triggered. During a run the prescale system looked at every trigger word. If the current value of the prescale counter for that trigger word was greater than 1 then the counter was decremented by 1 and the new value was saved until the next time. If the current value had counted down to 1 then a trigger flag was issued, to indicate the event should be accepted, and the counter was reset to its original value. If the current value was zero then no action was taken (Requirement 10). This prescale system was only activated if there were tokens in the token FIFO. Each token represented an available resource (buffer) in the later levels of the trigger system (Requirement 9).

The trigger word was also used as input to the action word look-up table (Requirement 8). The output of this look-up table was a bitmask indicating which detectors were to be triggered for this trigger word, and what type of action those detectors were supposed to take; normal readout, calibration readout, etc… (Requirement 11).

If the trigger flag was set then the action word and the next token were passed to the TCDs via the TCUI. This information was also written into output FIFOs along with a pointer to the DSM buffers, the trigger word, the input from the DSMs and the modified BUSY/LIVE bits (Requirement 12).

If the trigger flag was not set the TCU would select any commands that had been loaded into the Response FIFO, and distribute them to the TCDs and output FIFOs in the same way. A response could be an abort, or accept, command produced by a later level of the trigger system and loaded into the response FIFO for the TCU to distribute at an appropriate moment (Requirement 14).

If the trigger flag was not set, and there were no responses to issue, then the TCU sent zeros to the TCDs, and put nothing in the output FIFOs (Requirement 15).

In parallel with all of this there were two counters. One counted every combination of the input bits. The other counted every trigger word. Both counters were in fact composed of two identical sub-counters. One sub-counter would count until any one channel came close to overflowing. It would then set a bit, for L1CTL, to indicate it was in the “overflow” state, and the second sub-counter would automatically start counting. While the second sub-counter was counting the user could read and clear the first sub-counter, which resulted in the “overflow” status being reset. When the second sub-counter overflowed counting would resume on the first sub-counter (assuming it was not still in the “overflow” state), and the second one would be available to be read and cleared. The trigger word counter was implemented “gated LIVE”, i.e. it only counted when the TCU was running and there were tokens in the token FIFO (Requirement 16). At the time of writing this document, the input counter was implemented in hardware, but not in software.

Problems with the Original TCU Implementation

All the input bits, both from the DSM tree and the detectors, went into the Trigger Word look-up table. This was implemented in fast static RAM, and the largest table that could be constructed (in 1996, when this board was designed) had 18 input (address) bits. These bits had to be split between the DSM input bits and the detector bits. This meant it was not possible to accept all 16 bits from the last DSM as well as the desired number of detector status bits (originally 8) (Requirements 3 and 4).
The generation of the 3 RHIC clock tick long internal BUSY state happened automatically for every detector, irrespective of whether or not they needed this feature. It was not a user-selectable option (Requirement 5).
The information saying that contaminating data was in the system was not passed to the necessary part of the TCU, so it was not possible to disable a detector under this situation (Requirements 6 and 7).
The logic needed to implement the “monitor and halt” functionality would not fit in the FPGA and the memory chosen (along with everything else), so it was never implemented (Requirement 13).
Neither the Input Counters or the Trigger Word Counters included the TCU LIVE information in the set of bits that were counted, so it was not possible to monitor the TCU performance (Requirement 17).
There was no direct path to reset the “overflow” state of the trigger word counter sub-counters so this had to be implemented in some unnecessarily complex logic that involved counting the numbers of “reads” and “clears”.
Data was actually latched on to and off the TCU on the TCUI. This meant there was less than one tick of the clock available for the TCU to do its work because time had to be allowed for the TCUI.
The input memory on the TCU could only hold 256 sets of input data and this was not really enough to robustly test the board. It was also not possible to reverse the drivers and record the input data in the input memory. This feature was found to be extremely useful on the DSM boards.
The “token FIFO empty” signal should have gone directly to the prescale FPGA, which uses it to enable and disable issuing triggers, and it did not. It was routed through the Operation Control FPGA, which slowed it down and made the timing difficult.
The prescale system had an unnecessary latency on startup (i.e. when tokens were added to the token FIFO) that made the TCU behavior difficult to understand when dealing with single tokens.
Space and logic were used to partially implement a priority queue. This involved a “priority” token FIFO and “priority” information FIFOs. Events labeled as “priority” events would always be processed first by later levels of the trigger system. Unfortunately the systems were not implemented totally separately. They shared some signals in the prescale system and Operation Control FPGA, and this actually prevented both queues being fully implemented.
The “token FIFO empty” signal was used, redundantly, in deciding which output to select: action word and token, aborts or nothing. That signal was already used by the prescale system in deciding whether or not to issue a trigger flag, and the trigger flag was itself also used in deciding which output to select. This made the decision logic unnecessarily complicated.
There was no output memory on the TCU, just output FIFOs that were only filled if an event was actually triggered. This made debugging rather hard.
The signals from the RHIC Clock and Control system were received by the TCU in a very different way from the DSMs. This made keeping the TCU in lockstep with the DSMs unnecessarily complicated.
A lot of time was spent implementing a VME interface by hand, which was complex and hard to upgrade.
The TCU was not compatible with VME64 transactions that were used by the DSMs. In fact, it disrupted any such transactions occurring in the same crate. This had to be fixed by disconnecting some chips and adding wires to the board.
There was no way, without stopping the board and changing the configuration, to tell what the current LIVE/BUSY status of each detector was.
There was a twist in the communications bus between the TCU and the Trigger Clock Distribution boards (TCDs) that made debugging unnecessarily complex.
There were no monitoring points on the TCU that were accessible when the board was installed in a VME crate with DSM boards. This made it hard to debug problems in the DSM-TCU system as a whole.

In summary, we feel the original design concept for the TCU was a good one. It allowed us to issue prescaled triggers based on data from the fast trigger detectors and the BUSY/LIVE state of the rest of the system. The module was fast enough to make the trigger decision within one RHIC clock tick and flexible enough allow the user fine control over exactly what conditions were triggered on and what triggers were issued. However, there were some problems in the details of the implementation that meant some of the functional requirements were not met; most importantly it was not possible to accommodate all the input bits from the DSM board tree and the detectors. Also, the board was un-necessarily hard to debug and monitor.

New TCU Implementation

The new TCU will be similar in overall data flow to the original TCU. It is mainly the input section that will be changed (see Figure 2). The bottleneck in the original design was the 18-bit maximum input to the trigger word look-up table. Since this look-up table was right at the input to the board this effectively limited the total number of input bits to the original TCU. In the five years since the original TCU was designed this bottleneck has eased only slightly, with the advent of 20-bit memory chips. 20-bits is still not enough to allow for the maximum 16 bits from the DSM board tree and the 6 detectors that have been available so far. In the 2002 run we expect to have at least 11 detectors. In order to accommodate all these input bits, and leave room for future expansion, it is proposed that, in the new TCU design, we implement a configurable 28-to-20-bit MUX. Initially the MUX will be configured to select 14 detector bits and 6 bits from the DSM path. In the future, if necessary, the MUX could be reconfigured to select a different combination of detector and DSM bits. A new look-up table will be added to compress the 16 DSM input bits down to the number of bits that the MUX is configured to select.

The processing of the input bits will be split into two pieces. 16 bits from the DSM board tree (Requirement 3) will be input to a Physics Word look-up table that will, by default, produce a 6-bit physics word to classify the event type. A 6-bit physics word has 63 non-zero combinations which is enough to allow each of the detectors a few special triggers per run and still leave room for the physics events. The LUT will also produce a one-bit contamination flag for a set of user-selected combinations of the 16 input bits. This flag will indicate that the trigger detectors saw enough particles to contaminate any close event. Separately 16 detector LIVE/BUSY bits will go to an FPGA (Requirement 4). This FPGA will also receive the list of detectors that are included whenever a trigger is issued. For user-specified, detectors the FPGA will use the “detector-triggered” bit to generate an internal BUSY state for that detector (Requirement 5). This state will be long enough to cover the time it takes for the trigger to reach that detector, and for it to set its LIVE/BUSY bit to the BUSY state. For each detector the external LIVE/BUSY state and the internal BUSY state will be combined.

Figure 2. Block Diagram of the New TCU

The “preceded” logic will also be implemented in the FPGA (Requirement 6). Inside the FPGA the contamination bit will be used to generate a “Contamination BUSY state” for each of the detectors starting on the next RHIC clock tick. The length of this BUSY state will configurable for each detector. For certain, user-specified, detectors that contamination BUSY state will then be combined with the external LIVE/BUSY status provided by the detectors themselves and the internal BUSY state (generated by the FPGA whenever a trigger is issued to a detector) to produce the final LIVE/BUSY state. Since the TCU only issues triggers to detectors that are LIVE the contamination BUSY state will prevent triggers being issued to detectors when they are contaminated with data from a previous interaction.

The contamination bit from the physics word look-up table can also be used to flag triggered events that are contaminated by a following one (Requirement 7). When a trigger is issued to a detector the FPGA can generate an “event protection” signal for each detector, lasting for as long as the detector needs to be protected. If the contamination bit gets set while the event protection is in place a bit can be set in a register that L1CTL can then read. In this way L1CTL will know that the event may be contaminated by a following event and can decide whether or not to abort it. It should be noted that, in this planned implementation, the followed logic has a hole in it for fast detectors. Only one register is planned, with one bit for each detector, so when the user reads the register there is nothing to specify exactly which event the information applies to. For slow detectors (TPC, SVT, etc…) there is only one event in the trigger system at any one time so the register information obviously applies to that event. However, for fast detectors (e.g. EMC), where there can be multiple events in the trigger system simultaneously, it is not possible to determine which event the register information applies to. In order to solve this ambiguity there would either have to be one register for every token (4095 registers!) or the fast detectors would have to be slowed down. Since neither of these options is really desirable the current plan is to use the single register described above, with the understanding that the information is only supposed to be used by the slow detectors. Since the fast detectors are, by definition, BUSY for a very short period of time the probability of a triggered event being followed is correspondingly very short, so hopefully the lack of “followed” event logic should not be a problem.

The final LIVE/BUSY states of the detectors, and the output of the physics word look-up table will then be combined in the MUX and used as input to the trigger word LUT. The default configuration of the MUX will be to select 6 bits from the physics word LUT and 14 LIVE/BUSY states. The output of the trigger word LUT will be the same 16-bit trigger word as is currently implemented on the existing TCU. This 16-bit trigger word will then be used as input to the prescale system and the action word look-up table (Requirement 8), again exactly as is implemented on the current TCU. The output of the action word look-up table will now be 24 bits containing a 16-bit bitmask, indicating which detectors are to be triggered for this trigger word, and 4-bit trigger and DAQ commands indicating what type of action those detectors were supposed to take; normal readout, calibration readout, etc… (Requirement 11).

As before, the trigger flag will be used to decide if the data that is passed to the TCDs, and stored in the output FIFOs, should be the action word and token, any response from the Response FIFO or nothing (Requirements 12, 14 and 15). In order to avoid modifying the TCD hardware, but still make room for the increased number of bits in the detector bitmask (16 rather than 8), and the increased number of detector LIVE/BUSY bits (also 16 rather than 8) these bits will be placed on a cable that used to be used for passing the trigger word to the TCD modules. They never used the trigger word, so the bits are available to be re-defined. In order to fully understand why an event was triggered, more output FIFOs will be added. They will now save all the LIVE/BUSY status components for all the detectors as well as the DSM input bits, the physics word and the trigger word.

The monitor and halt logic will be implemented using dual ported static RAM instead of just static RAM (Requirement 13). This will allow one port to be used by the TCU, every tick of the RHIC clock, for tagging new events as they are triggered, and monitoring old events. The other port can be used, totally asynchronously, by L1CTL to un-tag events that have been fully processed. This will remove the requirement of the old design, of allowing 3 accesses (1 read and 2 writes) to the static RAM every tick of the clock, without significantly increasing the control logic complexity. If the TCU ever detects that data from a triggered event is about to be over-written it will issue a “halt” command. The command will go to the RCC module, which will distribute it back to all the DSMs and the TCU at the same time. L1CTL will also be notified that the “halt” command has been issued. When they receive this command the DSMs will stop moving through their circular input buffers and just halt where they are. The TCU will suspend the prescale system, so it will stop issuing triggers. L1CTL will then have time to read-out the DSM data for all currently triggered events. When it has completely finished L1CTL will notify the TCU, which will then remove the “halt” command and data taking will continue.

A new counter will be added to the new TCU (Requirement 17) to monitor its performance. The input to this counter will be the TCU LIVE bits (the token FIFO empty bit and the halt bit), the physics word and contamination bit from the physics word look-up table and the trigger flag.

The following changes will also be made to solve the remaining implementation problems on the original TCU:

A direct path to the counter system will be provided to enable the user to reset the “overflow” state simply (Problem 6).
The input data latch will be removed from the TCUI and implemented on the TCU in the same way as on the DSM boards (Problem 7).
The size of the input memory will be increased from 256 to 64K entries and will record as well as play, again, just like on the DSMs (Problem 8).
The “token FIFO empty” signal will be routed directly to the Prescale Control FPGA as well as the Operations Control FPGA (Problem 9).
The startup latency will be removed from the Pre-scale Control logic so it is easier to understand and debug (Problem 10).
Since no request was ever made to use it the priority queue will be dropped, leaving just one token FIFO and one set of output FIFOs (Problem 11).
The “token FIFO empty” signal will be removed from the decision of whether or not to drive the action word and token to the TCDs because it is already in the trigger flag, which is itself used in making this decision (Problem 12).
Output memory will be added that is filled every RHIC crossing, irrespective of whether or not a trigger flag is issued (Problem 13).
The reception and processing of the control signals from the RCC will be implemented in the same way as is done on the DSM boards (Problem 14).
The VME interface will be implemented using an off-the-shelf VME interface chipset that is fully compatible with VME64 transactions (Problems 15 and 16).
Registers will be added to read the current busy status: external, local, contamination and the final modified status (Problem 17).
The twist in the trigger token bus between the TCU and TCD modules will be undone (Problem 18).
Four lemo connectors will be added on the front panel to allow access to the RHIC clock, halt, token FIFO empty and trigger flag signals while the TCU is in place in a VME crate at BNL (Problem 19).