Some Level 1 Timing Numbers for 2001 Run

Z. Milosevich (zoran.milosevich@cmu.edu)

The following figures illustrate timing characteristics in the TRG L1CTL processor, as measured in the RHIC physics run of the summer of 2001. There were several processes running on this CPU, and the main ones to note are the L1_Hardware_Interface (HI), and the L1_Analysis (L1ANA). The HI task read the information FIFO's from the TCU and sent the token to the L1ANA task. The L1ANA task then read the DSM boards in the L1 crate and sent tokens to CTB, MWC, BEMC crate as well as sending its data to the L2 CPU. Timing points were taken when the TCU was read, when the token was sent from HI to L1ANA, the point at which the DSM boards were latched and read, the point after the DSM boards were read, and the point at which the L1 data had been sent to L2 and L2 was notified that it was there.

The study was incomplete due to the fact that this was done during down time during the run, and the system was not available to complete the study. Several peculiarities were noted, that were not fully investigated. However, four plots will be shown here which show that the Level 1 time budget for doing L1 analysis is exceeded with the current L1 design.

The first figure was taken for run 2246023 and the system was set up so that the CTB, MWC, and BEMC were in the run, but DAQ and any other STAR detectors were not. The trigger was a free running trigger, so that a trigger was issued when a token was available in the TCU token FIFO. The pre/post was set to zero for this run. The top left hand of the figure is a plot of time difference between when the token was read from the TCU by the HI to the time the HI released the token. The top left is time between when the DSM's were latched to after they were read. The bottom left plot is the total time from when the token was read by the HI to when it was released by the L1ANA. The second figure is for the same run, but shows a scatter plot over event number. It should be noted that there were a total of 10,000 events taken for each run, and the events were stored in L2 memory during the run, so as to not have L2 write to disk interfere with the timing tests.

The third and fourth figures contain similar plots but for run 2246025, which was with a similar setup but with pre/post set to five. One peculiarity that shows up here is the fact that for the pre/post=5 run, the times were significantly less than for the pre/post=0 run. There was not a chance to investigate this in detail. However, even in the best case scenario for the pre/post=5 run, somewhere close to 10% of the events were over the time budget of 80-100 usec for doing level 1 analysis. And it should be noted that this is without a level 1 analysis task running on the CPU. As seen in the scatter plots, for the most part the timing was very good. But at regular intervals, the time increased, most probably due to other extraneous tasks running on the CPU. What is alarming is that for pre/post=0, the total time was well over 100 usec even without these extraneous tasks waking up.

It should be clear from this that the L1 design needs to be looked at closer if one wants to implement a L1 analysis that would issue aborts/accepts.