240717 HjC - We recommend the following actions concerning runs that stop under guidance from L2 or if there is a burst of errors from a crate when trying to start a run. - the daq monitor will presumably have identified the offending crate - most likely the BBC crate although we have also seen errors from BC1, MXQ, BBQ, EQ1, EQ2, EQ3 - 0. multiple CRC errors mean you should power cycle L0 crate 1. Power cycle the offending crate. 2. Check the STP status for that crate in https://online.star.bnl.gov/L2TimingPlotsCeph/scaler2mon/scaler2mon.html STP2mMonitor The status of all STP1 cards listed at the bottom of that url should all be 0x0100. If crate does not have this status, then power cycle that crate and try again. When the status is correct, go and start another run. if daq messages have not identified the offending crate, to find the offending crate you can grep "Timed Out" /net/daqman/log/trigger.log if it says that multiple crates timed out then the problem is probably the L0 or L1 processor so the offending crate is most likely the L0/L1 crate so you should power cycle that. if this does not work, the problem requires Jeff because he is in charge of the L0 and L1 processor/code in that crate NOTE: if you power cycle or reboot a crate you need to include that crate in the next run to get it to configure correctly. This will serve to configure the crate and it will participate in the trigger correctly. Please send a note to the trigger list informing us of the trouble. The reason L2 stops the run is because a hardware error has occurred. The error is usually not in L2 - the error is typically an mpic error caused by a hardware problem in a particular DSM board. - L2 stops communication with that board and hence it stops data collection from that board and events just time out waiting for that board to respond. If power cycling the offending crate, or the L0L1 crate, does not solve the problem, there are 2 ways to see if you need to power cycle the L2ana01 CPU: if L2 is red in run control OR if you can't ping l2ana01.trg.bnl.local from the startrg machine in the control room then you need to power cycle the L2ana01 CPU. If you can ping l2ana01 and get a response then you should not power cycle it: you need to look elsewhere for the solution. Best to contact a trigger expert. After checking that L2 is responsive, you can try restart_l2 in the startrg window. Then reboot all. Make sure all trigger crates are green in the STP2 monitor and power cycle any crate that is red until it becomes green. The STP1 PMC status must be 0x0100 - if it isn't, then power cycle the crate until it is. For experts, ssh trg@trgconfig ./kill_all_trg_group_run_control ./trg_system_run_control_start