Researchers at the Large Hadron Collider (LHC) deal with enormous datasets on a daily basis. The experiments which observe the LHC’s proton collisions generate hundreds of Exabytes of data per year, comparable to the entire current worldwide mobile phone traffic [1] and potentially consumings tens of millions of hard disks. It would not be physically possible to store all this data, and even if could be stored, it could never be distributed to the thousands of researchers around the world who are waiting to analyze it. For this reason, most of the analysis carried out on this data is performed in real-time. Custom processing hardware making microsecond decisions about which data to keep and discard is backed up by thousands of computers which analyze the remaining data and look for specific interesting physics processes within it. At the end of this process the data stream is reduced by a factor of 10000, small enough to finally send to the physicists who will sift through it in search of new particles and forces governing our world. This process of real-time analysis is called “triggering” by physicists because it primarily consists of looking for specific interesting features in the data, for example a particle with exceptionally high energy, and selecting the subset of proton collisions with such features for future inspection. Triggering has been at the heart of the more than 1000 papers published by the LHC collaborations. It is nevertheless a primitive approach to real-time analysis, which assumes that only a small fraction of proton collisions produce interesting physics and that the trigger’s job is to find these needles in the haystack of data. As the LHC experiments increase their data rates 100 times over the next decade, in order to probe nature with ever greater precision, this assumption will break down in a fundamental way, and instead of looking for needles in haystacks real-time analysis will have to categorize haystacks of needles. To do this, real-time analysis will have to become more sophisticated, no longer relying on simple easily visible features but rather studying the entire event in detail, using the best quality detector calibration in order to maximize its power.
Motivated by these problems, researchers who have been at the forefront of today’s real-time analysis at the four big LHC experiments — ATLAS, CMS, LHCb, and ALICE — met together with industrial partners for a three-day workshop in Lund to launch the SMARTHEP network. Our objective? To build common infrastructures which will allow researchers to confidently perform their analyses in real-time, from continuously aligning and calibrating their detectors, selecting interesting signals and rejecting backgrounds, and persisting the resulting analysis output for future use. We are linking together because we are convinced that our problems share a common core, and because we believe in benefitting from the best of each other’s ideas. As we develop our network we will keep you posted on its progress, so look our for further stories soon!