Management of Large-Scale Computing Commercial Systems

The aim of the DIRAC project – funded under the CPAN project of the National Centre for Particle, Astroparticle and Nuclear Physics– is to develop the software's capability to manage the computing resources needed by users in all areas of the scientific community. The software has been successfully tested in simulations carried out as part of the Belle experiment in Japan.

Ricardo Graciani, a researcher for the ICCUB and director of the DIRAC project, explains: "Computer simulation of the collisions and the response of the LHCb detector is crucial to our research, but it requires an enormous volume of computing resources". Constructing a computing grid with resources dedicated exclusively to data processing for a single experiment is expensive and comparatively inefficient, since, as Graciani explains, "there are peaks and troughs in the demand on the system".

Graciani gives the example of the Belle project, an international initiative at the KEK particle accelerator (Japan) which conducts similar tests to the LHCb project. Belle has a data collection period of six months per year, and a further three months are needed to carry out the corresponding computer simulations. "These requirements lead to the over-provision of computing resources during the remainder of the year", explains Graciani, which entails "additional costs" for installation and operation.

To optimize computing costs, the researchers worked with the CPAN project to adapt the DIRAC software for managing resources, a leading provided of online computing services. Together with scientists from the University of Melbourne, they carried out simulations for the Belle project using 250 EC2 virtual machines, which provide the equivalent power of 2,000 networked processors. "The first results show that these new resources provide over 95% efficiency", says Graciani.

The test was carried out over a ten-day period. During the 7,500 computing hours, an operating peak of 2,000 processors running simultaneously was reached for 18 hours. "DIRAC enables us to harness the flexibility of the Amazon system to optimize resource use according to specific requirements", explains Graciani. Calculations from the exercise have given an estimated cost of US$6,000 for 1,426 simulations (equivalent to 120 million collisions or 2,700 GB of experimental data).; Source: University of Barcelona