The Tokyo Institute of Technology built a high-performance computer (HPC) system based on the latest model of Nvidia Corp's graphics processing unit (GPU) and started its operation.
The news was announced at Super Computing 2008, an HPC-related international conference and exhibition.
The institute built the "Tsubame Grid Cluster (Tsubame)" HPC system in 2006 and achieved a computing speed of 38.18TFLOPS in a floating-point calculation using "double precision" arithmetic, which can handle 64-bit or greater bit numbers. And the system took seventh place in the global "Top 500" ranking of HPC systems in June 2006.
Although the institute continued improving the performance, it failed to keep up with the rapid increase in computing speed of HPC systems around the world. And Tsubame dropped to 24th in the ranking in June 2008, with a computing speed of 67.7TFLOPS.
The latest system was built by externally adding 170 units of the "Tesla S1070," a rack-type arithmetic unit consisting of four GPU products just announced by Nvidia Corp on Nov 18, 2008, to the existing Tsubame system.
"It took about a week to add the (Tesla S1070) units in mid-October," said Satoshi Matsuoka, a professor at the Global Scientific Information and Computing Center of the institute. "I knew we could do it."
The Tesla S1070 has the maximum computing power of 4.1TFLOPS based on 32-bit single precision arithmetic. The total performance of 170 units (so-called peak performance) is 697TFLOPS (= 4.1TFLOPS x 170). Combined with the computing power of the existing Tsubame, the peak performance of the entire system reaches 910TFLOPS. In terms of figures, the new system has the world's top-class performance, which is slightly lower than 1PFLOPS.
"The system would probably be ranked in the top 10 in the world if it was evaluated based on single precision arithmetic," a researcher at the institute said.
However, double precision arithmetic is a necessary item for the Top 500 ranking. Meanwhile, Nvidia's GPU is the first model to fully support double precision arithmetic, and its computing speed for double precision arithmetic is significantly lower than that for single precision arithmetic. The computing power of the additional system based on the double precision arithmetic is 59TFLOPS even at its peak performance.
As a result, the peak performance of the entire system based on double precision arithmetic is only 170TFLOPS. The effective performance measured by "Linpack," a program used to solve simultaneous linear equations in the evaluation for the Top 500 ranking, was 77.48TFLOPS. The new system was unable to move up in the latest ranking and ended up in 29th place.
Regarding the poor results, Matsuoka pointed out the two reasons, (1) a lack of adjustment due to the suddenness of the work and (2) a mismatch between the system and Linpack.
"We could achieve about 90TFLOPS even with Linpack if we adjusted the system properly, but this application is rather disadvantageous for our system," he said. "With some tweaking, even single precision arithmetic can achieve good results in most of scientific calculations. With our latest efforts to enhance performance, our system acquired computing power that is virtually on par with the top-ranking systems in the world."
For use in workstations, the Nvidia's latest GPU is called the "Tesla C1060" and operated at 1.296GHz. The Tesla S1070 has the same hardware configuration as the Tesla C1060 but operates at 1.44GHz and has a slightly higher computing power.