Servers passive cooling

Started by richardBranson, Sep 16, 2022, 09:12 AM

Previous topic - Next topic

richardBransonTopic starter

The more computing resources we install per unit volume of the machine room, the more profitable the whole complex will be and the more it will warm up. Cooling systems are being improved, but heat removal from electronic components is still based on the use of fans and massive radiators.
It is possible to isolate cold or hot corridors, install advanced monitoring systems and manage air flows in real time, but the technological limit of the effectiveness of such solutions has already been reached. And where is a reasonable alternative to that?



Data processing centers consume a large amount of electricity, about 63% of which is spent on the operation of web servers and their heat generation, 32% on cooling and 10% on ensuring uninterrupted power supply. Older data centers may have worse performance (40/50/10).

Why "Tercon"?

There are various ways to cool IT equipment, up to the immersion of servers in an electrically neutral liquid coolant. Their choice is limited primarily by the technical conditions of the data center, so the promising exotic had to be swept away. We needed a more down-to-earth solution compatible with mass-produced web servers and, most importantly, with the engineering infrastructure of the site.

    First of all, the data center operator is interested in more efficient heat removal from the equipment, which shares some benefits with us: the rental cost per unit of computing power decreases, which means that web services become more competitive.

One of ours experiments in that direction was the test implementation of a two-phase heat sink system "Tercon".

    reduction of direct energy consumption for web server cooling by reducing air consumption;

    higher equipment safety compared to liquid systems;

    the ability to adapt the cooling system to non-standard solutions;

    support of the domestic manufacturer — there are so few of them;

    the system is not demanding for maintenance and works on the principle of "put and forget" — the load on the staff is reduced.

In addition to the advantages, the Tercon system also has disadvantages, the main of which is the lack of elaboration of the technology. "Out of the box" nothing starts. More precisely, during the start-up and commissioning, problems arise that can be solved.
What is "Tercon"?

The key element of the cooling systems of the company "Terkon" is contour heat pipes (CTT) for heat transfer, cooling and thermal stabilization. They are mounted to the processor and other heat—generating elements of web server, and the terkon condenser is mounted to the heat exchange cooling bus. The tire is cooled with glycol from the data center air conditioning circuit, and there are no mechanical moving parts in the tubes, which increases the service life and stability of the cooling system.

We use grade A ammonia as a coolant, but the manufacturer offers solutions on ethanol, methanol, acetone, freons and even on water. The system dispenses with compressors due to the "liquid-vapor" phase transition — this, by the way, is its fundamental difference from liquid cooling, which requires the installation of circulation pumps.
Steam under the influence of capillary pressure forces moves through the steam pipeline — that is an isothermal process — and enters the condensation zone, where the reverse phase transition gradually occurs. A supercooled liquid exits the condensation zone.

The movement through the condensate pipeline is also considered isothermal, and there is a pressure loss in it. The liquid reaches the compensation cavity, where it is heated to the saturation line temperature. Further, through the central channel or shut-off layer, it impregnates the capillary structure, where a phase transition occurs again in the steam drainage channels.

The cooling system based on contour heat pipes allows you to increase the power of the IT load up to 25 kW and place more web servers in racks due to the rejection of bulky radiators and heat sinks outside the perimeter of the machine room. Unlike traditional heat pipes, CTTS provide heat transfer over a distance of up to 23 meters.
Installation process

Alexander, a specialist of the manufacturing company, helped us in the assembly of the Tercon cooling system. He twice went to the 14th DataPro engine room, where the radiators were previously delivered.

On his first business trip, Alexander showed a team of our engineers how to assemble and install a cooling system. Together with him, we installed the first web server, the other two were assembled and installed independently. Note that it is better to install a server with tubes in a rack together, since there is a risk of damaging the equipment alone.

The installation process is divided into two stages. First, web server is prepared, the standard radiators are removed and the Terkon cooling system is installed (that takes about 40 minutes).

Then the finished web  server is put in a rack: if the sled is already mounted in it, it can take up to 15 minutes: it can be difficult to get into the holes of the cooling tire the first time (it is necessary to adjust the contact with a screw clamp).

The procedure is simple, but requires some skill, and the efficiency of cooling using CTT depends on the thermal pastes used and the fit of the evaporator: the denser, the better.

We also note that we did not remove the fans: the installation of an additional passive cooling system was supposed to reduce the heating of electronic components and drastically reduce air consumption. It is difficult to say whether it is possible to do without coolers at all. In web server, not only processors or other devices with radiators are heated, to which heat pipes are supplied. In addition, the absence of forced ventilation can disrupt the direction of the air flow from the cold corridor to the hot one. In any case, at low speeds, coolers consume little electricity.

45 minutes, and the first web server is ready: video recording of the installation
The connection to the heat collector looks like this:
The "open" position of the clamp is necessary to loosen the locking plates in order to remove web server.
The position of the clamps "closed" allows for a tight fit of the plates to the tire, which increases the cooling efficiency.

The complex of connecting pipes for the cooling system occupies two lower units (they become inaccessible for the installation of servers). The pipes are connected to the raised floor of the data center directly under web server rack: as a result, the heat from the CTT processors is removed from web server volume to an external liquid heat exchanger and then to the outdoor cooling systems of the data center.
The tire in the middle of the rack

The tire in the assembly is located in the middle of the rack. In the future, we plan to move it, because in the current configuration it is inconvenient to connect patches and a video cable.

It also required LinX 0.7.0 software for AMD — GUI for the Intel Linpack test application. The servers were connected to the RPCM 1502 remote power management module, which increased the stability of the current characteristics and made it possible to record electricity consumption indicators.

For additional verification of the correctness of the energy consumption data, we used a wattmeter and data from the IPMI module.

    In total, we had to conduct three cycles of tests. The results of the first one did not meet our expectations and the cooling system indicators declared by the manufacturer. We assumed the presence of a radiator defect.

During the second business trip, the defect was confirmed by a specialist of LLC "Tercon-CTT", who sorted the radiators in the warehouse. At that stage, we added a few more test web  servers: as a result, we had 22% of the radiators left, and 78% were sent to Yekaterinburg for revision. The third cycle of tests was carried out to compare the air cooling system and CTT.

Particular attention should be paid to the results of testing the first web server: the CPU temperature at +95 °C clearly threatens the stable operation of the equipment. The reason for that result is a radiator defect, which was confirmed by Alexander during his second business trip.

Later, during a visit to us, Arkady Ivanov, General Director of Tercon-CTT LLC, showed a number of other nuances that improve the quality and stability of the cooling system. On the contact surface of the processor and the thermal interface of the tube, MX-4 thermal paste should be applied, and on the contact surface of the condenser plate of the tube, which is attached to the bus, KPT 19. This combination of thermal pastes is important because MX-4 provides better heat transfer, and KPT 19 does not dry.
Successful tests

After replacing the radiators and using the right combination of thermal pastes, we were able to get acceptable results. Note that if the temperature spread was large in a series of unsuccessful tests, then after correcting the errors, the results on different web  servers turned out to be almost identical.

The savings from using the Tercon system are more clearly evident when considering the operation of fans: depending on the cooling temperature of the processor, automation reduces the number of revolutions by 10-40% compared to the operation of standard systems.

The speed of the fans can be controlled via IPMI. The difference in power consumption between the minimum and maximum speed (8400 – 12600) was 30-50 watts, and the temperature difference between the maximum and minimum fan speeds was about 5 °C. The performance does not depend on the fan rotation speed, while the relatively low performance of individual web servers (less than 100 GFLOPS) is explained by single-channel memory.
Conclusions

The use of contour heat pipes allows for stable and uniform cooling of two processors, as well as to reduce power consumption (on average up to 383.4 watts for CTT versus 406 watts for air cooling). Better cooling of the remote processor also increases the performance of the equipment.
With Tercon, we can use single-unit web servers with a consumption of 400 watts. With low-profile radiators, such machines usually overheat, so you have to put the housings on 2U and active radiators.

Despite a number of difficulties with setting up equipment and a defect in the first delivery of radiators, it is necessary to recognize the experience of working with the Tercon solution as successful.
Pretty quickly we managed to assemble a functioning passive cooling system that meets the stated characteristics. Separately, it is worth noting the customer orientation of the manufacturer: commissioning of equipment and solving problems with it took place with the participation of representatives of the developer company.
  •  

Fan_Tema

An interesting idea. Personally, I have always been impressed by that implementation of the idea: to remove heat from the server with a thermal tube, and then use a liquid cooling circuit.
But, a prepared infrastructure is needed for such an idea.
Question about your implementation, is there a comparison of CTT with conventional heat pipes in terms of performance and price?
  •  

mickyrobert

What will happen if there is a leak in the circuit?
What are the monitoring costs?
Or just "an employee sniffed and smelled that something was leaking"? Or "when it hurts your eyes, we start the diagnostics"? I didn't see any wiring on any radiator from the built - in ammonia pressure sensors in the circuit ..

The idea may be good, but the implementation is bullshit: it's all the same in the gym (good, good, - liquid circulates under the floor of the machine room, next to the power bus); and in each web server there is a potential rather smelly toxic ammonia "bomb". Rolling a server on rails together is not super somehow (although this is rarely done, - a fact). Remounting of a branded server with opening of the case, etc... - also imho "not super".

.. So it's main heat sink is two 750 watt power supplies. The possessors, like web server, are old, the cooling mode depends on the temperature.
generally, it seems to me that the tested ammonia version is a very exotic thing.
  •  

jackgrant1

If you really wanted a fanless server so much, then it would be quite logical to place the cases vertically, "on the side", and accordingly change the direction of the radiator edge.
What's the point of farming such a thing in the server room if hundreds of fans are constantly roaring there anyway?
I would still understand if it was for the house.
I assembled a home server for myself in the midi tower (one "big" mother, three raspberries and 7 disks) - I put three slow—speed 140 mm fans for blowing (through filters), the case is under pressure — it's clean inside, at the same time it works as an air purifier in the apartment. From 2-3 meters it can no longer be heard, only the rustling of the air is noticeable nearby.
  •