ARM or x86? Which servers are better suited for cloud hosting

Started by plccourses, Oct 09, 2022, 07:24 AM

Previous topic - Next topic

plccoursesTopic starter

If you needed high performance, then you had to choose an x86 processor for your solutions. Recently, more and more talk about the decline of the x86 architecture and the transition to ARM. Last year, Apple introduced the Mac for the first time on ARM processors. But will the ARM processor be suitable for data centers, will ARM be able to permanently send x86 to the bench?



Paradigm shift

When mass users started using smartphones and tablets, it became obvious that Intel, AMD and VIA x86 processors were simply not suitable for these tasks. Intel's instruction set is still not a popular choice for mobile processors, although engineers are trying to change that. In any case, when it comes to the market segment, x86 is not as efficient as other architectures, namely processors based on ARM's 32-bit ARMv7 and 64-bit ARMv8 instruction sets.

Over the past decade, and especially over the past five years, ARM processors have taken a dominant position in the world of smartphones and tablets. ARM processors offer better performance per watt, they are cheaper to design, to breed on the board and deploy. Large companies can buy additional blocks (IP cores) for ARM processors, such as modems, graphics accelerators, and develop their own based on ARMv7 or ARMv8.

This has prompted some developers to develop their own custom ARM processor cores.
Qualcomm and Apple have become leaders in the development of System-on-Chip (SoC), the development of complex and non-standard processor cores has played a crucial role in their success. However, custom ARM cores were used only for high-performance systems, while the rest of the market used standard ARM cores.

One of the prerequisites for ARM's success was Microsoft's failure in the smartphone and tablet market

It would seem that nothing could break the Windows+Intel (Wintel) bundle. Windows OS with Intel processor reliably dominated the desktops of corporate and home user desktops. Microsoft has been developing the Windows Mobile platform based on Windows CE for a long time, designed for embedded systems from POS terminals to single-board computers in industry. Windows CE worked only on RISC architecture.

When Google's Android came on the scene, it became clear that the Windows Mobile platform was hopelessly outdated. Instead of expensive smartphones on Windows Mobile, cheaper Android smartphones came, up to artisanal Chinese ones that were made on the knee. It was the openness of the platform, the developed Google services, the low price of devices that made ARM processors a mass product.

Trying to jump on the outgoing train, Microsoft announces the development of a new Windows Phone platform, instead of Windows Mobile. In 2015, according to IDC, the share of Windows Mobile smartphones did not exceed 2.2 percent. Disappointed by this turn, Intel is trying to enter the smartphone market with Android OS.
By this time, leading smartphone manufacturers have already built their business on the use of ARM processors. None of the smartphone manufacturers wanted to contact Intel because of the high risks of using the x86 processor in smartphones. At that time, the Google Play Store contained programs only for the ARM architecture. The power consumption of smartphones on Intel Atom was significantly higher, as was the final price for the consumer. As a result, several smartphones were released, but they did not receive any special popularity. By the end of the last decade, it became obvious that Microsoft and Intel had ceded this lucrative market to ARM, Google and Apple. Unfortunately, the management of the companies could not recognize the potential of smartphones and tablets in time.

Nevertheless, mobile communications is not the first and not the only market segment that has witnessed the failure of Microsoft. The other is web server market. At first glance, smartphones and data centers do not have much in common, but from a technological and business point of view, they partially coincide.

Regardless of whether you are developing a smartphone or a server, you need to highlight similar aspects of your hardware platform, such as energy efficiency, low heat generation, performance/price ratio, etc. The most important thing is that you don't really need an x86-based processor for smartphones and many types of servers. Thanks to Microsoft's failures, Windows does not dominate these market segments. Instead, Linux/Unix-based operating systems are used: Android, iOS and other various distributions.

Why did ARM servers seem like a good idea from the beginning?

Currently, several billion smartphones and tablets running on ARM processors are in use. In addition to being used in mobile devices, ARM processors are used in a bunch of other devices: routers, set-top boxes and smart TVs, smart watches, some gaming devices, automotive infotainment systems, etc.

Despite such popularity, it is difficult to meet ARM processors in other market segments. When we talk about web servers, we are usually talking about x86 Xeon from Intel and AMD, respectively.

So what makes ARM servers better than x86? ARM servers are cheaper. This thesis can be stated in paragraphs:

    Best performance/price indicator
    It is easier to scale the load
    Competition and selection of ARM processors from various manufacturers
    It is possible to integrate any computing unit into the processor, for example, FPGA placement.
    A great way to reduce dependence on Intel.

In the x86 server, of all the components, the most expensive is the Intel Xeon processor. Therefore, by replacing the Intel CPU with ARM, you can significantly save in cost. Additionally receiving reduced heat generation.

The use of outdated x86 processors is inefficient in terms of energy consumption. Consume and emit as much energy as more productive ones. When it comes to a home computer or an office PC, heat dissipation is not a big problem. But when designing a data processing center, first of all it is necessary to evaluate the TDP (thermal design power - design requirements for heat sink, requirements for heat sink). Because servers with low TDP are cheaper and easier to maintain.

How many cores are enough for a processor?

Just ten years ago, multi-core x86 processors were used in computers and servers, but now you can get quad-core Intel Atom (x86) processors in tablets for a hundred dollars.

Before the advent of multicore processors, there was an era of the frequency race. To increase productivity, the processor frequency was increased. This led to a significant increase in electricity consumption and an increase in heat generation. In the end, a ceiling was reached at which a further increase in the frequency of the processor led to the impossibility of heat dissipation without going beyond the standard reasonable dimensions of the systems. With a large heat release, you can switch to liquid cooling or liquid nitrogen cooling, but it is unlikely that such an approach can become widespread.

The era of Gigahertz was replaced by the era of Multicore. As it turned out, from a technological and financial point of view, building multi-core systems is better than raising the frequency of the processor.

It is much easier to distribute the load on several smaller and more efficient processor cores than to develop one huge core capable of operating at high frequencies. The multi-core approach provides superior efficiency and is simpler in chip design.

From the point of view of increasing the number of cores, ARM architecture has a huge potential, it is possible to create an ARM processor with 128 or 256 cores. x86 does not have such an impressive potential. However, this does not mean that we will have such processors tomorrow, because in this case, the limiting factor is the data bus bandwidth.

Qualcomm made its first web server processor with 24 ARMv8 cores, and made it clear that there will be even more cores in future models. Qualcomm is not the only company producing ARM processors for servers.
Manufacturers such as Broadcom, Calxeda, Cavium Networks and Huawei HiSilicon are also working on ARM-based server solutions. Nvidia and Samsung, two heavyweights in the SoC and GPU business, are also developing.

What are custom ARM cores?

Let's look at the differences not from the point of view of x86 and ARM instruction sets, but from the point of view of business. The biggest difference between Intel, AMD and ARM is a fundamentally different business model. Over the years of development, ARM's approach to marketing and licensing its technology has not changed.

Consider the technological process of manufacturing Intel processors. Intel processor is developed using Intel instruction sets. Further, it is manufactured at one of Intel's factories, packaged and shipped with the Intel Inside brand. It may seem simple, but we must not forget about the billions spent on research and development (R&D) for decades.

What about ARM processors? ARM is not a chip manufacturer, but a company engaged in the development and design of ARM cores, so you will not see processors under this brand anywhere. ARM is selling something much more interesting: intellectual property.
This means that ARM customers can choose from any of the many different licensing plans and start building their own processors. Most of them choose their own ARM designs (Cortex series processors, Mali series GPUs), so they pay a license fee for each CPU/GPU core produced.

However, it is not always necessary for the client to license off-the-shelf processors; instead, they can license a set of architectures and develop their own kernel based on the ARM instruction set. This is what Apple does. They use the ARMv8 instruction set to create large and powerful 64-bit processor cores for their devices.

Designing your own processor core is not easy. Therefore, most companies use standard ARM Cortex cores instead (for example, a 64-bit Cortex-A76 core can be used in a server environment).

It is important to note that ARM—based chips are developed individually, while the processor cores used in most are not.

The vast majority of ARM processors are based on standard ARM cores (Cortex processors), rather than custom processor cores. End processor manufacturers can choose any of a number of ARM cores, third-party GPUs and other components and adapt the processor to suit their needs without having to develop their own ARM core. This is a fairly cheap way to make the architecture more flexible, and it has more to do with ARM licensing policy than with development.

It is also important to note that such future ARM web servers based on the 64-bit ARM architecture have the same set of peripherals as x86 processors. Already many ARM processors support various interfaces such as: PCIe, SATA, NVMe, etc.

ARM Neoverse processors for data centers

ARM is actively expanding its PROCESSOR usage segments. So wanting to push Intel's leadership in the segment of high-performance systems, ARM has developed the ARM Neoverse processor designed exclusively for data centers.
The processor is based on the Cortex-A76 core, operating frequency up to 3 GHz (in practice up to 2.6 GHz). The processor design is represented by two variants: Neoverse N1 and Neoverse E1. It is expected that in 2021, factories producing the Neoverse processor will switch production from 7 nm to 5 nm.

The basis of SoC on Neoverse N1 cores is the Component Aggregation Layer (CAL). It is designed to work in a mesh topology and contains two interfaces. The first interface addresses the network layer, consisting of numerous intersections of the XP (Crosspoint) mesh network, which are switches or routers. The second CAL interface addresses the System Level Cache. The mesh network operates at 2/3 of the CPU core frequency and is optimized for minimal delays. The bandwidth has been more than doubled compared to the Cortex A72.

ARM based on Neoverse N1 offers various reference platforms. The design of N1 Edge is focused on network and cloud solutions, data storage systems. Here ARM uses up to eight Neoverse N1 cores. The design of the N1 Hyperscale is designed to show the full potential of the architecture. Here, 64 to 128 N1 cores are already used in the 8x8 mesh topology. The volume of the System Level Cache is 64 or 128 MB. 128 PCI Express 4.0 lines and an 8-channel memory interface are also available.

Similar SoC: "CPU tiles" blocks or tiles consist of two N1 cores each. Eight similar tiles form a "Super Tile". As a result, we get the building blocks of the N1 Hyperscale design, which can have up to 128 cores. The mesh network will operate at frequencies between 1.8 and 2.2 GHz. If this level is 2/3 of the frequency of the cores, then in the case of the latter we will get from 2.4 to 3.0 GHz.

ARM neoverse web hosting
Construction of the cellular structure of ARM Neoverse N1.

Such a SoC will consume 105 watts (in the 64-core version). However, depending on the scenario, it can significantly outpace Intel and AMD processors with 28 and 32 cores in performance, so efficiency should be higher.

The Neoverse E1 processor is primarily focused on maximum performance, ARM has prioritized processing large data arrays. We are getting the first ARM server processor with SMT (Simultaneous Multi Threading) support.

The system of caches and sampling blocks has been changed to take into account the execution of several threads simultaneously. The Neoverse E1 design provides up to 16 cores per SoC. Theoretically, more is possible, but many components and the System Level Cache are focused on working with a cluster of 2x 8 cores. Such a processor works with DDR4-3200 memory via a dual-channel interface.
 The power consumption of the entire SoC is less than 15 watts, and the cores themselves require only 4 watts — which once again shows that other components like network interfaces make a significant contribution to the energy consumption budget.

Neoverse E1 and Neoverse N1 show ARM's very serious ambitions to conquer web server market. ARM is not going to settle for a niche segment, the company plans to compete with Intel and AMD. Huawei has clearly shown in recent months that the development of its own server processors based on ARM design is a very promising direction. ARM has transformed Cortex designs into specialized Neoverse E1 and Neoverse N1 server platforms.

An interesting battle of concepts awaits us in the next two years. But web server market is quite inertial, and it may take several years to replace the platforms.

Forrester's report on the effectiveness of the transition to ARM Neoverse

In the middle of last year, the research and consulting company Forrester, commissioned by ARM, presented a report on the assessment of the financial benefits of switching to Neoverse processors. To better understand the benefits, costs and risks associated with migration, Forrester conducted interviews and interviewed several customers who use Neoverse, as well as two related cloud service providers.

Switching to ARM servers.

According to the results of the study , it was found out:

    Initial infrastructure construction costs are reduced by 30-60%. Thanks to the compact form factor of ARM modules, and reduced heat dissipation compared to x86, it is possible to significantly increase the density of ARM cores per server. As a result, the number of web servers that need to be installed is reduced. In addition, ARM-based hardware is 20% cheaper.
    Operating costs are reduced by 10-15%, because ARM servers require less cooling, and their number is smaller.
    Switching to a cloud infrastructure running ARM processors saves up to 80% of funds


Based on the current costs of the surveyed companies, the situation of using 400 cores was modeled, with a 3-year cost horizon for ARM and traditional servers (x86).

As a result of 3 years of ownership of ARM servers, it is possible to achieve 40% financial savings, which is very good for motivating the transition of large IT companies.

ARM processors are better suited for specialized enterprise applications

As for the development of enterprise applications, the entire optimization of execution comes down to better algorithms. For example, if it is necessary to search for textual information more quickly, then indexes and various algorithms are used. Anyway, there is a certain ceiling in increasing performance, and new algorithms are quite difficult to come up with. All this is true in the architecture of the existence of x86 processors.

If we take an ARM processor, then we can achieve better results by moving the software logic to hardware. The algorithm will be executed not on a general-purpose processor, but on a special coprocessor working in tandem with the main ARM core. One example is Google, which has developed its own server design. As well as specialized Amazon web  servers for web hosting services.

AWS Graviton — specialized processor for Amazon Web Services

Amazon was one of the first companies to decide to develop processors "for themselves". The new processors were supposed to replace x86 solutions, in terms of some services.

AWS Graviton processors are based on Amazon's original design using 64-bit Arm Neoverse cores and provide optimal value for money for Amazon EC2 cloud loads.

AWS Graviton processors of the first generation are used for Amazon EC2 A1 instances – the world's first AWS instances based on the ARM architecture. These instances can significantly reduce the cost of other general-purpose instances for horizontally scalable applications, such as web servers, container-based microservices, log data processing tools, as well as other workloads that can be managed using smaller cores and available memory.

AWS Graviton2 processors are characterized by higher performance and advanced capabilities compared to the first-generation AWS Graviton processors.
They are used for Amazon EC2 T4g, M6g, C6g and R6g instances, as well as their variants with NVMe‑based local SSD storage, which have increased performance by 40% (compared to similar instances of the current generation based on the x86 architecture) and are designed for a wide variety of workloads, including startup application servers, microservices, high-performance computing, electronics design automation, gaming applications, open source databases and in-memory caching.

Instances on AWS Graviton2 are used by Snap Inc, a well-known provider of services for social networks such as Snapchat and Bitmoji. According to the company's software engineer, the transition to C6g instances based on Graviton2 allowed to reduce CPU usage by about 10% due to better performance.

The transition to M6g instances on Graviton2 allowed Netflix, the largest online cinema, to increase the performance of java applications by up to 50% compared to instances on M5 (Intel Xeon Scalable or AMD EPYC).

ARM servers, pros and cons

ARM servers are usually used in small niches and are not suitable for small development teams that can work with any server. A small business is more likely to choose an x86 server than an ARM server. Moreover, when building a small system, the final cost of web  servers on x86 and ARM will be almost comparable.

However, despite this, the benefits of using ARM servers will be as follows:

    The total cost of ownership over a long period of time will be less due to lower equipment and electricity costs.
    Flexible extension of functional modules.
    The ability to switch to new technologies and industry standards due to the faster entry of ARM processors into the market compared to x86
    Load scales better
    The potential of choosing the best offer due to greater competition and variety of products than in the x86 market.

Problems with using ARM servers:

    The necessary software for the ARM architecture is not always available
    So far there are few offers to choose from
    There are significantly fewer specialists in the labor market
    It is difficult to migrate from x86

Software-related issues are likely to be the biggest problem. Although many popular applications will run on ARM servers, software support will be a challenge. It is not enough to simply transfer everything to new equipment, because we need to make sure that the services are running smoothly and there is no drop in services.

Another obvious problem is the support of outdated software. On the one hand, it is good when a company develops a product and can transfer it to the ARM architecture in the future.
 But what to do if you use a product from a company that no longer exists on the market, or support for the product has been discontinued.

Many software developers take a wait-and-see attitude towards the ARM architecture, because there are no requests from large customers for ARM support yet.

What about these old habits? It is customary for programmers, since it works fine, do not touch it. Since web  server technologies are not developing fast, people tend to stick to proven platforms, namely x86 hardware.
So, any use of ARM servers creates an extra risk, and a fair amount of courage and confidence will be required for the transition.
  •  

Bravoman1

Now the server ARM for enterprise looks absolutely incomprehensible.
 It's not even a different architecture: for server applications, there are Linux distributives, and compiled applications, and, of course, there is a JVM — if there was some clear gain, the sharks of capitalism would quickly flock to the smell of money.
Independent benchmarks (for example) while some kind of "wow!" (at least in some part, as in the same YERUS) do not show. Breakfasts that "tomorrow all servers will be on ARM" we hear for 10 years ("wolf, wolf"). The effect that Apple got with M1 is poorly applicable here (let's start with the fact that there is overclocked soldered memory).

Enterprises actually almost don't care what to run the same microservices or standard applications on — if ARM is profitable, then it will quickly (in 5-15 years) replace the bulk of servers.
  •  

jamesanderson11

I've been watching these fables about the conquest of the ARM server market since the appearance of ThunderX, and this is 2013.
During this time, AMD managed to rise from the dead. I see EPIC-based servers around, they have different manufacturers, vdsina advertises them. And ARM still needs to be searched for.

You also need to try to get a normal workstation on ARM. And this is necessary for debugging and porting software. Few people will like to develop on RPI.
From what I personally managed to feel with my hands, these are servers from Huawei on their ARM processor. In general, as a finished product, it makes a good impression.
Sufficient performance + networking with RDMA. But vendor lock, and unclear prospects.
While some large hardware company, for example NVIDIA, with the support of a large software company, will not bring to market a line of CPU + chipset ready for OEM, I doubt the success of ARM in the server segment.
  •  

Articru

ARM's recent victories include the closure of the ThunderX3 project at Marvell and the dismissal of the entire team.
As well as the cancellation of the server direction in the purchased Qualcomm Nuvia. The facts are that the tales of the 2012 sample about ARM, which will now come and bite off a share in data centers, crashed into the cruel reality in the form of the worst energy efficiency of ARM (performance comparable to x86 is given at the price of a higher TDP), worse scalability (not all loads can be scattered across weak cores), poor performance in FP , near—zero compatibility (the server CPU is not another Mediatek that can be molded from what was), unfinished software, and so on, so on, so on — with swords and with a bow.
Running around LinkedIn, waving ARMS, except for delusional figures like John Masters.
  •