Arc A770/750 lookup: Faster than RTX 3060 Ti and RX 6700 XT in ray tracing?

Intel Arc Alchemist Ray Traversal Pipeline Acceleration

A few days ago, Intel’s Arc graphics cards gave their official rendezvous. In the detailed Intel Arc test, we informed you about the strengths and weaknesses of the brand new Xe architecture in the “High Performance Gaming” (HPG) variant. In fact, there is a lot of performance in the Arc models A770 and A750 if you play modern games on them. In the best light – technically and figuratively speaking – Arc is there with ray tracing. But why is that and what level of performance can generally be expected from ray tracing? We’ll take a closer look at that below, with more games and some technical background.

Arc Alchemist: Radiant Man

The ACM-G10, currently Intel’s fastest graphics processor and engine of the Arc-A7 graphics cards, accommodates an impressive 21.7 billion circuits on a core area of ​​406 square millimeters. This means that it is really well fed for a mid-range chip in modern Node 6 production and connoisseurs wonder why the chip achieves relatively little performance despite the many circuits. The answer to that lies both under the hood and in the software.

Looking at the hardware innards, it’s clear that Intel engineers invested a lot of transistors into advanced features. As a reminder: The last Nvidia architecture with a pure rasterizer focus is called Pascal and saw the light of day in 2016. Since Turing (2018), the Geforce makers have been installing dedicated arithmetic units, which some players still consider to be a waste of space: ray tracing units and auxiliary arithmetic units that can perform special tasks with incredible throughput (“tensor cores”). After a few years of drought, both the ray tracing train and hardware-based upscaling have picked up speed and the industry agrees: this is the future. After AMD also implemented ray tracing units with the RDNA 2 architecture in 2020, but these turned out to be minimalistic and said special cores are also missing, the presentation of Xe HPG in 2021 was particularly interesting. Intel’s graphics division takes the same line as Nvidia, with powerful ray tracing cores and XMX units called “Intel Matrix Extensions” in the style of Tensor cores. The latter have their origins in the professional field, but machine learning has long been more than a short-lived trend in the gaming segment. The XMX units are intended to help the chronically overloaded FP32 ALUs (Vector Units) with the help of clever upsampling processes, for which Intel has created XeSS in the style of DLSS.

Intel Xe HPG: Arc-itecture

Intel Arc Alchemist Ray Traversal Pipeline Acceleration


If you follow Intel’s explanations, Arc’s ray tracing units correspond roughly to the Nvidia counterparts within amperes – and these are the best of the best until the release of the first Ada graphics card (Geforce RTX 4090). This includes relatively inflexible but fast-paced arithmetic units for specific purposes – so-called fixed function units. In Turing, Ampere and ACM-G10, these take over the acceleration of the raytracing data structure BVH (Bounding Volume Hierarchy) including intersection work and keep the data in a dedicated cache. That’s not all, Arc also includes an ingredient that Nvidia first pours into the Ada silicon: the so-called Thread Sorting Unit (TSU). One of these is in every Xe-Core and is intended to pre-sort the diffuse data stream into more efficiently executable sequences during ray tracing. Nvidia calls this function Shader Execution Reordering (SER) and states that this unit must be specifically addressed by developers. Intel, on the other hand, emphasizes that this is done automatically, which is a clear advantage – but probably not quite as efficient as interacting with game code.

graphic card Arc A770 16GB Arc A770 8GB Arc A750 Geforce RTX 3060 Radeon RX 6650 XT
GPU codename ACM-G10 ACM-G10 ACM-G10 GA106-300 Navi 23 KXT
Manufacturing Technology & Foundry N6 TSMC N6 TSMC N6 TSMC 8N Samsung N7P TSMC
Transistors graphics chip (million) 21,700 21,700 21,700 12,000 11,060
Die-Size (mm²) 406 406 406 276 237
Typical Core Clock (MHz) 2,100 2,100 2,050 1,777 2,410
Memory Clock (MHz/GTs) 8,750/17.5 8,000/16.0 8,000/16.0 7,501/15.0 8,750/17.5
SIMDs (Xe Cores, CUs, SMs) 32 32 28 28 32
FP32 ALUs (SP) 4,096 4,096 3,584 3,584 2,048
Ray Tracing Units 32 32 28 28 32
GFLOPS FP32 (SP) 17.203 17.203 14,694 12,738 9,871
Level 2 Cache (MiByte) 16 16 16 2.25 2
Level 3 Cache (MiByte) 32
Memory interface (bit) 256 256 256 192 128
Storage transfer rate (GByte/s) 560 512 512 360 280
Memory configuration (MiByte) 16,384 8.192 8.192 12,288 8.192

Specifications of the performance with typical GPU boost according to the manufacturer. In practice, the frequency varies (usually it is higher) and therefore the throughput.

Long story short: Arc is well prepared for ray tracing on the hardware side. What’s holding Intel’s young alchemists back is the driver – even under DirectX 12 and Vulkan, the current ray tracing interfaces. Because although the driver has less control here than under DirectX 11 and older, its influence as a communicator between GPU and API is not insignificant. Intel’s graphics boss Raja Koduri explains this fact in the latest Intel talk about Arc and thus gives hope for great leaps in performance, since the driver team is far from tapping the potential of the hardware. This brings us to practice: What do the Arc graphics cards A770 and A750 do in ray tracing?

Reference-www.pcgameshardware.de