Yesterday, AMD unveiled a key technology for it’s Heterogenous System Architecture (or short HSA) called heterogenous Queuing (hQ). AMD considers hQ to be a building block to make HSA designs efficient. The first HSA-enabled design aimed at consumers is codenamed Kaveri and due out towards the end of the year.

AMD hQ Overview

HSA is all about using all the computing potential available in an APU chip. Some workloads are better suited for traditional CPU cores, while others work better on GPU cores. In pre-HSA designs you would have to move data back and forth between the CPU and the GPU, which is a costly operation ? so costly it could eat up the entire speedup you could gain from GPU processing. HSA aims to unify memory spaces of the CPU and GPU to allow for better sharing of data among different types of cores. This is called hUMA (Heterogeneous Uniform Memory Access). However, this doesn’t quite solve the problem of transferring data from the CPU to the GPU, it just shifts the bottleneck to another part of the process.

Traditional approach to packet queuing

Old.

AMD hQ - The New Way of Queuing

New.

This is where hQ comes in. Instead of dispatching tasks from the CPU to the GPU via a kernel driver that would require user mode switching which itself is a costly operation, with hQ the CPU can directly dispatch tasks to the GPU cores without going through a middleman. Not only that, but it allows the GPU to spawn tasks itself for GPU or CPU computation. The tasks will be placed into task queues for different applications. This is a low latency process and thus allows workloads to use different types of cores efficiently. There is also a new hardware scheduler to fairly provide different applications access to the HSA resources. Potentially this allows heterogeneous computing to be applied to workloads that previously would be hamstrung by the overhead of transferring data back and forth.

Close-To-Metal (CTM) coming alive - direct access to hardware

The packets are standard - you don't need to program specially for any particular vendor.

hQ is being enabled by a standardized task dispatch packet format part of the HSA Architected Queuing Language that is the same across HSA vendors. In theory, this would allow to mix and match different HSA-enabled components to generate new chips with unique capabilities. In the case of AMD this will be mostly restricted to x86 CPU cores and GPU cores and possibly ARM CPU cores, depending on how well their ARM business does after it’s initial launch. AMD makes it clear that this technology is scalable from handheld devices to big iron servers in datacenters.

AMD HSA Architecture overview

AMD will disclose additional details about HSA and related technologies at the APU13, AMD’s Developer Summit taking place November 11-13th in the San Jose Convention Center.