Qualcomm announced its Snapdragon S4 class of processors and their first offering is the MSM8960 chip with an Adreno 225 GPU. The five processor SoC will include an integrated modem on die. The 28nm SoC’s micro-architecture is based on up to four independent ARM Cortex-A15 CPU cores, plus a 32 core GPU, a 128-bit SIMD engine, three DSPs, and hardwired codecs all running initially at 1.5GHz which will scale to 2.5GHz.

Earlier this year, Qualcomm's leaked roadmap showed MSM8960, a 28nm dual-core
Earlier this year, Qualcomm’s leaked roadmap showed MSM8960, a 28nm dual-core… and yes, there are quad-cores coming as well.

Qualcomm is working with both TSMC and GlobalFoundries, although TSMC will produce the first chips. Krait is a completely custom design of the ARM Cortex-A15, not an off-the-shelf version of either TSMC nor GlobalFoundries technology offerings. Qualcomm says Krait will be the world’s first smartphone CPU built on a 28nm process.

Qualcomm Snapdragon S4 system diagram - 28nm chip is being manufactured by GlobalFoundries
Qualcomm Snapdragon S4 system diagram – the real world version is a 28nm chip manufactured by TSMC or GlobalFoundries

Qualcomm’s S4 design differs from Nvidia’s Kal-El quad-core mobile processor. Kal-El is built around four ARM Cortex A-9 high-performance cores, with a fifth Cortex-A9 "companion" core specifically designed to handle less demanding tasks like push email and keeping the phone running while the user is not using the phone.

The S4 Krait CPU core is compatible with the ARM instruction-set architecture (ISA). Krait can issue up to four instructions in parallel and can fetch and decode three instructions per clock. Qualcomm has a three level exclusive cache hierarchy in Krait. The lower two levels are private per core, while the third level is shared among all cores. Qualcomm calls these caches L0, L1 and L2.

While the specific size of L0 is not disclosed, each core comes with 32KB L1 cache (16KB instruction + 16KB data). The L2 cache is shared among all cores. In dual-core designs the L2 cache is sized at 1MB, while quad-core Krait SoCs will have a 2MB L2. Krait’s L2 cache is 8-way set associative. As you can see from these figures, the amount of cache clearly shows that ARM is targeting contemporary low-power x86 architectures such as Intel Atom and AMD Fusion E-Series.

Qualcomm says they chose TSMC’s Poly/SiON and just LP (low power) transistors instead of LPG transistors, which are similar but tend to leak at higher temperatures. According to Qualcomm there’s less risk associated with TSMC’s non-HKMG process.

Qualcomm says they chose to build the MSM8960 SoC on a 28nm LP process compared to Nvidia’s 40nm LPG design in Kal-El. From Qualcomm’s perspective, 40nm LPG transistors are only useful at reducing leakage at high temperatures but for smartphone designs a LP design is better.

NVIDIA's 40nm process for Tegra 250 and Kal-El on top, Qualcomm's 28nm S4 i.e. Krait on bottom
Comparing 28nm S4 to 40nm used on NVIDIA Tegra 250 and Kal-El

The S4 processor family incorporates Qualcomm?s Adreno GPU technology, starting with the 32-core Adreno 225 GPU (Graphics Processing Unit). Qualcomm says the new chip represents a 50 percent increase in GPU performance over the previous generation GPU, the Adreno 220, and 6 times the processing power of Adreno 200. Qualcomm claims that MSM8960 will be able to significantly outperform Apple’s A5 in GLBenchmark 2.x at qHD resolutions. [Seeing is believing, Ed.]

Comparing the Adreno Family from Qualcomm
Comparing Adreno GPU family

The MSM8690 is using Qualcomm?s second generation (3GPP Rel.9) multi-mode modem. This amazing unit is capable of 4G LTE FDD/TDD (Long Term Evolution Frequency Division Duplexing/Time-Division Duplexing), UMTS, CDMA/EVDO (Code Division Multiple Access/Evolution-Data Optimized ? 3G, 2G, 1X), TD-SCDMA (Time Division Synchronous Code Division Multiple Access, for Chinese markets), and GERAN (GSM EDGE Radio Access Network). The MSM8960 also includes built in WLAN 802.11b/g/n (single spatial stream), Bluetooth, and GPS.

The Rel.8 and Rel.9 network processors can support cellular network speeds of up to 100/50Mbps for R8 or 150/84 for R9 (coming in 2013).
The Rel.8 and Rel.9 network processors can support cellular network speeds of up to 100/50Mbps for R8 or 150/84 for R9 (coming in 2013).

The final item Qualcomm added to this SoC is their own custom DSP (Digital Signal Processor), which they call "Hexagon DSP". Hexagon DSPs have been in Snapdragon chips since 2006. However, Qualcomm has not released much information about them prior to now.

Qualcomm has built in a serious DSP with a memory management unit, symmetric multiprocessing support and a hypervisor for increased capability. This DSP is ultra-low power so it will see a lot of use for audio, sensors, video, and imaging enhancement. We can expect to see this DSP configuration showing up in future generations of Snapdragon.

Hexagon DSP evolution
Hexagon DSP

About three months ago, TSMC sent the MSM8960 silicon back to Qualcomm. Qualcomm says they are on track to release during first half of 2012. We will probably see sample smartphones and tablets with MSM8960 SoC at CES 2012 in January. The competition over at Texas Instruments (TI) has not been sleeping with their OMAP 5, dual-core, 2GHz CPU promised for first half 2012. Plus, Nvidia always has a thing or two up its sleeve for CES, especially with the 28nm refresh of Kal-El (T35). 2012 should be an exciting year for mobile users.