AI innovation needs more data processing than ever. A redesigned memory device integrated with computational logic will unlock new applications for the future
PIM Solves the AI Data Dilemma
The growth and evolution of AI algorithms and applications have driven the demand for sharp increases in data processing requirements. It is clear that utilizing current memory solutions with existing incremental improvements in capacity and bandwidth will not be enough to meet the evolving needs of application areas such as healthcare, speech recognition and autonomous driving to process ever larger volumes of data at ever increasing rates to gain deeper insights. For a challenge as big as the one facing the future of AI, a revolutionary breakthrough is needed.
A solution to address the constraints of current memory solutions on the growth of AI applications has emerged in the form of Processing-in-Memory (PIM), and in an industry first, Samsung has incorporated PIM into High Bandwidth Memory (HBM). PIM will provide a timely bridge between the growing demands of AI data processing and current memory solutions that are struggling to meet those demands.
PIM in and of itself is not a new technology but it had previously only been explored as a high level concept in academia and industry. PIM works by integrating compute and memory, enabling a memory device with logic to perform computation on data locally, which is a task usually reserved for high performance logic devices such as CPUs, GPU’s and NPU’s. The ability to perform computation on data locally minimizes latency, increases the rate of processing, and improves energy efficiency. Samsung has implemented the PIM concept within HBM for the first time by incorporating an AI engine called the Programmable Computing Unit (PCU) within a HBM device.
Smarter memory and improved performance
The growth of AI applications has resulted in a large volume of data that must be repeatedly accessed and processed for insights to be extracted. One result of the demand for high volume of required data movement is that computational performance for AI applications are typically bounded by performance of the memory system. This limitation generally constrains the computational performance for AI applications. The need for a solution that addresses the limitations of the memory system was apparent. PIM is a solution that works for AI applications because the amount of data passing between the memory and the coupled high performance logic device is reduced, as some of the data is retained and processed in the memory device locally. The limitations of the memory system placed on the computational performance for AI applications are thus alleviated, and net computational performance improves.
AI applications can be generally divided into compute bound (vision) and memory bound (voice recognition, machine translation and recommendation) applications. GPUs and Neural Processing Units (NPU) are reliable solutions for computing bound AI applications but memory bound AI applications on big data sets need higher performance memory system with larger capacity and higher bandwidth than memory system requirement for compute bound applications. Among commercialized DRAM solutions currently available, HBM has satisfied the demands of compute bound AI applications and some memory bound AI applications thus far due to its relatively large capacity and very high bandwidth in a small form factor. However, the requirements of memory bound AI applications are rapidly growing past the capacity and bandwidth capability of the HBM device. As a result, Samsung looked to PIM as a way to augment the capability of HBM and improve the performance of memory bound AI applications.
The breakthrough in Samsung’s HBM-PIM solution is an AI engine called the Programmable Computing Unit (PCU) that sits in the memory core that enables the performance of some logic functions in the memory device. The PCU works in a similar way to multi-core processing in a CPU. The PCU enables parallel processing in memory to enhance performance.
PIM is well matched to HBM to improve the computational throughput for AI applications because of the HBM devices’ high degree of internal parallelism. The HBM device is constructed by stacking DRAM dies on top of each other and enabling simultaneous accesses to each DRAM die in parallel. The high degree of internal parallelism is also key to enabling HBM-PIM, as the structure of the HBM device allows for localized computational processing to occur on some DRAM dies while allowing for continued data accesses to other DRAM dies.
The HBM-PIM device shows promise in terms of attaining improved performance and improved power efficiency. In AI applications such as speech recognition, HBM-PIM showed a 2 times increase in performance compared to existing HBMs. In the area of power efficiency, by keeping some amount of computation internal to the DRAM die, the IO traffic associated with moving that data between the memory device and logic device is eliminated. As a result, in early tests, HBM-PIM shows that it can reduce power consumption by upwards of 70% compared with existing HBM solutions.
An open door to the future
PIM can be applied to many different memory technologies such as LPDDR and GDDR as well as HBM without the need to radically change or throw out existing memory ecosystems. Consequently, the size of the opportunity is huge. Enabling the memory to worker harder, or rather – smarter than it can without PIM, will be exactly what the industry needs as it investigates new ways to develop new AI applications. Samsung is starting with HBM in its march to use PIM to shake up the way we see computing. The company is working with innovators in the AI industry to make it happen.
PIM extends the possibilities of AI, which already has many applications that are changing the world around us. Now, there’s so much more to achieve thanks to this breakthrough in memory technology.