One of the most fundamental problems in computing is the relationship between a CPU and its memory hierarchy. The problem, in a nutshell, is that fast, low latency memory is extremely expensive and can’t hold much information. Larger, slower memory pools are far cheaper, but can’t be searched as quickly and take time to access.
Thus, a typical CPU has a small low latency L1 cache, a larger, slower L2, and so on. If a chip can’t find data in cache, it has to go out to main memory and from there to storage. The problem is that even pulling data from L3 has a noticeable impact on a chip’s performance – going to main memory takes hundreds of cycles, while accessing primary storage takes tens of thousands.
That’s why I was initially sceptical when Diablo Technologies claimed to have found a way to drop NAND flash into the main memory bus of a system. DRAM and NAND flash are designed for entirely different purposes, starting with the fact that DRAM is volatile, while NAND flash isn’t. NAND isn’t designed to be read or written in the same fashion, and it takes far longer to perform a read/write cycle to flash than it does to DRAM.
After talking to the company, I think my scepticism was misplaced. Diablo has something rather interesting here. The company’s Memory Channel Storage connects huge (200-400GB) pools of storage directly off the memory bus with ultra-low latency and better performance characteristics.
Diablo calls its own particular implementation of this technology TeraDIMM. That’s the reference design – the company has partnered with SMART Storage Systems to bring actual product to market.
What Diablo has done is build a specialised translation ASIC (codenamed Rush) and implemented it on-board a standard DIMM. The TeraDIMM can be configured in one of two ways. It can present itself as standard storage, at which point the OS will detect it as a hard drive and map it accordingly, or it can be used as RAM. This last part is what threw me initially – how do you mount an entirely different memory technology aboard a DIMM without the integrated memory controller (IMC) on the CPU throwing a fit?
Answer: You use a translation layer. Each TeraDIMM contains a specialised ASIC that translates the memory controllers’ commands. From the IMC’s perspective, nothing has changed, save that the system now has vastly more memory available than it had previously. Still, even using a hardware translator raises latency questions. Even the fastest NAND flash is far behind conventional DRAM as far as read/write speeds are concerned – how does Memory Channel Storage get around that?
Through the intelligent use of buffers. Here, Diablo Technologies was cagey and clearly didn’t want to give too much away, but it told us that each TeraDIMM contains a DRAM buffer of unspecified size. The company’s patent filings confirm this, and imply that the amount of RAM per DIMM can vary depending on the wishes of the manufacturer.
Stacking additional RAM on the TeraDIMM allows the controller to mask access latencies and perform write cycles in the background. The Rush ASIC pairs up with an SSD controller as well, though it’s not clear which company is providing that aspect of the technology. One thing the company did tell us is that the system can background copy data out of the RAM buffer over to NAND without needing the CPU to get involved with the process, and that 4K page copies from TeraDIMM to conventional RAM are handled without toggling the CPU, as well.
Diablo hasn’t released details of its technology beyond the fact that it’s using 19nm flash, but the company says it is aggressively binning and selecting high-quality MLC for the product. We suspect that the back-end of the DIMM is a fairly standard NAND flash drive, and MLC NAND in enterprise devices is common. It’s the front – the ASIC and the DRAM buffers – where the magic happens.
Of course, there’s still an underlying question of why anyone would take this approach to begin with. NAND, after all, is never going to be as fast as DRAM. A 400GB memory expansion might sound incredible, but you can’t access NAND flash in nanoseconds.
The limitations of PCI-Express
Comparing NAND to DRAM, it turns out, is the wrong way to think about this technology. The point of Diablo Technologies Memory Channel Storage isn’t to boost NAND to DRAM speed, but to slash the access penalty for solid state storage compared to SSDs or PCI-Express. The PCIe bus is high bandwidth, but latency can be a different matter. Sustained performance across PCIe is a function of how many other devices are on the bus, what those devices are doing, and what tasks the CPU is engaged in as well.
This is Diablo Technologies’ own representation of the situation, as compared to the access latency for TeraDIMMs (the tiny red line at the bottom is the claimed TeraDIMM latency). This may be overstating the case, but there’s supplemental documentation available – Dell’s own HPC I/O documentation compares latency on PCIe 1.1, 2.0, and 3.0 to show that PCIe 3.0 can be much lower latency than its predecessors with large message sizes (see the image below).
Diablo Technologies’ first customers are going to be high frequency stock traders and large-scale enterprise providers – possibly some HPC work at some point. This isn’t technology that’s going to come to the desktop anytime soon, because it’s not technology that the desktop particularly requires. DIMM slots are historically much more “expensive” than PCIe or SATA ports and software needs to be specifically tuned to take advantage of the way this shifts the memory model. But TeraDIMMs can serve as RAM, though you still need at least one conventional DIMM in a system.
That’s one of the most attractive aspects of this technology. You can load TeraDIMMs into any memory slots, in any order, provided you have a single DRAM DIMM. Whether or not this will result in performance enhancements based on the sheer amount of available memory rather than the latency reduction isn’t clear; any such workload would be esoteric, nearly by definition.
Still, I think there’s value here. Even if only a small class of enterprise users buy into the technology, this kind of hybrid approach could characterise the development of flash’s eventual successor. The quest for a non-volatile replacement for both DRAM and NAND flash continues – but that’s something we’re going to discuss further later this week.
If you register with ITProPortal.com, you'll receive:
- Fast-track access to the seminar programme
- Entry into a prize draw for an exclusive gourmet dining experience at IP EXPO ONE Place Dining.
- PLUS: As a loyal reader of ITProPortal, you'll also be able to kick back in the exclusive ITProPortal lounge, enjoying complimentary beverages and the chance to chat to our expert team of technology writers.