Caching, tiering, and hybrids: where and how SSD can fit into your environment

While the Solid State Drive (SSD) is one of the biggest buzzwords of the past year, it is not by any means a new technology. SSD has been with us in some form or another for quite some time.

While the Solid State Drive (SSD) is one of the biggest buzzwords of the past year, it is not by any means a new technology. SSD has been with us in some form or another for quite some time. So the real question is, why is it gaining momentum now?

In enterprise environments, RAM-based SSDs have existed for over a decade to accommodate a niche high performance market that not many of us had a lot to do with. Over the course of that decade, traditional disks kept growing in size but not in speed. As a result the balance between capacity and speed has warped and confronted us with scenarios where we needed to put many more drives in our storage systems than we would normally need from a capacity viewpoint in order to get the speed we need. (And we most certainly need speed. Consolidation of our server farms, applications, and desktops on computing platforms where Moore’s law still applies have pushed traditional arrays to their limits.) Does this mean we're going to replace our traditional magnetic drives with SSD technology? Not quite, but giving some serious thought to where SSD technology fits in a modern storage landscape is something all of us should do.

Over the last two years, NAND-based SSDs have been making an appearance in the enterprise market—and they are doing well. However they have their limitations.

As with any storage design, which SSD should be used and where it should be positioned all depends on your application landscape and your budget. Most likely you won't require SSD technology for your entire application set.

So let's take a serious look at what SSD is and what it can do for you, beginning with where you use it.

Where to position SSD technology to best accommodate your storage requirements depends on a number of factors. Each of the ways SSDs are used in the enterprise has pros and cons. Since the applications determine which of these points are important in your environment, we’ll discuss on a high-level which applications could benefit from a certain type of SSD. SSDs are currently used in mainly the following ways:

  1. Flash-cached (performance acceleration of magnetic disks by extending cache)
  2. Flash-tiered (performance acceleration of magnetic disks by using a tiered approach)
  3. Hybrid file systems (performance acceleration of magnetic disks by using a tiered approach)
  4. Flash-Only arrays (using SSD’s in a purpose built all flash array)
  5. Server side flash (connecting SSD directly to the PCI bus of the server)
  6. Network flash (combination of flash-only arrays en server side flash) 

Flash Cached

Most, if not all storage arrays have some level of caching available within the system, sometimes just for read or just for write, though ideally for both. As a rule this is RAM-based, battery-backed cache that is sized related to the size of the platform. It's generally only expandable on the larger systems, with flash cached cards that can expand the internal RAM-based cache to an extra layer of SSD.

With read caching, predictive algorithms are used to load data that is expected to be read or has been read several times in the last time period into cache to be able to serve it faster.

Write caching is generally used to buffer incoming writes and to optimize them before writing them to disk, for example by using full stripe writes, where enough data is cached to write across all disks in an underlying stripe set. In modern arrays the use of SSD technology allows vendors to expand these caches to beyond what is normally available in the system.

When caching writes, the advantage of using SSDs are immediately obvious. but it’s only suitable for bursts. Sustained writes at a higher level than the underlying disks can handle will fill up the cache and at that point you're at the mercy of the performance of the underlying disks.

With regards to read caching, an SSD cache needs to ‘warm up’ for a few days so the algorithms in the array can determine which blocks are 'hot' and therefor need caching. Depending on the access pattern of the data this can increase performance from slight to significant.

Pros and Cons

Strong points of this technology are that it can decrease the amount of spindles required in your existing array while very little training is required for your support staff. In the case of a read cache card a disadvantage can be that if the card fails or a failover to another head occurs the cache will need a few days to warm up and reach the same efficiency levels as before.

Practical applications

A dataset consisting of VDI desktops which have a high de-dupe ratio will, in a de-dupe aware cache, perform very well. In a completely random access pattern to a volume with millions of small files performance may not see such a big improvement. In a scenario like that you could however still cache your file metadata to achieve a performance boost.

But if you can get a workable average out of your measurements that can be accommodated by your magnetic disks, extending the cache of your existing array for burst handling can be a good idea. No requirements for training staff on a new platform, low impact implementation and relatively easy migration of data are all additional bonuses here.

Examples of Flash cached arrays would be NetApp with its Flash Cache cards. (Note that these cards only cache reads.)

Flash tiered

In flash-tiered arrays SSD is used as a Tier0 where a form of tiering is used to move data either down to underlying traditional disks or up from traditional disks to SSD. This can both be a dynamic or a scheduled process and it is important to note that depending on the implementation, blocks as large as 1GB and even entire volumes can be subject to relocation, regardless of the fact that the hot data might only be a few KB in size. Generally these are the more traditional arrays that are not necessarily optimized for the use of SSDs. 

Pros and Cons

Wear-leveling can be a serious issue if the array is not optimized to write to Flash in a sequential way. Controllers may not be fully optimized for the higher throughput and IOPS rates that Flash can provide which is why quite a few startups have actually redesigned their array from the ground up to take full advantage of the capacity Flash drives can provide. Tiering algorithms can significantly increase the load on the system.

Practical applications

If you have a significant investment in an existing Enterprise platform, depending on your requirements, adding SSD as a tier 0 to your array could be a good idea. It would address the issue of having to manage multiple arrays and would allow for more than 100TB under a single pane of glass, which seems currently to be the limit for ‘all flash arrays’. Examples of flash tiered arrays would be Hitachi’s VSP and EMC’s VNX platforms.

Hybrid File Systems

Hybrid file systems describe a technology that allows for DRAM, SSD and traditional spinning disks to be part of the same file system. Data written to the storage system is stored into battery protected DRAM or SSD after which an acknowledgement of the write is sent to the client. The file system subsequently de-stages the data to the backend disks but is also capable of promoting data that is getting hot. It is in effect a level of tiering within the file system.

Pros and Cons

The level of intelligence required to apply this kind of tiering requires a file system with enough intelligence do the work transparently from the clients perspective.

Practical applications

File systems like WAFL and ZFS support this level of intelligence. NetApp has announced Flash Pools, the SUN/Oracle 7000 series uses ZFS in this way, as does Nexenta.

Flash Only arrays

The architecture of Flash-only arrays resembles traditional storage systems in the sense that there is a central array with FC / iSCSI, etc connectivity. Flash-only arrays have some advantages over Flash-cached/tiered arrays, they are generally designed from the ground up to take full advantage of the speed that SSD drives can provide and have advanced wear-leveling and management capabilities.

Pros and Cons

The price/capacity ratio is a lot less favorable than that off traditional disks but through thin provisioning, (inline)de-dupe, compression, zeroing, etc, this form of SSD approaches the price per usable/effective GB which traditional Tier 1 arrays offer, only many times faster.

Only recently players in this segment have started to offer HA within the array as an option, without it, this can’t be used to support business critical applications without organizing fail-over/redundancy at higher (application/OS) levels.

Network latency is still an issue here as the compute layer is divided from the storage layer by some sort of network, be it FC or Ethernet. That means for sub-millisecond response times a more localized solution is required.

Cost can also be an issue depending on what you need. If you don’t need space but IOPS a system like this one will give you the best $/IOPS price, if however you do need a substantial amount of space as well, it might be cheaper to go with a few more spindles in a traditional array. The serious players in this segment are starting to sell their systems as competitive with traditional Tier1 storage on $/GB basis due to added features like compression, de-duplication, zeroing out, etc.

Practical applications

Customers requiring a consistently high rate of IOPS or throughput may benefit from a system in this category. It will require some extra training, a more complicated migration path and of course an implementation will need to be scheduled. A new VDI implementation would be a good example where this technology could do well.

Solutions like Whiptail use SSD drives with MLC cells but they consolidate IOs so data gets written with blocks at a time effectively eliminating the write amplification and making a single write from all the incoming random writes.

Other vendors like Violin Memory use PCI flash cards and have a dynamic net storage space depending on IO load. They allocate more of the gross space to allow garbage collection to create empty cells. The lower the IO load, the more net space you get.

Pure Storage has just finished customer try-outa and they have an excellent team of technicians at work, also, not unimportant for a startup, they have some very strong VC backing.

Nimbus Data Systems, already a financially healthy company offers HA, unified access, and a decent set of included software which includes replication, snapshots, etc.

Texas Memory Systems’ RAMSAN product is the grandfather of modern day SSD arrays. These days not all their products are RAM based, they have MLC arrays as well.

Server Side Flash

Server side Flash describes a solution where SSDs are connected to the local PCI bus of a server platform.

Pros and Cons

Local Flash directly on the PCI bus does not have network latencies to contend with so this is the fastest variant in the field with access times of microseconds as opposed to milliseconds. It does have some limitations; at least until flash cards can talk directly to each other over some kind of inter system PCI framework. Something some blade servers already provide! If however you need a synchronous multisite solution, network latency rears its ugly head again and you would have to go for a Hadoop-esque kind of solution to be able to use these cards.

There is a reason that from FusionIO’s total turnover over 2011, 57% came from two customers, Apple and Facebook. The reason they don’t see network latency as an issue is because they have solved the problem at the application level. Facebook’s Hadoop systems that operate on a share nothing principle don’t require shared storage functionality. Apple uses Oracles DataGuard which optimizes the network traffic between the nodes.

For those of us not using this technology this can be a serious limitation if you want to use for example, VMWare HA. DB systems that you want to mirror must be so at the application level, again ethernet latencies will increase latency considerably. However more and more Hypervisor and database manufacturers are adding this level of intelligence, MS SQL server 2012 is now being shipped with the ‘Always On’ functionality. At present the technology is primarily used for accelerating temporary datasets like Business Intelligence servers, non-persistent virtual desktops or temp space and log files of applications.

Practical applications

There are a few manufacturers that have decided to build a system where the servers hosting the PCI Flash cards can be interconnected through the use of a common enclosure. Nutanix and Kaminario use standard (blade) form factors to connect PCI busses together where each individual server has a FusionIO card with NAND-based Flash and/or magnetic disk at the back of it. HP and HDS blade servers now support similar functionality where PCI busses can be interlinked, adding a FusionIO card to each server would create a system where compute and storage layers are connected via PCI.

Currently, however, server-based Flash is intended for those datasets that have a temporary nature or that have application level redundancy combined with a demand for high IOPS and low latency.

Network Flash

Network Flash is a hybrid solution where server-side flash talks to a flash-tiered, cached or flash-only array. The key here is to notice that this technology involves network latency.

Pros and Cons

This approach creates a bottleneck within itself since the server-side Flash card will need to communicate with the Flash array over some kind of network.

Technologies like compression, inline de-duplication and perhaps a form of tiering that spans the internal card and the central array will alleviate this issue somewhat. With FC speeds going up to 16 and later 32Gbit/s and Ethernet going from 40 to 100Gbit/s we come a little closer to the 300+Gbits/s that PCI-E v3.0 x32 can accommodate but if you are looking at a solution at the time of writing this would be a limitation to consider.

Practical applications

EMC started in this Network Flash space with their Thunder and Lightning project. NetApp has also announced that it will start in this space. Currently this technology is not ready for production.

Conclusion

So what can we expect from NAND Flash Memory in the near future? Fact is that the cell degradation will keep haunting it. Density can’t go down much further without this degradation causing problems. And density is what flash needs to gain more market share.

Also, within the next couple of years, Heat Assisted Magnetic Recording (HAMR) for spinning disks will become mature and we’ll start to see hard disks of 100s of TB in size. IOPS might still be a problem for these disks but it will be a long time, if ever, before Flash Memory comes near that amount of GB/$.

It’s more likely we’ll see a mix of the two and have hybrid solutions that do real hot block auto tiering. Current tiering solutions where background processes move large blocks of data (100MBs in size at a time sometimes) to and from SSD in a form of cache a couple of times a day are only a viable in specific use cases. With files, a single byte change will move the whole file up to SSD. (Something that's rather pointless when it’s a database file or a VDI disk image.) Only inline auto tiering at the block level will be able to do the job and no solutions exist today that offer that.

But there are other alternatives that may soon render Flash Memory obsolete altogether. Techniques like Phase-change Memory (PRAM) have working prototypes that are 50-100 times faster than Flash Memory. With current outlooks the data (bit-) switching times may end up only a few times slower than our current RAM and make an excellent alternative to all storage solutions.

Other contestants are Ferroelectric RAM (FeRAM) and Magnetoresistive RAM (MRAM). These are comparable to DRAM in speed but are non-volatile so they don’t require power to retain data. Currently they lack density and have much higher costs but in the near future one them may prove to be the one universal memory that runs and stores all data. But until these technologies are ready for mass production and can be purchased at a reasonable price, RAM and NAND Flash are what we have to work with to close the IOPS gap created by ever growing traditional spinning disks stuck at 15000 RPM.

It is noteworthy to mention that the 5 to 7 years we expect it will take before HAMR and Persistent RAM based solutions will take over is a significantly shorter timespan than the 40+ years traditional magnetic disks have dominated as the preferred data storage media.

Is Flash therefor an intermediary solution and not worthy of investment? Not entirely. If you have a high IOPS requirement over the next 5-7 years you may very well not have a choice.

And let’s not forget that although P-RAM solutions might be technically superior to NAND Flash that doesn’t necessarily mean it will be adopted straight away. NAND Flash is being produced at an incredible rate and will continue to get cheaper, just as wear-leveling techniques will get better. Advances in these fields will very likely slow the adaptation of future technologies particularly since manufacturers have made some significant investments in NAND Flash production. Capital, unlike data, can be pretty slow to move.

Whatever you decide you might need, look before you leap. Measure the impact of your applications on your storage, don’t stop at IOPS, block sizes and projected growth but also look at the level of availability, latency and intelligence you require at the storage level. 

Credits

The credits for this information are for my colleague Herco van Brug and Marcel Kleine; Follow Herco and Marcel on twitter. Comments or Feedback ?! Please let us know! (rsp@pqr.nl; www.twitter.com/rspruijt)

Download whitepaper

The whitepaper with even more content and technical details can be downloaded here (no registration required).

Join the conversation

8 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Ruben, first of all fantastic work by you and your team…hard to pull so many storage architectures together and even begin to classify them.  I do think you’ve missed a small but important category of solutions that utilize SSD (and/or memory) to optimize traditional spinning disk.  While they do share some similarities between the “flash cached” solution where they differ is their focus is on taking that data from cache and optimizing the write IO to disk, optimization being the key differentiator.  They typically still do use SSD (large cache) to accelerate random reads and therefore are only as good as their algorithm for warming up the cache.  


Spinning disk isn’t that slow at read/write of sequential IO, but rather it’s when you throw in random read/write that the mechanics of spinning disk show their ugly side (duh, preaching to the choir).  By using SSD cache they are able to create large sequential writes, many times in combination with compression/de-duplication, further reducing the amount of times they have to go to disk.  


The one I have personally worked with recently is Nimble Storage.  I’m quite impressed by the amount of random read/write IO the array is able to sustain by simply optimizing the IO pattern.  I’m not here trying to pimp their storage, I’m just pretty surprised by how well it has worked for a few customers we have using it.  It has given me pause and made me rethink whether spinning disk (even for IO intensive applications) is really on the way out.  Here is some quick testing I did with their array. blog.danbrinkmann.com/.../nimble-storage


Not sure where companies like this or Atlantis Computing belong?  Thoughts?


Cancel

While it's been a while since I really looked into them, I have a feeling you're missing out on Dell's acquisition Compellent. They do tiering - I'm not quite sure if it is inline, but it is at least block level, and has been for some time. They even had "fast track" which was tiering within a particular spindle, where hot blocks are moved to the out 30% of the tracks on a traditional spindle. Back when I looked into them, was before SDD's, so I have no idea how they incorporate these nowadays, but I can see from their website that SSD's are now an option.


Otherwise: thanks a lot for a great overview and classification!


Best regards


Cancel

Hi Dan,


Whilst writing this whitepaper we made a conscious choice to exclude devices that support a specific workload but you are right of course. The likes of Nimble, Tintri, Atlantis and many others have created some great products that will suit a VDI workload very well. It would be interesting to see a comparison on both price and performance from devices in this category.


Regards, Marcel.


Cancel

Hello Brian,


Dell indeed has the smallest page file with regards to tiering; 512 KB, 2 MB or 4 MB can be set.


Although significantly smaller then HDS HDT (42MB) or EMC FAST (1GB) it is still tiering on a page level. That doesn’t excuse me for not adding it to the Flash tiered array section though. 


Regards, Marcel.


Cancel

Back to the comment about Nimble and Tintri, unlike Atlantis Computing they sell a storage solution which applies to all workloads, not just a VDI or hypervisor specific one.


In the case of Nimble their solution really has nothing to do with SSD other than a large read cache, not a write cache as others above use it for. Their storage value is in optimizing write IO and therefore solving the biggest problem in all storage today which is that writes are hard for everyone, eventually cache or SSD fills and you have to write IO to disk that can keep up..or you buy more and more SSD (which is valid, but usually more expensive and at the cost of capacity in some cases)  Nimble uses SSD as a very large read cache, but all writes go to spinning disk.


I can't speak to Tintri, I'm not familiar with them but I have heard good things about them as well.


BTW...the migration of BM.com seems to hit a snag, had to create a brand new BM.com account to post comments...yesterday no matter what I did I was unable to sign in and post comments.


Cancel

Ruben, excellent read!


Agree with Dan on “Spinning disk isn’t that slow at read/write of sequential IO”, although write coalescing techniques can significantly improve random writes (4k incoming random writes, for example, are aggregated into, let’s say, 128k writes, thus “de-randomizing” the IO written to the disk).


A relevant publication that was just published today, June 18, 2012:


“Japanese team boosts NAND flash durability and performance with ReRAM buffer”


www.extremetech.com/.../131202-japanese-team-boosts-nand-flash-durability-and-performance-with-reram-buffer


Similar aggregation techniques with ReRAM before writing to NAND.


Cancel

Hi Alex,


You are right of course, sequentializing random writes can get you as much as 600 IOPS out of a single traditional disk with only a marginal difference between 15k SAS and 7,2k SATA/NL-SAS drives.


NetApp are a great example of the benefits of this technology. It looks like the Japanese have moved up a level in doing Re-RAM > NAND instead of NAND  >SATA/SAS.


Hypervisors and operating systems are becoming increasingly storage aware as well and in future will have functionalities that currently only third-party software like Atlantis offers. Storage intelligence is moving up in the stack. As Gregg Schultz tends to say, the best IO is the one you don’t have to do.  


Cancel

Brian,


As usual this is another great post by you and your team.  I am wondering if you left someone out of the mix, however?  I have been having very good success with V3 systems.  In fact, there is a recent case study on their web site that elegantly shows how they can dramatically increase performance of VDI implementations in a very cost effective manner.


The case study can be found at:


v3sys.com/.../Healthcare-Case-Study.pdf


Cancel

-ADS BY GOOGLE

SearchVirtualDesktop

SearchEnterpriseDesktop

SearchServerVirtualization

SearchVMware

Close