True Customer Stories: Industrial vs. Consumer Storage Solutions
“We were not aware that Consumer SSDs are not a good “drop in” for our industrial device”– True Customer Stories
Consumer storage products are advertised extensively to the point that those brands are often recognized first and foremost when considering a storage device for an Industrial, embedded solution. There are many reasons why this might not be a good idea, and understanding the usage model of the storage is a good first step to choose the right solution.
Consumer solutions focus on priorities like high capacity and speed, often measured in sheer cost or $/GB. True Industrial, embedded solutions focus on data integrity, environmental robustness (shock, temperature and vibration) and electrical robustness. It is quickly apparent when a customer is solely focused on cost, speed and the very highest capacities, that the application is perhaps not industrial, or the user is unaware of the true usage of the storage by the system.
“We chose to use a well-known brand of SSD from a NAND flash manufacture in our embedded product. We were not overly concerned about speed, as our product is used in a manner in which the data is written to the drive off line, or better said the user is not waiting. We saw some advantages in that we were able to source these from multiple retail suppliers, and able to get the best pricing and delivery terms due to the fact these drives were used in many consumer client device applications. We assumed those applications were far more intensive than ours. ”
We first became aware of some issues when drives would randomly switch to read only. We looked to find the triggering event, or the error that might have caused this. The drive went off line, and would not allow further data to be written, but all data written up to that point was still readable on the drive. Our technicians would reinitialize the drives in the system, and that seemed to fix it. We were still in the process of looking for the reason this was happening when the percentage of failures of the oldest installed drives seemed to start creeping up. We could work with 2-4% of the drives reverting into read only per month, but when this started to get closer to 10%, and most were concentrated in the older population, we started to get worried”
Off the shelf SSD’s commonly used for Client applications are focused on 3 things: capacity, speed and cost. These features work well in laptops and servers, where the emphasis is on speed, and not breaking the bank. These features however are assuming the user is in a compute environment, and data is being written in a rather specific way. Many Industrial embedded systems, in fact most of them, do not follow the compute style of writing data to flash.
Industrial embedded SSD solutions have a completely different set of features to focus on. Data integrity is number one priority, followed by elements like environmental and electrical robustness, life cycle management, flash utilization and a myriad of other features determined by the specifics of the application.
Many embedded hosts use Linux, which allows easier customization of the host software, and this often results in one or more of the constraints related to NAND flash to be tested. Flash wears based on program and erase cycles (P/E cycles). The file size being written has a huge effect on P/E cycles. All flash has a limited number of P/E cycles available based on its construction.
A typical FLASH chip is constructed in a way that there are “pages” of 4K or 8K or higher bytes grouped into blocks with anywhere from 256 pages to 16K pages depending on the FLASH chip. Thus, a single block can hold a large amount of data. One constraint on the device is the smallest unit that can be erased is an entire block. Thus if a few bytes in a single page are to be updated, the data must be moved to a new block with the update, and the previous block can then be erased and used for future writes. A second constraint on the device is there are a limited number of times a block can be erased before data retention is in jeopardy.
When new data is written into the physical chip, the data is written into a free page. If the user writes a very small amount of data, say 256 bytes, it will use an entire page to hold that 256 bytes of data. The remainder of the page is unused. This is because once a page is opened, it cannot be “re-written” with additional data. On a subsequent write of 256 bytes, another free page is used to hold this new block of 256 bytes of data. If these data are to be consecutive, the firmware will move the previous 256 bytes to a new page along with the new 256 bytes to keep them together. This rewriting of data from one page to another causes “write amplification” (WA) which is a measurement of how often the same data is written to the FLASH.
With these factors in mind, you can see, the operating mode, or work load of the drive can drastically affect the life of the drive. The main factor affecting the life of the drive in this case was the small size of the application writes wearing the flash combined with the use of consumer grade TLC flash. P/E cycles with some kinds of flash are very small, and when the drive reaches maximum P/E cycles it will begin to use up spare blocks to replace bad blocks. There are a limited number of spare blocks, and when these are used up many drives will become read only. That was what was happening with the customer’s drives. Using TLC flash in some drives may limit you to as few 800 P/E cycles. The available number of erase cycles for some brands of raw commercial flash is TLC: 800 MLC: 3000 SLC: 60000. However these raw figures can be increased by error correction or tools such as wear leveling to achieve much higher numbers of P/E cycles.
Switching to a smaller capacity SLC drive solved the issue. There are probably other solutions available, like modifying the host application to adjust file size writes, but this adjustment to work flow of the drive would have been time consuming and expensive. It is critical to match the storage solution with the work flow of the host when choosing a product. Sometimes well-known brands of retail SSD or other storage products that are designed for the client market are not the best solution. By using a smaller capacity SLC based SSD the customer was able to still achieve a workable cost when compared to the much higher capacity SSD they were originally using.
Lean on Delkin’s Technical Team
Have you had a problem with your host storage? The Delkin Customer Applications Team stands at the ready to be your trusted advisor for your host critical storage. We have solved hundreds of complicated host failures and look forward to understanding your usage model for flash based Rugged Controlled Storage.