The Effect Of Data Write Size and Patterns on Solid State Drive Performance
General
This article will describe the effect of writing small and large data clusters, randomly and sequentially to a Sold State Drive (SSD).
Mechanical Vs. Solid State Disk Drives
Spinning mechanical disk drives have one big advantage. They are virtually unaffected by the data transfer size or mode on writes. This is because of the nature of the media. It can be overwritten without any intervening erase. The spots on the media are re-magnetized. Convenient and fast.
Not so with Solid State Media. To understand all of this a review of NAND Flash basics is required. This was covered in our last paper and will be summarized here for first time readers.
NAND FLASH Summary
NAND FLASH is structured as cells arranged as follows:
Sectors, Pages and Blocks
The Block is the top level element in Flash hierarchy, consisting of multiple pages. Pages consist of multiple sectors plus overhead bytes. A sector is 512 Bytes, This is the smallest useable data chunk size.
Blocks are split into 2 planes. Odd blocks in plane 0 even blocks in plane 1. This allows simultaneous programing of 2 pages in each block, adding to throughput.
NAND Flash is written at the page level, and can only be erased at the block level. Pages must be written in sequential order lowest to highest and generally can only be written to one time. No partial page programming allowed. The exception here is SLC which has that capability. Generally NOP=4 for SLC, and NOP=2 1 for other Flash.
Logical to Physical flash Mapping Schemes – Flash Translation Layer (FTL)
The basic unit of storage at the host level is the sector or LBA (Logical Block Address). This must be mapped to a physical sector in the the Flash. This is one of the tasks of the FTL.
So how can this be done? There are 2 basic schemes for mapping. One is called Block Based mapping, (BBM) the other is called Page Based mapping (PBM). Of the 2 schemes, Block based mapping is the easiest to implement. But it has issues with Random writes and write amplification. PBM is the preferred mapping scheme, but it suffers slightly in sequential read speeds compared to the BBM scheme.
Why It Is More Efficient To Transfer Large Data Chunks?
Modes of Data Transfer
Data transfer modes vary by application, and often are the key factor in overall system throughput.
A properly designed storage device will attempt to even out the playing field by optimizing transfer speeds in each mode.
Transfer modes for both read and write transfers are either sequential or random. One can probably surmise that write speeds, in particular random write speeds, are the most difficult to deal with. This results from nature of NAND FLASH itself and the FTL mapping used; BBM taking the biggest hit here (High Write Amplification Factor).
Data Transfer Size
The worst practice when dealing with an SSD, is to write small chunks of data randomly. This is true for both BBM and PBM based FTLs. The most detrimental impact is seen in systems using BBM. However PBM suffers because of intensive garbage collection required as pages, fill, data deleted and new data written. This in escapable, even if garbage collection is performed as a background task.This results from the nature of the Flash itself. Pages can generally be written only one (MLC and TLC). A 16K page contains 32 sectors (LBA/PBA).
Re-Writing Small Data Chunk Size
So as an example, using PBM, a 4K data write results in using only 25% of the page. If another 4K data chunk is written to the same LBA, it must be written to the next available page in a PBM based FTL. This leaves the rest of both pages unusable for new data, and hence will be dealt with at garbage collection time. Pages are used up fast , and as data is deleted or rewritten, these pages are marked for garbage collection. This places a heavy load on garbage collection and severely impacts performance.
In the case of a BBM FTL, the situation is even worse. Lets take the same 4k data write. This will be written to the first available page. So in the 16K page, the same 25% of the page is used. If the same LBA is rewritten, a Data Move, Block Erase, and Data Merge is required immediately. As you can imagine things slow quickly.
Sequential Writing Small Data Chunk Size
As we have seen, 4K data write results in using only 25% of the page. If another 4K data chunk is written to the a different LBA, it must be written to the next available page in a PBM based FTL. This leaves the rest of both pages unusable for new data, and hence will be dealt with at garbage collection time. Pages are used up fast , and as data is deleted or rewritten, these pages are marked for garbage collection. This places a heavy load on garbage collection and severely impacts performance.
In this case the BBM FTL, acts similarly. The next 4K will be written to the first available page. So in the 16K page, the same 25% of the page is used. This continues until pages are exhausted. Similar to the PBM FTL. At this point, block erases start piling up and the SSD slows down.
Random Writing Small Data Chunk Size
Random writing the same 4K data chunk sizes results in even poorer performance. For the PBM FTL, it increases the garbage collection task more than sequential writes. Due to the sequential nature of PBM, this is not horrible. But as the drive fills up it gets worse.
The bigger hit is with the BBM FTL. Here is where random in-line erases increase many times, especially as the SSD starts to fill up. It can be seen by running IOMETER or a similar test suite on the SSD.
Writing Large Data Chunk Sizes
As we have seen, writing small data chunk sizes, is wasteful of Flash page usage. The best case scenario is always writing as close to the page size as possible. Even better, writing on a page boundary. This isn’t, always practical, but avoiding transfers that are a fraction of a Flash page should be avoided.
This really helps reduce garbage collection issues in a PBM scheme, and also helps prevent a large amount of unnecessary in-line block erases in a BBM scheme.
All around, large data transfers are preferred on an SSD.
Aiding Handling of Small Data Chunk Size Writes
There are ways of easing the pain of small data size transfers. Some SSD devices us an external DRAM cache to acculturate sectors until there is at least a page or more worth of data to commit to Flash. This is double edged sword, in that on a power failure the cache data is lost. This can be helped by a properly designed power fail cleanup at the drive level, or better yet at the system level, or both. At the enterprise level, this should be a requirement.
Considering the use of OS functionality such as TRIM also helps, but is not a cure.
Conclusion
Avoiding writing small data chunk sizes (smaller than the Flash page size), especially random, is important for overall system performance. If a drive with a cache is used, it is very important that the drive and system have a robust power fail management task.
Contact
Article Contributor:
Carmine C. Cupani, MSEE
CTech Electronics LLC