Technical Success Stories: Flash Endurance and SMART Commands
“Our product sells for under $200, we could not afford an expensive card, we just assumed we had to have some failures”– Technical Success Stories
You would not put jet fuel in your moped, so an inexpensive product would not normally come with an expensive SLC based card. No one is going to pay $60 to put an SD card into something that costs $99 or even $199, the math just does not work. Such is the reality of Industrial products, and sometimes price point of the right card is too high, and OEM’s must live with a less expensive card, just because of economics. That, however, does not mean you must accept failures.
“Failures tended to occur about once a year, in each of our products, due to the SD cards wearing out. This seemed manageable, as our customers would deploy hundreds and even thousands of our inexpensive devices designed to collect data. If they could plot the data from 98% of the devices, that was usually enough to get them what they needed. Our devices tended to wear out SD cards, as we constantly wrote to the cards on a 24 / 7 continuous basis. Using the inexpensive MLC based cards we would often exceed the TBW standard of the card, and eventually they would just become read-only”
“We were aware that our hosts operated in a harsh environment. So in addition to exceeding the TB’s written limitation, we often found that our devices were in extreme heat in Texas, and huge temperature fluctuations at sites in places like Venezuela. These temperature changes and extremes sometimes would accelerate the wear out of the card, and cause the cards to fail sooner. We had a budget of about $10, so we just threw the cards away, and replaced them with a new one when a failure was encountered. This practice went on for years. ”
“We only reached out to Delkin when we saw a white paper by them on flash endurance.” That started a conversation about the failures, and the process we used to identify and replace the cards.
The Delkin Customer Application Engineering team got involved, and asked a series of questions about the environment, but also about the OS being used, application software, and ultimately what workload the card was experiencing. The cards were being worn out – with the program / erase cycle count exceeding the flash rating and the spare blocks consumed, resulting in a “bricked” card. This failure mode was appropriate for the usage model, and there was not much more that could be done for an MLC-based product in that environment. Sometimes Delkin’s role is more consultant than supplier, and we can be a good resource to double check that all that can be done is being done to remedy the situation.
One point which was discussed was if the card end of life failures could be predicted, the situation would be more manageable for the customer.
A solution that was offered, was to use Delkin’s support for SMART commands including the Delkin SMART Dashboard and library. This utility allows the user to extract vital card information at chosen intervals. This data includes estimated % of life remaining for the card, erase counts and remaining spare blocks. Using this tool would allow the cards to be interrogated by the manufacturer or their customer prior to deployment to the field, or on a regularly scheduled maintenance schedule. Then cards with perhaps less than 10% useful life left, would be replaced prior to failing in the field.
This solution turned out to be perfect for the customer. There were regularly scheduled maintenance intervals where components of the device were inspected, and by adding this step to the process, the customer was able to reduce field failures to almost zero.
The Delkin SMART tools however only worked on Delkin cards, so this did force a change of card technology for the customer.
There may be multiple ways to solve a problem, and working as a team with the customer and Delkin looking for ways to mitigate the issue, is critical. Knowing the usage model of the host, and the host environment can go a long way towards getting the right solution. Some solutions may be focused on cost, and some level of defects may be allowed, other designs may have a zero failure requirement, and a custom design or a more featured controller may be warranted, in order to work around known issues. Most important is choosing a supplier that is more focused on solving the problem, than finding a reason to avoid any responsibility.
Lean on Delkin’s Technical Team
Have you had a problem with your host storage? The Delkin Customer Applications Team stands at the ready to be your trusted advisor for your host critical storage. We have solved hundreds of complicated host failures and look forward to understanding your usage model for flash based Rugged Controlled Storage.