I came across this article at Spiceworks noting a new type of URL typo-squatting based on bits being flipped in memory.  However, the article seemed to bury the lead which is this:

Research has shown that a computer with 4GB of memory has a 96% chance of having a random “bit flip” every three days.

That’s a crazy high chance of data corruption occurring on your computer. So, what causes these bits flip errors? Well as circuits in computers get smaller and smaller (e.g., the latest Apple chips are based on impossibly fine 5nm circuits and memory circuits have also shrunk), when cosmic rays/neutrons or some other interference passes through them, there is an increased chance that a 0 can be flipped to a 1 or vice versa.

This is why devices are ‘radiation-hardened’ for space applications. Hardening includes, in part, increasing the size of circuits. Chip fabrication for space application is generally held between 65nm to 150nm (a staggering 30x larger than current circuits), because cosmic rays are much more likely to pass through devices in space than on the surface of the Earth.

Here on Earth we have an easier way to deal with such random bit flips and it’s called ECC memory. ECC stands for Error Correction Code and it employs parity to correct such bit flip errors. Parity is used, for example, by network storage devices like Synology, e.g. with RAID 5, to let you replace a bad drive in your RAID without losing your data (so why don’t they use it with RAM). Currently, the only Apple product that employs ECC memory is the Mac Pro. The question is why?

Modern devices seem likely to flip a bit and corrupt your data almost every day. The problem will only get worse with more memory and smaller fabrication techniques. That means every day your computer may bomb inexplicably or some bit of data on your computer will get corrupted. And that data corruption can compound getting worse and worse over time.

So why don’t all modern computer and mobile device makers use ECC memory? Right now ECC memory costs a bit more (you basically have a 9th bit of memory as a single bit parity check on the other 8 bits). However, if everyone moved to ECC memory as a default, these prices would fall fast.

I guess my question is, with error rates so high that a Mario 64 speed runner is experiencing them, is it at some point negligent for our computer/device makers to not start using ECC memory?

Subscribe
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments