arclight@oldbytes.space (@arclight@oldbytes.space)
Engineer (nuclear, safety analysis), scientific software developer, rehabilitator of unloved FORTRAN, recovering sysadmin, marginally competent solderator. Occasional Bond Villain (Card Overpunch). Nuclear Scoundrel™
My internet claim to fame was livetweeting the Fukushima reactor failures on the Birdsite. I'm pretty chipper for one spending so much time looking at sad melty reactors and sad creaky software.
Post
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy.
Senza categoria@gabrielesvelto Thank you for this detailed and specific explanation. Chris Hobbs discusses the relative unreliability of popular modern CPUs in "Embedded Systems Development for Safety-Critical Systems" but not to this depth.
I don't do embedded work but I do safety-related software QA. Our process has three types of test - acceptance tests which determine fitness-for-use, installation tests to ensure the system is in proper working order, and in-service tests which are sort of a mystery. There's no real guidance on what an in-service test is or how it differs from an installation test. Those are typically run when the operating system is updated or there are similar changes to support software. Given the issue of CPU degradation, I wonder if it makes sense to periodically run in-service tests or somehow detect CPU degradation (that's probably something that should be owned by the infrastructure people vs the application people).
I've mainly thought of CPU failures as design or manufacturing defects, not in terms of "wear" so this has me questioning the assumptions our testing is based on.