Citiverse
  • gabrielesvelto@mas.toG
    50
    0

    @AndresFreundTec I've been in charge of Firefox stability for ten years now and some of my early work to detect hardware issues dates back then. In pre-2020 years we could get a 2-3 bugs per year, usually across different CPUs. Now we get dozens, it's really on another level.

  • gabrielesvelto@mas.toG
    50
    0

    @AndresFreundTec admittedly we get a lot more after a new microarchitecture launches, and then they go down as microcode updates get rolled out. If Microsoft hadn't started shipping microcode updates with their OS updates we'd be swamped.

  • kimsj@mastodon.socialK
    1
    0

    @gabrielesvelto
    There’s also meta-stability. If a value is snapshotted half way through it changing, it may occasionally result in the output not being one or zero, but some ‘half’ value. Depending on the circuits using that result, it may be interpreted as either 1 or 0 — and maybe different parts of the circuit will use different interpretations. Such intermediate states are only meta-stable, and will flip to a firm 1 or 0 at some indeterminate time later, possibly propagating the problem.

  • gabrielesvelto@mas.toG
    50
    0

    @KimSJ ah yes, very good point. It's been a while since my days in hardware land and I had forgotten about it.

    Link Preview Image
  • gabrielesvelto@mas.toG
    50
    0

    @tehstu yes, absolutely. I've encountered several bugs in AMD CPUs, not many on ARM just yet, but our ARM user-base is very small compared to x86, so it's just less likely for us to stumble upon them. Plus we have some machinery that can detect some hardware bugs automatically but it doesn't work on ARM just yet.

  • mdione@en.osm.townM
    3
    0

    @gabrielesvelto but UEFI is already quite complex, it has to find block devices, read their partition tables, read FAT file systems, read directories and files, load data in memory and transfer execution. Wouldn't a patch after all that not be too late?

  • K
    1
    0

    @gabrielesvelto Intel's officially stated reason is that (too) high voltage (and temperature) caused fast degradation of clock trees inside cores. This degradation resulted in a duty cycle shift (square wave no longer square?), which caused general instability. If they use both posedge and negedge as triggers, then change in duty cycle will definitely violate timing.

  • gabrielesvelto@mas.toG
    50
    0

    @arclight timing degradation should not be visible outside of the highest-spec desktop CPUs which are really pushing the envelope even when they're new. Embedded systems and even mid-range desktop CPUs will never fail because of it. What might become visible is increased power consumption over time though.

  • gabrielesvelto@mas.toG
    50
    0

    @arclight on the other hand watch out for memory errors. Those can crop up much sooner than CPU problems due to circuit degradation: https://fosstodon.org/@gabrielesvelto/112407741329145666

  • burnitdown@beige.partyB
    1
    0

    @gabrielesvelto there was also no meaningful computer security nor much need for it in the days of 6502. it's much different when most computers are now connected to the internet and can be infected with malware within seconds of connecting.

  • gabrielesvelto@mas.toG
    50
    0

    @mdione yes, it's very complex, but motherboard firmware has a mechanism to load the new microcode right as the CPU is bootstrapped. That is even before the CPU is capable of accessing DRAM. All the rest of the UEFI machinery runs after that. Note that this early bootstrap mechanisms usually involves a separate bootstrap CPU, usually an embedded microcontroller whose task is to get the main x86 core up and running.

  • x0@dragonscave.spaceX
    2
    0

    @gabrielesvelto I wonder if they could use said statistical toys as part of a large-scale fuzzing process to detect such bugs?

  • clanger9@mastodon.onlineC
    1
    0

    Fascinating thread. Do you know if the same issues exist on low power, embedded CPUs like ESP32, or is this something that mostly affects high-end stuff?

  • vfrmedia@social.tchncs.deV
    1
    0

    @perpetuum_mobile @gabrielesvelto I used to even code in assembler on 8 bit platforms, for years I could not quite get my head round how modern CPUs worked until this thread (and now I know a bit more)

  • bflipp@vmst.ioB
    1
    0

    @gabrielesvelto

    I don’t cut any slack for Intel producing two whole generations of CPUs with manufacturing flaws then trying to cover it up and never really offering full restitution to any customers.

  • hyaniner@mastodon.gamedev.placeH
    1
    0

    @gabrielesvelto It was a very rich, exciting, interesting, and useful post! Thank you very much!

  • gabrielesvelto@mas.toG
    50
    0

    @vfrmedia @perpetuum_mobile if you have some free time this is a good deep dive: https://cseweb.ucsd.edu/classes/fa14/cse240A-a/pdf/04/Gonzalez_Processor_Microarchitecture_2010_Claypool.pdf

    While it doesn't cover some of the most recent advancement it captures 90% of what you need to know.

    If you have a lot of free time and want to dive deeper there's this: https://www.agner.org/optimize/microarchitecture.pdf

  • N
    1
    0

    @gabrielesvelto The book 'Silicon' by the Italian who designed the 4004, 8080 and Z80 is a most splendid read. Fascinating that he had to add reverse engineering optical confusions to minimise cloning by rivals.

  • perpetuum_mobile@mastodon.socialP
    2
    0

    @vfrmedia @gabrielesvelto I did code a little bit in x86 asm when I was a teen. It was the only way to turn on SVGA modes in Turbo Pascal and I wanted to make games back then 😉 I did a program which simulated a flame in real time, doing per pixel average of surrounding pixels and adding random 255 sparks on the bottom to make the flame move and look real


Citiverse è un progetto che si basa su NodeBB ed è federato! | Categorie federate | Chat | 📱 Installa web app o APK | 🧡 Donazioni | Privacy Policy

Il server utilizzato è quello di Webdock, in Danimarca. Se volete provarlo potete ottenere il 20% di sconto con questo link e noi riceveremo un aiuto sotto forma di credito da usare proprio per mantenere Citiverse.