:INFO When Nodes Lie: The Hardest Failure Mode Crash fault-tolerant consensus — Raft, Paxos — assumes that faulty nodes simply stop responding. Byzantine fault tolerance addresses a far more adversarial scenario: a faulty node can behave arbitrarily. It can send different messages to different peers, selectively withhold messages, replay old messages, and collude with other faulty nodes to manipulate the outcome. This is the failure model appropriate for blockchains, financial networks, and any system where participants may have an economic incentive to cheat. :PATH The Original Byzantine Generals Result In 1982, Lamport, Shostak, and Pease proved that to tolerate f Byzantine-faulty nodes in a consensus system, you need at least 3f+1 total nodes. The intuition: with only 3f nodes and f faulty, an honest node receives messages from 2f other nodes, of which :PATH Why 3f+1 Is a Hard Lower Bound The proof that 3f+1 is necessary (not just sufficient) shows that for 3f or fewer nodes, it is impossible to guarantee consensus in the presence of f Byzantine nodes. The argument uses a partitioning construction: with 3 groups of f nodes each and one Byz :PATH PBFT: Practical Byzantine Fault Tolerance PBFT (Castro and Liskov, 1999) was the first practical BFT protocol. It operates in a primary-backup model similar to Raft but with three communication phases: pre-prepare (primary sends proposal), prepare (all nodes broadcast that they received the propo :PATH HotStuff: Linear Message Complexity HotStuff, developed at VMware Research and used as the basis for Facebook's LibraBFT and several Tendermint variants, achieves O(n) message complexity per round using threshold signatures. Instead of every node broadcasting to every other node, nodes send :CHECKLIST Byzantine Fault Tolerance in Storage Systems [ ] RAID provides redundancy against drive failures but cannot detect a drive that silently returns wrong data [ ] Tahoe-LAFS uses cryptographic hashes to verify every read from storage nodes [ ] ZFS uses checksums at the filesystem level to detect and correct silent data corruption [ ] Distributed object stores like Ceph verify block integrity using checksums at the placement group level [ ] A Byzantine-resilient storage system must verify integrity cryptographically, not just check availability :INFO Go Deeper: Cryptographic Complexity Byzantine consensus algorithms use cryptographic signatures to authenticate every message — the security of the entire system reduces to the hardness of forging those signatures. But that is a statement about what is computationally feasible, and making it precise requires formal complexity theory. Understanding the P vs NP question and what computational hardness actually means puts all of the cryptographic assumptions seen throughout this series on rigorous footing. :SLATE 1005 :LINK https://pmg.csail.mit.edu/papers/osdi99.pdf Practical Byzantine Fault Tolerance (Castro and Liskov)