PLDI'26 SRC Paper #54 Reviews and Comments =========================================================================== Paper #54 Formal Proofs of Bit Hacks in Machine Code Review #54A =========================================================================== Overall merit ------------- 3. Accept Reviewer expertise ------------------ 3. Knowledgeable Technical Feedback ------------------ The paper formalizes a number of *bit hacks*, efficient bit-level implementations of common algorithms, in Picinæ, a Rocq library for reasoning about machine code. This seems well-motivated and potentially useful, and a worthwhile poster. Section 1.1 tries to make the case for "bottom-up verification" as opposed to decompilation followed by verification, but this is a bit of a red-herring: The alternative to verifying machine code isn't decompiling that code to a higher level language, it's verifying the code in the language that it was written in. This paper makes a case for verifying machine code, but it's in section 2.1: Picinæ is a general framework capable of reasoning about generic machine code across a range of platforms, meaning that this verification is likely to be broadly applicable. This is a good reason to use Picinæ, though you might also discuss alternatives, like a general IR (LLVM?) One thing that isn't clear: How many of these algorithms were verified for the first time, and to what extent did verifying one help (directly or indirectly) in verifying another? Data like this would help support the case for this library. Writing Feedback ---------------- This is generally well-presented, but I found a few common issues: Related work shouldn't be focus on "taking down" the existing work: 2.3.1 particularly falls victim to this mentality. What are the approaches taken in [1], [7] and [21], what algorithms do they verify and how is your approach different? The referencing in this paper seems broken, there's a reference to "post-condition 3.1" but no such postcondition is present in the paper. Be careful with your use of \label and \ref (or \cref). I wouldn't refer to equation 4 as a "theorem" or the subsequent equations. A theorem should be self-contained and ideally have a major result of the paper. I'd work on the subsection titles a bit "Compute parity of a word, naïve way" doesn't need the "naive way" in the title and I'd probably say "Computing". In the list of bit hacks, you jump between "compute"/"detect" and "counting"/"determining". Be consistent in tense. Review #54B =========================================================================== Overall merit ------------- 2. Unsure Reviewer expertise ------------------ 3. Knowledgeable Technical Feedback ------------------ This paper concerns the verification of Hacker's Delight-style bit twiddling operations at the machine code level. 14 bit hacks are verified using the Picnae framework in Rocq. The paper places a lot of emphasis on the verification of these operations at the machine code level, but I would expect that the main proof work is in the "pure" lemmas about the fixed-width types (as illustrated in 3.1) with the link to machine code coming almost for free from the proof obligations generated by Picnae. These pure lemmas seem relatively simple and I would imagine that a fair number of them already exist in the standard library or in widely-used developments like CompCert. If this isn't the case, describing trends of limitations/omissions in the ecosystem would strengthen the paper. 3.2 introduces a complication as the bit hack involves a loop, so a loop invariant also needs to be provided. My understanding is that most bit hacks avoid control flow and are essentially pure expressions, but a clearer focus on examples which do contain control flow or nontrivial stateful update would help motivate the unique challenges of the project, and why the extra machinery with Picnae is a notable research contribution above and beyond the pure lemmas about fixed-width types. It is briefly mentioned in 2.2 that three of the bit hacks involve while loops. My main question would be - did any other bit hacks beyond these three use control flow constructs or interesting program state, so that Picnae+associated VCG machinery can't simply convert them to pure expressions? Writing Feedback ---------------- The submission is written well on a technical level and is clearly understandable. The main weakness is that there isn't sufficient motivation that the verification problems have unique qualities or challenges. Review #54C =========================================================================== Overall merit ------------- 2. Unsure Reviewer expertise ------------------ 1. No familiarity Technical Feedback ------------------ **Paper summary:** The work proves 14 "bit hacks" in Rocq using the Picinae library. Two of the proofs are described in detail. The paper reviews related work on correctness proofs for bit hacks and more generally the problem of verifying machine code. It also discusses the motivation for verifying machine code, particularly cases where higher-level source code isn't available. It discusses future work on integrating Picinae with SMT solvers for further proof automation. **Comments:** I found sections 1.1 and 2.1 helpful, but I didn't find 2.2 and 2.3 particuarly informative and I feel that space could be used better. I think it might be good to shorten those sections substantially or integrate them into section 3. For example, the part about ARM8 strlen around line 145 isn't helpful because the point seems to be to demonstrate that bit hacks are difficult to prove, but I don't think it explains enough about the example. It doesn't tell me what x is nor what strlen is computing. I think if the goal of that part is to demonstrate the complexity of bit hacks, it would help to explain the example further. But I actually think the difficulty would be better demonstrated by spending more time on the case studies. I found them hard to follow even with the existing explanation in section 3.1, for example. I think one thing that is missing is a more thorough explanation of how exactly Listing 2 connects to Listing 1 (and the same goes for the second example's listings on the next page). The prose starting at line 145 doesn't explain it. Perhaps you could consider a short description of how the assembly implements the C code. **Questions:** It was unclear to me how the Picinae library relates to the broader goal of machine code verification. It seems the specification of the properties one wants to verify, at least those in the case study, are obtained from existing higher-level source code, but you don't want to assume that right? Writing Feedback ---------------- I do think the paper presents enough context for a PL audience to understand the problem. But as I said above, I think the length of Section 2, particularly sections 2.2 and 2.3, is at the expense of more detailed explanation of the proofs in Section 3. It would be interesting to know a bit more about how much effort was involved in these proofs, e.g. how many lines of code? Presentation details: Line 32: What is meant by "utility functions"? Line 152: I didn't see any "post-condition 3.1". Review #54D =========================================================================== Overall merit ------------- 3. Accept Reviewer expertise ------------------ 2. Some familiarity Technical Feedback ------------------ *Summary:* This paper formalizes 14 bit hacks, i.e. tricks to compute special integer functions by exploiting the bit representation of the inputs. They are mechanized at the machine code level within the Picinæ framework in Rocq. Each bit hack is implemented in C code, which is compiled with `gcc -O3` and transformed to the Picinæ intermediate language via Ghidra. Part of the contribution is a Rocq library for properties of binary arithmetic, which is used in the proofs. *Comments for the students:* * Proving 14 bit hacks correct is a solid result and it is nice to see that some of the properties and tricks build on top of each other. * To give a better sense of the scale of the project, it would be interesting to hear how many lines of code each of the bit hacks takes and how big the bit arithmetic library is. * More quantitative results would be helpful in general: it would be good to see some data on reusability. How many of the hacks are reused in other hacks? How many of the lemmas are reused? This could strengthen the reusability claim. * The motivation in Section 1 could be strengthened: the Ariane 5 and Patriot missile examples sound unrelated to bit hacks. On the other hand, you mention the previous work on strlen [10] where a majority of the proof code dealt with bit hacks, and some other pieces of related work. I think those can provide a stronger motivation for the paper. * If there's space (maybe in a future submission), it would be nice to see a sample Rocq mechanization of one example (possibly abridged/simplified). Minor comments: * Listing 3 has the same introductory sentence as Listing 1 (copy-paste error?). * "unpractical" -> "impractical" *Questions for the students:* * I would imagine that gcc is aware of many of these tricks. So could you clarify why `-O3` doesn't defeat the point to some extent? Won't it try to rewrite inefficient bit manipulations to trickier, more efficient versions? How did you have to control for this, e.g. avoiding the naive version being optimized? * What are the advantages of your approach compared to the related work? You mention that some previous work used SMT solvers, which presumably reduces manual verification effort. You mention integration with SMT solvers under Future Work, but as it stands, the advantages of your method could be clarified. Writing Feedback ---------------- *Context:* The submission provides sufficient context in the Introduction and Related Work sections. The trade-offs between the different approaches could be clarified to some extent, in particular for the AWS projects cited. *Technical Presentation:* The technical presentation is solid. It would be nice to show a simple Rocq lemma from the mechanization to give a flavor of the approach. *Evaluation:* The evaluation is mostly example-driven (two case studies). Some of the claims could be better supported quantitatively, such as the reusability claim (see above comment).