Skip to content
Snippets Groups Projects
  1. Aug 31, 2023
  2. Aug 30, 2023
  3. Aug 10, 2023
  4. Aug 09, 2023
    • Martin Larralde's avatar
    • Jubilee's avatar
      SFENCE after streaming loops (#4) · 738fe7d7
      Jubilee authored
      MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
      most stores are not nontemporal, LLVM uses simple stores when lowering
      LLVMIR like `atomic store ... release` on x86, itself a lowering of
      Rust's `AtomicBool::store(.., .., Ordering::Release)`. These facts
      could allow something like the following code to be emitted:
      
      ```asm
      vmovntdq [addr],     ymmreg
      vmovntdq [addr+32],  ymmreg
      vmovntdq [addr+64],  ymmreg
      vmovntdq [addr+96],  ymmreg
      mov byte ptr [flag], 1 ; producer-consumer flag
      ```
      
      But these stores are NOT ordered with respect to each other! Nontemporal
      stores induce the CPU to use write-combining buffers. These writes will
      be resolved in bursts instead of at once, and the write may be further
      deferred until a serialization point. Even a "yes-temporal" write to any
      other location will not force the deferred writes to be resolved first.
      Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
      these writes in e.g. this actual order:
      
      ```asm
      vmovntdq [addr+64],  ymmreg
      vmovntdq [addr+96],  ymmreg
      mov byte ptr [flag], 1
      vmovntdq [addr+32],  ymmreg
      vmovntdq [addr],     ymmreg
      ```
      
      This could e.g. result in other threads accessing this address after the
      flag is set, thus accessing memory via safe code that was assumed to be
      correctly synchronized. This could result in observing tearing or other
      inconsistent program states, especially as the number of writes, thus
      the number of write buffers that may begin retiring simultaneously,
      thus the chance of them resolving in an unfortunate order, increases.
      
      To guarantee program soundness, code using nontemporal stores must
      currently use SFENCE in its safety boundary, unless and until LLVM
      decides this combination of facts should be considered a miscompilation
      and motivation to choose lowerings that do not require explicit SFENCE.
      Even `unsafe fn` must explicitly pass this invariant to their callers!
      
      The SSE/AVX implementation functions contain their entire loop, so this
      problem can simply be closed over with appropriately placed SFENCEs.
      Unverified
      738fe7d7
  5. Aug 07, 2023
Loading