- Aug 31, 2023
-
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
- Aug 30, 2023
-
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
- Aug 10, 2023
-
-
Dirk Stolle authored
The following updates are performed: * update actions/cache to v3 * replace unmaintained actions-rs/toolchain by dtolnay/rust-toolchain * replace unmaintained actions-rs/cargo by direct invocation of cargo
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
-
Martin Larralde authored
- Aug 09, 2023
-
-
Martin Larralde authored
-
Jubilee authored
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, LLVM uses simple stores when lowering LLVMIR like `atomic store ... release` on x86, itself a lowering of Rust's `AtomicBool::store(.., .., Ordering::Release)`. These facts could allow something like the following code to be emitted: ```asm vmovntdq [addr], ymmreg vmovntdq [addr+32], ymmreg vmovntdq [addr+64], ymmreg vmovntdq [addr+96], ymmreg mov byte ptr [flag], 1 ; producer-consumer flag ``` But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a "yes-temporal" write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order: ```asm vmovntdq [addr+64], ymmreg vmovntdq [addr+96], ymmreg mov byte ptr [flag], 1 vmovntdq [addr+32], ymmreg vmovntdq [addr], ymmreg ``` This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states, especially as the number of writes, thus the number of write buffers that may begin retiring simultaneously, thus the chance of them resolving in an unfortunate order, increases. To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE. Even `unsafe fn` must explicitly pass this invariant to their callers! The SSE/AVX implementation functions contain their entire loop, so this problem can simply be closed over with appropriately placed SFENCEs.
-
- Aug 07, 2023
-
-
Martin Larralde authored
-
Martin Larralde authored
-