Commits · 5d1915841362f387d7427d0d322a062ad2a88a11 · Martin Larralde / LightMotif

Aug 31, 2023
- Add copy protocol to `EncodedSequence` and `StripedSequence` · 5d191584
  Martin Larralde authored 1 year ago
  
  5d191584
- Add buffer protocol for `EncodedSequence` and `StripedSequence` · a9727b9f
  Martin Larralde authored 1 year ago
  
  a9727b9f
- Add support for indexing an `EncodedSequence` · 26c650f5
  Martin Larralde authored 1 year ago
  
  26c650f5
- Implement the buffer protocol for Python `StripedSequence` · 6dc317ee
  Martin Larralde authored 1 year ago
  
  6dc317ee
- Update `Encode` trait to return an `EncodedSequence` · 453bdc99
  Martin Larralde authored 1 year ago
  
  453bdc99
- Add a `PartialEq` implementation for `EncodedSequence` · 5aab90ca
  Martin Larralde authored 1 year ago
  
  5aab90ca
- Require `Symbol` implementors to be `Eq` · 1858af0e
  Martin Larralde authored 1 year ago
  
  1858af0e
- Add some convenience methods and traits to `TfmPvalue` · a8b8afd3
  Martin Larralde authored 1 year ago
  
  a8b8afd3
- Make `TfmPvalue` generic over the reference to the `ScoringMatrix` · 9086f036
  Martin Larralde authored 1 year ago
  
  9086f036
- Avoid initialization when allocating new buffer in `EncodedSequence::encode` · 4d975710
  Martin Larralde authored 1 year ago
  
  4d975710
- Update Python wrapper to use the dynamic dispatch pipeline · 6830cdb6
  Martin Larralde authored 1 year ago
  
  6830cdb6
- Update example in `README.md` to use the dynamic dispatch pipeline · bb00dc69
  Martin Larralde authored 1 year ago
  
  bb00dc69
- Fix bug in `Encode` implementation for `Pipeline<_, Dispatch>` · 88b6ad22
  Martin Larralde authored 1 year ago
  
  88b6ad22
- Update tests and benchmarks to use the dispatched pipeline · 7eb22b0c
  Martin Larralde authored 1 year ago
  
  7eb22b0c
- Add a dynamic dispatched pipeline backend to `lightmotif` · 228979ec
  Martin Larralde authored 1 year ago
  
  228979ec
- Fix name of helper methods for `Stripe` trait · ed2bbb6e
  Martin Larralde authored 1 year ago
  
  ed2bbb6e
- Fix broken tests in `lightmotif::seq` module · cf3f397d
  Martin Larralde authored 1 year ago
  
  cf3f397d
Aug 30, 2023
- Add AVX2 implementation of the `Stripe` trait using 32x32 matrix transpose · 9f0c4162
  Martin Larralde authored 1 year ago
  
  9f0c4162
- Add tests for the `Transpose` trait implementations · c7bdb341
  Martin Larralde authored 1 year ago
  
  c7bdb341
- Make `StripedSequence::new` return an error when given an invalid length · 8d763605
  Martin Larralde authored 1 year ago
  
  8d763605
- Add new pipeline trait to stripe an encoded sequence · e26e11ef
  Martin Larralde authored 1 year ago
  
  e26e11ef
- Add convenience methods to `StripedSequence` · 001da6dd
  Martin Larralde authored 1 year ago
  
  001da6dd
- Fix `Debug` for `DenseMatrix` now to render the padding bytes · 4645a543
  Martin Larralde authored 1 year ago
  
  4645a543
- Add method to fill all a `DenseMatrix` with a given value · 49267733
  Martin Larralde authored 1 year ago
  
  49267733
- Improve performance of AVX2 and NEON `Encode` by removing non-const functions in loop · 51bc3fc8
  Martin Larralde authored 1 year ago
  
  51bc3fc8
- Add `Alphabet::as_str` method to get all symbols from an alphabet · 7e8bdf38
  Martin Larralde authored 1 year ago
  
  7e8bdf38
- Add multiplexed NEON implementation of `Encode` for NEON · 53da5b92
  Martin Larralde authored 1 year ago
  
  53da5b92
- Add NEON implementation of the `Threshold` operation · 935ac549
  Martin Larralde authored 1 year ago
  
  935ac549
- Add NEON benchmarks to `lightmotif` benches · 1aef6b77
  Martin Larralde authored 1 year ago
  
  1aef6b77
Aug 10, 2023
- Release v0.4.0 · 5cf94f5d
  Martin Larralde authored 1 year ago
  
  View commits for tag v0.4.0 v0.4.0
  
  5cf94f5d
- Update GitHub Actions CI (#3) · 2d0fa41f
  Dirk Stolle authored 1 year ago
  
  The following updates are performed: * update actions/cache to v3 * replace unmaintained actions-rs/toolchain by dtolnay/rust-toolchain * replace unmaintained actions-rs/cargo by direct invocation of cargo
  Unverified
  
  2d0fa41f
- Update code example in `README.md` · 71368843
  Martin Larralde authored 1 year ago
  
  71368843
- Add `max` and `argmax` methods to the Python `StripedScores` interface · 4fd9360b
  Martin Larralde authored 1 year ago
  
  4fd9360b
- Add `StripedScores.is_empty` method · 6d65a9fa
  Martin Larralde authored 1 year ago
  
  6d65a9fa
- Rename `BestPosition` trait to `Maximum` to prepare for fast maximum score extraction · 09c99260
  Martin Larralde authored 1 year ago
  
  09c99260
- Update `BestPosition` to return first position on equal maxima · 67829490
  Martin Larralde authored 1 year ago
  
  67829490
Aug 09, 2023

Fix `Score` causing an overflow on sequences shorter than PSSM · 24f941e7
Martin Larralde authored 1 year ago

24f941e7

SFENCE after streaming loops (#4) · 738fe7d7

Jubilee authored 1 year ago

MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86, itself a lowering of
Rust's `AtomicBool::store(.., .., Ordering::Release)`. These facts
could allow something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+32],  ymmreg
vmovntdq [addr+64],  ymmreg
vmovntdq [addr+96],  ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a "yes-temporal" write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+64],  ymmreg
vmovntdq [addr+96],  ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+32],  ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states, especially as the number of writes, thus
the number of write buffers that may begin retiring simultaneously,
thus the chance of them resolving in an unfortunate order, increases.

To guarantee program soundness, code using nontemporal stores must
currently use SFENCE in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and motivation to choose lowerings that do not require explicit SFENCE.
Even `unsafe fn` must explicitly pass this invariant to their callers!

The SSE/AVX implementation functions contain their entire loop, so this
problem can simply be closed over with appropriately placed SFENCEs.

Unverified

738fe7d7

Aug 07, 2023
- Add an AVX2 implementation of `Encode` with benchmarks · 1906c5fc
  Martin Larralde authored 1 year ago
  
  1906c5fc
- Add pipeline trait to perform sequence encoding with SIMD · 46baa25a
  Martin Larralde authored 1 year ago
  
  46baa25a