Linux | Windows |
---|---|
Spreads.Native
Spreads' native dependencies and low-level IL methods.
Native compression
Spreads.Native.Compression class exposes methods from Blosc:
- SIMD-optimized shuffle/unshuffle.
- Compression: LZ4, Zstd, Zlib/GZip/Deflate compression/decompression.
Currently works on Windows x64/x86 and Linux x64 (tested on WSL & Docker Ubuntu). Targets
netstandard2.0
.
UnsafeEx
UnsafeEx class contains unsafe IL helper methods that we cannot implement in C#.
Constrained generic calls without constraints
Generic methods ending with Constrained
emit a constrained call to instance methods of known interfaces on instances of a generic type T
without a type constraint where T : IKnownInterface<T>
.
For example, calling the IComparable<T>.CompareTo
method is implemented like this:
.method public hidebysig static int32 CompareToConstrained<T>(!!T& left, !!T& right) cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
.maxstack 8
ldarg.0
ldarg.1
ldobj !!T
constrained. !!T
callvirt instance int32 class [System.Runtime]System.IComparable`1<!!T>::CompareTo(!0)
ret
} // end of method Unsafe::CompareToConstrained
In addition to the IComparable<T>
interface there are IEquatable<T>
and the following custom ones in Spreads
namespace:
public interface IDelta<T>
{
T AddDelta(T delta);
T GetDelta(T other);
}
public interface IInt64Diffable<T> : IComparable<T>
{
T Add(long diff);
long Diff(T other);
}
KeyComparer<T>
The main use case and sample usage is KeyComparer<T>
.
A benchmark shows that the unsafe CompareToConstrained
method and the KeyComparer<T>
that uses it are c.2x faster than the Comparer<T>.Default
when called via the IComparer<T>
interface and are c.1.6x faster when the default comparer is called directly as a class.
ComparerInterfaceAndCachedConstrainedComparer
Case | MOPS | Elapsed | GC0 | GC1 | GC2 | Memory |
---|---|---|---|---|---|---|
Unsafe | 403.23 | 248 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
KeyComparer* | 396.83 | 252 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
Default | 255.75 | 391 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
Interface | 211.42 | 473 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
* KeyComparer<T>
uses the JIT compile-time constant optimization
for known types and falls back to the Unsafe.CompareToConstrained
method for types that implement IComparable<T>
interface.
On .NET 4.6.1 there is no visible difference with and without the special cases: Unsafe.CompareToConstrained
performs as fast as the if (typeof(T) == typeof(Int32)) { ... }
pattern. See the discussion here and
implementation with comments here explaining why
the special cases could be needed on some platforms.
Unsafe methods could only be called on instances of a generic type T
when the type implements a relevant interface. KeyComparer<T>
has a static readonly
field that (in theory) allows to use the same JIT optimization mentioned above:
private static readonly bool IsIComparable = typeof(IComparable<T>).GetTypeInfo().IsAssignableFrom(typeof(T));
public int Compare(T x, T y)
{
...
if (IsIComparable) // JIT compile-time constant
{
return Unsafe.CompareToConstrained(ref x, ref y);
}
...
}
But even if such optimization breaks in this particular case (see the linked discussion) then checking a static bool field is still much cheaper than a virtual call, especially given that its value is constant for the lifetime of a program and branch prediction should be 100% effective.
FastDictionary
Another use case is FastDictionary<TKey,TValue>
that uses unsafe methods via KeyEqualityComparer<T>
,
which is very similar to KeyComparer<T>
above. FastDictionay is a rewrite of S.C.G.Dictionary<TKey,TValue>
that avoids virtual calls
to an equality comparer.
A benchmark for <int,int>
types shows that FastDictionary<int,int>
is c.70% faster than S.C.G.Dictionary<int,int>
:
CompareSCGAndFastDictionaryWithInts
Case | MOPS | Elapsed | GC0 | GC1 | GC2 | Memory |
---|---|---|---|---|---|---|
FastDictionary | 120.48 | 415 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
Dictionary | 71.63 | 698 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
Such implementation is much simpler than one with an additoinal generic parameter for a comparer, as recently discussed in this blog post.
It is also more flexible than constraining TKey
to where TKey : IEquatable<TKey>
and gives the same performance.
Another benchmark with a key as a custom 16-bytes Symbol
struct shows c.50% performance gain:
CompareSCGAndFastDictionaryWithSymbol
Case | MOPS | Elapsed | GC0 | GC1 | GC2 | Memory |
---|---|---|---|---|---|---|
FastDictionary | 63.69 | 157 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
Dictionary | 43.29 | 231 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
Discussion
There was an interesting discussion about intrinsifying EqualityComparer<T>
,
aiming to achieve a similar goal of inlining calls when possible. However, the last comment from a CoreCLR member says that:
I am not aware of anybody working on this right now, so it is pretty unlikely [that it has a chance to appear in .NET Core 2.0].
Later devirtualization support for EqualityComparer<T>.Default
was added.
But there is no support for comparers and especially for custom interfaces. Future versions of .NET Core may have much faster
comparers, but for existing code and platforms Spreads.Native.UnsafeEx
gives the required performance right here and now.
License
MPL 2.0. See the lincese file and third-party licenses.