API

API

Functions

bitonic_merge(input_a::Vec{N,T}, input_b::Vec{N,T}) where {N,T}

Merges two SIMD.Vec objects of the same type and size using a bitonic sort network. The inputs are assumed to be sorted. Returns a pair of vectors with the first and second halves of the merged sequence.

source
bitonic_merge_interleaved(output_a, output_b, input_a, input_b, Val(N))

Merges K pairs of vectors using a bitonic sort network, with interleaved execution. The inputs are assumed to be sorted. Inputs and outputs are acessed directly in memory.

source
ChipSort.chipsort!Method.

Regular Comb sort followed by Insertion sort when the interval becomes 1.

source
chipsort_medium!(data, Val(V), Val(J), Val(K))

Sort a medium array in-place. The data is divided between K blocks of J SIMD vectors of size V. The array should typically fit inside lower levels of cache memory (L1 or L2), for instance 8192 Int32 values on an conventional desktop computer. Our recipe is:

  • Generate small sorted vectors.
  • Use vectorized Comb sort.
  • Transpose in-place.
  • Finish with insertion sort over the nearly sorted array.

Examples

julia> chipsort_medium!(Int32[1:2^13...] .* 0xf0ca .%0x10000, Val(8), Val(8), Val(64))'
1×8192 LinearAlgebra.Adjoint{Int32,Array{Int32,1}}:
 30  32  34  36  38  70  72  74  76  108  110  112  114  116  …  65488  65490  65492  65494  65496  65528  65530  65532  65534
source
chipsort_small!(data, Val(V), Val(J))

Sort a small array using a vectorized sorting network and bitonic merge networks. The initial sort is over J SIMD vectors of size V. The array should typically fit inside register memory, for instance 64 Int32 values on an AVX2 machine.

Examples

julia> chipsort_small!(Int32[1:16...] .* Int32(1729) .%0x100, Val(4), Val(4))'
1×16 LinearAlgebra.Adjoint{Int32,Array{Int32,1}}:
 4  8  12  16  67  71  75  79  130  134  138  142  193  197  201  205
source
ChipSort.combsort!Method.

Regular version of Comb sort.

source

Insertion sort, adapted from the Julia codebase.

source
merge_vecs(input...)

This brave function merges 4, 8 vectors or even more. Relies on the great ability from SIMD.jl and LLVM to handle large vectors. Performance not yet tested.

source

Sort blocks from the input array, optionally transposing to generate sorted sequences of size V (default), or further merging those into sequences of size V*L.

source

In-place transpose of a 3 dimensional array VKJ into VJK.

source
transpose_vecs(input::Vararg{Vec{N,T}, L}) where {L,N,T}

Transposes a matrix of L vectors of size N into N vectors of size L. Sizes should be powers of 2.

source

Find the cycle seeds to transpose a matrix with K rows and J columns into J rows and K columns.

source

Returns the stream with the smallest first element. If the two streams are empty, returns nothing.

source

Index