API
Functions
ChipSort.bitonic_merge
— Method.bitonic_merge(input_a::Vec{N,T}, input_b::Vec{N,T}) where {N,T}
Merges two SIMD.Vec
objects of the same type and size using a bitonic sort network. The inputs are assumed to be sorted. Returns a pair of vectors with the first and second halves of the merged sequence.
ChipSort.bitonic_merge_interleaved
— Method.bitonic_merge_interleaved(output_a, output_b, input_a, input_b, Val(N))
Merges K pairs of vectors using a bitonic sort network, with interleaved execution. The inputs are assumed to be sorted. Inputs and outputs are acessed directly in memory.
ChipSort.chipsort!
— Method.Regular Comb sort followed by Insertion sort when the interval becomes 1.
ChipSort.chipsort_medium!
— Method.chipsort_medium!(data, Val(V), Val(J), Val(K))
Sort a medium array in-place. The data is divided between K
blocks of J
SIMD vectors of size V
. The array should typically fit inside lower levels of cache memory (L1 or L2), for instance 8192 Int32 values on an conventional desktop computer. Our recipe is:
- Generate small sorted vectors.
- Use vectorized Comb sort.
- Transpose in-place.
- Finish with insertion sort over the nearly sorted array.
Examples
julia> chipsort_medium!(Int32[1:2^13...] .* 0xf0ca .%0x10000, Val(8), Val(8), Val(64))'
1×8192 LinearAlgebra.Adjoint{Int32,Array{Int32,1}}:
30 32 34 36 38 70 72 74 76 108 110 112 114 116 … 65488 65490 65492 65494 65496 65528 65530 65532 65534
ChipSort.chipsort_small!
— Method.chipsort_small!(data, Val(V), Val(J))
Sort a small array using a vectorized sorting network and bitonic merge networks. The initial sort is over J
SIMD vectors of size V
. The array should typically fit inside register memory, for instance 64 Int32 values on an AVX2 machine.
Examples
julia> chipsort_small!(Int32[1:16...] .* Int32(1729) .%0x100, Val(4), Val(4))'
1×16 LinearAlgebra.Adjoint{Int32,Array{Int32,1}}:
4 8 12 16 67 71 75 79 130 134 138 142 193 197 201 205
ChipSort.combsort!
— Method.Regular version of Comb sort.
ChipSort.insertion_sort!
— Method.Insertion sort, adapted from the Julia codebase.
ChipSort.merge_vecs
— Method.merge_vecs(input...)
This brave function merges 4, 8 vectors or even more. Relies on the great ability from SIMD.jl and LLVM to handle large vectors. Performance not yet tested.
ChipSort.sort_vecs!
— Method.Sort blocks from the input array, optionally transposing to generate sorted sequences of size V (default), or further merging those into sequences of size V*L.
ChipSort.transpose!
— Method.In-place transpose of a 3 dimensional array VKJ into VJK.
ChipSort.transpose_vecs
— Method.transpose_vecs(input::Vararg{Vec{N,T}, L}) where {L,N,T}
Transposes a matrix of L vectors of size N into N vectors of size L. Sizes should be powers of 2.
ChipSort.find_transpose_cycles
— Method.Find the cycle seeds to transpose a matrix with K rows and J columns into J rows and K columns.
ChipSort.first_stream
— Method.Returns the stream with the smallest first element. If the two streams are empty, returns nothing
.