API
Functions
ChipSort.bitonic_merge — Method.bitonic_merge(input_a::Vec{N,T}, input_b::Vec{N,T}) where {N,T}Merges two SIMD.Vec objects of the same type and size using a bitonic sort network. The inputs are assumed to be sorted. Returns a pair of vectors with the first and second halves of the merged sequence.
ChipSort.bitonic_merge_interleaved — Method.bitonic_merge_interleaved(output_a, output_b, input_a, input_b, Val(N))Merges K pairs of vectors using a bitonic sort network, with interleaved execution. The inputs are assumed to be sorted. Inputs and outputs are acessed directly in memory.
ChipSort.chipsort! — Method.Regular Comb sort followed by Insertion sort when the interval becomes 1.
ChipSort.chipsort_medium! — Method.chipsort_medium!(data, Val(V), Val(J), Val(K))Sort a medium array in-place. The data is divided between K blocks of J SIMD vectors of size V. The array should typically fit inside lower levels of cache memory (L1 or L2), for instance 8192 Int32 values on an conventional desktop computer. Our recipe is:
- Generate small sorted vectors.
- Use vectorized Comb sort.
- Transpose in-place.
- Finish with insertion sort over the nearly sorted array.
Examples
julia> chipsort_medium!(Int32[1:2^13...] .* 0xf0ca .%0x10000, Val(8), Val(8), Val(64))'
1×8192 LinearAlgebra.Adjoint{Int32,Array{Int32,1}}:
30 32 34 36 38 70 72 74 76 108 110 112 114 116 … 65488 65490 65492 65494 65496 65528 65530 65532 65534ChipSort.chipsort_small! — Method.chipsort_small!(data, Val(V), Val(J))Sort a small array using a vectorized sorting network and bitonic merge networks. The initial sort is over J SIMD vectors of size V. The array should typically fit inside register memory, for instance 64 Int32 values on an AVX2 machine.
Examples
julia> chipsort_small!(Int32[1:16...] .* Int32(1729) .%0x100, Val(4), Val(4))'
1×16 LinearAlgebra.Adjoint{Int32,Array{Int32,1}}:
4 8 12 16 67 71 75 79 130 134 138 142 193 197 201 205ChipSort.combsort! — Method.Regular version of Comb sort.
ChipSort.insertion_sort! — Method.Insertion sort, adapted from the Julia codebase.
ChipSort.merge_vecs — Method.merge_vecs(input...)This brave function merges 4, 8 vectors or even more. Relies on the great ability from SIMD.jl and LLVM to handle large vectors. Performance not yet tested.
ChipSort.sort_vecs! — Method.Sort blocks from the input array, optionally transposing to generate sorted sequences of size V (default), or further merging those into sequences of size V*L.
ChipSort.transpose! — Method.In-place transpose of a 3 dimensional array VKJ into VJK.
ChipSort.transpose_vecs — Method.transpose_vecs(input::Vararg{Vec{N,T}, L}) where {L,N,T}Transposes a matrix of L vectors of size N into N vectors of size L. Sizes should be powers of 2.
ChipSort.find_transpose_cycles — Method.Find the cycle seeds to transpose a matrix with K rows and J columns into J rows and K columns.
ChipSort.first_stream — Method.Returns the stream with the smallest first element. If the two streams are empty, returns nothing.