Templates for efficient gemm kernels. More...
#include "common.h"
#include "vector_intrin.h"
Go to the source code of this file.
Classes | |
class | MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K > |
Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions. More... | |
struct | MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end > |
struct | MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_end, T_end > |
class | MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack< T_rows, T_cols, T_ordering_kernel, T_repetitions > |
Class template for packing of matrix elements prior to matrix-matrix multiply. More... | |
struct | MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack< T_rows, T_cols, T_ordering_kernel, T_repetitions >::Assign_to_packed< T_ordering_matrix > |
struct | MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack< T_rows, T_cols, T_ordering_kernel, T_repetitions >::Extract_from_packed< T_ordering_matrix > |
Templates for efficient gemm kernels.
For architectures with SSE2 or higher.