NMSIS-NN
Version 1.3.1
NMSIS NN Software Library
|
Collection of fully-connected and matrix multiplication functions. More...
Modules | |
GetBufferSizeFC | |
Functions | |
riscv_nmsis_nn_status | riscv_fully_connected_mat_q7_vec_q15 (const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer) |
Mixed Q15-Q7 fully-connected layer function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_mat_q7_vec_q15_opt (const q15_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q15_t *pOut, q15_t *vec_buffer) |
Mixed Q15-Q7 opt fully-connected layer function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_q15 (const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer) |
Q15 opt fully-connected layer function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_q15_opt (const q15_t *pV, const q15_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q15_t *bias, q15_t *pOut, q15_t *vec_buffer) |
Q15 opt fully-connected layer function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_q7 (const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer) |
Q7 basic fully-connected layer function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_q7_opt (const q7_t *pV, const q7_t *pM, const uint16_t dim_vec, const uint16_t num_of_rows, const uint16_t bias_shift, const uint16_t out_shift, const q7_t *bias, q7_t *pOut, q15_t *vec_buffer) |
Q7 opt fully-connected layer function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_s16 (const nmsis_nn_context *ctx, const nmsis_nn_fc_params *fc_params, const nmsis_nn_per_tensor_quant_params *quant_params, const nmsis_nn_dims *input_dims, const int16_t *input, const nmsis_nn_dims *filter_dims, const int8_t *kernel, const nmsis_nn_dims *bias_dims, const int64_t *bias, const nmsis_nn_dims *output_dims, int16_t *output) |
Basic s16 Fully Connected function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_s4 (const nmsis_nn_context *ctx, const nmsis_nn_fc_params *fc_params, const nmsis_nn_per_tensor_quant_params *quant_params, const nmsis_nn_dims *input_dims, const int8_t *input, const nmsis_nn_dims *filter_dims, const int8_t *kernel, const nmsis_nn_dims *bias_dims, const int32_t *bias, const nmsis_nn_dims *output_dims, int8_t *output) |
Basic s4 Fully Connected function. More... | |
riscv_nmsis_nn_status | riscv_fully_connected_s8 (const nmsis_nn_context *ctx, const nmsis_nn_fc_params *fc_params, const nmsis_nn_per_tensor_quant_params *quant_params, const nmsis_nn_dims *input_dims, const int8_t *input, const nmsis_nn_dims *filter_dims, const int8_t *kernel, const nmsis_nn_dims *bias_dims, const int32_t *bias, const nmsis_nn_dims *output_dims, int8_t *output) |
Basic s8 Fully Connected function. More... | |
riscv_nmsis_nn_status | riscv_vector_sum_s8 (int32_t *vector_sum_buf, const int32_t vector_cols, const int32_t vector_rows, const int8_t *vector_data, const int32_t lhs_offset, const int32_t *bias_data) |
Calculate the sum of each row in vector_data, multiply by lhs_offset and optionally add s32 bias_data. More... | |
riscv_nmsis_nn_status | riscv_vector_sum_s8_s64 (int64_t *vector_sum_buf, const int32_t vector_cols, const int32_t vector_rows, const int8_t *vector_data, const int32_t lhs_offset, const int64_t *bias_data) |
Calculate the sum of each row in vector_data, multiply by lhs_offset and optionally add s64 bias_data. More... | |
Collection of fully-connected and matrix multiplication functions.
Fully-connected layer is basically a matrix-vector multiplication with bias. The matrix is the weights and the input/output vectors are the activation values. Supported {weight, activation} precisions include {8-bit, 8-bit} and {8-bit, 16-bit}
riscv_nmsis_nn_status riscv_fully_connected_mat_q7_vec_q15 | ( | const q15_t * | pV, |
const q7_t * | pM, | ||
const uint16_t | dim_vec, | ||
const uint16_t | num_of_rows, | ||
const uint16_t | bias_shift, | ||
const uint16_t | out_shift, | ||
const q7_t * | bias, | ||
q15_t * | pOut, | ||
q15_t * | vec_buffer | ||
) |
Mixed Q15-Q7 fully-connected layer function.
[in] | pV | pointer to input vector |
[in] | pM | pointer to matrix weights |
[in] | dim_vec | length of the vector |
[in] | num_of_rows | number of rows in weight matrix |
[in] | bias_shift | amount of left-shift for bias |
[in] | out_shift | amount of right-shift for output |
[in] | bias | pointer to bias |
[in,out] | pOut | pointer to output vector |
[in,out] | vec_buffer | pointer to buffer space for input |
RISCV_NMSIS_NN_SUCCESS
Buffer size:
vec_buffer size: 0
Q7_Q15 version of the fully connected layer
Weights are in q7_t and Activations are in q15_t
riscv_nmsis_nn_status riscv_fully_connected_mat_q7_vec_q15_opt | ( | const q15_t * | pV, |
const q7_t * | pM, | ||
const uint16_t | dim_vec, | ||
const uint16_t | num_of_rows, | ||
const uint16_t | bias_shift, | ||
const uint16_t | out_shift, | ||
const q7_t * | bias, | ||
q15_t * | pOut, | ||
q15_t * | vec_buffer | ||
) |
Mixed Q15-Q7 opt fully-connected layer function.
[in] | pV | pointer to input vector |
[in] | pM | pointer to matrix weights |
[in] | dim_vec | length of the vector |
[in] | num_of_rows | number of rows in weight matrix |
[in] | bias_shift | amount of left-shift for bias |
[in] | out_shift | amount of right-shift for output |
[in] | bias | pointer to bias |
[in,out] | pOut | pointer to output vector |
[in,out] | vec_buffer | pointer to buffer space for input |
RISCV_NMSIS_NN_SUCCESS
Buffer size:
vec_buffer size: 0
Q7_Q15 version of the fully connected layer
Weights are in q7_t and Activations are in q15_t
Limitation: x4 version requires weight reordering to work
Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7_t matrix looks like this:
| a11 | a12 | a13 | a14 | a15 | a16 | a17 |
| a21 | a22 | a23 | a24 | a25 | a26 | a27 |
| a31 | a32 | a33 | a34 | a35 | a36 | a37 |
| a41 | a42 | a43 | a44 | a45 | a46 | a47 |
| a51 | a52 | a53 | a54 | a55 | a56 | a57 |
| a61 | a62 | a63 | a64 | a65 | a66 | a67 |
We operates on multiple-of-4 rows, so the first four rows becomes
| a11 | a21 | a12 | a22 | a31 | a41 | a32 | a42 |
| a13 | a23 | a14 | a24 | a33 | a43 | a34 | a44 |
| a15 | a25 | a16 | a26 | a35 | a45 | a36 | a46 |
The column left over will be in-order. which is: | a17 | a27 | a37 | a47 |
For the left-over rows, we do 1x1 computation, so the data remains as its original order.
So the stored weight matrix looks like this:
| a11 | a21 | a12 | a22 | a31 | a41 |
| a32 | a42 | a13 | a23 | a14 | a24 |
| a33 | a43 | a34 | a44 | a15 | a25 |
| a16 | a26 | a35 | a45 | a36 | a46 |
| a17 | a27 | a37 | a47 | a51 | a52 |
| a53 | a54 | a55 | a56 | a57 | a61 |
| a62 | a63 | a64 | a65 | a66 | a67 |
riscv_nmsis_nn_status riscv_fully_connected_q15 | ( | const q15_t * | pV, |
const q15_t * | pM, | ||
const uint16_t | dim_vec, | ||
const uint16_t | num_of_rows, | ||
const uint16_t | bias_shift, | ||
const uint16_t | out_shift, | ||
const q15_t * | bias, | ||
q15_t * | pOut, | ||
q15_t * | vec_buffer | ||
) |
Q15 opt fully-connected layer function.
Q15 basic fully-connected layer function.
[in] | pV | pointer to input vector |
[in] | pM | pointer to matrix weights |
[in] | dim_vec | length of the vector |
[in] | num_of_rows | number of rows in weight matrix |
[in] | bias_shift | amount of left-shift for bias |
[in] | out_shift | amount of right-shift for output |
[in] | bias | pointer to bias |
[in,out] | pOut | pointer to output vector |
[in,out] | vec_buffer | pointer to buffer space for input |
RISCV_NMSIS_NN_SUCCESS
Buffer size:
vec_buffer size: 0
riscv_nmsis_nn_status riscv_fully_connected_q15_opt | ( | const q15_t * | pV, |
const q15_t * | pM, | ||
const uint16_t | dim_vec, | ||
const uint16_t | num_of_rows, | ||
const uint16_t | bias_shift, | ||
const uint16_t | out_shift, | ||
const q15_t * | bias, | ||
q15_t * | pOut, | ||
q15_t * | vec_buffer | ||
) |
Q15 opt fully-connected layer function.
[in] | pV | pointer to input vector |
[in] | pM | pointer to matrix weights |
[in] | dim_vec | length of the vector |
[in] | num_of_rows | number of rows in weight matrix |
[in] | bias_shift | amount of left-shift for bias |
[in] | out_shift | amount of right-shift for output |
[in] | bias | pointer to bias |
[in,out] | pOut | pointer to output vector |
[in,out] | vec_buffer | pointer to buffer space for input |
RISCV_NMSIS_NN_SUCCESS
Buffer size:
vec_buffer size: 0
Here we use only one pointer to read 4 rows in the weight matrix. So if the original matrix looks like this:
| a11 | a12 | a13 |
| a21 | a22 | a23 |
| a31 | a32 | a33 |
| a41 | a42 | a43 |
| a51 | a52 | a53 |
| a61 | a62 | a63 |
We operates on multiple-of-4 rows, so the first four rows becomes
| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |
| a13 | a23 | a33 | a43 |
Remaining rows are kept the same original order.
So the stored weight matrix looks like this:
| a11 | a12 | a21 | a22 | a31 | a32 | a41 | a42 |
| a13 | a23 | a33 | a43 | a51 | a52 | a53 | a61 |
| a62 | a63 |
riscv_nmsis_nn_status riscv_fully_connected_q7 | ( | const q7_t * | pV, |
const q7_t * | pM, | ||
const uint16_t | dim_vec, | ||
const uint16_t | num_of_rows, | ||
const uint16_t | bias_shift, | ||
const uint16_t | out_shift, | ||
const q7_t * | bias, | ||
q7_t * | pOut, | ||
q15_t * | vec_buffer | ||
) |
Q7 basic fully-connected layer function.
[in] | pV | pointer to input vector |
[in] | pM | pointer to matrix weights |
[in] | dim_vec | length of the vector |
[in] | num_of_rows | number of rows in weight matrix |
[in] | bias_shift | amount of left-shift for bias |
[in] | out_shift | amount of right-shift for output |
[in] | bias | pointer to bias |
[in,out] | pOut | pointer to output vector |
[in,out] | vec_buffer | pointer to buffer space for input |
RISCV_NMSIS_NN_SUCCESS
Buffer size:
vec_buffer size: dim_vec
This basic function is designed to work with regular weight matrix without interleaving.
riscv_nmsis_nn_status riscv_fully_connected_q7_opt | ( | const q7_t * | pV, |
const q7_t * | pM, | ||
const uint16_t | dim_vec, | ||
const uint16_t | num_of_rows, | ||
const uint16_t | bias_shift, | ||
const uint16_t | out_shift, | ||
const q7_t * | bias, | ||
q7_t * | pOut, | ||
q15_t * | vec_buffer | ||
) |
Q7 opt fully-connected layer function.
[in] | pV | pointer to input vector |
[in] | pM | pointer to matrix weights |
[in] | dim_vec | length of the vector |
[in] | num_of_rows | number of rows in weight matrix |
[in] | bias_shift | amount of left-shift for bias |
[in] | out_shift | amount of right-shift for output |
[in] | bias | pointer to bias |
[in,out] | pOut | pointer to output vector |
[in,out] | vec_buffer | pointer to buffer space for input |
RISCV_NMSIS_NN_SUCCESS
Buffer size:
vec_buffer size: dim_vec
This opt function is designed to work with interleaved weight matrix. The vector input is assumed in q7_t format, we call riscv_q7_to_q15_no_shift_shuffle function to expand into q15_t format with certain weight re-ordering, refer to the function comments for more details. Here we use only one pointer to read 4 rows in the weight matrix. So if the original q7_t matrix looks like this:
| a11 | a12 | a13 | a14 | a15 | a16 | a17 |
| a21 | a22 | a23 | a24 | a25 | a26 | a27 |
| a31 | a32 | a33 | a34 | a35 | a36 | a37 |
| a41 | a42 | a43 | a44 | a45 | a46 | a47 |
| a51 | a52 | a53 | a54 | a55 | a56 | a57 |
| a61 | a62 | a63 | a64 | a65 | a66 | a67 |
We operates on multiple-of-4 rows, so the first four rows becomes
| a11 | a21 | a13 | a23 | a31 | a41 | a33 | a43 |
| a12 | a22 | a14 | a24 | a32 | a42 | a34 | a44 |
| a15 | a25 | a35 | a45 | a16 | a26 | a36 | a46 |
So within the kernel, we first read the re-ordered vector in as:
| b1 | b3 | and | b2 | b4 |
the four q31_t weights will look like
| a11 | a13 |, | a21 | a23 |, | a31 | a33 |, | a41 | a43 |
| a12 | a14 |, | a22 | a24 |, | a32 | a34 |, | a42 | a44 |
The column left over will be in-order. which is:
| a17 | a27 | a37 | a47 |
For the left-over rows, we do 1x1 computation, so the data remains as its original order.
So the stored weight matrix looks like this:
| a11 | a21 | a13 | a23 | a31 | a41 |
| a33 | a43 | a12 | a22 | a14 | a24 |
| a32 | a42 | a34 | a44 | a15 | a25 |
| a35 | a45 | a16 | a26 | a36 | a46 |
| a17 | a27 | a37 | a47 | a51 | a52 |
| a53 | a54 | a55 | a56 | a57 | a61 |
| a62 | a63 | a64 | a65 | a66 | a67 |
riscv_nmsis_nn_status riscv_fully_connected_s16 | ( | const nmsis_nn_context * | ctx, |
const nmsis_nn_fc_params * | fc_params, | ||
const nmsis_nn_per_tensor_quant_params * | quant_params, | ||
const nmsis_nn_dims * | input_dims, | ||
const int16_t * | input_data, | ||
const nmsis_nn_dims * | filter_dims, | ||
const int8_t * | filter_data, | ||
const nmsis_nn_dims * | bias_dims, | ||
const int64_t * | bias_data, | ||
const nmsis_nn_dims * | output_dims, | ||
int16_t * | output_data | ||
) |
Basic s16 Fully Connected function.
[in,out] | ctx | Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}_get_buffer_size() provides the buffer size if an additional buffer is required. The caller is expected to clear the buffer, if applicable, for security reasons. |
[in] | fc_params | Fully Connected layer parameters. fc_params->input_offset : 0 fc_params->filter_offset : 0 fc_params->output_offset : 0 |
[in] | quant_params | Per-tensor quantization info. It contains the multiplier and shift values to be applied to the output tensor. |
[in] | input_dims | Input (activation) tensor dimensions. Format: [N, H, W, C_IN] Input dimension is taken as Nx(H * W * C_IN) |
[in] | input_data | Input (activation) data pointer. Data type: int16 |
[in] | filter_dims | Two dimensional filter dimensions. Format: [N, C] N : accumulation depth and equals (H * W * C_IN) from input_dims C : output depth and equals C_OUT in output_dims H & W : Not used |
[in] | filter_data | Filter data pointer. Data type: int8 |
[in] | bias_dims | Bias tensor dimensions. Format: [C_OUT] N, H, W : Not used |
[in] | bias_data | Bias data pointer. Data type: int64 |
[in] | output_dims | Output tensor dimensions. Format: [N, C_OUT] N : Batches C_OUT : Output depth H & W : Not used. |
[in,out] | output_data | Output data pointer. Data type: int16 |
RISCV_NMSIS_NN_SUCCESS
riscv_nmsis_nn_status riscv_fully_connected_s4 | ( | const nmsis_nn_context * | ctx, |
const nmsis_nn_fc_params * | fc_params, | ||
const nmsis_nn_per_tensor_quant_params * | quant_params, | ||
const nmsis_nn_dims * | input_dims, | ||
const int8_t * | input_data, | ||
const nmsis_nn_dims * | filter_dims, | ||
const int8_t * | filter_data, | ||
const nmsis_nn_dims * | bias_dims, | ||
const int32_t * | bias_data, | ||
const nmsis_nn_dims * | output_dims, | ||
int8_t * | output_data | ||
) |
Basic s4 Fully Connected function.
[in,out] | ctx | Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}_get_buffer_size() provides the buffer size if an additional buffer is required. The caller is expected to clear the buffer ,if applicable, for security reasons. |
[in] | fc_params | Fully Connected layer parameters. Range of fc_params->input_offset : [-127, 128] fc_params->filter_offset : 0 Range of fc_params->output_offset : [-128, 127] |
[in] | quant_params | Per-tensor quantization info. It contains the multiplier and shift values to be applied to the output tensor. |
[in] | input_dims | Input (activation) tensor dimensions. Format: [N, H, W, C_IN] Input dimension is taken as Nx(H * W * C_IN) |
[in] | input_data | Input (activation) data pointer. Data type: int8 |
[in] | filter_dims | Two dimensional filter dimensions. Format: [N, C] N : accumulation depth and equals (H * W * C_IN) from input_dims C : output depth and equals C_OUT in output_dims H & W : Not used |
[in] | filter_data | Filter data pointer. Data type: int8_t packed 4-bit weights, e.g four sequential weights [0x1, 0x2, 0x3, 0x4] packed as [0x21, 0x43]. |
[in] | bias_dims | Bias tensor dimensions. Format: [C_OUT] N, H, W : Not used |
[in] | bias_data | Bias data pointer. Data type: int32 |
[in] | output_dims | Output tensor dimensions. Format: [N, C_OUT] N : Batches C_OUT : Output depth H & W : Not used. |
[in,out] | output_data | Output data pointer. Data type: int8 |
RISCV_NMSIS_NN_SUCCESS
riscv_nmsis_nn_status riscv_fully_connected_s8 | ( | const nmsis_nn_context * | ctx, |
const nmsis_nn_fc_params * | fc_params, | ||
const nmsis_nn_per_tensor_quant_params * | quant_params, | ||
const nmsis_nn_dims * | input_dims, | ||
const int8_t * | input_data, | ||
const nmsis_nn_dims * | filter_dims, | ||
const int8_t * | filter_data, | ||
const nmsis_nn_dims * | bias_dims, | ||
const int32_t * | bias_data, | ||
const nmsis_nn_dims * | output_dims, | ||
int8_t * | output_data | ||
) |
Basic s8 Fully Connected function.
[in,out] | ctx | Function context (e.g. temporary buffer). Check the function definition file to see if an additional buffer is required. Optional function {API}_get_buffer_size() provides the buffer size if an additional buffer is required. The caller is expected to clear the buffer, if applicable, for security reasons. |
[in] | fc_params | Fully Connected layer parameters. Range of fc_params->input_offset : [-127, 128] fc_params->filter_offset : 0 Range of fc_params->output_offset : [-128, 127] |
[in] | quant_params | Per-tensor quantization info. It contains the multiplier and shift values to be applied to the output tensor. |
[in] | input_dims | Input (activation) tensor dimensions. Format: [N, H, W, C_IN] Input dimension is taken as Nx(H * W * C_IN) |
[in] | input_data | Input (activation) data pointer. Data type: int8 |
[in] | filter_dims | Two dimensional filter dimensions. Format: [N, C] N : accumulation depth and equals (H * W * C_IN) from input_dims C : output depth and equals C_OUT in output_dims H & W : Not used |
[in] | filter_data | Filter data pointer. Data type: int8 |
[in] | bias_dims | Bias tensor dimensions. Format: [C_OUT] N, H, W : Not used |
[in] | bias_data | Bias data pointer. Data type: int32 |
[in] | output_dims | Output tensor dimensions. Format: [N, C_OUT] N : Batches C_OUT : Output depth H & W : Not used. |
[in,out] | output_data | Output data pointer. Data type: int8 |
RISCV_NMSIS_NN_ARG_ERROR
if argument constraints fail. or, RISCV_NMSIS_NN_SUCCESS
on successful completion.riscv_nmsis_nn_status riscv_vector_sum_s8 | ( | int32_t * | vector_sum_buf, |
const int32_t | vector_cols, | ||
const int32_t | vector_rows, | ||
const int8_t * | vector_data, | ||
const int32_t | lhs_offset, | ||
const int32_t * | bias_data | ||
) |
Calculate the sum of each row in vector_data, multiply by lhs_offset and optionally add s32 bias_data.
[in,out] | vector_sum_buf | Buffer for vector sums |
[in] | vector_cols | Number of vector columns |
[in] | vector_rows | Number of vector rows |
[in] | vector_data | Vector of weigths data |
[in] | lhs_offset | Constant multiplied with each sum |
[in] | bias_data | Vector of bias data, added to each sum. |
RISCV_NMSIS_NN_SUCCESS
- Successful operation riscv_nmsis_nn_status riscv_vector_sum_s8_s64 | ( | int64_t * | vector_sum_buf, |
const int32_t | vector_cols, | ||
const int32_t | vector_rows, | ||
const int8_t * | vector_data, | ||
const int32_t | lhs_offset, | ||
const int64_t * | bias_data | ||
) |
Calculate the sum of each row in vector_data, multiply by lhs_offset and optionally add s64 bias_data.
[in,out] | vector_sum_buf | Buffer for vector sums |
[in] | vector_cols | Number of vector columns |
[in] | vector_rows | Number of vector rows |
[in] | vector_data | Vector of weigths data |
[in] | lhs_offset | Constant multiplied with each sum |
[in] | bias_data | Vector of bias data, added to each sum. |
RISCV_NMSIS_NN_SUCCESS
- Successful operation