Related Products:
VPF2 - PowerPC MPC8641D, Dual User Programmable Virtex-5 FPGA VXS Digital Signal Processor
Phoenix VPF1 - Dual PowerPC, dual User Programmable Virtex-II Pro FPGA VXS DSP Board
Phoenix 3CPF1 - PowerPC 744X and User Programmable Virtex-II Pro FPGA 3U DSP Board

VSIPLus DSP Libraries VSIPL, CSIPL and VECLIB Libraries


  • Portable: As the VSIPL and CSIPL libraries adhere to the Core VSIPL standard, code portability to other systems is greatly enhanced.
  • High Performance: Vendor-optimized implementations often perform better than ad-hoc handwritten code.
  • Enhanced Productivity: Code is easier to read. Skills learned on one project are applicable to others. Greatly reduces the use of assembly code.

A set of highly optimized libraries for PowerPC 7447 and 7447A CPU that includes a VSIPL implementation of the Core Profile functionality, a standard C implementation of the same functionality and a large, systematic set of vector operations.

Introduction

To be of value to developers, DSP libraries need to be efficient in their use of processor resources in order to provide the fastest possible execution times; yet these libraries also need to be easy to use. For many years, the VSIPL (Vector, Signal and Image Processing Library) standard has provided an application programming interface (API) that simplifies development by hiding many of its implementation features making it widely used in DSP applications.

For some applications it is preferable to use a standard C API or other facilities which are not defined in the VSIPL standard. The VSIPLus libraries meet these requirements by providing: 

VSIPL - an optimized implementation conforming to the VSIPL standard

CSIPL - a library with the same functionality as VSIPL, but using a standard C API

VECLIB - a large set of optimized, low level vector routines providing additional functionality to the VSIPL standard.

Together, the VSIPLus libraries provide highly optimized implementations of a broad range of vector processing, matrix, FFT, standard filters and windowing functions. The libraries are easy to use and provide developers with the processing performance needed in modern applications.


Ease of Use

The package includes both Development and Production versions of the libraries. The Development versions include full parameter and error checking, to assist in tracking down programming bugs. The production libraries remove the error checking for greater efficiency.

Artificial constraints are not put on the data, so the libraries automatically handle data management allowing:

the data to be strided

the data to have any memory alignment

any data length is permitted (not just multiples of the processor vector length (=4 for the PowerPC74xx))

the data to be of any basic type supported by the processor

complex vector data to be either split or interleaved.


VSIPL and CSIPL Functionality

The VSIPL and CSIPL libraries provide the functionality specified for the VSIPL Core Profile; there are a total of 517 functions in the Core Profile.

The range of functions supported  includes:

elementwise vector operations (e.g. vector add)

vector math functions (e.g. sin, cos)

elementwise matrix operations (e.g. matrix add)

gather operations (e.g. dot product)

matrix operations (e.g. transpose)

matrix--vector operations (e.g. general matrix--vector product, GEMV)

matrix--matrix operations (e.g. general matrix--matrix product, GEMM)

windowing and filter operations (e.g. moving average)

FFT and convolution 


Table 1: VSIPL Supported Data Types

Data Type
Comments
vsip_scalar_vi
Scalar vector index
vsip_scalar_mi
Scalar matrix index (not in Core Lite)
vsip_scalar_bl
Scalar boolean
vsip_scalar_f
32- bit (single precision) float
vsip_cscalar_f
Complex (single precision) float
vsip_scalar_i
32 bit signed intege


VECLIB Functionality
VECLIB provides highly optimised implementations of a systematic set of vector operations on scalar and one or more vector operands. The library includes:

Binary Functions:
Contains all possible combinations of real, complex, scalar or vector arguments. A total of 110 independent functions, 128 functions in all.

Real Ternary Functions:
Contains all possible combinations of real scalar or real vector arguments. A total of 149 independent functions, 336 functions in all.

Complex Ternary Functions:
Contains all possible combinations of complex scalar and complex vector arguments. A total of 1263 independent functions, 3024 functions in all.

Real vector Quaternary Functions:
Contains all possible combinations of real vector arguments. A total of 62 independent functions, 64 functions in all.

Complex-vector Quaternary Functions:
Contains all possible combinations of complex vector arguments. A total of 805 independent functions, 1024 functions in all. 

The VECLIB library contains 4576 functions of 2, 3, or 4 operands, real or complex, scalar or vector


Table 2: Number of Routines in Core VSIPL Libraries

Algorithm Area Core Functions Comments
Initialize/finalize
2
Service routines
Block support
43
Block/matrix memory management
Vector support
104
Vector memory management
Vector copy
12
Real and complex
Matrix support
52
Matrix memory management
Matrix copy
2
Real and complex
Scalar functions
47
Indices, arithmetic, random numbers
Vector elementwise functions
147
Arithmetic, math (eg sin, cos), min/max, fill, gather, scatter, random numbers,...
1D and 2D FFT functions
24
Complex-complex, real-complex, complex-real, in-place, out of place
Window creation
4
Hanning, blackman, kaiser, chebyshev
FIR Filter
8
Create, filter, destroy, get attributes
Convolution
4
Create, filter, destroy, get attributes
Correlation
8
Create, filter, destroy, get attributes
Histogram
1
 
Matrix functions
19
Products, Transpose, Sum, Special Products
Linear Algebra
40
LU, Cholesky, QRD, Special solvers inc. Toeplitz
TOTAL FUNCTIONS
517


Optimization and Efficiency
Every routine in the libraries has been specifically optimised for the target processor. The implementations do the following:

block the data:   the block sizes are tailored to the processor being targeted: typically a multiple of 4 or 8.

unroll block loops:  the depth of unrolling is optimised and is operation and data size dependent.

prefetch blocks:  when this is helpful.

re-order and group   low level operations such as fetch, prefetch, and arithmetic operations.

implement  advanced cache management strategies.

implement  strategies which depend upon the data details. For example, aligned and unaligned data are treated separately, as are vectors with stride 1 (contiguous data); 2 (typically interleaved complex data); and general strided data.

handle ``edge effects:  vector or matrix sizes which are not a multiple of the SIMD length, in a transparent but optimal manner.

utilize a mix of optimised C and assembler modules.

Efficiency is also improved for complex data routines by the choices made for data representation. Within VSIPL, the representation of complex data is transparent to the user. The VSIPLus VSIPL library utilises a split (rather than interleaved) representation internally, and this is reflected in the excellent performance figures.

Last updated: May 19 2008, 06:02PM