cpl - cross platform library

The cross-platform library (cpl) is a high-performance C++ library directed towards developing audio applications. It contains a lot of basic functions for doing common tasks on different systems, a lot of signal processing and math - and the vision of the library is to make it easier to develop stuff like audio plugins. It is self-contained, and contains an extended graphics & widget system build on JUCE, tailored for audio plugins.

For now, it is a huge work in project and functions mainly as my common codebase across my projects. As of such, the whole library probably shouldn't be used in more serious projects, however it contains certain interesting subprojects (some which are explained here), and the project is open-source.

Contents

Features

This list is evergrowing and may not be up to date, you can check out the repository for the newest commits.

Widget system

Screenshot of UI widgets

The image shows a common bunch of different widgets. A lot of them are based on the widgets in JUCE, but they all inherit a polymorphic design (CBaseControl), that enables:

Included widgets are currently sliders/knobs (as colour, value or list controls), combo boxes, transform widgets, text and image buttons and preset widgets. Widgets are either vector graphics or SVG for images.

Parameterized SIMD math library

Most SIMD libraries either revolves around specializing a specific type (like 128-bit float vectors) and/or providing a wrapper type with built-in operators. The second suffers from extreme compiler-optimization dependency, that may not even be available today. The first suffers from generality and not being generic, the field of supporting accelerated SIMD operations in common applications is non-trivial: It depends on hardware, OS and CPU-support. If you want to take advantage of newer instruction sets, you often have to target your whole application to that platform, thus your end-product is only compatible on a subset of current consumer systems.

The approach in the SIMD math library instead depends on runtime path selection of parameterized generic functions - this allows support of everything from single-scalars to N-vectors, so long as your algorithm is easily parallelizable. The SIMD library contains a bunch of helpful template metaprogramming to allow this system, and has a designed API around this. Here's a common method for rotating a set of coordinates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
void matrix_rotate(vector<float> & x, vector<float> & y, size_t size, float radians)
{
	const float sine = sin(radians);
	const float cosine = cos(radians);

	for (size_t i = 0; i < size; ++i)
	{
		float real = x[i], imag = y[i], temp;

		temp = real * cosine - imag * sine;
		imag = real * sine + imag * cosine;
		real = temp;

		x[i] = real;
		y[i] = imag;
	}
}

Adding a little generality to the mix, the method can be rewritten pretty easily to this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
using namespace cpl::simd;

template<typename V, typename Vector>
	void vmatrix_rotate(Vector & x, Vector & y, size_t size, typename scalar_of<V>::type radians)
	{
		const V rads = set1<V>(radians);
		const V sine = sin(rads);
		const V cosine = cos(rads);

		auto px = &x[0];
		auto py = &y[0];

		for (size_t i = 0; i < size; i += elements_of<V>::value)
		{
			V real = load<V>(px + i), imag = load<V>(py + i), temp;

			temp = real * cosine - imag * sine;
			imag = real * sine + imag * cosine;
			real = temp;

			store(px + i, real);
			store(py + i, imag);

		}
	}

The win here is, that the method is now type-agnostic, type-safe and generic. The only thing missing is potentially a scalar remainder loop, if you're not in control of your buffers' sizes. Currently, SIMD architectures support only 32- and 64-bit IEEE floating point models, but it may change. This piece of code doesn't have to, and you can call it with nearly every type of standard memory container:

1
2
3
4
5
6
7
8
auto x = { 0.2, 0.3, 0.1 };
auto y = { 0.3, 0.2, 0.1 };

vmatrix_rotate<double>(x, y, extent<decltype(x)>::value, M_PI);

vector<float> x2(1024), y2(1024);

vmatrix_rotate<float>(x2, y2, x2.size(), 3.14f);

But of course, the idea is to support run-time scalability. Using something like this for your entry point, the code will now select the most optimal path based on CPU-support, and thus take maximum advantage of the current instruction set - but the same binary will still be able to run on systems with no SIMD support, with no penalty!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
switch (max_vector_capacity<float>())
{
	case 8:
		vmatrix_rotate<vector_of<float, 8>::type>(x, y, x.size(), rads);
		break;
	case 4:
		vmatrix_rotate<vector_of<float, 4>::type>(x, y, x.size(), rads);
		break;
	default:
		vmatrix_rotate<float>(x, y, x.size(), rads);
		break;
}

And just to be clear, here's the output assembly for the body of the loop of matrix_rotate:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
loop:
	movss       xmm2,dword ptr [edx+eax]
	inc         edi
	movss       xmm3,dword ptr [eax]
	movaps      xmm1,xmm2
	movaps      xmm0,xmm3
	mulss       xmm1,xmm5
	mulss       xmm0,xmm4
	mulss       xmm3,xmm5
	mulss       xmm2,xmm4
	subss       xmm1,xmm0
	addss       xmm3,xmm2
	movss       dword ptr [edx+eax],xmm1
	movss       dword ptr [eax],xmm3
	add         eax,4
	cmp         edi,ebx
	jb          loop

And for vmatrix_rotate for SSE2-enabled systems:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
loop:
	movaps      xmm3,xmmword ptr [esi+eax]
	movaps      xmm2,xmmword ptr [eax]
	movaps      xmm1,xmm3
	movaps      xmm0,xmm2
	mulps       xmm1,xmm5
	mulps       xmm0,xmm4
	mulps       xmm2,xmm5
	mulps       xmm3,xmm4
	subps       xmm1,xmm0
	addps       xmm2,xmm3
	movaps      xmmword ptr [esi+eax],xmm1
	movaps      xmmword ptr [eax],xmm2
	add         eax,10h
	sub         ecx,1
	jne         loop

As we can see, the compiler failed to auto-vectorize the scalar function even though the compiler is completely new, and the code is compiled on the highest optimization level. The assembly for vmatrix_rotate is instead optimal.

The library includes overloads of nearly all common math library functions like exp, sin, cos, tan, atan, log, sqrt. Although the library only supports intel-architecture SIMD-acceleration, plans are to include formats found on other common systems, like ARM. If you follow the aforementioned system, the code will still be compatible on other platforms, it just wont be accelerated (the scalar path is chosen).

What will this library not do? Accelerated array and matrix operations, like performing a log() function on a large array. Other libraries exist for this sort of thing, like yeppp and volk. It can of course be simulated, perhaps even to a similar level of performance, but this library's use case is custom algorithms and application-specific domains.

About boost.SIMD: This library was developed before I knew of before-mentioned, I'm currently looking into it.

Subbpixel font rendering

With cpl relying mostly on other platform graphic libraries, extreme inconsistency in font rendering, weight and antialiasing on different platforms led me to develop a separate module integrating into the JUCE software renderer, that performs subpixel-rendering of fonts that (to my eye) looks a lot more pleasing. It is even competing in rendering speed. Contrary to other renderers, it is aware of screen orientation and pixel-ordering of the screen. Here's a comparison of different renderers:

Comparison of font engine renderers

Here's a link to the development thread. The module can be found in cpl/rendering. Even though the code is specific to JUCE, the technology and algorithm is certainly usable in other projects.

License

The license is currently GPL v3, however license incompabilities may induce a change. The project will always be open-source though. If you want to use the code (especially for a commercial product), please contact me. See licenses of contained libraries as well.

Contained libraries

It's a great and precise FFT library for double precision, while being self-contained and has no dependcies. It's also pretty fast. License.

Good, fast, safe. The basis of the lock free queues in cpl. License.

Anything GUI-related needs the JUCE library, which has a separate dual-license.

Development status

The library is actively being worked on nearly everyday. I reserve the right to make breaking changes constantly - backwards compability is not an priority right now, until I develop a stable modular system. Please don't hesitate to report bugs.