Vector Swizzling in D
March 14th, 2012Warning: DOMXPath::query(): Invalid expression in /home/boscop/public_html/blog/wp-content/plugins/wppygments/php/phpQuery.php on line 1764
Warning: DOMXPath::query(): Invalid expression in /home/boscop/public_html/blog/wp-content/plugins/wppygments/php/phpQuery.php on line 1764
Warning: Invalid argument supplied for foreach() in /home/boscop/public_html/blog/wp-content/plugins/wppygments/php/phpQuery.php on line 1769
Inspired by this article which shows how to implement syntactic sugar for vector swizzlers in C++, I decided to write a blog post about how to implement them in D (and compare it a bit to the C++ version). I ended up with five different approaches.
This post is targeted at people who know a little about D but want to dig deeper or those who want to get a glimpse of what’s possible. There is probably nothing surprising/revolutionary for experienced D users here ![]()
While the C++ article focuses mainly on the swizzlers and the resulting asm code, I will mainly focus on the D programming language and explain some of its features.
Vector swizzlers are a concise syntax to transform a vector into another one by assigning its components in an arbitrary order to those of the resulting vector.
Here are some examples:
Vec(5,6,7,8).zyzx == Vec(7, 6, 7, 5)
Vec(5,6,7,8).zyx == Vec(7, 6, 5, 5)
Vec(5,6,7,8).wy == Vec(8, 6, 6, 6)
Vec(5,6,7,8).z == Vec(7, 7, 7, 7)
The first approach is the least elegant and most brute force way: generating all different swizzler getters using CTFE (compile-time function execution) to build a large string that contains all their definitions and then using the mixin statement to paste it into the Vec struct:
import std.stdio: writeln;
import std.ascii: toUpper;
auto fo_shizzle_ma_swizzle() {
class Perm {
char[] s;
int delegate(char[]) f;
this(int n) {s.length = n;}
void perm(int pos) {
if(pos >= s.length) {
f(s);
return;
}
foreach(i; 0..4) {
s[pos] = cast(char)('w' + i);
perm(pos + 1);
}
}
int opApply(int delegate(char[]) dg) {
f = dg;
perm(0);
return 0;
}
}
char[] join(T)(T[] a, T s) {
if(!a.length) return [];
T[] r = [a[0]];
foreach(e; a[1..$]) r ~= [s, e];
return r;
}
auto extend(char[] p) {
char[] s = p.dup;
foreach(ref c; s) c = cast(char)toUpper(c);
foreach(i; p.length .. 4) s ~= s[$-1];
return s;
}
string s;
foreach(i; 1..5)
foreach(p; new Perm(i))
s ~= "@property auto " ~ p ~ "(){return Vec(" ~ join(extend(p), ',') ~ ");}";
return s;
}
struct Vec {
double[4] v;
@property auto X() {return v[0];}
@property auto Y() {return v[1];}
@property auto Z() {return v[2];}
@property auto W() {return v[3];}
this(double x, double y, double z, double w) {v = [x,y,z,w];}
mixin(fo_shizzle_ma_swizzle());
}
void main() {
Vec a, b = Vec(5,6,7,8);
a = b.zyzx;
writeln(a);
a = b.zyx;
writeln(a);
a = b.wy;
writeln(a);
a = b.z;
writeln(a);
}
/*
output:
Vec([7, 6, 7, 5])
Vec([7, 6, 5, 5])
Vec([8, 6, 6, 6])
Vec([7, 7, 7, 7])
*/
What’s going on?
The fo_shizzle_ma_swizzle() function is responsible for generating the large string. It uses an instance of the Perm class in the foreach statement to iterate over all permutations of the length that Perm gets instantiated with (ranging from 1 to 4 (a..b includes only a to b-1) in the outer loop). This is of course because we have to account for swizzlers of length 1 to 4. foreach calls back Perm’s opApply(), which in turn calls the recursive perm() function that calls the delegate that represents the body of the foreach loop with the current permutation as argument. E.g. the generated getter for zyx looks like this:
@property auto zyx(){return Vec(Z,Y,X,X);}
At that point, p contains “zyx”, which extend() transforms into “ZYXX”, which gets joined to “Z,Y,X,X”, which is then concatenated with the rest.
This approach is bad because it generates a lot of functions (4 + 4^2 + 4^3 + 4^4 = 340) that will never be used which just blows up the object file and it also decreases compilation time (not very noticeable in this range though).
The second approach is to use constants to specify swizzlers which are passed to opIndex (the subscript operator). This approach is similar to the one in the other article, but instead of generating the constants in an external script we can use D’s CTFE again to generate them for us:
import std.stdio: writeln;
import std.conv: to;
struct Perm {
char[] s;
int delegate(char[]) f;
this(int n) {s.length = n;}
void perm(int pos) {
if(pos >= s.length) {
f(s);
return;
}
foreach(i; 0..4) {
s[pos] = cast(char)('W' + i);
perm(pos + 1);
}
}
int opApply(int delegate(char[]) dg) {
f = dg;
perm(0);
return 0;
}
}
auto fo_shizzle_ma_swizzle() {
char[] join(T)(T[] a, T s) {
if(!a.length) return [];
T[] r = [a[0]];
foreach(e; a[1..$]) r ~= [s, e];
return r;
}
auto extend(char[] p) {
char[] s = p;
foreach(i; p.length .. 4) s ~= s[$-1];
return s;
}
string s;
foreach(i; 1..5) {
auto S = "S" ~ cast(char)('0' + i);
s ~= "alias " ~ ["byte","ubyte","int","uint"][i-1] ~ " " ~ S ~ ";";
foreach(p; Perm(i)) {
int v;
foreach(j; 0..i)
v += (t => t[1] > 2];
return Vec(a,b,b,b);
}
Vec opIndex(S3 s) {
auto a = v[s & 0x3], b = v[(s >> 2) & 0x3], c = v[s >> 4];
return Vec(a,b,c,c);
}
Vec opIndex(S4 s) {
auto a = v[s & 0x3], b = v[(s >> 2) & 0x3], c = v[(s >> 4) & 0x3], d = v[s >> 6];
return Vec(a,b,c,d);
}
}
void main() {
Vec a, b = Vec(5,6,7,8);
a = b[ZYZX];
writeln(a);
a = b[ZYX];
writeln(a);
a = b[WY];
writeln(a);
a = b[Z];
writeln(a);
}
/*
output:
Vec([7, 6, 7, 5])
Vec([7, 6, 5, 5])
Vec([8, 6, 6, 6])
Vec([7, 7, 7, 7])
*/
This time the fo_shizzle_ma_swizzle() function generates the constants, which are then mixed into the global scope. For example
immutable S2 YZ=9;
immutable S4 ZYWZ=182;
To print stuff at compile time you use pragma(msg, str), so you could do
pragma(msg, fo_shizzle_ma_swizzle());
to see all the constants that are generated. (Or you could print them at runtime of course.)
The rule to choose the number for each constant is the same as in the C++ article:
The first two bits determine which component becomes the new X (with X=0, …, W=3), the next two bits determine which one becomes Y etc.
But since we want to support swizzlers of shorter length than 4 we have 4 different overloads that use the argument differently (extend it to the right if the swizzler’s length is shorter than 4). For this we have to choose 4 different argument types. D doesn’t have a typedef statement like C++ (anymore), instead it has ‘alias’, which doesn’t introduce a new type though. If I had aliased all 4 types to int, the overloads would have been ambiguous because they would all take arguments of type int. That’s why I decided to use byte (S1) for swizzlers of length 1, ubyte (S2) for swizzlers of length 2, int (S3) for those of length 3 and uint (S4) for those of length 4.
(Note: using e.g. byte for swizzlers of length 4 wouldn’t work, because they become too large (e.g. WWWW=255 because W=3 and W becomes the new X, Y, Z and W, so all two-bit pairs contain a 3)
The alias declarations are also generated by fo_shizzle_ma_swizzle().
The decoding then takes place in the opIndex overloads:
The opIndex for S1 uses the component that is represented by the first 2 bits for all components in the returned vector.
The opIndex for S2 uses the component that is represented by the first 2 bits for the new x component and the one represented by the second pair of bits is used for the remaining y,z,w components.
You get the picture
This approach isn’t really nice either because it pollutes the module namespace with 340 constants and the syntax to access the swizzlers isn’t very appealing.
The third approach uses member function templates to pass the indices at compile-time. It’s more or less a translation of the C++ version:
enum {X, Y, Z, W}
struct Vec {
double[4] v;
@property auto X() {return v[0];}
@property auto Y() {return v[1];}
@property auto Z() {return v[2];}
@property auto W() {return v[3];}
this(double x, double y, double z, double w) {v = [x,y,z,w];}
auto swizzle(uint a)() {
return Vec(v[a], v[a], v[a], v[a]);
}
auto swizzle(uint a, uint b)() {
return Vec(v[a], v[b], v[b], v[b]);
}
auto swizzle(uint a, uint b, uint c)() {
return Vec(v[a], v[b], v[c], v[c]);
}
auto swizzle(uint a, uint b, uint c, uint d)() {
return Vec(v[a], v[b], v[c], v[d]);
}
}
unittest {
auto v = Vec(5,6,7,8);
assert(v.swizzle!(Z,Y,Z,X) == Vec(7, 6, 7, 5));
assert(v.swizzle!(Z,Y,X) == Vec(7, 6, 5, 5));
assert(v.swizzle!(W,Y) == Vec(8, 6, 6, 6));
assert(v.swizzle!(Z) == Vec(7, 7, 7, 7));
}
(This example also demonstrates how unittests can be integrated into the code. Those are run when you pass the -unittest switch to DMD.)
The syntax for using the swizzlers is a bit verbose, even when shortening “swizzle” to “s”.
To demonstrate more of D’s metaprogramming capabilities (and to allow more successful inlining) the fourth example uses templates for every function involved in the swizzlers:
template i(char c) {
enum i = [3,0,1,2][c-'w'];
}
template extend(string s) {
static if(s.length < 4)
enum extend = extend!(s ~ s[$-1]);
else
enum extend = s;
}
template valid(string s) {
template v(char c) {
enum v = 'w' <= c && c <= 'z';
}
static if(s.length == 1)
enum valid = v!(s[0]);
else
enum valid = v!(s[0]) && valid!(s[1..$]);
}
struct Vec {
double[4] v;
@property auto X() {return v[0];}
@property auto Y() {return v[1];}
@property auto Z() {return v[2];}
@property auto W() {return v[3];}
this(double x, double y, double z, double w) {v = [x,y,z,w];}
@property auto s(string s, string p = extend!s)() if(s.length <= 4 && valid!s) {
return Vec(v[i!(p[0])], v[i!(p[1])], v[i!(p[2])], v[i!(p[3])]);
}
}
unittest {
assert(Vec(5,6,7,8).s!"zyzx" == Vec(7, 6, 7, 5));
assert(Vec(5,6,7,8).s!"zyx" == Vec(7, 6, 5, 5));
assert(Vec(5,6,7,8).s!"wy" == Vec(8, 6, 6, 6));
assert(Vec(5,6,7,8).s!"z" == Vec(7, 7, 7, 7));
}
Since templates are by nature declarative, they return values by declaring a variable or type with the same name as themselves, which dictates a functional coding style.
As you can see, templates can be nested. You could even return a local template that captures symbols from the current one like a closure…
The string passed to “s” is checked by the template constraint, so that we only get valid swizzlers, i.e. those of a length that doesn’t exceed 4 and with all their chars in the range ‘w’..’z’. The “i” template is used to convert a char to its index into the “v” array, where again x=0, …, w=3.
This code allows using v.s!”zyx” syntax instead of v.s!(Z,Y,X) but it’s still too verbose. We’d like to be able to write v.zyx instead, right?
This brings us to the fifth approach which uses a rather new feature, called opDispatch, which is a member template that gets called when a method that is called on an object doesn’t exist (and if it’s also not available as a global function that takes objects of this type as first argument. In that case the global function would be called because of UFCS, but that’s a different topic…).
opDispatch gets the name of the non-existing method as compile-time argument and can then dispatch on its own, hence the name ![]()
Here is a short example of what is possible with opDispatch:
struct Foo {auto asd() {return "foo";}}
struct Bar {auto sdf() {return 3;}}
struct Baz {void dfg(int a, int b) {assert(a == b);}}
struct S {
Foo foo;
Bar bar;
Baz baz;
auto opDispatch(string s, T...)(T a) {
static if(s[0..4] == "foo_")
return __traits(getMember, this.foo, s[4..$])(a);
else static if(s[0..4] == "bar_")
return __traits(getMember, this.bar, s[4..$])(a);
else static if(s[0..4] == "baz_")
return __traits(getMember, this.baz, s[4..$])(a);
else static assert(0);
}
}
void main() {
S s;
assert(s.foo_asd() == "foo");
assert(s.bar_sdf() == 3);
s.baz_dfg(42, 42);
}
You can read more about opDispatch on the official site.
It forwards the arguments by using the variadic argument tuple ‘a’. The arguments’ types will be stored in the tuple T, but it’s unused here, since ‘a’ is just passed on to the target method.
Since we can use opDispatch to catch the calls to non-existing methods, our swizzlers will have method syntax. If we additionally mark opDispatch with @property, we can omit the parens when calling them (as we are already doing with the vector component getters X,Y, etc.):
template i(char c) {
enum i = [3,0,1,2][c-'w'];
}
template extend(string s) {
static if(s.length < 4)
enum extend = extend!(s ~ s[$-1]);
else
enum extend = s;
}
template valid(string s) {
template v(char c) {
enum v = 'w' <= c && c <= 'z';
}
static if(s.length == 1)
enum valid = v!(s[0]);
else
enum valid = v!(s[0]) && valid!(s[1..$]);
}
struct Vec {
double[4] v;
@property auto X() {return v[0];}
@property auto Y() {return v[1];}
@property auto Z() {return v[2];}
@property auto W() {return v[3];}
this(double x, double y, double z, double w) {v = [x,y,z,w];}
@property auto opDispatch(string s)() if(s.length <= 4 && valid!s) {
immutable p = extend!s;
return Vec(v[i!(p[0])], v[i!(p[1])], v[i!(p[2])], v[i!(p[3])]);
}
}
unittest {
assert(Vec(5,6,7,8).zyzx == Vec(7, 6, 7, 5));
assert(Vec(5,6,7,8).zyx == Vec(7, 6, 5, 5));
assert(Vec(5,6,7,8).wy == Vec(8, 6, 6, 6));
assert(Vec(5,6,7,8).z == Vec(7, 7, 7, 7));
}
This allows us to use the same syntax as with the first approach but in a much cleaner way. The opDispatch template is only instantiated for the swizzlers that are actually used. If we have a vector v and call v.zyx, it will transform into v.opDispatch!(“zyx”)(), which then calls the other templates (at compile-time of course).
Conclusion:
As you have seen there are many ways to skin a cat (not only in D).
The approach using opDispatch is arguably the cleanest one, because we get the desired syntax without hacky string mixins, without polluting any scope with a huge number of symbols (whether it’s functions, constants or enums) and without having one overload with a special integral type for each swizzler length.
Instead it expresses the intent very clearly in a few lines of code.
Update (2012-03-15)
Often you’d use CTFE instead of templates whenever possible to be able to use the same functions at compile-time and run-time, and because it’s often less verbose.
Originally I was using CTFE in the opDispatch example but I noticed that the operations actually didn’t get executed at compile-time, which is why I switched to using templates. On the D Newsgroup, Don Clugston pointed out to me which operations prevented CTFE with these functions, so here is the improved version that does everything at compile-time without templates (except opDispatch):
bool valid(string s) {
foreach(c; s)
if(c < 'w' || c > 'z') return false;
return true;
}
struct Vec {
double[4] v;
@property auto X() {return v[0];}
@property auto Y() {return v[1];}
@property auto Z() {return v[2];}
@property auto W() {return v[3];}
this(double x, double y, double z, double w) {v = [x,y,z,w];}
@property auto opDispatch(string s)() if(s.length <= 4 && valid(s)) {
auto extend(string s) {
while(s.length < 4) s ~= s[$-1];
return s;
}
enum p = extend(s);
int i(char c) {return [3,0,1,2][c-'w'];}
enum i0 = i(p[0]), i1 = i(p[1]), i2 = i(p[2]), i3 = i(p[3]);
return Vec(v[i0], v[i1], v[i2], v[i3]);
}
}
static assert(Vec(5,6,7,8).zyzx == Vec(7, 6, 7, 5));
static assert(Vec(5,6,7,8).zyx == Vec(7, 6, 5, 5));
static assert(Vec(5,6,7,8).wy == Vec(8, 6, 6, 6));
static assert(Vec(5,6,7,8).z == Vec(7, 7, 7, 7));
The important thing if you want to make sure that something gets evaluated at compile-time is to declare an enum with the expression as value.
Using ‘immutable’ only guarantees that something isn’t changed after after it’s assigned (function-local immutables) / after the constructor returns (class/struct fields) / after the module constructor returns (immutables at module level).
When a type/function isn’t used outside a function I often use function-local types/functions (just as with variables, which are declared just before they are needed). Someone told me that he thinks it makes the code unreadable, but IMO it improves encapsulation and locality (when reading the code in the body of the function you don’t have to look far to see the code of the helper function). Of course, as soon as you need the function/type outside, you can move it to a higher scope.