Friday, March 12, 2010

Linq to Tasks

One of the more interesting properties of the Linq syntax is that, at heart, it has nothing to do with sequences or XML or SQL. Linq is a really syntax for performing operations with monads, and you can add your own types to it relatively easily. (For the uninitiated: a monad is essentially a wrapper class plus a way to transform functions to use wrapped values instead of normal values). Linq works with sequences because sequences are an extremely common Monad, and so the linq-to-list stuff is implemented as part of .Net. In this post, I'm going to show you an example of adding your own Linq-to-X functionality.

Tasks are a new feature in .Net 4.0. A task represents an asynchronous operation, such as a function running on another thread, and provides methods for working with the eventual result. You can use the ContinueWith methods to chain tasks together, feeding the eventual output of one task into the input of another task. For our purposes, the important thing is that tasks are a monad: they wrap a value, and you can modify functions to operate on tasks instead of normal values (using ContinueWith).

Since tasks are a type of monad, we can use them with Linq. All we have to do is define the necessary extension methods: Select and SelectMany. Select is the function used when you have a single 'From' line in your query, while SelectMany is used when there are multiple 'From' lines. Our task-version of Select will take a task and a function, and return a task for the function's eventual result after being given the task's eventual value. SelectMany is essentially the same thing, but with two tasks and two functions.

    Imports System.Threading.Tasks

Public Module LinqToTasks
<Extension()>
Public Function [Select](Of TArg, TResult)(ByVal task As Task(Of TArg),
ByVal func As Func(Of TArg, TResult)) As Task(Of TResult)
Return task.ContinueWith(Function(t) func(t.Result), TaskContinuationOptions.OnlyOnRanToCompletion)
End Function

<Extension()<
Public Function SelectMany(Of TArg, TMid, TReturn)(ByVal task As Task(Of TArg),
ByVal projection1 As Func(Of TArg, Task(Of TMid)),
ByVal projection2 As Func(Of TArg, TMid, TReturn)) As Task(Of TReturn)
Return task.Select(Function(value1) projection1(value1).
Select(Function(value2) projection2(value1, value2))).Unwrap
End Function
End Module

I know that doesn't seem like a lot of code, but that's all of it. We can now use the linq syntax on tasks. But keep in mind that we didn't implement many of the operators, like "where" (what would that even mean for a task?). Here are some examples of using the linq syntax on tasks:

Example #1: Returns a task that evaluates to 35 after five seconds:

        Public  Function TaskTest() As  Task(Of Int32)
Dim t = New TaskCompletionSource(Of Int32)
Call New System.Threading.Thread(Sub()
System.Threading.Thread.Sleep(5000)
t.SetResult(5)
End Sub).Start()
Return From value In t.Task
Let squared = value * value
Let doubled = value + value
Select squared + doubled
End Function

Example #2: Returns a task that evaluates to 20 after ten seconds:

Public Function TaskTest2() As Task(Of Int32)
Dim t = New TaskCompletionSource(Of Task(Of Int32))
Call New System.Threading.Thread(Sub()
System.Threading.Thread.Sleep(5000)
Dim t2 = New TaskCompletionSource(Of Int32)
t.SetResult(t2.Task)
System.Threading.Thread.Sleep(5000)
t2.SetResult(1)
End Sub).Start()
Return From taskValue In t.Task
From value In taskValue
Select value * 20
End Function

As you have seen, allowing linq syntax on a type is as easy as implementing extension methods for the operators. A good learning exercise is to implement the linq operators for the non-generic version of IEnumerable (hint: all the functions involve enumerating). There are tons of types you can linq-ify: collection types, nullables, Lazy<t>, delegates like Func<t>, etc, etc, etc and etc. Try not to overdo it.

Friday, March 5, 2010

Post-Mortem of a Job Application

I applied for a job recently, and was asked to write a small clone of Arkanoid (aka Breakout) using C++/DirectX. My C++ was rusty and the last DirectX stuff I did was five years ago and in VB6, but I figured I might as well try. At least it would refresh my C++ and force me to learn some DirectX. Unfortunately, I never managed to finish the game (on time). My lack of familiarity caused difficulties but, looking back, I can see that familiarity was not the issue. Learning sucked up some time, but the real time sink came about because of a design decision made only three hours into the project: I decided to write a Rational class instead of using floating point values.

Since this was a job application, I didn't want to do the bare minimum. I came up with the idea to record 'ghosts', so you could race against yourself. If I had the time, I could even feed the ghost data in live from another game for some relatively simple networked multiplayer. Thinking about how I would implement this, I remembered that floating points are not reliable in terms of consistency. Small optimizations made by the compiler, and differences across machines, can make the results of even simple additions come out slightly differently. That could be a problem, so I decided to write a Rational class and use it instead of floating points. That way I could guarantee consistency across machines.

At the beginning, I was reaping benefits. The rationals weren't as fast as floats, since constructing one involved computing a gcd, but the game wasn't even close to taxing the CPU. They even made the physics nice and exact, because all the 2d vector operations I used are closed over the rationals (No rounding errors. At all!). For example, this code computes the *exact* intersection point and normal of a moving point into a line segment:

//[example note: Vector2 is an x/y pair of Rationals. CollisionPoint is a location/normal pair which can be empty for no collision]

CollisionPoint CollisionPoint :: PointIntoLine(const Vector2 p, const Vector2 dp,

const Vector2 q1, const Vector2 q2) {

assert(dp != Vector2(0, 0));

assert(q1 != q2);

//Useful intermediate values

Vector2 p1 = p;

Vector2 p2 = p + dp;

Vector2 dq = q2 - q1;

Rational f1 = q1.Cross(q2);

Rational f2 = p1.Cross(p2);

//Intersection point (of lines extended to infinity)

Rational d = dp.Cross(dq);

if (d == 0) return CollisionPoint(); //no collision due to parallel lines

Rational cx = (f1*dp.X() - f2*dq.X())/d;

Rational cy = (f1*dp.Y() - f2*dq.Y())/d;

Vector2 intersectionPoint = Vector2(cx, cy);

//Confirm that the intersection point is within the segments

if (Median(p1.X(), cx, p2.X()) != cx ||

Median(p1.Y(), cy, p2.Y()) != cy ||

Median(q1.X(), cx, q2.X()) != cx ||

Median(q1.Y(), cy, q2.Y()) != cy) return CollisionPoint(); //missed

//return the point of impact and the normal of the line the at point of impact

return CollisionPoint(intersectionPoint, - dp.PerpOnto(q2 - q1));

}

The happy times continued until I hit an assert: "denominator < 0". I had decided that a rational's negative sign must always be on the numerator, and put an assert in the constructor to enforce this. Upon first seeing the error, I thought I had made some stupid error in one of the operator functions. Unfortunately, what had happened was far more insidious: integer overflow.

The thing to understand about rationals is that performing operations on them tends to make them 'share' their factors. If you add a/b and c/d, you get (ad + bc)/(bd), which almost always has a larger denominator and numerator even after removing the gcd. In the worst case each addition will square the numerator and denominator, and you only have to square five times to go from 2 to 2^32 and cause an overflow. Well guess what: a physics function involves way more than five additions, not to mention other operations.

I tried to smooth over the problem by rounding, but the problem kept happening. I rounded more aggressively, but it kept happening. Eventually I had rounding statements on practically every other line outside of the physics functions, but it still kept happening. Even my goat sacrifice to Apollo didn't work (maybe he doesn't like goats?). You could pick a physics function at random, give it touchy input, and cause an assertion. To make matters worse, all this tinkering and rounding kept introducing small bugs. Eventually I ran way over on time, and gave up.

If I had gone with the tried-and-true floating points, I would have finished with a potentially-maybe-slightly-out-of-sync ghost, instead of not finishing at all. The lesson I'm taking from all of this is that I shouldn't be experimenting with ideas when I'm on the clock, unless it's necessary. Sometimes they work, but when they don't the cost is too high. Perhaps, now that there's no time limit, I'll rewrite the rationals to use BigIntegers in order to guarantee there are no overflows. That could be an interesting side project.