Tuesday, August 4, 2009

Emulating Yield-like Iterators in VB10

Introduction

One of the defining characteristics of VB10 and C#4 is feature convergence. VB and C# now share optional arguments, dynamic types, multiline lambdas, and auto-properties. However, there are still differences between the two languages. One of the glaring differences is the lack of yield-style Iterators in VB. In this post, I am going to address that difference by constructing a class that uses lambda expressions to achieve functionality similar to yield.

Explicit Iterators vs Yield Iterators

Explicitly implementing an iterator is a huge pain, because of the sheer amount of code required. You have to write a whole class, implement two interfaces (including their non-generic legacy versions), and manually track state. Put it all together and you get massive line waste.

Just to hammer this in, here is a typical iterator implementation to enumerate a contiguous subset of an array:

Public Class SubArrayIterator(Of T)
Implements IEnumerable(Of T)
Implements IEnumerator(Of T)

Private ReadOnly array As T()
Private ReadOnly offset As Integer
Private ReadOnly length As Integer
Private index As Integer
Private cur As T

Public Sub New(ByVal array As T(), ByVal offset As Integer, ByVal length As Integer)
Me.array = array
Me.offset = offset
Me.length = length
End Sub

Public Function GetEnumerator() As IEnumerator(Of T) Implements IEnumerable(Of T).GetEnumerator
Return New SubArrayIterator(Of T)(array, offset, length)
End Function

Private ReadOnly Property Current As T Implements IEnumerator(Of T).Current
Get
Return cur
End Get
End Property

Private Function MoveNext() As Boolean Implements IEnumerator.MoveNext
If index >= length Then Return False
cur = array(index)
index += 1
Return True
End Function

Private Sub Reset() Implements IEnumerator.Reset
index = 0
End Sub

Private Sub Dispose() Implements IDisposable.Dispose
'no unmanaged state
End Sub

Private Function GetEnumeratorObj() As IEnumerator Implements IEnumerable.GetEnumerator
Return GetEnumerator()
End Function
Private ReadOnly Property CurrentObj As Object Implements IEnumerator.Current
Get
Return cur
End Get
End Property
End Class

Augh! Fifty lines for something I can describe in less than ten words! Now compare that to this:

public IEnumerable IterateSubArray<T>(T[] array, int offset, int length) {
for (int i = 0; i <>
yield return array[i + offset];
}

Wow! We jumped from fifty lines to four! The compiler will explicitly implement a large class (it translates the function into a state machine), but all we see is four lines. There are downsides (eg. Reset is not supported, but nobody uses Reset anyways), but they are mostly inconsequential compared to the code savings. The main problem is the lazy evaluation of the iterator causes argument exceptions to be thrown when the iterator is used instead of when it is constructed.

I should probably point out that, using linq, the example enumeration can be just a single line. Remember that it is only for demonstration purposes.

The Controller

The basic idea I have is to use a generator function to implement the enumeration. The function will take a controller, in order to signal special commands like "end of enumeration".

Public Interface IEnumeratorController(Of T)
Function Break() As T
End Interface

You can add more control methods as you desire them. Other useful possible control methods are "Multiple", which would return a sequence of items to enumerate, and "Repeat", which would return no item to enumerate and run the generator again. Note that the expected usage style of the controller methods is in a return statement (like "return controller.Break()"). This avoids confusing cases like calling break then returning an element to enumerate (Which would take precedence?).

The Enumerator

Now that the controller is defined, we can write a class which implements IEnumerator using a generator function. It's relatively straightforward: you just track the generator's last returned item and a flag for whether or not controller.break has been called yet. We have line waste, but we only have to include this code once instead of once per iterator.

Public NotInheritable Class Enumerator(Of T)
Implements IEnumerator(Of T)
Implements IEnumeratorController(Of T)

Private ReadOnly generator As Func(Of IEnumeratorController(Of T), T)
Private cur As T
Private break As Boolean
Public Sub New(ByVal generator As Func(Of IEnumeratorController(Of T), T))
Me.generator = generator
End Sub

Public Function MoveNext() As Boolean Implements IEnumerator(Of T).MoveNext
If Me.break Then Return False 'have previously called controller.break
Me.cur = generator(Me)
If Me.break Then Return False 'just called controller.break
Return True
End Function

Private Function ControllerBreak() As T Implements IEnumeratorController(Of T).Break
Me.break = True
Return Nothing
End Function

Public ReadOnly Property Current As T Implements IEnumerator(Of T).Current
Get
Return Me.cur
End Get
End Property

Private ReadOnly Property CurrentObj As Object Implements IEnumerator.Current
Get
Return Me.cur
End Get
End Property
Private Sub Reset() Implements IEnumerator.Reset
Throw New NotSupportedException()
End Sub
Private Sub Dispose() Implements IDisposable.Dispose
'no unmanaged state
End Sub
End Class

The Generator

With our Enumerator and Controller in hand, we can write a function to enumerate a contiguous subset of an array. We will use a lambda expression for the generator, and a hoisted local variable to keep state:

Public Function EnumerateSubArray(Of T)(ByVal items() As T, ByVal offset As Integer, ByVal length As Integer) As IEnumerator(Of T)
Dim index = 0 'hoisted variable used for state
Return New Enumerator(Of T)(
Function(controller)
If index >= length Then Return controller.Break()
index += 1
Return items(index - 1)
End Function)
End Function

The result isn't be as compact as using C#'s yield, but is still a huge improvement over explicit implementation. It's essentially like implementing only the MoveNext method instead of an entire enumerator class. The main downside compared to yield is the loss of multiple entry points into the function (we always start at the beginning, not on the line after the last return). We do have one advantage over using yield: we can check arguments upon construction of the iterator (because everything before return New Enumerator(...) is executed immediately).

Enumerable vs Enumerator

An IEnumerable is an object that can be enumerated more than once. It returns a new enumerator for each enumeration, and is the type required by the useful 'for each' loop. In order to implement an IEnumerable we just pull the same trick again: seeding a class with a generator function. Luckily, this time we don't need a controller because an IEnumerable doesn't require state.

Public NotInheritable Class Enumerable(Of T)
Implements IEnumerable(Of T)
Private ReadOnly generator As Func(Of IEnumerator(Of T))
Public Sub New(ByVal generator As Func(Of IEnumerator(Of T)))
Me.generator = generator
End Sub
Public Function GetEnumerator() As IEnumerator(Of T) Implements IEnumerable(Of T).GetEnumerator
Return generator()
End Function
Private Function GetEnumeratorObj() As IEnumerator Implements IEnumerable.GetEnumerator
Return GetEnumerator()
End Function
End Class

At last, we can properly iterate over that contiguous array subset by using our previous EnumerateSubArray function to generate enumerators:

Public Function IterateSubArray(Of T)(ByVal items() As T, ByVal offset As Integer, ByVal length As Integer) As IEnumerable(Of T)
Return New Enumerable(Of T)(Function() EnumerateSubArray(items, offset, length))
End Function

Conclusion

Even though VB doesn't have a yield keyword, we can still cut the number of lines required for an iterator by a factor a five. All we needed were a couple classes to turn generator functions into an enumerator. It's not as good as using yield, but we don't have yield.


No comments:

Post a Comment