One of the defining characteristics of VB10 and C#4 is feature convergence. VB and C# now share optional arguments, dynamic types, multiline lambdas, and auto-properties. However, there are still differences between the two languages. One of the glaring differences is the lack of yield-style Iterators in VB. In this post, I am going to address that difference by constructing a class that uses lambda expressions to achieve functionality similar to yield.
Explicit Iterators vs Yield Iterators
Explicitly implementing an iterator is a huge pain, because of the sheer amount of code required. You have to write a whole class, implement two interfaces (including their non-generic legacy versions), and manually track state. Put it all together and you get massive line waste.
Just to hammer this in, here is a typical iterator implementation to enumerate a contiguous subset of an array:
Public Class SubArrayIterator(Of T)
Implements IEnumerable(Of T)
Implements IEnumerator(Of T)
Private ReadOnly array As T()
Private ReadOnly offset As Integer
Private ReadOnly length As Integer
Private index As Integer
Private cur As T
Public Sub New(ByVal array As T(), ByVal offset As Integer, ByVal length As Integer)
Me.array = array
Me.offset = offset
Me.length = length
End Sub
Public Function GetEnumerator() As IEnumerator(Of T) Implements IEnumerable(Of T).GetEnumerator
Return New SubArrayIterator(Of T)(array, offset, length)
End Function
Private ReadOnly Property Current As T Implements IEnumerator(Of T).Current
Get
Return cur
End Get
End Property
Private Function MoveNext() As Boolean Implements IEnumerator.MoveNext
If index >= length Then Return False
cur = array(index)
index += 1
Return True
End Function
Private Sub Reset() Implements IEnumerator.Reset
index = 0
End Sub
Private Sub Dispose() Implements IDisposable.Dispose
'no unmanaged state
End Sub
Private Function GetEnumeratorObj() As IEnumerator Implements IEnumerable.GetEnumerator
Return GetEnumerator()
End Function
Private ReadOnly Property CurrentObj As Object Implements IEnumerator.Current
Get
Return cur
End Get
End Property
End Class
Augh! Fifty lines for something I can describe in less than ten words! Now compare that to this:
public IEnumerableIterateSubArray<T>(T[] array, int offset, int length) {
for (int i = 0; i <>
yield return array[i + offset];
}
Wow! We jumped from fifty lines to four! The compiler will explicitly implement a large class (it translates the function into a state machine), but all we see is four lines. There are downsides (eg. Reset is not supported, but nobody uses Reset anyways), but they are mostly inconsequential compared to the code savings. The main problem is the lazy evaluation of the iterator causes argument exceptions to be thrown when the iterator is used instead of when it is constructed.
I should probably point out that, using linq, the example enumeration can be just a single line. Remember that it is only for demonstration purposes.
The Controller
The basic idea I have is to use a generator function to implement the enumeration. The function will take a controller, in order to signal special commands like "end of enumeration".
Public Interface IEnumeratorController(Of T)
Function Break() As T
End Interface
You can add more control methods as you desire them. Other useful possible control methods are "Multiple", which would return a sequence of items to enumerate, and "Repeat", which would return no item to enumerate and run the generator again. Note that the expected usage style of the controller methods is in a return statement (like "return controller.Break()"). This avoids confusing cases like calling break then returning an element to enumerate (Which would take precedence?).
The Enumerator
Now that the controller is defined, we can write a class which implements IEnumerator
Public NotInheritable Class Enumerator(Of T)
Implements IEnumerator(Of T)
Implements IEnumeratorController(Of T)
Private ReadOnly generator As Func(Of IEnumeratorController(Of T), T)
Private cur As T
Private break As Boolean
Public Sub New(ByVal generator As Func(Of IEnumeratorController(Of T), T))
Me.generator = generator
End Sub
Public Function MoveNext() As Boolean Implements IEnumerator(Of T).MoveNext
If Me.break Then Return False 'have previously called controller.break
Me.cur = generator(Me)
If Me.break Then Return False 'just called controller.break
Return True
End Function
Private Function ControllerBreak() As T Implements IEnumeratorController(Of T).Break
Me.break = True
Return Nothing
End Function
Public ReadOnly Property Current As T Implements IEnumerator(Of T).Current
Get
Return Me.cur
End Get
End Property
Private ReadOnly Property CurrentObj As Object Implements IEnumerator.Current
Get
Return Me.cur
End Get
End Property
Private Sub Reset() Implements IEnumerator.Reset
Throw New NotSupportedException()
End Sub
Private Sub Dispose() Implements IDisposable.Dispose
'no unmanaged state
End Sub
End Class
The Generator
With our Enumerator and Controller in hand, we can write a function to enumerate a contiguous subset of an array. We will use a lambda expression for the generator, and a hoisted local variable to keep state:
Public Function EnumerateSubArray(Of T)(ByVal items() As T, ByVal offset As Integer, ByVal length As Integer) As IEnumerator(Of T)
Dim index = 0 'hoisted variable used for state
Return New Enumerator(Of T)(
Function(controller)
If index >= length Then Return controller.Break()
index += 1
Return items(index - 1)
End Function)
End Function
The result isn't be as compact as using C#'s yield, but is still a huge improvement over explicit implementation. It's essentially like implementing only the MoveNext method instead of an entire enumerator class. The main downside compared to yield is the loss of multiple entry points into the function (we always start at the beginning, not on the line after the last return). We do have one advantage over using yield: we can check arguments upon construction of the iterator (because everything before return New Enumerator(...) is executed immediately).
Enumerable vs Enumerator
An IEnumerable
Public NotInheritable Class Enumerable(Of T)
Implements IEnumerable(Of T)
Private ReadOnly generator As Func(Of IEnumerator(Of T))
Public Sub New(ByVal generator As Func(Of IEnumerator(Of T)))
Me.generator = generator
End Sub
Public Function GetEnumerator() As IEnumerator(Of T) Implements IEnumerable(Of T).GetEnumerator
Return generator()
End Function
Private Function GetEnumeratorObj() As IEnumerator Implements IEnumerable.GetEnumerator
Return GetEnumerator()
End Function
End Class
At last, we can properly iterate over that contiguous array subset by using our previous EnumerateSubArray function to generate enumerators:
Public Function IterateSubArray(Of T)(ByVal items() As T, ByVal offset As Integer, ByVal length As Integer) As IEnumerable(Of T)
Return New Enumerable(Of T)(Function() EnumerateSubArray(items, offset, length))
End Function
Conclusion
Even though VB doesn't have a yield keyword, we can still cut the number of lines required for an iterator by a factor a five. All we needed were a couple classes to turn generator functions into an enumerator. It's not as good as using yield, but we don't have yield.
No comments:
Post a Comment