|
In Visual Basic .NET I wrote some polymorphic code, and the performance is just AWFUL, about 200 times worse than my similar non-polymorphic code.
But the VB array.sort is polymorphic and nicely fast, nearly as fast as my fastest non-polymorphic code.
Clearly the people who wrote array.sort know some things about writing fast polymorphic code I don't.
What do I need to do to get my polymorphic code to run fast?
Mostly I want to understand the general situation, but I have a pressing problem.
I need to sort an array of strings 'by surrogate' (explained below) to use one of D. Knuth's terms. The number of keys to be sorted should be maybe 100 million now and more later when we move to 64 bit computing. So I wrote some heap sort code. I know; I know: array.sort implements quicksort which on average is faster and has much better locality of reference in virtual memory and has the promise of being able to exploit multi-core processors well, but heap sort has better guaranteed performance. Besides, sort algorithms are not the point of my question.
Beyond the pressing problem, for other work really SHOULD understand how to get good performance with polymorphism.
So, to get started, both on the pressing problem and on understanding performance with polymorphism, I wrote some heap sort routines. One of these is
Sub astsu_heap_sort01( _ ByRef integer_pointers() As Int32, _ ByRef key_array() As String, _ ByVal n As Int32 )
This routine name abbreviates 'ascending string by surrogate heap sort'.
Begin Flame
I know; at least now I know: When working with 'events', there can be some dangers in using the underscore character in names, but the Pascal naming convention, e.g., AStSuHeapSort would be okay except Visual Basic (VB) is case insensitive so that ItSlinks will be the same as ItsLinks, etc. I know; I know; Visual Studio has a practical fix for this issue, but I hate nearly all GUIs and in particular all IDEs.
End Flame
So, upon entry the number of array components to sort is
ByVal n As Int32
Upon entry, for i = 1, 2, ..., n,
integer_pointers(i)
is a permutation on the set of integers {1, 2, ..., n}. In particular, can have
integer_pointers(i) = i
The 'keys' in this sort are
key_array(i)
Upon return, for i = 1, 2, ..., n - 1,
key_array( integer_pointers( i ) ) <= _ key_array( integer_pointers( i + 1 ) )
This 'pointer indirection' explains the meaning of 'by surrogate'.
The components of
key_array()
are read-only, that is, never changed.
Well, on my 1.8 Ghz single core processor, when sorting strings 16 bytes long of random decimal numbers, with n = 500,000 the execution time is
2.9218750 seconds.
Times with different values of n show that the execution time does grow like n*log(n) as it is supposed to.
This execution time is nicely fast.
For something a little faster, I wrote
Sub ast_heap_sort02( _ ByRef key_array() As String, _ ByVal n As Int32 )
'ascending string heap sort', which is just a sort in the usual sense and not sort 'by surrogate'. On the same array of strings, the time was
2.5312500 seconds.
a little faster. The last two timings begin to suggest that in the second case Visual Basic regarded String as a 'reference type' and moved only 'references' (pointers) and never moved any strings at all and, in particular, made no use of 'garbage collection' memory management. Good. So, both cases were in effect 'by surrogate', and the second case that let Visual Basic handle the pointer logic was a little faster. Good.
Begin Note
If Visual Basic is sufficiently clever, then, even when in a sort routine there is one variable for temporary use for the strings being sorted, say, key_father, a statement such as
key_father = key_array( i )
still will move no string data and, instead, just increase by 1 a 'usage count' on the storage for key_array(i).
In the case key_array(i) is an array of instances of a class, there could be an issue and a performance issue for how to handle key_father.
End Note
In comparison, on the same array of strings,
array.sort( Array, Int32, Int32 )
ran in
3.5312500 seconds,
slower but nicely commendable.
But for my application, I need either to sort an array of strings by surrogate or sort an array of instances of a class that has properties for both the key and the 'record'.
So, I defined a Class, matrix_increment, with a method CompareTo, wrote
Dim key_array( n_keys ) As matrix_increment
and passed it to 'ascending object heap sort' routine
Sub ao_heap_sort02( _ ByRef key_array As Array, _ ByVal n As Int32 )
Now the execution time was
605.2031250 seconds.
WOW!!!
So, I changed the definition of the class to implement the IComparable interface, ran again, and got
629.4062500 seconds,
which is even worse!
Then, with this same class with the IComparable interface I passed the same array of instances of the class matrix_increment to
array.sort( Array, Int32, Int32 )
and got execution time
3.5312500 seconds
which is much more like how the world should be. Non-object oriented, non-polymorphic sorting by surrogate is significantly faster, but array.sort totally blew the doors off my disastrous OO-polymorphic results of
605.2031250 seconds
and
629.4062500 seconds.
The MSDN documentation for this version of array.sort says that the parameters are:
Public Shared Sub Sort ( _ array As Array, _ index As Integer, _ length As Integer _ )
which I tried to imitate in
Sub ao_heap_sort02( _ ByRef key_array As Array, _ ByVal n As Int32 )
So, I write some old-fashioned code, non-object oriented, non-polymorphic, that runs in
2.9218750 seconds
for the by surrogate version or
2.5312500 seconds
without by surrogate and where array.sort runs in
3.5312500 seconds.
Then I define a class and pass an array of instances to a parameter As Array and get
605.2031250 seconds.
without IComparable and
629.4062500 seconds
with IComparable while on the same array of instances with IComparable array.sort runs in
3.5312500 seconds.
So, how can I write fast polymorphic code?
What do the authors of array.sort understand about writing fast polymorphic code that I do not?
| | sigmawaite Monday, April 14, 2008 6:55 AM | Arrays in .Net can store an element at index 0. You decalre then with the upper bound, not the size. Hence:
Dim ints(0) As Integer
would have room for 1 element.
This just means your array is 1 element too big and has nothing at index 0, which isn't a big deal as you don't start at 0.
--------------------
Your error catching is a mix of old-style vb and vb.Net. You should use one or the other.
I'll just get rid of it for now.
--------------------
You don't need to pass the Array ByRef as it is a reference type. Changes to the array are propogated back to calling code.
--------------------
If you switch Option Strict Onand Option Explicit On, then you get a huge list of problems. (Always have these two options set).
There are two problems it detected, repeated many times:
--------------------
First accessing the Array like this:
key_father = key_array(i_father0)
Is not allowed, you must use:
key_father = key_array.GetValue(i_father0)
(this is what I was muttering about in an earlier post)
--------------------
Secondly, the type of the elements in the array is Object, and Object doesn't have a CompareTo method. At runtime then, it ishaving to find the CompareTo method using late binding.
You can't implement the generic Comparable(Of T), as you aren't using generics anywhere else. I'll show that later. So I changed it to the normal Comparable.
Now, to get around this late binding, you need to have a Type that DOES have a CompareTo method, and at the moment that type is either matrix_increment or IComparable. You can't go casting the objects to max_increment, as this method is supposed to be able to cope with different types. So you need to cast to IComparable.
--------------------
To follow:
A Demo of the problem.
A fixed, butugly,version.
Then a neater fixed version, using a separate comparer class and Object() instead of Array.
Then a generic version.
It should get faster with each one. | | jo0ls Friday, April 18, 2008 8:01 PM | It's hard to say without seeing the code for these methods.
Have a look at the source code for the Frameworks Array.Sort method. You could use Reflector, or if you have VS 2008 you can set up up to get the source code from microsoft - though this will be in C# (the code from reflector can be in VB.Net). The C# code for the array class is also available here: http://www.dotnet247.com/247reference/System/Array/__rotor Complicated. It seems to end up in public static void Sort(Array keys, Array items, int index, int length, IComparer comparer)
Where it makes a decision, can the keys and items arrays be cast to Object()? If so it calls one sort method that it says can use "fast jit helpers", otherwise it calls another that is slow. I wonder how that works?!? Edit: Ah, it is probably talking about the difference between szarrays and mdarrays. It's not important as you have an array of Classes. Use Object() instead of Array. (Better still, use generic types instead of Object()). " There's always the choice of using object[] and Array when using covariance to right general array functions. Object[] in many cases are faster because it is clear the array is an SZARRAY and IL has special instructions to call for setting and getting elements. However, Array can always hold arrays of value-types as well as multidimensional arrays."
http://www.codeproject.com/KB/dotnet/arrays.aspx
That's probably not the reason for the slow down, just a minor tweak.
| | jo0ls Tuesday, April 15, 2008 12:26 PM | I had a go at heapsort (using the WikiPedia pseudocode) using an IComparer and using Object(). I get about4 seconds.
Here's the class we will sort
Code Snippet
Option Strict On Option Explicit On
Public Class MyThing
Private MyPath As String Private MyNumber As Integer Private Shared rand As New Random
Public ReadOnly Property Path() As String Get Return MyPath End Get End Property
Public ReadOnly Property Number() As Integer Get Return MyNumber End Get End Property
Sub New() Me.MyPath = IO.Path.GetRandomFileName Me.MyNumber = rand.Next End Sub
Public Overrides Function ToString() As String Return String.Format("Path: {0}, Number: {1}", MyPath, MyNumber.ToString) End Function
End Class
Here's an IComparer for it
Code Snippet
Option Strict On Option Explicit On
Public Class MyThingComparer Implements IComparer
Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare ' No checks at all. Dim thingX As MyThing = DirectCast(x, MyThing) Dim thingY As MyThing = DirectCast(y, MyThing) ' CompareOrdinal is a bit faster than compare, but there are caveats Return String.CompareOrdinal(thingX.Path, thingY.Path) End Function
End Class
And here's a form that does HeapSort
Code Snippet
Option Strict On Option Explicit On
Imports System.ComponentModel Imports System.Text
Public Class Form1
Private WithEvents bTest As New Button Private WithEvents bTime As New Button Private tb As New TextBox Private Const ArraySize As Integer = 500000 Private WithEvents bw As New BackgroundWorker
Sub New() InitializeComponent() Dim sw As New Stopwatch Me.Controls.AddRange(New Control() {tb, bTest, bTime}) tb.Multiline = True tb.Dock = DockStyle.Top tb.Height = Me.ClientSize.Height - bTest.Height - 10 tb.ScrollBars = ScrollBars.Vertical bTest.Location = New Point(Me.ClientRectangle.Right - (bTest.Width * 2) - 10, tb.Bottom + 5) bTest.Text = "Small Test" bTime.Location = New Point(bTest.Right + 5, bTest.Top) bTime.Text = "Time Test" End Sub
Private Delegate Sub WriteOutDelegate(ByVal s As String)
' Thread safe write to textbox. Private Sub WriteOut(ByVal s As String) If Me.InvokeRequired Then Dim del As New WriteOutDelegate(AddressOf WriteOut) Me.Invoke(del, New Object() {s}) Else tb.Text &= s & vbCrLf End If End Sub
Private Sub bTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTest.Click tb.Clear() Dim things(10) As MyThing Dim sb As New StringBuilder() For i As Integer = 0 To things.Length - 1 things(i) = New MyThing sb.AppendLine(things(i).ToString) Next WriteOut("Before: ") WriteOut("--------") WriteOut(sb.ToString) HeapSort(things, things.Length, New MyThingComparer) sb = New StringBuilder For i As Integer = 0 To things.Length - 1 sb.AppendLine(things(i).ToString) Next WriteOut("After: ") WriteOut("--------") WriteOut(sb.ToString) End Sub
Private Sub bTime_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTime.Click Me.bTime.Enabled = False tb.Clear() WriteOut("Creating array size: " & ArraySize.ToString) bw.RunWorkerAsync() End Sub
Private Sub bw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bw.DoWork Dim rolex As New Stopwatch Dim things(ArraySize - 1) As MyThing For i As Integer = 0 To things.Length - 1 things(i) = New MyThing Next WriteOut("Sorting...") Application.DoEvents() rolex.Reset() rolex.Start() HeapSort(things, things.Length, New MyThingComparer) rolex.Stop() WriteOut("It took: " & (rolex.ElapsedMilliseconds / 1000).ToString("F3") & "s") End Sub
Private Sub bw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bw.RunWorkerCompleted bTime.Enabled = True End Sub
Sub HeapSort(ByVal a() As Object, ByVal count As Integer, ByVal comparer As IComparer) '(first place a in max-heap order) Heapify(a, count, comparer)
Dim last As Integer = count - 1 While last > 0 ' swap the root(maximum value) of the heap with the ' last element of the heap) Dim temp As Object = a(last) a(last) = a(0) a(0) = temp ' decrease the size of the heap by one so that the ' previous max value will stay in its proper placement last -= 1 ' put the heap back in max-heap order SiftDown(a, 0, last, comparer) End While End Sub
Sub Heapify(ByVal a() As Object, ByVal count As Integer, ByVal comparer As IComparer) '(start is assigned the index in a of the last parent node) Dim start As Integer = (count - 1) \ 2
While start >= 0 ' sift down the node at index start to the proper place ' such that all nodes below the start index are in heap ' order SiftDown(a, start, count - 1, comparer) start -= 1 ' after sifting down the root all nodes/elements are in heap order End While End Sub
Sub SiftDown(ByVal a() As Object, ByVal start As Integer, ByVal last As Integer, ByVal comparer As IComparer) ' last represents the limit of how far down the heap to sift. Dim root As Integer = start ' While the root has at least one child While (root * 2) + 1 <= last ' root*2+1 points to the left child Dim child As Integer = root * 2 + 1 ' If the child has a sibling and the child's value is less ' than its sibling's... ' (Here we use the IComparer which hopefully is up to the task. ' Array.Sort checks the comparer is fit for duty.) If (child < last) AndAlso comparer.Compare(a(child), a(child + 1)) < 0 Then ' ... point to the right child instead child += 1 End If If comparer.Compare(a(root), a(child)) < 0 Then ' out of max-heap order Dim temp As Object = a(root) a(root) = a(child) a(child) = temp ' repeat to continue sifting down the child now root = child Else Return End If End While End Sub
End Class
I'll mess with it later and see how other ways compare. It cries out forgeneric types. | | jo0ls Tuesday, April 15, 2008 6:00 PM |
Changing over to Array instead of Object() and accessing values with GetValue SetValue adds a couple of seconds (I get about 6s):
Code Snippet
Option Strict On Option Explicit On
Imports System.ComponentModel Imports System.Text
Public Class Form1
Private WithEvents bTest As New Button Private WithEvents bTime As New Button Private tb As New TextBox Private Const ArraySize As Integer = 500000 Private WithEvents bw As New BackgroundWorker
Sub New() InitializeComponent() Dim sw As New Stopwatch Me.Controls.AddRange(New Control() {tb, bTest, bTime}) tb.Multiline = True tb.Dock = DockStyle.Top tb.Height = Me.ClientSize.Height - bTest.Height - 10 tb.ScrollBars = ScrollBars.Vertical bTest.Location = New Point(Me.ClientRectangle.Right - (bTest.Width * 2) - 10, tb.Bottom + 5) bTest.Text = "Small Test" bTime.Location = New Point(bTest.Right + 5, bTest.Top) bTime.Text = "Time Test" End Sub
Private Delegate Sub WriteOutDelegate(ByVal s As String)
' Thread safe write to textbox. Private Sub WriteOut(ByVal s As String) If Me.InvokeRequired Then Dim del As New WriteOutDelegate(AddressOf WriteOut) Me.Invoke(del, New Object() {s}) Else tb.Text &= s & vbCrLf End If End Sub
Private Sub bTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTest.Click tb.Clear() Dim things(10) As MyThing Dim sb As New StringBuilder() For i As Integer = 0 To things.Length - 1 things(i) = New MyThing sb.AppendLine(things(i).ToString) Next WriteOut("Before: ") WriteOut("--------") WriteOut(sb.ToString) HeapSort(things, things.Length, New MyThingComparer) sb = New StringBuilder For i As Integer = 0 To things.Length - 1 sb.AppendLine(things(i).ToString) Next WriteOut("After: ") WriteOut("--------") WriteOut(sb.ToString) End Sub
Private Sub bTime_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTime.Click Me.bTime.Enabled = False tb.Clear() WriteOut("Creating array size: " & ArraySize.ToString) bw.RunWorkerAsync() End Sub
Private Sub bw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bw.DoWork Dim rolex As New Stopwatch Dim things(ArraySize - 1) As MyThing For i As Integer = 0 To things.Length - 1 things(i) = New MyThing Next WriteOut("Sorting...") Application.DoEvents() rolex.Reset() rolex.Start() HeapSort(things, things.Length, New MyThingComparer) rolex.Stop() WriteOut("It took: " & (rolex.ElapsedMilliseconds / 1000).ToString("F3") & "s") End Sub
Private Sub bw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bw.RunWorkerCompleted bTime.Enabled = True End Sub
Sub HeapSort(ByVal a As Array, ByVal count As Integer, ByVal comparer As IComparer) '(first place a in max-heap order) Heapify(a, count, comparer)
Dim last As Integer = count - 1 While last > 0 ' swap the root(maximum value) of the heap with the ' last element of the heap) Dim temp As Object = a.GetValue(last) a.SetValue(a.GetValue(0), last) a.SetValue(temp, 0) ' decrease the size of the heap by one so that the ' previous max value will stay in its proper placement last -= 1 ' put the heap back in max-heap order SiftDown(a, 0, last, comparer) End While End Sub
Sub Heapify(ByVal a As Array, ByVal count As Integer, ByVal comparer As IComparer) '(start is assigned the index in a of the last parent node) Dim start As Integer = (count - 1) \ 2
While start >= 0 ' sift down the node at index start to the proper place ' such that all nodes below the start index are in heap ' order SiftDown(a, start, count - 1, comparer) start -= 1 ' after sifting down the root all nodes/elements are in heap order End While End Sub
Sub SiftDown(ByVal a As Array, ByVal start As Integer, ByVal last As Integer, ByVal comparer As IComparer) ' last represents the limit of how far down the heap to sift. Dim root As Integer = start ' While the root has at least one child While (root * 2) + 1 <= last ' root*2+1 points to the left child Dim child As Integer = root * 2 + 1 ' If the child has a sibling and the child's value is less ' than its sibling's... If (child < last) AndAlso comparer.Compare(a.GetValue(child), a.GetValue(child + 1)) < 0 Then ' ... point to the right child instead child += 1 End If If comparer.Compare(a.GetValue(root), a.GetValue(child)) < 0 Then ' out of max-heap order Dim temp As Object = a.GetValue(root) a.SetValue(a.GetValue(child), root) a.SetValue(temp, child) ' repeat to continue sifting down the child now root = child Else Return End If End While End Sub
End Class
| | jo0ls Tuesday, April 15, 2008 6:13 PM | Using Generics is consistantly faster than using Object(), I've not tested extensively.
We need an IComparer(Of T) instead of a straight comparer.
Code Snippet
Option Strict On Option Explicit On
Public Class MyThingComparer Implements IComparer(Of MyThing)
Public Function Compare(ByVal x As MyThing, ByVal y As MyThing) As Integer Implements System.Collections.Generic.IComparer(Of MyThing).Compare ' CompareOrdinal is a bit faster than compare, but there are caveats Return String.CompareOrdinal(x.Path, y.Path) End Function
End Class
And change all the methods to use generics.
Code Snippet
Option Strict On Option Explicit On
Imports System.ComponentModel Imports System.Text
Public Class Form1
Private WithEvents bTest As New Button Private WithEvents bTime As New Button Private tb As New TextBox Private Const ArraySize As Integer = 500000 Private WithEvents bw As New BackgroundWorker
Sub New() InitializeComponent() Dim sw As New Stopwatch Me.Controls.AddRange(New Control() {tb, bTest, bTime}) tb.Multiline = True tb.Dock = DockStyle.Top tb.Height = Me.ClientSize.Height - bTest.Height - 10 tb.ScrollBars = ScrollBars.Vertical bTest.Location = New Point(Me.ClientRectangle.Right - (bTest.Width * 2) - 10, tb.Bottom + 5) bTest.Text = "Small Test" bTime.Location = New Point(bTest.Right + 5, bTest.Top) bTime.Text = "Time Test" End Sub
Private Delegate Sub WriteOutDelegate(ByVal s As String)
' Thread safe write to textbox. Private Sub WriteOut(ByVal s As String) If Me.InvokeRequired Then Dim del As New WriteOutDelegate(AddressOf WriteOut) Me.Invoke(del, New Object() {s}) Else tb.Text &= s & vbCrLf End If End Sub
Private Sub bTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTest.Click tb.Clear() Dim things(10) As MyThing Dim sb As New StringBuilder() For i As Integer = 0 To things.Length - 1 things(i) = New MyThing sb.AppendLine(things(i).ToString) Next WriteOut("Before: ") WriteOut("--------") WriteOut(sb.ToString) HeapSort(things, things.Length, New MyThingComparer) sb = New StringBuilder For i As Integer = 0 To things.Length - 1 sb.AppendLine(things(i).ToString) Next WriteOut("After: ") WriteOut("--------") WriteOut(sb.ToString) End Sub
Private Sub bTime_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTime.Click Me.bTime.Enabled = False tb.Clear() WriteOut("Creating array size: " & ArraySize.ToString) bw.RunWorkerAsync() End Sub
Private Sub bw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bw.DoWork Dim rolex As New Stopwatch Dim things(ArraySize - 1) As MyThing For i As Integer = 0 To things.Length - 1 things(i) = New MyThing Next WriteOut("Sorting...") Application.DoEvents() rolex.Reset() rolex.Start() HeapSort(Of MyThing)(things, things.Length, New MyThingComparer) rolex.Stop() WriteOut("It took: " & (rolex.ElapsedMilliseconds / 1000).ToString("F3") & "s") End Sub
Private Sub bw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bw.RunWorkerCompleted bTime.Enabled = True End Sub
Sub HeapSort(Of T)(ByVal a() As T, ByVal count As Integer, ByVal comparer As IComparer(Of T)) '(first place a in max-heap order) Heapify(Of T)(a, count, comparer)
Dim last As Integer = count - 1 While last > 0 ' swap the root(maximum value) of the heap with the ' last element of the heap) Dim temp As T = a(last) a(last) = a(0) a(0) = temp ' decrease the size of the heap by one so that the ' previous max value will stay in its proper placement last -= 1 ' put the heap back in max-heap order SiftDown(a, 0, last, comparer) End While End Sub
Sub Heapify(Of T)(ByVal a() As T, ByVal count As Integer, ByVal comparer As IComparer(Of T)) '(start is assigned the index in a of the last parent node) Dim start As Integer = (count - 1) \ 2
While start >= 0 ' sift down the node at index start to the proper place ' such that all nodes below the start index are in heap ' order SiftDown(a, start, count - 1, comparer) start -= 1 ' after sifting down the root all nodes/elements are in heap order End While End Sub
Sub SiftDown(Of T)(ByVal a() As T, ByVal start As Integer, ByVal last As Integer, ByVal comparer As IComparer(Of T)) ' last represents the limit of how far down the heap to sift. Dim root As Integer = start ' While the root has at least one child While (root * 2) + 1 <= last ' root*2+1 points to the left child Dim child As Integer = root * 2 + 1 ' If the child has a sibling and the child's value is less ' than its sibling's... If (child < last) AndAlso comparer.Compare(a(child), a(child + 1)) < 0 Then ' ... point to the right child instead child += 1 End If If comparer.Compare(a(root), a(child)) < 0 Then ' out of max-heap order Dim temp As T = a(root) a(root) = a(child) a(child) = temp ' repeat to continue sifting down the child now root = child Else Return End If End While End Sub
End Class
| | jo0ls Tuesday, April 15, 2008 6:23 PM | Below are some details on how got absurdly long execution time of
629.4062500 seconds
sorting 500,000 instances of a simple class that implements the interface IComparable.
The materials below are in three parts:
Calling Ascending Object Sort Routine Class Whose Instances Are to Be Sorted Ascending Object Heap Sort
The question is, why is the sorting so slow?
<---------------------------------->
Calling Ascending Object Sort Routine
Here we have
Dim n_keys As Int32
Dim key_array( n_keys ) As matrix_increment
We have
n_keys = 500000
Routine
Private Sub get_times ( _ ByRef total_seconds As Double, _ ByRef increment_seconds As Double )
uses
current_clock.Ticks
to read the clock and help in finding execution time.
To fill array key_array() we run
For i = 1 To n_keys s1 = random_variable.NextDouble() key_array( i ) = New matrix_increment key_array( i ).key = String.Format("{0,16:f8}", s1 ) Next i
For the sort, we run
Console.WriteLine(routine_name & _ ": Calling ast_heap_sort02 ... " )
get_times( test_start_time, time0 )
Console.WriteLine(routine_name & _ ": test_start_time = {0, 15:f7} seconds.", test_start_time )
ao_heap_sort02( key_array, n_keys )
get_times( test_end_time, time0 )
Execution time was
n_keys time (seconds) --------- -------------- 500,000 629.4062500
What's wrong?
<---------------------------------->
Class Whose Instances Are to Be Sorted
Class matrix_increment Implements IComparable(Of matrix_increment)
Public key As String Public inner_product As Double Public index As Int32
Public Overloads Function CompareTo( _ ByVal other As matrix_increment ) As Integer _ Implements IComparable( _ Of matrix_increment).CompareTo
Return key.CompareTo( other.key )
End Function
End Class
<---------------------------------->
Ascending Object Heap Sort
' AO_HEAP_SORT02.VB -- ' ' Ascending object heap sort. ' ' For i = 1, 2, ..., n, sort components of key_array(i) ' into ascending order using heap sort. ' ' We check parameters for reasonable values and, in case of ' a error, raise the error condition. ' ' Upon return, for i = 1, 2, ..., n - 1, ' ' key_array( i ) <= key_array( i + 1 ) ' ' The array subscript logic here is done carefully enough ' that it shuld never overflow even for the largest ' possible value of n. ' ' For some constant k (depending on the computer, compiler, ' etc. but not depending on the parameters) the execution ' time is guaranteed to be closely just ' ' k*n*log(n) ' ' in the worst case. ' ' The code here needs a 'sift' loop in two places; the code ' is the same in both places. ' ' The work here is 'in place', that is, uses no storage ' whose size depends on the passed parameters.
Sub ao_heap_sort02( _ ByRef key_array As Array, _ ByVal n As Int32 )
Dim routine_name As String = "ao_heap_sort02"
Dim error_code As Int32 = 0
Dim i_2 As Int32 = 2
Dim i_first, i_middle, i_last As Int32
Dim i_father0 As Int32
Dim i_father, i_child1, i_child2, i_max_child As Int32
' We need one working location of data type the same as the ' components of parameter key_array():
Dim key_father As Object
Try
' Get started:
error_code = 1001
If n <= 0 Then err.raise(error_code, routine_name)
error_code = 1002
If key_array.GetUpperBound(0) < 0 Then err.raise(error_code, routine_name) End If
error_code = 1003
i_first = 1
i_last = n
i_middle = i_last \ i_2
i_father0 = i_middle + 1
' Build heap
Do i_father0 = i_father0 - 1 If i_father0 < i_first Then Exit Do key_father = key_array( i_father0 ) i_father = i_father0
' Sift
Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If key_array( i_max_child ).CompareTo( key_father ) > 0 Then key_array( i_father ) = key_array( i_max_child ) i_father = i_max_child Continue Do Else key_array( i_father ) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array( i_father ) = key_father Exit Do End If i_child1 = i_father + i_father ' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else ' i_child1 = i_last i_max_child = i_child1 End If If key_array( i_max_child ).CompareTo( key_father ) > 0 Then key_array( i_father ) = key_array( i_max_child ) key_array ( i_max_child ) = key_father Else key_array ( i_father ) = key_father End If Exit Do
End If
Loop ' End of sift loop
Loop ' End of build heap loop
' Sort heap
i_first = 1
i_last = n
Do
key_father = key_array( i_last )
key_array( i_last ) = key_array( i_first )
key_array( i_first ) = key_father
i_father = i_first
i_last = i_last - 1
If i_last <= i_first Then Exit Do
i_middle = i_last \ 2
' Sift
Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If key_array( i_max_child ).CompareTo( key_father ) > 0 Then key_array( i_father ) = key_array( i_max_child ) i_father = i_max_child Continue Do Else key_array( i_father ) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array( i_father ) = key_father Exit Do End If i_child1 = i_father + i_father ' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else ' i_child1 = i_last i_max_child = i_child1 End If If key_array( i_max_child ).CompareTo( key_father ) > 0 Then key_array( i_father ) = key_array( i_max_child ) key_array ( i_max_child ) = key_father Else key_array ( i_father ) = key_father End If Exit Do
End If
Loop ' End of sift loop
Loop ' End of sort heap loop
Catch
Console.WriteLine( " err.number = " & err.number ) Console.WriteLine( " err.source = " & err.source )
err.raise(error_code, routine_name)
End Try
out:
Return
End Sub
| | sigmawaite Friday, April 18, 2008 2:20 AM | Arrays in .Net can store an element at index 0. You decalre then with the upper bound, not the size. Hence:
Dim ints(0) As Integer
would have room for 1 element.
This just means your array is 1 element too big and has nothing at index 0, which isn't a big deal as you don't start at 0.
--------------------
Your error catching is a mix of old-style vb and vb.Net. You should use one or the other.
I'll just get rid of it for now.
--------------------
You don't need to pass the Array ByRef as it is a reference type. Changes to the array are propogated back to calling code.
--------------------
If you switch Option Strict Onand Option Explicit On, then you get a huge list of problems. (Always have these two options set).
There are two problems it detected, repeated many times:
--------------------
First accessing the Array like this:
key_father = key_array(i_father0)
Is not allowed, you must use:
key_father = key_array.GetValue(i_father0)
(this is what I was muttering about in an earlier post)
--------------------
Secondly, the type of the elements in the array is Object, and Object doesn't have a CompareTo method. At runtime then, it ishaving to find the CompareTo method using late binding.
You can't implement the generic Comparable(Of T), as you aren't using generics anywhere else. I'll show that later. So I changed it to the normal Comparable.
Now, to get around this late binding, you need to have a Type that DOES have a CompareTo method, and at the moment that type is either matrix_increment or IComparable. You can't go casting the objects to max_increment, as this method is supposed to be able to cope with different types. So you need to cast to IComparable.
--------------------
To follow:
A Demo of the problem.
A fixed, butugly,version.
Then a neater fixed version, using a separate comparer class and Object() instead of Array.
Then a generic version.
It should get faster with each one. | | jo0ls Friday, April 18, 2008 8:01 PM | Here's a demo of the problem.
Generating data Created data in: 1.1407463s Starting to build heap heap built in 18.1453599s sorted in: 306.5314217s total: 324.702742s
Code Snippet
Imports System.ComponentModel
Module Module1
Private rand As New Random Private running As Boolean
Sub Main() Go() Go() Console.ReadKey() End Sub
Sub Go() Dim sw As New Stopwatch sw.Start() Console.WriteLine("Generating data") Dim n_keys As Int32 = 500000 Dim key_array(n_keys) As MatrixIncrement For i = 1 To n_keys Dim s1 As Double = rand.NextDouble() key_array(i) = New MatrixIncrement key_array(i).key = String.Format("{0,16:f8}", s1) Next i sw.Stop() Console.WriteLine("Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() sw.Start() ao_heap_sort02(key_array, n_keys) sw.Stop() Console.WriteLine("total: {0}s", sw.Elapsed.TotalSeconds.ToString) End Sub
Sub ao_heap_sort02(ByVal key_array As Array, ByVal n As Int32) Dim sw As New Stopwatch sw.Start() Dim routine_name As String = "ao_heap_sort02" Dim i_2 As Int32 = 2 Dim i_first, i_middle, i_last As Int32 Dim i_father0 As Int32 Dim i_father, i_child1, i_child2, i_max_child As Int32 'We need one working location of data type the same as the 'components of parameter key_array(): Dim key_father As Object
If n <= 0 Then Throw New ArgumentOutOfRangeException("n") If key_array.GetUpperBound(0) < 0 Then Throw New ArgumentException("key_array") i_first = 1 i_last = n i_middle = i_last \ i_2 i_father0 = i_middle + 1 'Build heap Console.WriteLine("Starting to build heap") Do i_father0 = i_father0 - 1 If i_father0 < i_first Then Exit Do key_father = key_array(i_father0) i_father = i_father0 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If key_array(i_max_child).CompareTo(key_father) > 0 Then key_array(i_father) = key_array(i_max_child) i_father = i_max_child Continue Do Else key_array(i_father) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array(i_father) = key_father Exit Do End If i_child1 = i_father + i_father' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If If key_array(i_max_child).CompareTo(key_father) > 0 Then key_array(i_father) = key_array(i_max_child) key_array(i_max_child) = key_father Else key_array(i_father) = key_father End If Exit Do
End If Loop' End of sift loop Loop' End of build heap loop 'Sort heap Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() i_first = 1 i_last = n sw.Start() Do key_father = key_array(i_last) key_array(i_last) = key_array(i_first) key_array(i_first) = key_father i_father = i_first i_last = i_last - 1 If i_last <= i_first Then Exit Do i_middle = i_last \ 2 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If key_array(i_max_child).CompareTo(key_father) > 0 Then key_array(i_father) = key_array(i_max_child) i_father = i_max_child Continue Do Else key_array(i_father) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array(i_father) = key_father Exit Do End If i_child1 = i_father + i_father' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If If key_array(i_max_child).CompareTo(key_father) > 0 Then key_array(i_father) = key_array(i_max_child) key_array(i_max_child) = key_father Else key_array(i_father) = key_father End If Exit Do End If Loop' End of sift loop Loop' End of sort heap loop Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Stop() End Sub
End Module
Public Class MatrixIncrement Implements IComparable(Of MatrixIncrement)
Public key As String Public inner_product As Double Public index As Int32
Public Overloads Function CompareTo( _ ByVal other As MatrixIncrement) As Integer _ Implements IComparable( _ Of MatrixIncrement).CompareTo
Return key.CompareTo(other.key)
End Function
End Class
| | jo0ls Friday, April 18, 2008 8:31 PM | Here I switched on Option Strict and Option Explicit and fixed up the warnings.
This meant CompareTo couldn't be called directly on the items from the array, so they get cast to IComparable first.
And, GetValue and SetValue had to be used.
This eliminated LateBinding, and sped it up dramatically.
Generating data Created data in: 1.1811032s Starting to build heap heap built in 0.3861724s sorted in: 5.1919813s total: 5.5784823s
Code Snippet
Option Strict On Option Explicit On
Imports System.ComponentModel
Module Module1
Private rand As New Random Private running As Boolean
' Normal naming conventions mean Classes and ' methods should start with Capitals. Sub Main() Go() Go() ' use this time. Console.ReadKey() End Sub
Sub Go() Dim sw As New Stopwatch sw.Start() Console.WriteLine("Generating data") Dim n_keys As Int32 = 500000 ' .Net arrays have an element at 0 and ' you declare then with the upper bound Dim key_array(n_keys - 1) As MatrixIncrement For i = 0 To n_keys - 1 Dim s1 As Double = rand.NextDouble() key_array(i) = New MatrixIncrement key_array(i).key = String.Format("{0,16:f8}", s1) Next i sw.Stop() Console.WriteLine("Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() sw.Start() ao_heap_sort02(key_array, n_keys) sw.Stop() Console.WriteLine("total: {0}s", sw.Elapsed.TotalSeconds.ToString) End Sub
Sub ao_heap_sort02(ByVal key_array As Array, ByVal n As Int32) Dim sw As New Stopwatch sw.Start() Dim routine_name As String = "ao_heap_sort02" Dim i_2 As Int32 = 2 Dim i_first, i_middle, i_last As Int32 Dim i_father0 As Int32 Dim i_father, i_child1, i_child2, i_max_child As Int32 'We need one working location of data type the same as the 'components of parameter key_array(): Dim key_father As Object
If n <= 0 Then Throw New ArgumentOutOfRangeException("n") If key_array.GetUpperBound(0) < 0 Then Throw New ArgumentException("key_array") ' Array now starts at 0 i_first = 0 ' upper bound is n - 1 as array now starts at 0. i_last = n - 1 i_middle = i_last \ i_2 i_father0 = i_middle + 1 'Build heap Console.WriteLine("Starting to build heap") Do i_father0 = i_father0 - 1 If i_father0 < i_first Then Exit Do key_father = key_array.GetValue(i_father0) i_father = i_father0 'Sift Do ' The class implements IComparable and so it can be cast to ' IComparable, which DOES have a compareTo method. Dim comparable1 As IComparable = DirectCast(key_array.GetValue(i_child1), IComparable) If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable) If comparableMax.CompareTo(key_father) > 0 Then key_array.SetValue(key_array.GetValue(i_max_child), i_father) i_father = i_max_child Continue Do Else key_array.SetValue(key_father, i_father) Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array.SetValue(key_father, i_father) Exit Do End If i_child1 = i_father + i_father' i_father = i_middle comparable1 = DirectCast(key_array.GetValue(i_child1), IComparable) If i_child1 < i_last Then i_child2 = i_child1 + 1 If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable) If comparableMax.CompareTo(key_father) > 0 Then key_array.SetValue(key_array.GetValue(i_max_child), i_father) key_array.SetValue(key_father, i_max_child) Else key_array.SetValue(key_father, i_father) End If Exit Do
End If Loop' End of sift loop Loop' End of build heap loop 'Sort heap Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() i_first = 0 i_last = n - 1 sw.Start() Do key_father = key_array.GetValue(i_last) key_array.SetValue(key_array.GetValue(i_first), i_last) key_array.SetValue(key_father, i_first) i_father = i_first i_last = i_last - 1 If i_last <= i_first Then Exit Do i_middle = i_last \ 2 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 Dim comparable1 As IComparable = DirectCast(key_array.GetValue(i_child1), IComparable) If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable) If comparableMax.CompareTo(key_father) > 0 Then key_array.SetValue(key_array.GetValue(i_max_child), i_father) i_father = i_max_child Continue Do Else key_array.SetValue(key_father, i_father) Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array.SetValue(key_father, i_father) Exit Do End If i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then i_child2 = i_child1 + 1 Dim comparable1 As IComparable = DirectCast(key_array.GetValue(i_child1), IComparable) If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable) If comparableMax.CompareTo(key_father) > 0 Then key_array.SetValue(key_array.GetValue(i_max_child), i_father) key_array.SetValue(key_father, i_max_child) Else key_array.SetValue(key_father, i_father) End If Exit Do End If Loop' End of sift loop Loop' End of sort heap loop Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Stop() End Sub
End Module
Public Class MatrixIncrement ' We don't want it to be generic, unless everything is generic. Implements IComparable
Public key As String Public inner_product As Double Public index As Int32
Public Function CompareTo(ByVal obj As Object) As Integer Implements System.IComparable.CompareTo ' Cast from Object to MatrixIncrement. ' Will bomb if it is passed something else. Dim other As MatrixIncrement = DirectCast(obj, MatrixIncrement) Return key.CompareTo(other.key) End Function End Class
Edit: Highlighted some of the changes.
| | jo0ls Friday, April 18, 2008 8:36 PM | Here I swapped to Object() instead of Array.
And I swapped to using a Comparer class instead of Comparable. This way we don't need to keep casting to IComparable.
I added a bit to check it actually was sorting!
Generating data Created data in: 1.3212753s Starting to build heap heap built in 0.3126916s sorted in: 4.1875777s total: 4.5011018s
The speed increase comes from using Object() instead of Array.
Code Snippet
Option Strict On Option Explicit On
Imports System.ComponentModel
Module Module1
Private rand As New Random
' Normal naming conventions mean Classes and ' methods should start with Capitals. Sub Main() Go() Go() ' use this time. Console.ReadKey() End Sub
Sub Go() Dim sw As New Stopwatch sw.Start() Console.WriteLine("Generating data") Dim n_keys As Int32 = 500000 ' .Net arrays have an element at 0 and ' you declare then with the upper bound Dim key_array(n_keys - 1) As MatrixIncrement For i = 0 To n_keys - 1 Dim s1 As Double = rand.NextDouble() key_array(i) = New MatrixIncrement key_array(i).key = String.Format("{0,16:f8}", s1) Next i sw.Stop() Console.WriteLine("Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() sw.Start() Dim comparer As New MatrixImcrementComparer ao_heap_sort02(key_array, n_keys, comparer) sw.Stop() Console.WriteLine("total: {0}s", sw.Elapsed.TotalSeconds.ToString) For i As Integer = 0 To 9 Console.WriteLine(key_array(i).ToString) Next For i As Integer = key_array.Length - 11 To key_array.Length - 1 Console.WriteLine(key_array(i).ToString) Next End Sub
' The framework's Array.Sort methods accept an IComparer Sub ao_heap_sort02(ByVal key_array() As Object, ByVal n As Int32, ByVal comparer As IComparer) Dim sw As New Stopwatch sw.Start() Dim i_2 As Int32 = 2 Dim i_first, i_middle, i_last As Int32 Dim i_father0 As Int32 Dim i_father, i_child1, i_child2, i_max_child As Int32 'We need one working location of data type the same as the 'components of parameter key_array(): Dim key_father As Object If n <= 0 Then Throw New ArgumentOutOfRangeException("n") If key_array.GetUpperBound(0) < 0 Then Throw New ArgumentException("key_array") i_first = 0 i_last = n - 1 i_middle = i_last \ i_2 i_father0 = i_middle + 1 'Build heap Console.WriteLine("Starting to build heap") Do i_father0 = i_father0 - 1 If i_father0 < i_first Then Exit Do key_father = key_array(i_father0) i_father = i_father0 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If comparer.Compare(key_array(i_max_child), key_father) > 0 Then key_array(i_father) = key_array(i_max_child) i_father = i_max_child Continue Do Else key_array(i_father) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array(i_father) = key_father Exit Do End If i_child1 = i_father + i_father' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If If comparer.Compare(key_array(i_max_child), key_father) > 0 Then key_array(i_father) = key_array(i_max_child) key_array(i_max_child) = key_father Else key_array(i_father) = key_father End If Exit Do End If Loop' End of sift loop Loop' End of build heap loop 'Sort heap Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() i_first = 0 i_last = n - 1 sw.Start() Do key_father = key_array(i_last) key_array(i_last) = key_array(i_first) key_array(i_first) = key_father i_father = i_first i_last = i_last - 1 If i_last <= i_first Then Exit Do i_middle = i_last \ 2 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If comparer.Compare(key_array(i_max_child), key_father) > 0 Then key_array(i_father) = key_array(i_max_child) i_father = i_max_child Continue Do Else key_array(i_father) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then key_array(i_father) = key_father Exit Do End If i_child1 = i_father + i_father' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If If comparer.Compare(key_array(i_max_child), key_father) > 0 Then key_array(i_father) = key_array(i_max_child) key_array(i_max_child) = key_father Else key_array(i_father) = key_father End If Exit Do End If Loop' End of sift loop Loop' End of sort heap loop Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Stop() End Sub
End Module
Public Class MatrixIncrement
Public key As String Public inner_product As Double Public index As Int32
Public Overrides Function ToString() As String ' we aren't interested in the others. Return key End Function
End Class
Public Class MatrixImcrementComparer Implements IComparer
Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare Dim matX As MatrixIncrement = DirectCast(x, MatrixIncrement) Dim matY As MatrixIncrement = DirectCast(y, MatrixIncrement) Return String.Compare(matX.key, matY.key) End Function
End Class
| | jo0ls Friday, April 18, 2008 8:54 PM | And here's a complicated one with Generics, it's a bit slower than the one above.
Generating data Created data in: 1.2289694s Starting to build heap heap built in 0.3783466s sorted in: 4.7357952s total: 5.1150947s
First we say the method is (Of T) to mean it is a generic method.
Now, we can swap back to using IComparable(Of T) as we are using generics.
If the method was just (Of T) then we still wouldn't be able to use CompareTo, as that is not a method of any Type, and T can stand for any Type. We need to make it a method (Of T but T must implement IComparable Of T). You can do that with a generic constraint. The syntax is:
Public Sub MySub(Of T As IComparable(Of T))
That hasn't got the parameters for the function yet, it is just stating that this is a generic Sub.
We can add more polymorphism to the sub, if we make keys_array of type IList(Of T). This means it can accept an array, a List(Of T), and anything other class that implements IList(Of T). So Yuo end up with a method sig like this:
Sub ao_heap_sort02(Of T As IComparable(Of T))(ByVal keys As IList(Of T), ByVal n As Int32)
So here's the Generic version.
Code Snippet
Option Strict On Option Explicit On
Imports System.ComponentModel
Module Module1
Private rand As New Random
' Normal naming conventions mean Classes and ' methods should start with Capitals. Sub Main() Go() Go() ' use this time. Console.ReadKey() End Sub
Sub Go() Dim sw As New Stopwatch sw.Start() Console.WriteLine("Generating data") Dim n_keys As Int32 = 500000 ' .Net arrays have an element at 0 and ' you declare then with the upper bound Dim key_array(n_keys - 1) As MatrixIncrement For i = 0 To n_keys - 1 Dim s1 As Double = rand.NextDouble() key_array(i) = New MatrixIncrement key_array(i).key = String.Format("{0,16:f8}", s1) Next i sw.Stop() Console.WriteLine("Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() sw.Start() ao_heap_sort02(key_array, n_keys) sw.Stop() Console.WriteLine("total: {0}s", sw.Elapsed.TotalSeconds.ToString) For i As Integer = 0 To 9 Console.WriteLine(key_array(i).ToString) Next For i As Integer = key_array.Length - 11 To key_array.Length - 1 Console.WriteLine(key_array(i).ToString) Next End Sub
Sub ao_heap_sort02(Of T As IComparable(Of T))(ByVal keys As IList(Of T), ByVal n As Int32) Dim sw As New Stopwatch sw.Start() Dim i_first, i_middle, i_last As Int32 Dim i_father0 As Int32 Dim i_father, i_child1, i_child2, i_max_child As Int32 'We need one working location of data type the same as the 'components of parameter key_array(): Dim key_father As T i_first = 0 i_last = n - 1 i_middle = i_last \ 2 i_father0 = i_middle + 1 'Build heap Console.WriteLine("Starting to build heap") Do i_father0 = i_father0 - 1 If i_father0 < i_first Then Exit Do key_father = keys(i_father0) i_father = i_father0 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If keys(i_max_child).CompareTo(key_father) > 0 Then keys(i_father) = keys(i_max_child) i_father = i_max_child Continue Do Else keys(i_father) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then keys(i_father) = key_father Exit Do End If i_child1 = i_father + i_father' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If If keys(i_max_child).CompareTo(key_father) > 0 Then keys(i_father) = keys(i_max_child) keys(i_max_child) = key_father Else keys(i_father) = key_father End If Exit Do End If Loop' End of sift loop Loop' End of build heap loop 'Sort heap Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Reset() i_first = 0 i_last = n - 1 sw.Start() Do key_father = keys(i_last) keys(i_last) = keys(i_first) keys(i_first) = key_father i_father = i_first i_last = i_last - 1 If i_last <= i_first Then Exit Do i_middle = i_last \ 2 'Sift Do If i_father < i_middle Then i_child1 = i_father + i_father i_child2 = i_child1 + 1 If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If If keys(i_max_child).CompareTo(key_father) > 0 Then keys(i_father) = keys(i_max_child) i_father = i_max_child Continue Do Else keys(i_father) = key_father Exit Do End If Else ' i_father >= i_middle If i_father > i_middle Then keys(i_father) = key_father Exit Do End If i_child1 = i_father + i_father' i_father = i_middle If i_child1 < i_last Then i_child2 = i_child1 + 1 If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then i_max_child = i_child1 Else i_max_child = i_child2 End If Else' i_child1 = i_last i_max_child = i_child1 End If If keys(i_max_child).CompareTo(key_father) > 0 Then keys(i_father) = keys(i_max_child) keys(i_max_child) = key_father Else keys(i_father) = key_father End If Exit Do End If Loop' End of sift loop Loop' End of sort heap loop Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString) sw.Stop() End Sub
End Module
Public Class MatrixIncrement Implements IComparable(Of MatrixIncrement)
Public key As String Public inner_product As Double Public index As Int32
Public Overrides Function ToString() As String ' we aren't interested in the others. Return key End Function
Public Function CompareTo(ByVal other As MatrixIncrement) As Integer Implements System.IComparable(Of MatrixIncrement).CompareTo Return key.CompareTo(other.key) End Function End Class
It is possibly faster as:
Sub ao_heap_sort02(Of T As IComparable(Of T))(ByVal keys() As T, ByVal n As Int32)
| | jo0ls Friday, April 18, 2008 10:40 PM | I went through again and tested on a different (slower) computer. I removed the Stopwatch and Console.WriteLine stuff from the sorting sub as it takes time. I started the stopwatch before calling the sorting sub, and stopped it on return. I used the same random seed in each example, so they all sorted the same items. I ran the release build 20 times for each example (except the first, which would take too long), and picked the fastest.
Original problem code: 352.6s First Fix: 6.72s Object() and an IComparer: 5.47s Generics, IComparable(Of T) constraint, keys As IList(Of T): 5.36s Generics, IComparable(Of T) constraint, keys() As T: 3.87 Generics, no constraint on T, keys As IList(Of T), comparer As IComparer(Of T): 5.26 Generics, no constraint on T, keys() As T, comparer As IComparer(Of T): 3.78
So, I think it's best not to get overly fancy with generics. Using an array of T is definately faster than using an IList. I guess there's an extra cast for it to do. Asking the user of the sub to supply an IComparer(Of T) might be a tiny bit faster thanimplementing comparablein the class, but this test wasn't in the least bit thorough. | | jo0ls Saturday, April 19, 2008 12:03 AM | jo01s:
SUPER answers!
Thanks for
Dim sw As New Stopwatch
So, once again I didn't have to write my own!
I also wrote my own for Copies("="c, 72) before I discovered
that this function was available!
Sometimes it's easier to write 10 lines of code than to find
something equivalent in the thousands of Web pages at MSDN!
Yes, on array subscript lower bounds, I was aware, as is
traditional in Basic, that Visual Basic .NET uses 0 instead of
1 as was long traditional and/or default in Fortran, PL/I,
etc.
But for heap sort, the algorithm makes heavy use of subscript
expressions such as
i \ 2
which does work for i = 1, 2, ..., n but which I would have to
check for i = 0, 1, 2, ..., n - 1.
So, for simplicity and one less thing to think about, I just
kept the array indexing i = 1, 2, ..., n. In doing this, I
waste the array component with subscript 0.
Since Visual Basic .NET is my first attempt at a Microsoft
Basic, I don't know about either "old" or "new" error handling
techniques! I just looked a some of
Jim Buyens, 'Web Database Development, Step by Step: .NET
Edition', ISBN 0-7356-1637-X, Microsoft Press, Redmond,
Washington, 2002.
found some simple things that appeared to work, and settled on
those. So I just put each 'contiguous block of executable
code' in Try-Catch with something like
error_code = 1001
error_code = 1002
...
ahead of anything risky, and in the corresponding Catch-End
Try block put some simple code to 'percolate' the error to the
calling routine.
But in some of my sorting routines I did leave some
Console.Writeline statements in the Catch-End Try 'block', and
that was unintentional; with my little scheme, such print
statements are supposed to be only in the main routine and, in
production, write to a log file, use SNMP, or some such.
Okay, I'm starting to understand: My class matrix_increment
does not inherit from a class that has a method CompareTo.
Since my code did run, I guessed that my code was okay. That
my code ran at all sounds like 'late binding' did a lot.
I will start using
Option Strict On
Option Explicit On
nearly universally. I am eager for all the help from the
compiler I can get. Compiler messages about when my code is
using late binding might be helpful also! I'm eager for the
power of late binding but only when I'm willing to pay the
price in execution time.
Gee, just now see from the VB compiler:
error BC30574: Option Strict On disallows late binding.
So, can be told when there needs to be late binding!
How with my class definition array.sort still ran so well, I
don't know! I did look at the C# code for array.sort and
didn't see anything very tricky with 'reflection' or using
actual memory addresses.
Maybe the big difference is that array.sort didn't try to do
the key comparisons with CompareTo and avoided this problem
with its statement
if (comparer == null) comparer = Comparer.Default;
One of your points is to have
Sub ao_heap_sort02(ByVal key_array() As Object, ByVal n As _
Int32, ByVal comparer As IComparer)
that is, so that the sort routine can do its pair-wise
comparisons, just pass to the sort routine an instance of a
class that implements the interface IComparer.
Curious: With your suggestions, I went ahead and wrote such
code. I wrote two versions: In one version I defined a class
that implements the interface IComparer and passed to the sort
routine the instance as an argument corresponding to a
parameter in the routine with type IComparer. So, I was
passing an instance of a class to a parameter of type
interface. Curious that I didn't get an error for this
'mismatch' between argument and parameter.
So, one lesson is NOT to complicate the definition of the
class, whose instances are being sorted in an array, with
comparison code. GOOD. Wondered about that. One reason is
that in some applications might have a class with, say, seven
properties and for various purposes would want to sort on
different ones of the properties. So, would want to have
several comparer classes, not just one, and certainly not just
one named CompareTo in the definition of the class of the
instances being sorted.
So, you showed me how to do that! Good!
You have
Public Class MatrixImcrementComparer
Implements IComparer
Public Function Compare(ByVal x As Object, ByVal y As Object) _
As Integer Implements System.Collections.IComparer.Compare
Dim matX As MatrixIncrement = DirectCast(x, MatrixIncrement)
Dim matY As MatrixIncrement = DirectCast(y, MatrixIncrement)
Return String.Compare(matX.key, matY.key)
End Function
End Class
Here maybe I understand: This piece of code is particular to
class MatrixIncrement yet, still, its parameters are declared
Object instead of MatrixIncrement. Maybe the reason is that
the polymorphic sort routine will pass arguments that are of
type Object so that having the parameters of type
MatrixIncrement would be a mismatch between arguments and
parameters and, possibly, an error, either in compilation or
execution.
But, inside the method, need to be particular to
MatrixIncrement. So, do a DirectCast. Here there is no issue
of 'how' the conversion is to be made because we are sure (we
could put in some code to check if we were not sure -- there
are examples on MSDN) that the actual arguments really are of
type MatrixIncrement.
Maybe this is what is going on.
I would guess that this polymorphic technique of doing all the
pair-wise comparisons would be horribly slow, but some of the
timings from both of us show that, somehow, the run time can
be commendably fast, maybe only a factor of 2 worse than with
no use of polymorphism.
Here is a summary of execution times on my computer for
sorting 500,000 strings of length 16 bytes where each array
component is either a string or an instance of a class where
one property of the class in such a string:
629.4062500 seconds own heap sort code sorting an array
of instances of a class where the
class has a messed up comparison
method definition
7.5312500 seconds own heap sort code sorting an array
of instances of a class where the
class and a comparison method are
defined following your suggestions
3.8125000 seconds array.sort sorting an array of
instances of a class where the
class has a messed up comparison
method definition
2.9218750 seconds own non-polymorphic heap sort code
sorting an array of strings by
surrogate
2.5312500 seconds own non-polymorphic heap sort code
soring an array of strings not by
surrogate
So your suggestions worked! Success! Now at least for simple
things I can write Visual Basic .NET polymorphic code as the
language designers intended and get decent performance!
You have shown me a LOT. THANKS!
The purpose of this project was to get me to understand how to
write polymorphic code, but for sorting more could be done:
o When sorting by surrogate, it is possible to write
the code to do a 'stable' sort, and that can be a
nice tool when sorting on a sequence of keys.
Sorting by surrogate might often be better when
sorting large or complicated objects.
o Sorting exploiting multiple processor cores should
soon be of importance.
For generics, I read the chapter in
Francesco Bolena, 'Programming Microsoft Visual Basic
2005: The Language', Microsoft Press, Redmond,
Washington, 2006.
didn't get even 10% of your understanding, and will have to
read the chapter again even to understand your use of the
syntax "Of T"!
For now, polymorphism will be enough for the real work.
Beyond sorting, the software development project plans include
a large scale use of collection classes -- 64 bit computing on
computers with 256 GB of main memory, hopefully more -- where
we will either have to make polymorphic code fast or write our
own AVL tree code, and very much would prefer just to use what
Microsoft already has in collection classes.
Back of the envelop arithmetic starting with k*n*log(n) from
the heap sort timings indicates that the collection class code
really MUST be able to exploit multiple cores, and not sure
that has been done yet or just how to do it. Hmm ....
Many thanks!
| | sigmawaite Sunday, April 20, 2008 7:27 PM | Curious: With your suggestions, I went ahead and wrote such code. I wrote two versions: In one version I defined a class that implements the interface IComparer and passed to the sort routine the instance as an argument corresponding to a parameter in the routine with type IComparer. So, I was passing an instance of a class to a parameter of type interface. Curious that I didn't get an error for this 'mismatch' between argument and parameter. This another bit of polymorphism - "many forms". If the class MyComparer implements IComparer, then you can assign it directly to a variable declared as being of type IComparer. People often say "an interface is a contract".
The contract means that all the methods of IComparer will be available in MyComparer. So, you create a MyComparer object and you are originally referencing it with a variable declared as being of type MyComparer, and so can use all the available methods and properties of the object.
Later you pass it to a function that assigns it to a variable declared as type IComparer. This is perfectly safe to do, as you can only call the methods defined in the Interface via this reference. All implementing classes have to be able to respond to these calls because of the contract. The object is the same object, but you are only able to use the parts exposed through the interface.
So, MyComparer has at least 3 "forms" - you can reference it with a variable declared as either MyComparer, IComparer or Object.
You have
Public Class MatrixImcrementComparer Implements IComparer
Public Function Compare(ByVal x As Object, ByVal y As Object) _ As Integer Implements System.Collections.IComparer.Compare Dim matX As MatrixIncrement = DirectCast(x, MatrixIncrement) Dim matY As MatrixIncrement = DirectCast(y, MatrixIncrement) Return String.Compare(matX.key, matY.key) End Function
End Class
Here maybe I understand: This piece of code is particular to class MatrixIncrement yet, still, its parameters are declared Object instead of MatrixIncrement. Maybe the reason is that the polymorphic sort routine will pass arguments that are of type Object so that having the parameters of type MatrixIncrement would be a mismatch between arguments and parameters and, possibly, an error, either in compilation or execution.
The class is implementing IComparer, rather than IComparer(Of T). IComparer has Object in its signature, and so we have to use Object. But why didn't I use IComparer(Of T)?..
If we used IComparer(Of MatrixIncrement), then the soring routine would need IComparer(Of MatrixIncrement) in its signature. This would then make it pointless to use Object() or Array, as you are tied to MatrixIncrement. Later when I switched to the generic Function we could use the generic IComparer.
But, inside the method, need to be particular to MatrixIncrement. So, do a DirectCast. Here there is no issue of 'how' the conversion is to be made because we are sure (we could put in some code to check if we were not sure -- there are examples on MSDN) that the actual arguments really are of type MatrixIncrement.
Yes, that's right. We don't want to late bind, but we do want the CompareTo method so we need to cast. I skipped error checking. You need to check you have non-null objects of the correct type. TryCast will attempt to cast the type, it won't throw an exception if there is an error:
Public Class MatrixImcrementComparer Implements IComparer
Public Function Compare(ByVal x As Object, ByVal y As Object) _ As Integer Implements System.Collections.IComparer.Compare Dim matX As MatrixIncrement = TryCast(x, MatrixIncrement) Dim matY As MatrixIncrement = TryCast(y, MatrixIncrement) If matX Is Nothing Then Throw New ArgumentException("x is of the wrong type or null") If matY Is Nothing Then Throw New ArgumentException("y is of the wrong type or null") Return String.Compare(matX.key, matY.key) End Function
End Class
The generics syntax takes quite a whle to get used to.
| | jo0ls Monday, April 21, 2008 4:29 AM |
|