Visual Basic Development Bookmark and Share   
 Home > Visual Basic Language > Polymorphism Performance
 

Polymorphism Performance

In Visual Basic .NET I wrote some
polymorphic code, and the performance is
just AWFUL, about 200 times worse than my
similar non-polymorphic code.

But the VB array.sort is polymorphic and
nicely fast, nearly as fast as my fastest
non-polymorphic code.

Clearly the people who wrote array.sort
know some things about writing fast
polymorphic code I don't.

What do I need to do to get my polymorphic
code to run fast?

Mostly I want to understand the general
situation, but I have a pressing problem.

I need to sort an array of strings 'by
surrogate' (explained below) to use one of
D. Knuth's terms. The number of keys to
be sorted should be maybe 100 million now
and more later when we move to 64 bit
computing. So I wrote some heap sort
code. I know; I know: array.sort
implements quicksort which on average is
faster and has much better locality of
reference in virtual memory and has the
promise of being able to exploit
multi-core processors well, but heap sort
has better guaranteed performance.
Besides, sort algorithms are not the point
of my question.

Beyond the pressing problem, for other
work really SHOULD understand how to get
good performance with polymorphism.

So, to get started, both on the pressing
problem and on understanding performance
with polymorphism, I wrote some heap sort
routines. One of these is

Sub astsu_heap_sort01( _
ByRef integer_pointers() As Int32, _
ByRef key_array() As String, _
ByVal n As Int32 )

This routine name abbreviates 'ascending
string by surrogate heap sort'.

Begin Flame

I know; at least now I know: When working
with 'events', there can be some dangers
in using the underscore character in
names, but the Pascal naming convention,
e.g., AStSuHeapSort would be okay except
Visual Basic (VB) is case insensitive so
that ItSlinks will be the same as
ItsLinks, etc. I know; I know; Visual
Studio has a practical fix for this issue,
but I hate nearly all GUIs and in
particular all IDEs.

End Flame

So, upon entry the number of array
components to sort is

ByVal n As Int32

Upon entry, for i = 1, 2, ..., n,

integer_pointers(i)

is a permutation on the set of integers
{1, 2, ..., n}. In particular, can have

integer_pointers(i) = i

The 'keys' in this sort are

key_array(i)

Upon return, for i = 1, 2, ..., n - 1,

key_array( integer_pointers( i ) ) <= _
key_array( integer_pointers( i + 1 ) )

This 'pointer indirection' explains the
meaning of 'by surrogate'.

The components of

key_array()

are read-only, that is, never changed.

Well, on my 1.8 Ghz single core processor,
when sorting strings 16 bytes long of
random decimal numbers, with n = 500,000
the execution time is

2.9218750 seconds.

Times with different values of n show that
the execution time does grow like n*log(n)
as it is supposed to.

This execution time is nicely fast.

For something a little faster, I wrote

Sub ast_heap_sort02( _
ByRef key_array() As String, _
ByVal n As Int32 )

'ascending string heap sort', which is
just a sort in the usual sense and not
sort 'by surrogate'. On the same array of
strings, the time was

2.5312500 seconds.

a little faster. The last two timings
begin to suggest that in the second case
Visual Basic regarded String as a
'reference type' and moved only
'references' (pointers) and never moved
any strings at all and, in particular,
made no use of 'garbage collection' memory
management. Good. So, both cases were in
effect 'by surrogate', and the second case
that let Visual Basic handle the pointer
logic was a little faster. Good.

Begin Note

If Visual Basic is sufficiently clever,
then, even when in a sort routine there is
one variable for temporary use for the
strings being sorted, say, key_father, a
statement such as

key_father = key_array( i )

still will move no string data and,
instead, just increase by 1 a 'usage
count' on the storage for key_array(i).

In the case key_array(i) is an array of
instances of a class, there could be an
issue and a performance issue for how to
handle key_father.

End Note

In comparison, on the same array of
strings,

array.sort( Array, Int32, Int32 )

ran in

3.5312500 seconds,

slower but nicely commendable.

But for my application, I need either to
sort an array of strings by surrogate or
sort an array of instances of a class that
has properties for both the key and the
'record'.

So, I defined a Class, matrix_increment,
with a method CompareTo, wrote

Dim key_array( n_keys ) As matrix_increment

and passed it to 'ascending object heap
sort' routine

Sub ao_heap_sort02( _
ByRef key_array As Array, _
ByVal n As Int32 )

Now the execution time was

605.2031250 seconds.

WOW!!!

So, I changed the definition of the class
to implement the IComparable interface,
ran again, and got

629.4062500 seconds,

which is even worse!

Then, with this same class with the
IComparable interface I passed the same
array of instances of the class
matrix_increment to

array.sort( Array, Int32, Int32 )

and got execution time

3.5312500 seconds

which is much more like how the world
should be. Non-object oriented,
non-polymorphic sorting by surrogate is
significantly faster, but array.sort
totally blew the doors off my disastrous
OO-polymorphic results of

605.2031250 seconds

and

629.4062500 seconds.

The MSDN documentation for this version of
array.sort says that the parameters are:

Public Shared Sub Sort ( _
array As Array, _
index As Integer, _
length As Integer _
)

which I tried to imitate in

Sub ao_heap_sort02( _
ByRef key_array As Array, _
ByVal n As Int32 )

So, I write some old-fashioned code,
non-object oriented, non-polymorphic, that
runs in

2.9218750 seconds

for the by surrogate version or

2.5312500 seconds

without by surrogate and where array.sort
runs in

3.5312500 seconds.

Then I define a class and pass an array of
instances to a parameter As Array and get

605.2031250 seconds.

without IComparable and

629.4062500 seconds

with IComparable while on the same array
of instances with IComparable array.sort
runs in

3.5312500 seconds.

So, how can I write fast polymorphic code?

What do the authors of array.sort
understand about writing fast polymorphic
code that I do not?
sigmawaite  Monday, April 14, 2008 6:55 AM

Arrays in .Net can store an element at index 0. You decalre then with the upper bound, not the size. Hence:

Dim ints(0) As Integer

would have room for 1 element.

This just means your array is 1 element too big and has nothing at index 0, which isn't a big deal as you don't start at 0.

--------------------

Your error catching is a mix of old-style vb and vb.Net. You should use one or the other.

I'll just get rid of it for now.

--------------------

You don't need to pass the Array ByRef as it is a reference type. Changes to the array are propogated back to calling code.

--------------------

If you switch Option Strict Onand Option Explicit On, then you get a huge list of problems. (Always have these two options set).

There are two problems it detected, repeated many times:

--------------------

First accessing the Array like this:

key_father = key_array(i_father0)

Is not allowed, you must use:

key_father = key_array.GetValue(i_father0)

(this is what I was muttering about in an earlier post)

--------------------

Secondly, the type of the elements in the array is Object, and Object doesn't have a CompareTo method. At runtime then, it ishaving to find the CompareTo method using late binding.

You can't implement the generic Comparable(Of T), as you aren't using generics anywhere else. I'll show that later. So I changed it to the normal Comparable.

Now, to get around this late binding, you need to have a Type that DOES have a CompareTo method, and at the moment that type is either matrix_increment or IComparable. You can't go casting the objects to max_increment, as this method is supposed to be able to cope with different types. So you need to cast to IComparable.

--------------------

To follow:

A Demo of the problem.

A fixed, butugly,version.

Then a neater fixed version, using a separate comparer class and Object() instead of Array.

Then a generic version.

It should get faster with each one.

jo0ls  Friday, April 18, 2008 8:01 PM
It's hard to say without seeing the code for these methods.

Have a look at the source code for the Frameworks Array.Sort method.

You could use Reflector, or if you have VS 2008 you can set up up to get the source code from microsoft - though this will be in C# (the code from reflector can be in VB.Net). The C# code for the array class is also available here:

http://www.dotnet247.com/247reference/System/Array/__rotor

Complicated.

It seems to end up in

public static void Sort(Array keys, Array items, int index, int length, IComparer comparer)

Where it makes a decision,
can the keys and items arrays be cast to Object()? If so it calls one sort method that it says can use "fast jit helpers", otherwise it calls another that is slow. I wonder how that works?!?

Edit: Ah, it is probably talking about the difference between szarrays and mdarrays. It's not important as you have an array of Classes. Use Object() instead of Array. (Better still, use generic types instead of Object()).

"There's always the choice of using object[] and Array when using covariance to right general array functions. Object[] in many cases are faster because it is clear the array is an SZARRAY and IL has special instructions to call for setting and getting elements. However, Array can always hold arrays of value-types as well as multidimensional arrays."

http://www.codeproject.com/KB/dotnet/arrays.aspx


That's probably not the reason for the slow down, just a minor tweak.





jo0ls  Tuesday, April 15, 2008 12:26 PM
I had a go at heapsort (using the WikiPedia pseudocode) using an IComparer and using Object(). I get about4 seconds.

Here's the class we will sort

Code Snippet
Option Strict On
Option
Explicit On

Public
Class MyThing

Private MyPath As String
Private MyNumber As Integer
Private Shared rand As New Random

Public ReadOnly Property Path() As String
Get
Return MyPath
End Get
End Property

Public ReadOnly Property Number() As Integer
Get
Return MyNumber
End Get
End Property

Sub New()
Me.MyPath = IO.Path.GetRandomFileName
Me.MyNumber = rand.Next
End Sub

Public Overrides Function ToString() As String
Return String.Format("Path: {0}, Number: {1}", MyPath, MyNumber.ToString)
End Function

End
Class

Here's an IComparer for it

Code Snippet
Option Strict On
Option
Explicit On

Public
Class MyThingComparer
Implements IComparer

Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements System.Collections.IComparer.Compare
' No checks at all.
Dim thingX As MyThing = DirectCast(x, MyThing)
Dim thingY As MyThing = DirectCast(y, MyThing)
' CompareOrdinal is a bit faster than compare, but there are caveats
Return String.CompareOrdinal(thingX.Path, thingY.Path)
End Function

End
Class

And here's a form that does HeapSort

Code Snippet
Option Strict On
Option
Explicit On

Imports
System.ComponentModel
Imports System.Text

Public Class Form1

Private WithEvents bTest As New Button
Private WithEvents bTime As New Button
Private tb As New TextBox
Private Const ArraySize As Integer = 500000
Private WithEvents bw As New BackgroundWorker

Sub New()
InitializeComponent()
Dim sw As New Stopwatch
Me.Controls.AddRange(New Control() {tb, bTest, bTime})
tb.Multiline =
True
tb.Dock = DockStyle.Top
tb.Height =
Me.ClientSize.Height - bTest.Height - 10
tb.ScrollBars = ScrollBars.Vertical
bTest.Location =
New Point(Me.ClientRectangle.Right - (bTest.Width * 2) - 10, tb.Bottom + 5)
bTest.Text =
"Small Test"
bTime.Location = New Point(bTest.Right + 5, bTest.Top)
bTime.Text =
"Time Test"
End Sub

Private Delegate Sub WriteOutDelegate(ByVal s As String)

' Thread safe write to textbox.
Private Sub WriteOut(ByVal s As String)
If Me.InvokeRequired Then
Dim del As New WriteOutDelegate(AddressOf WriteOut)
Me.Invoke(del, New Object() {s})
Else
tb.Text &= s & vbCrLf
End If
End Sub

Private Sub bTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTest.Click
tb.Clear()
Dim things(10) As MyThing
Dim sb As New StringBuilder()
For i As Integer = 0 To things.Length - 1
things(i) =
New MyThing
sb.AppendLine(things(i).ToString)
Next
WriteOut("Before: ")
WriteOut(
"--------")
WriteOut(sb.ToString)
HeapSort(things, things.Length,
New MyThingComparer)
sb =
New StringBuilder
For i As Integer = 0 To things.Length - 1
sb.AppendLine(things(i).ToString)
Next
WriteOut("After: ")
WriteOut(
"--------")
WriteOut(sb.ToString)
End Sub

Private Sub bTime_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTime.Click
Me.bTime.Enabled = False
tb.Clear()
WriteOut(
"Creating array size: " & ArraySize.ToString)
bw.RunWorkerAsync()
End Sub

Private Sub bw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bw.DoWork
Dim rolex As New Stopwatch
Dim things(ArraySize - 1) As MyThing
For i As Integer = 0 To things.Length - 1
things(i) =
New MyThing
Next
WriteOut("Sorting...")
Application.DoEvents()
rolex.Reset()
rolex.Start()
HeapSort(things, things.Length,
New MyThingComparer)
rolex.Stop()
WriteOut(
"It took: " & (rolex.ElapsedMilliseconds / 1000).ToString("F3") & "s")
End Sub

Private Sub bw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bw.RunWorkerCompleted
bTime.Enabled =
True
End Sub

Sub HeapSort(ByVal a() As Object, ByVal count As Integer, ByVal comparer As IComparer)
'(first place a in max-heap order)
Heapify(a, count, comparer)

Dim last As Integer = count - 1
While last > 0
' swap the root(maximum value) of the heap with the
' last element of the heap)
Dim temp As Object = a(last)
a(last) = a(0)
a(0) = temp
' decrease the size of the heap by one so that the
' previous max value will stay in its proper placement
last -= 1
' put the heap back in max-heap order
SiftDown(a, 0, last, comparer)
End While
End Sub

Sub Heapify(ByVal a() As Object, ByVal count As Integer, ByVal comparer As IComparer)
'(start is assigned the index in a of the last parent node)
Dim start As Integer = (count - 1) \ 2

While start >= 0
' sift down the node at index start to the proper place
' such that all nodes below the start index are in heap
' order
SiftDown(a, start, count - 1, comparer)
start -= 1
' after sifting down the root all nodes/elements are in heap order
End While
End Sub

Sub SiftDown(ByVal a() As Object, ByVal start As Integer, ByVal last As Integer, ByVal comparer As IComparer)
' last represents the limit of how far down the heap to sift.
Dim root As Integer = start
' While the root has at least one child
While (root * 2) + 1 <= last
' root*2+1 points to the left child
Dim child As Integer = root * 2 + 1
' If the child has a sibling and the child's value is less
' than its sibling's...
' (Here we use the IComparer which hopefully is up to the task.
' Array.Sort checks the comparer is fit for duty.)
If (child < last) AndAlso comparer.Compare(a(child), a(child + 1)) < 0 Then
' ... point to the right child instead
child += 1
End If
If comparer.Compare(a(root), a(child)) < 0 Then
' out of max-heap order
Dim temp As Object = a(root)
a(root) = a(child)
a(child) = temp
' repeat to continue sifting down the child now
root = child
Else
Return
End If
End While
End Sub

End
Class

I'll mess with it later and see how other ways compare. It cries out forgeneric types.

jo0ls  Tuesday, April 15, 2008 6:00 PM

Changing over to Array instead of Object() and accessing values with GetValue SetValue adds a couple of seconds (I get about 6s):

Code Snippet
Option Strict On
Option
Explicit On

Imports
System.ComponentModel
Imports System.Text

Public Class Form1

Private WithEvents bTest As New Button
Private WithEvents bTime As New Button
Private tb As New TextBox
Private Const ArraySize As Integer = 500000
Private WithEvents bw As New BackgroundWorker

Sub New()
InitializeComponent()
Dim sw As New Stopwatch
Me.Controls.AddRange(New Control() {tb, bTest, bTime})
tb.Multiline =
True
tb.Dock = DockStyle.Top
tb.Height =
Me.ClientSize.Height - bTest.Height - 10
tb.ScrollBars = ScrollBars.Vertical
bTest.Location =
New Point(Me.ClientRectangle.Right - (bTest.Width * 2) - 10, tb.Bottom + 5)
bTest.Text =
"Small Test"
bTime.Location = New Point(bTest.Right + 5, bTest.Top)
bTime.Text =
"Time Test"
End Sub

Private Delegate Sub WriteOutDelegate(ByVal s As String)

' Thread safe write to textbox.
Private Sub WriteOut(ByVal s As String)
If Me.InvokeRequired Then
Dim del As New WriteOutDelegate(AddressOf WriteOut)
Me.Invoke(del, New Object() {s})
Else
tb.Text &= s & vbCrLf
End If
End Sub

Private Sub bTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTest.Click
tb.Clear()
Dim things(10) As MyThing
Dim sb As New StringBuilder()
For i As Integer = 0 To things.Length - 1
things(i) =
New MyThing
sb.AppendLine(things(i).ToString)
Next
WriteOut("Before: ")
WriteOut(
"--------")
WriteOut(sb.ToString)
HeapSort(things, things.Length,
New MyThingComparer)
sb =
New StringBuilder
For i As Integer = 0 To things.Length - 1
sb.AppendLine(things(i).ToString)
Next
WriteOut("After: ")
WriteOut(
"--------")
WriteOut(sb.ToString)
End Sub

Private Sub bTime_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTime.Click
Me.bTime.Enabled = False
tb.Clear()
WriteOut(
"Creating array size: " & ArraySize.ToString)
bw.RunWorkerAsync()
End Sub

Private Sub bw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bw.DoWork
Dim rolex As New Stopwatch
Dim things(ArraySize - 1) As MyThing
For i As Integer = 0 To things.Length - 1
things(i) =
New MyThing
Next
WriteOut("Sorting...")
Application.DoEvents()
rolex.Reset()
rolex.Start()
HeapSort(things, things.Length,
New MyThingComparer)
rolex.Stop()
WriteOut(
"It took: " & (rolex.ElapsedMilliseconds / 1000).ToString("F3") & "s")
End Sub

Private Sub bw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bw.RunWorkerCompleted
bTime.Enabled =
True
End Sub

Sub HeapSort(ByVal a As Array, ByVal count As Integer, ByVal comparer As IComparer)
'(first place a in max-heap order)
Heapify(a, count, comparer)

Dim last As Integer = count - 1
While last > 0
' swap the root(maximum value) of the heap with the
' last element of the heap)
Dim temp As Object = a.GetValue(last)
a.SetValue(a.GetValue(0), last)
a.SetValue(temp, 0)
' decrease the size of the heap by one so that the
' previous max value will stay in its proper placement
last -= 1
' put the heap back in max-heap order
SiftDown(a, 0, last, comparer)
End While
End Sub

Sub Heapify(ByVal a As Array, ByVal count As Integer, ByVal comparer As IComparer)
'(start is assigned the index in a of the last parent node)
Dim start As Integer = (count - 1) \ 2

While start >= 0
' sift down the node at index start to the proper place
' such that all nodes below the start index are in heap
' order
SiftDown(a, start, count - 1, comparer)
start -= 1
' after sifting down the root all nodes/elements are in heap order
End While
End Sub

Sub SiftDown(ByVal a As Array, ByVal start As Integer, ByVal last As Integer, ByVal comparer As IComparer)
' last represents the limit of how far down the heap to sift.
Dim root As Integer = start
' While the root has at least one child
While (root * 2) + 1 <= last
' root*2+1 points to the left child
Dim child As Integer = root * 2 + 1
' If the child has a sibling and the child's value is less
' than its sibling's...
If (child < last) AndAlso comparer.Compare(a.GetValue(child), a.GetValue(child + 1)) < 0 Then
' ... point to the right child instead
child += 1
End If
If comparer.Compare(a.GetValue(root), a.GetValue(child)) < 0 Then
' out of max-heap order
Dim temp As Object = a.GetValue(root)
a.SetValue(a.GetValue(child), root)
a.SetValue(temp, child)
' repeat to continue sifting down the child now
root = child
Else
Return
End If
End While
End Sub

End
Class

jo0ls  Tuesday, April 15, 2008 6:13 PM

Using Generics is consistantly faster than using Object(), I've not tested extensively.

We need an IComparer(Of T) instead of a straight comparer.

Code Snippet
Option Strict On
Option
Explicit On

Public
Class MyThingComparer
Implements IComparer(Of MyThing)

Public Function Compare(ByVal x As MyThing, ByVal y As MyThing) As Integer Implements System.Collections.Generic.IComparer(Of MyThing).Compare
' CompareOrdinal is a bit faster than compare, but there are caveats
Return String.CompareOrdinal(x.Path, y.Path)
End Function

End
Class

And change all the methods to use generics.

Code Snippet
Option Strict On
Option
Explicit On

Imports
System.ComponentModel
Imports System.Text

Public Class Form1

Private WithEvents bTest As New Button
Private WithEvents bTime As New Button
Private tb As New TextBox
Private Const ArraySize As Integer = 500000
Private WithEvents bw As New BackgroundWorker

Sub New()
InitializeComponent()
Dim sw As New Stopwatch
Me.Controls.AddRange(New Control() {tb, bTest, bTime})
tb.Multiline =
True
tb.Dock = DockStyle.Top
tb.Height =
Me.ClientSize.Height - bTest.Height - 10
tb.ScrollBars = ScrollBars.Vertical
bTest.Location =
New Point(Me.ClientRectangle.Right - (bTest.Width * 2) - 10, tb.Bottom + 5)
bTest.Text =
"Small Test"
bTime.Location = New Point(bTest.Right + 5, bTest.Top)
bTime.Text =
"Time Test"
End Sub

Private Delegate Sub WriteOutDelegate(ByVal s As String)

' Thread safe write to textbox.
Private Sub WriteOut(ByVal s As String)
If Me.InvokeRequired Then
Dim del As New WriteOutDelegate(AddressOf WriteOut)
Me.Invoke(del, New Object() {s})
Else
tb.Text &= s & vbCrLf
End If
End Sub

Private Sub bTest_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTest.Click
tb.Clear()
Dim things(10) As MyThing
Dim sb As New StringBuilder()
For i As Integer = 0 To things.Length - 1
things(i) =
New MyThing
sb.AppendLine(things(i).ToString)
Next
WriteOut("Before: ")
WriteOut(
"--------")
WriteOut(sb.ToString)
HeapSort(things, things.Length,
New MyThingComparer)
sb =
New StringBuilder
For i As Integer = 0 To things.Length - 1
sb.AppendLine(things(i).ToString)
Next
WriteOut("After: ")
WriteOut(
"--------")
WriteOut(sb.ToString)
End Sub

Private Sub bTime_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles bTime.Click
Me.bTime.Enabled = False
tb.Clear()
WriteOut(
"Creating array size: " & ArraySize.ToString)
bw.RunWorkerAsync()
End Sub

Private Sub bw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bw.DoWork
Dim rolex As New Stopwatch
Dim things(ArraySize - 1) As MyThing
For i As Integer = 0 To things.Length - 1
things(i) =
New MyThing
Next
WriteOut("Sorting...")
Application.DoEvents()
rolex.Reset()
rolex.Start()
HeapSort(
Of MyThing)(things, things.Length, New MyThingComparer)
rolex.Stop()
WriteOut(
"It took: " & (rolex.ElapsedMilliseconds / 1000).ToString("F3") & "s")
End Sub

Private Sub bw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bw.RunWorkerCompleted
bTime.Enabled =
True
End Sub

Sub HeapSort(Of T)(ByVal a() As T, ByVal count As Integer, ByVal comparer As IComparer(Of T))
'(first place a in max-heap order)
Heapify(Of T)(a, count, comparer)

Dim last As Integer = count - 1
While last > 0
' swap the root(maximum value) of the heap with the
' last element of the heap)
Dim temp As T = a(last)
a(last) = a(0)
a(0) = temp
' decrease the size of the heap by one so that the
' previous max value will stay in its proper placement
last -= 1
' put the heap back in max-heap order
SiftDown(a, 0, last, comparer)
End While
End Sub

Sub Heapify(Of T)(ByVal a() As T, ByVal count As Integer, ByVal comparer As IComparer(Of T))
'(start is assigned the index in a of the last parent node)
Dim start As Integer = (count - 1) \ 2

While start >= 0
' sift down the node at index start to the proper place
' such that all nodes below the start index are in heap
' order
SiftDown(a, start, count - 1, comparer)
start -= 1
' after sifting down the root all nodes/elements are in heap order
End While
End Sub

Sub SiftDown(Of T)(ByVal a() As T, ByVal start As Integer, ByVal last As Integer, ByVal comparer As IComparer(Of T))
' last represents the limit of how far down the heap to sift.
Dim root As Integer = start
' While the root has at least one child
While (root * 2) + 1 <= last
' root*2+1 points to the left child
Dim child As Integer = root * 2 + 1
' If the child has a sibling and the child's value is less
' than its sibling's...
If (child < last) AndAlso comparer.Compare(a(child), a(child + 1)) < 0 Then
' ... point to the right child instead
child += 1
End If
If comparer.Compare(a(root), a(child)) < 0 Then
' out of max-heap order
Dim temp As T = a(root)
a(root) = a(child)
a(child) = temp
' repeat to continue sifting down the child now
root = child
Else
Return
End If
End While
End Sub

End
Class

jo0ls  Tuesday, April 15, 2008 6:23 PM
Below are some details on how got absurdly long execution time
of

629.4062500 seconds

sorting 500,000 instances of a simple class that implements
the interface IComparable.

The materials below are in three parts:

Calling Ascending Object Sort Routine
Class Whose Instances Are to Be Sorted
Ascending Object Heap Sort

The question is, why is the sorting so slow?

<---------------------------------->

Calling Ascending Object Sort Routine

Here we have

Dim n_keys As Int32

Dim key_array( n_keys ) As matrix_increment

We have

n_keys = 500000

Routine

Private Sub get_times ( _
ByRef total_seconds As Double, _
ByRef increment_seconds As Double )

uses

current_clock.Ticks

to read the clock and help in finding execution time.

To fill array key_array() we run

For i = 1 To n_keys
s1 = random_variable.NextDouble()
key_array( i ) = New matrix_increment
key_array( i ).key = String.Format("{0,16:f8}", s1 )
Next i

For the sort, we run

Console.WriteLine(routine_name & _
": Calling ast_heap_sort02 ... " )

get_times( test_start_time, time0 )

Console.WriteLine(routine_name & _
": test_start_time = {0, 15:f7} seconds.", test_start_time )

ao_heap_sort02( key_array, n_keys )

get_times( test_end_time, time0 )

Execution time was

n_keys time (seconds)
--------- --------------
500,000 629.4062500

What's wrong?

<---------------------------------->

Class Whose Instances Are to Be Sorted

Class matrix_increment
Implements IComparable(Of matrix_increment)

Public key As String
Public inner_product As Double
Public index As Int32

Public Overloads Function CompareTo( _
ByVal other As matrix_increment ) As Integer _
Implements IComparable( _
Of matrix_increment).CompareTo

Return key.CompareTo( other.key )

End Function

End Class

<---------------------------------->

Ascending Object Heap Sort


' AO_HEAP_SORT02.VB --
'
' Ascending object heap sort.
'
' For i = 1, 2, ..., n, sort components of key_array(i)
' into ascending order using heap sort.
'
' We check parameters for reasonable values and, in case of
' a error, raise the error condition.
'
' Upon return, for i = 1, 2, ..., n - 1,
'
' key_array( i ) <= key_array( i + 1 )
'
' The array subscript logic here is done carefully enough
' that it shuld never overflow even for the largest
' possible value of n.
'
' For some constant k (depending on the computer, compiler,
' etc. but not depending on the parameters) the execution
' time is guaranteed to be closely just
'
' k*n*log(n)
'
' in the worst case.
'
' The code here needs a 'sift' loop in two places; the code
' is the same in both places.
'
' The work here is 'in place', that is, uses no storage
' whose size depends on the passed parameters.

Sub ao_heap_sort02( _
ByRef key_array As Array, _
ByVal n As Int32 )

Dim routine_name As String = "ao_heap_sort02"

Dim error_code As Int32 = 0

Dim i_2 As Int32 = 2

Dim i_first, i_middle, i_last As Int32

Dim i_father0 As Int32

Dim i_father, i_child1, i_child2, i_max_child As Int32

' We need one working location of data type the same as the
' components of parameter key_array():

Dim key_father As Object

Try

' Get started:

error_code = 1001

If n <= 0 Then err.raise(error_code, routine_name)

error_code = 1002

If key_array.GetUpperBound(0) < 0 Then
err.raise(error_code, routine_name)
End If

error_code = 1003

i_first = 1

i_last = n

i_middle = i_last \ i_2

i_father0 = i_middle + 1

' Build heap

Do
i_father0 = i_father0 - 1
If i_father0 < i_first Then Exit Do
key_father = key_array( i_father0 )
i_father = i_father0

' Sift

Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If key_array( i_max_child ).CompareTo( key_father ) > 0 Then
key_array( i_father ) = key_array( i_max_child )
i_father = i_max_child
Continue Do
Else
key_array( i_father ) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array( i_father ) = key_father
Exit Do
End If
i_child1 = i_father + i_father ' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else ' i_child1 = i_last
i_max_child = i_child1
End If
If key_array( i_max_child ).CompareTo( key_father ) > 0 Then
key_array( i_father ) = key_array( i_max_child )
key_array ( i_max_child ) = key_father
Else
key_array ( i_father ) = key_father
End If
Exit Do

End If

Loop ' End of sift loop

Loop ' End of build heap loop

' Sort heap

i_first = 1

i_last = n

Do

key_father = key_array( i_last )

key_array( i_last ) = key_array( i_first )

key_array( i_first ) = key_father

i_father = i_first

i_last = i_last - 1

If i_last <= i_first Then Exit Do

i_middle = i_last \ 2

' Sift

Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If key_array( i_max_child ).CompareTo( key_father ) > 0 Then
key_array( i_father ) = key_array( i_max_child )
i_father = i_max_child
Continue Do
Else
key_array( i_father ) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array( i_father ) = key_father
Exit Do
End If
i_child1 = i_father + i_father ' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If key_array( i_child1 ).CompareTo( key_array( i_child2 ) ) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else ' i_child1 = i_last
i_max_child = i_child1
End If
If key_array( i_max_child ).CompareTo( key_father ) > 0 Then
key_array( i_father ) = key_array( i_max_child )
key_array ( i_max_child ) = key_father
Else
key_array ( i_father ) = key_father
End If
Exit Do

End If

Loop ' End of sift loop

Loop ' End of sort heap loop

Catch

Console.WriteLine( " err.number = " & err.number )
Console.WriteLine( " err.source = " & err.source )

err.raise(error_code, routine_name)

End Try

out:

Return

End Sub

sigmawaite  Friday, April 18, 2008 2:20 AM

Arrays in .Net can store an element at index 0. You decalre then with the upper bound, not the size. Hence:

Dim ints(0) As Integer

would have room for 1 element.

This just means your array is 1 element too big and has nothing at index 0, which isn't a big deal as you don't start at 0.

--------------------

Your error catching is a mix of old-style vb and vb.Net. You should use one or the other.

I'll just get rid of it for now.

--------------------

You don't need to pass the Array ByRef as it is a reference type. Changes to the array are propogated back to calling code.

--------------------

If you switch Option Strict Onand Option Explicit On, then you get a huge list of problems. (Always have these two options set).

There are two problems it detected, repeated many times:

--------------------

First accessing the Array like this:

key_father = key_array(i_father0)

Is not allowed, you must use:

key_father = key_array.GetValue(i_father0)

(this is what I was muttering about in an earlier post)

--------------------

Secondly, the type of the elements in the array is Object, and Object doesn't have a CompareTo method. At runtime then, it ishaving to find the CompareTo method using late binding.

You can't implement the generic Comparable(Of T), as you aren't using generics anywhere else. I'll show that later. So I changed it to the normal Comparable.

Now, to get around this late binding, you need to have a Type that DOES have a CompareTo method, and at the moment that type is either matrix_increment or IComparable. You can't go casting the objects to max_increment, as this method is supposed to be able to cope with different types. So you need to cast to IComparable.

--------------------

To follow:

A Demo of the problem.

A fixed, butugly,version.

Then a neater fixed version, using a separate comparer class and Object() instead of Array.

Then a generic version.

It should get faster with each one.

jo0ls  Friday, April 18, 2008 8:01 PM

Here's a demo of the problem.

Generating data
Created data in: 1.1407463s
Starting to build heap
heap built in 18.1453599s
sorted in: 306.5314217s
total: 324.702742s

Code Snippet


Imports System.ComponentModel

Module Module1

Private rand As New Random
Private running As Boolean

Sub Main()
Go()
Go()
Console.ReadKey()
End Sub

Sub Go()
Dim sw As New Stopwatch
sw.Start()
Console.WriteLine(
"Generating data")
Dim n_keys As Int32 = 500000
Dim key_array(n_keys) As MatrixIncrement
For i = 1 To n_keys
Dim s1 As Double = rand.NextDouble()
key_array(i) =
New MatrixIncrement
key_array(i).key =
String.Format("{0,16:f8}", s1)
Next i
sw.Stop()
Console.WriteLine(
"Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
sw.Start()
ao_heap_sort02(key_array, n_keys)
sw.Stop()
Console.WriteLine(
"total: {0}s", sw.Elapsed.TotalSeconds.ToString)
End Sub

Sub ao_heap_sort02(ByVal key_array As Array, ByVal n As Int32)
Dim sw As New Stopwatch
sw.Start()
Dim routine_name As String = "ao_heap_sort02"
Dim i_2 As Int32 = 2
Dim i_first, i_middle, i_last As Int32
Dim i_father0 As Int32
Dim i_father, i_child1, i_child2, i_max_child As Int32
'We need one working location of data type the same as the
'components of parameter key_array():
Dim key_father As Object

If n <= 0 Then Throw New ArgumentOutOfRangeException("n")
If key_array.GetUpperBound(0) < 0 Then Throw New ArgumentException("key_array")
i_first = 1
i_last = n
i_middle = i_last \ i_2
i_father0 = i_middle + 1
'Build heap
Console.WriteLine("Starting to build heap")
Do
i_father0 = i_father0 - 1
If i_father0 < i_first Then Exit Do
key_father = key_array(i_father0)
i_father = i_father0
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If key_array(i_max_child).CompareTo(key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
i_father = i_max_child
Continue Do
Else
key_array(i_father) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array(i_father) = key_father
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
If key_array(i_max_child).CompareTo(key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
key_array(i_max_child) = key_father
Else
key_array(i_father) = key_father
End If
Exit Do

End If
Loop' End of sift loop
Loop' End of build heap loop
'Sort heap
Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
i_first = 1
i_last = n
sw.Start()
Do
key_father = key_array(i_last)
key_array(i_last) = key_array(i_first)
key_array(i_first) = key_father
i_father = i_first
i_last = i_last - 1
If i_last <= i_first Then Exit Do
i_middle = i_last \ 2
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If key_array(i_max_child).CompareTo(key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
i_father = i_max_child
Continue Do
Else
key_array(i_father) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array(i_father) = key_father
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If key_array(i_child1).CompareTo(key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
If key_array(i_max_child).CompareTo(key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
key_array(i_max_child) = key_father
Else
key_array(i_father) = key_father
End If
Exit Do
End If
Loop' End of sift loop
Loop' End of sort heap loop
Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Stop()
End Sub

End
Module

Public
Class MatrixIncrement
Implements IComparable(Of MatrixIncrement)

Public key As String
Public inner_product As Double
Public index As Int32

Public Overloads Function CompareTo( _
ByVal other As MatrixIncrement) As Integer _
Implements IComparable( _
Of MatrixIncrement).CompareTo

Return key.CompareTo(other.key)

End Function

End
Class

jo0ls  Friday, April 18, 2008 8:31 PM

Here I switched on Option Strict and Option Explicit and fixed up the warnings.

This meant CompareTo couldn't be called directly on the items from the array, so they get cast to IComparable first.

And, GetValue and SetValue had to be used.

This eliminated LateBinding, and sped it up dramatically.

Generating data
Created data in: 1.1811032s
Starting to build heap
heap built in 0.3861724s
sorted in: 5.1919813s
total: 5.5784823s

Code Snippet
Option Strict On
Option
Explicit On

Imports
System.ComponentModel

Module Module1

Private rand As New Random
Private running As Boolean

' Normal naming conventions mean Classes and
' methods should start with Capitals.
Sub Main()
Go()
Go()
' use this time.
Console.ReadKey()
End Sub

Sub Go()
Dim sw As New Stopwatch
sw.Start()
Console.WriteLine(
"Generating data")
Dim n_keys As Int32 = 500000
' .Net arrays have an element at 0 and
' you declare then with the upper bound
Dim key_array(n_keys - 1) As
MatrixIncrement
For i = 0 To
n_keys - 1
Dim s1 As Double = rand.NextDouble()
key_array(i) =
New MatrixIncrement
key_array(i).key =
String.Format("{0,16:f8}", s1)
Next i
sw.Stop()
Console.WriteLine(
"Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
sw.Start()
ao_heap_sort02(key_array, n_keys)
sw.Stop()
Console.WriteLine(
"total: {0}s", sw.Elapsed.TotalSeconds.ToString)
End Sub

Sub ao_heap_sort02(ByVal key_array As Array, ByVal n As Int32)
Dim sw As New Stopwatch
sw.Start()
Dim routine_name As String = "ao_heap_sort02"
Dim i_2 As Int32 = 2
Dim i_first, i_middle, i_last As Int32
Dim i_father0 As Int32
Dim i_father, i_child1, i_child2, i_max_child As Int32
'We need one working location of data type the same as the
'components of parameter key_array():
Dim key_father As Object

If n <= 0 Then Throw New ArgumentOutOfRangeException("n")
If key_array.GetUpperBound(0) < 0 Then Throw New ArgumentException("key_array"
)
' Array now starts at 0
i_first = 0
' upper bound is n - 1 as array now starts at 0.
i_last = n - 1
i_middle = i_last \ i_2
i_father0 = i_middle + 1
'Build heap
Console.WriteLine("Starting to build heap")
Do
i_father0 = i_father0 - 1
If i_father0 < i_first Then Exit Do
key_father = key_array.GetValue(i_father0)
i_father = i_father0
'Sift
Do
' The class implements IComparable and so it can be cast to
' IComparable, which DOES have a compareTo method.
Dim comparable1 As IComparable = DirectCast
(key_array.GetValue(i_child1), IComparable)
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable)
If comparableMax.CompareTo(key_father) > 0 Then
key_array.SetValue(key_array.GetValue(i_max_child), i_father)
i_father = i_max_child
Continue Do
Else
key_array.SetValue(key_father, i_father)
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array.SetValue(key_father, i_father)
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
comparable1 = DirectCast(key_array.GetValue(i_child1), IComparable)
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable)
If comparableMax.CompareTo(key_father) > 0 Then
key_array.SetValue(key_array.GetValue(i_max_child), i_father)
key_array.SetValue(key_father, i_max_child)
Else
key_array.SetValue(key_father, i_father)
End If
Exit Do

End If
Loop' End of sift loop
Loop' End of build heap loop
'Sort heap
Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
i_first = 0
i_last = n - 1
sw.Start()
Do
key_father = key_array.GetValue(i_last)
key_array.SetValue(key_array.GetValue(i_first), i_last)
key_array.SetValue(key_father, i_first)
i_father = i_first
i_last = i_last - 1
If i_last <= i_first Then Exit Do
i_middle = i_last \ 2
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
Dim comparable1 As IComparable = DirectCast(key_array.GetValue(i_child1), IComparable)
If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable)
If comparableMax.CompareTo(key_father) > 0 Then
key_array.SetValue(key_array.GetValue(i_max_child), i_father)
i_father = i_max_child
Continue Do
Else
key_array.SetValue(key_father, i_father)
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array.SetValue(key_father, i_father)
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle

If i_child1 < i_last Then
i_child2 = i_child1 + 1
Dim comparable1 As IComparable = DirectCast(key_array.GetValue(i_child1), IComparable)
If comparable1.CompareTo(key_array.GetValue(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
Dim comparableMax As IComparable = DirectCast(key_array.GetValue(i_max_child), IComparable)
If comparableMax.CompareTo(key_father) > 0 Then
key_array.SetValue(key_array.GetValue(i_max_child), i_father)
key_array.SetValue(key_father, i_max_child)
Else
key_array.SetValue(key_father, i_father)
End If
Exit Do
End If
Loop' End of sift loop
Loop' End of sort heap loop
Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Stop()
End Sub

End
Module

Public
Class MatrixIncrement
' We don't want it to be generic, unless everything is generic.
Implements
IComparable

Public key As String
Public inner_product As Double
Public index As Int32

Public Function CompareTo(ByVal obj As Object) As Integer Implements System.IComparable.CompareTo
' Cast from Object to MatrixIncrement.
' Will bomb if it is passed something else.
Dim other As MatrixIncrement = DirectCast(obj, MatrixIncrement)
Return key.CompareTo(other.key)
End Function
End
Class

Edit: Highlighted some of the changes.

jo0ls  Friday, April 18, 2008 8:36 PM

Here I swapped to Object() instead of Array.

And I swapped to using a Comparer class instead of Comparable. This way we don't need to keep casting to IComparable.

I added a bit to check it actually was sorting!

Generating data
Created data in: 1.3212753s
Starting to build heap
heap built in 0.3126916s
sorted in: 4.1875777s
total: 4.5011018s

The speed increase comes from using Object() instead of Array.

Code Snippet
Option Strict On
Option
Explicit On

Imports
System.ComponentModel

Module Module1

Private rand As New Random

' Normal naming conventions mean Classes and
' methods should start with Capitals.
Sub Main()
Go()
Go()
' use this time.
Console.ReadKey()
End Sub

Sub Go()
Dim sw As New Stopwatch
sw.Start()
Console.WriteLine(
"Generating data")
Dim n_keys As Int32 = 500000
' .Net arrays have an element at 0 and
' you declare then with the upper bound
Dim key_array(n_keys - 1) As MatrixIncrement
For i = 0 To n_keys - 1
Dim s1 As Double = rand.NextDouble()
key_array(i) =
New MatrixIncrement
key_array(i).key =
String.Format("{0,16:f8}", s1)
Next i
sw.Stop()
Console.WriteLine(
"Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
sw.Start()
Dim comparer As New MatrixImcrementComparer
ao_heap_sort02(key_array, n_keys, comparer)
sw.Stop()
Console.WriteLine(
"total: {0}s", sw.Elapsed.TotalSeconds.ToString)
For i As Integer = 0 To 9
Console.WriteLine(key_array(i).ToString)
Next
For i As Integer = key_array.Length - 11 To key_array.Length - 1
Console.WriteLine(key_array(i).ToString)
Next
End Sub

' The framework's Array.Sort methods accept an IComparer
Sub ao_heap_sort02(ByVal key_array() As Object, ByVal n As Int32, ByVal comparer As IComparer)
Dim sw As New Stopwatch
sw.Start()
Dim i_2 As Int32 = 2
Dim i_first, i_middle, i_last As Int32
Dim i_father0 As Int32
Dim i_father, i_child1, i_child2, i_max_child As Int32
'We need one working location of data type the same as the
'components of parameter key_array():
Dim key_father As Object
If n <= 0 Then Throw New ArgumentOutOfRangeException("n")
If key_array.GetUpperBound(0) < 0 Then Throw New ArgumentException("key_array")
i_first = 0
i_last = n - 1
i_middle = i_last \ i_2
i_father0 = i_middle + 1
'Build heap
Console.WriteLine("Starting to build heap")
Do
i_father0 = i_father0 - 1
If i_father0 < i_first Then Exit Do
key_father = key_array(i_father0)
i_father = i_father0
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If comparer.Compare(key_array(i_max_child), key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
i_father = i_max_child
Continue Do
Else
key_array(i_father) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array(i_father) = key_father
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
If comparer.Compare(key_array(i_max_child), key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
key_array(i_max_child) = key_father
Else
key_array(i_father) = key_father
End If
Exit Do
End If
Loop' End of sift loop
Loop' End of build heap loop
'Sort heap
Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
i_first = 0
i_last = n - 1
sw.Start()
Do
key_father = key_array(i_last)
key_array(i_last) = key_array(i_first)
key_array(i_first) = key_father
i_father = i_first
i_last = i_last - 1
If i_last <= i_first Then Exit Do
i_middle = i_last \ 2
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If comparer.Compare(key_array(i_max_child), key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
i_father = i_max_child
Continue Do
Else
key_array(i_father) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
key_array(i_father) = key_father
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If comparer.Compare(key_array(i_child1), key_array(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
If comparer.Compare(key_array(i_max_child), key_father) > 0 Then
key_array(i_father) = key_array(i_max_child)
key_array(i_max_child) = key_father
Else
key_array(i_father) = key_father
End If
Exit Do
End If
Loop' End of sift loop
Loop' End of sort heap loop
Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Stop()
End Sub

End
Module

Public
Class MatrixIncrement

Public key As String
Public inner_product As Double
Public index As Int32

Public Overrides Function ToString() As String
' we aren't interested in the others.
Return key
End Function

End
Class

Public
Class MatrixImcrementComparer
Implements
IComparer

Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements
System.Collections.IComparer.Compare
Dim matX As MatrixIncrement = DirectCast
(x, MatrixIncrement)
Dim matY As MatrixIncrement = DirectCast
(y, MatrixIncrement)
Return String
.Compare(matX.key, matY.key)
End
Function

End
Class

jo0ls  Friday, April 18, 2008 8:54 PM

And here's a complicated one with Generics, it's a bit slower than the one above.

Generating data
Created data in: 1.2289694s
Starting to build heap
heap built in 0.3783466s
sorted in: 4.7357952s
total: 5.1150947s


First we say the method is (Of T) to mean it is a generic method.

Now, we can swap back to using IComparable(Of T) as we are using generics.

If the method was just (Of T) then we still wouldn't be able to use CompareTo, as that is not a method of any Type, and T can stand for any Type. We need to make it a method (Of T but T must implement IComparable Of T). You can do that with a generic constraint. The syntax is:

Public Sub MySub(Of T As IComparable(Of T))

That hasn't got the parameters for the function yet, it is just stating that this is a generic Sub.

We can add more polymorphism to the sub, if we make keys_array of type IList(Of T). This means it can accept an array, a List(Of T), and anything other class that implements IList(Of T). So Yuo end up with a method sig like this:

Sub ao_heap_sort02(Of T As IComparable(Of T))(ByVal keys As IList(Of T), ByVal n As Int32)


So here's the Generic version.

Code Snippet
Option Strict On
Option
Explicit On

Imports
System.ComponentModel

Module Module1

Private rand As New Random

' Normal naming conventions mean Classes and
' methods should start with Capitals.
Sub Main()
Go()
Go()
' use this time.
Console.ReadKey()
End Sub

Sub Go()
Dim sw As New Stopwatch
sw.Start()
Console.WriteLine(
"Generating data")
Dim n_keys As Int32 = 500000
' .Net arrays have an element at 0 and
' you declare then with the upper bound
Dim key_array(n_keys - 1) As MatrixIncrement
For i = 0 To n_keys - 1
Dim s1 As Double = rand.NextDouble()
key_array(i) =
New MatrixIncrement
key_array(i).key =
String.Format("{0,16:f8}", s1)
Next i
sw.Stop()
Console.WriteLine(
"Created data in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
sw.Start()
ao_heap_sort02(key_array, n_keys)
sw.Stop()
Console.WriteLine(
"total: {0}s", sw.Elapsed.TotalSeconds.ToString)
For i As Integer = 0 To 9
Console.WriteLine(key_array(i).ToString)
Next
For i As Integer = key_array.Length - 11 To key_array.Length - 1
Console.WriteLine(key_array(i).ToString)
Next
End Sub

Sub ao_heap_sort02(Of T As IComparable(Of T))(ByVal keys As IList(Of T), ByVal n As Int32)
Dim sw As New Stopwatch
sw.Start()
Dim i_first, i_middle, i_last As Int32
Dim i_father0 As Int32
Dim i_father, i_child1, i_child2, i_max_child As Int32
'We need one working location of data type the same as the
'components of parameter key_array():
Dim key_father As T
i_first = 0
i_last = n - 1
i_middle = i_last \ 2
i_father0 = i_middle + 1
'Build heap
Console.WriteLine("Starting to build heap")
Do
i_father0 = i_father0 - 1
If i_father0 < i_first Then Exit Do
key_father = keys(i_father0)
i_father = i_father0
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If keys(i_max_child).CompareTo(key_father) > 0 Then
keys(i_father) = keys(i_max_child)
i_father = i_max_child
Continue Do
Else
keys(i_father) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
keys(i_father) = key_father
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
If keys(i_max_child).CompareTo(key_father) > 0 Then
keys(i_father) = keys(i_max_child)
keys(i_max_child) = key_father
Else
keys(i_father) = key_father
End If
Exit Do
End If
Loop' End of sift loop
Loop' End of build heap loop
'Sort heap
Console.WriteLine("heap built in {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Reset()
i_first = 0
i_last = n - 1
sw.Start()
Do
key_father = keys(i_last)
keys(i_last) = keys(i_first)
keys(i_first) = key_father
i_father = i_first
i_last = i_last - 1
If i_last <= i_first Then Exit Do
i_middle = i_last \ 2
'Sift
Do
If i_father < i_middle Then
i_child1 = i_father + i_father
i_child2 = i_child1 + 1
If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
If keys(i_max_child).CompareTo(key_father) > 0 Then
keys(i_father) = keys(i_max_child)
i_father = i_max_child
Continue Do
Else
keys(i_father) = key_father
Exit Do
End If
Else ' i_father >= i_middle
If i_father > i_middle Then
keys(i_father) = key_father
Exit Do
End If
i_child1 = i_father + i_father' i_father = i_middle
If i_child1 < i_last Then
i_child2 = i_child1 + 1
If keys(i_child1).CompareTo(keys(i_child2)) > 0 Then
i_max_child = i_child1
Else
i_max_child = i_child2
End If
Else' i_child1 = i_last
i_max_child = i_child1
End If
If keys(i_max_child).CompareTo(key_father) > 0 Then
keys(i_father) = keys(i_max_child)
keys(i_max_child) = key_father
Else
keys(i_father) = key_father
End If
Exit Do
End If
Loop' End of sift loop
Loop' End of sort heap loop
Console.WriteLine("sorted in: {0}s", sw.Elapsed.TotalSeconds.ToString)
sw.Stop()
End Sub

End
Module

Public
Class MatrixIncrement
Implements IComparable(Of MatrixIncrement)

Public key As String
Public inner_product As Double
Public index As Int32

Public Overrides Function ToString() As String
' we aren't interested in the others.
Return key
End Function

Public Function CompareTo(ByVal other As MatrixIncrement) As Integer Implements System.IComparable(Of MatrixIncrement).CompareTo
Return key.CompareTo(other.key)
End Function
End
Class

It is possibly faster as:
Sub ao_heap_sort02(Of T As IComparable(Of T))(ByVal keys() As T, ByVal n As Int32)
jo0ls  Friday, April 18, 2008 10:40 PM

I went through again and tested on a different (slower) computer. I removed the Stopwatch and Console.WriteLine stuff from the sorting sub as it takes time. I started the stopwatch before calling the sorting sub, and stopped it on return. I used the same random seed in each example, so they all sorted the same items. I ran the release build 20 times for each example (except the first, which would take too long), and picked the fastest.

Original problem code: 352.6s
First Fix: 6.72s
Object() and an IComparer: 5.47s
Generics, IComparable(Of T) constraint, keys As IList(Of T): 5.36s
Generics, IComparable(Of T) constraint, keys() As T: 3.87
Generics, no constraint on T, keys As IList(Of T), comparer As IComparer(Of T): 5.26
Generics, no constraint on T, keys() As T, comparer As IComparer(Of T): 3.78

So, I think it's best not to get overly fancy with generics. Using an array of T is definately faster than using an IList. I guess there's an extra cast for it to do. Asking the user of the sub to supply an IComparer(Of T) might be a tiny bit faster thanimplementing comparablein the class, but this test wasn't in the least bit thorough.

jo0ls  Saturday, April 19, 2008 12:03 AM
jo01s:

SUPER answers!

Thanks for

     Dim sw As New Stopwatch

So, once again I didn't have to write my own!

I also wrote my own for Copies("="c, 72) before I discovered
that this function was available!

Sometimes it's easier to write 10 lines of code than to find
something equivalent in the thousands of Web pages at MSDN!

Yes, on array subscript lower bounds, I was aware, as is
traditional in Basic, that Visual Basic .NET uses 0 instead of
1 as was long traditional and/or default in Fortran, PL/I,
etc.

But for heap sort, the algorithm makes heavy use of subscript
expressions such as

     i \ 2

which does work for i = 1, 2, ..., n but which I would have to
check for i = 0, 1, 2, ..., n - 1.

So, for simplicity and one less thing to think about, I just
kept the array indexing i = 1, 2, ..., n.  In doing this, I
waste the array component with subscript 0.

Since Visual Basic .NET is my first attempt at a Microsoft
Basic, I don't know about either "old" or "new" error handling
techniques!  I just looked a some of

     Jim Buyens, 'Web Database Development, Step by Step: .NET
     Edition', ISBN 0-7356-1637-X, Microsoft Press, Redmond,
     Washington, 2002.

found some simple things that appeared to work, and settled on
those.  So I just put each 'contiguous block of executable
code' in Try-Catch with something like

     error_code = 1001

     error_code = 1002

     ...

ahead of anything risky, and in the corresponding Catch-End
Try block put some simple code to 'percolate' the error to the
calling routine.

But in some of my sorting routines I did leave some
Console.Writeline statements in the Catch-End Try 'block', and
that was unintentional; with my little scheme, such print
statements are supposed to be only in the main routine and, in
production, write to a log file, use SNMP, or some such.

Okay, I'm starting to understand:  My class matrix_increment
does not inherit from a class that has a method CompareTo.

Since my code did run, I guessed that my code was okay.  That
my code ran at all sounds like 'late binding' did a lot.

I will start using

     Option Strict On
     Option Explicit On

nearly universally.  I am eager for all the help from the
compiler I can get.  Compiler messages about when my code is
using late binding might be helpful also!  I'm eager for the
power of late binding but only when I'm willing to pay the
price in execution time.

Gee, just now see from the VB compiler:

     error BC30574: Option Strict On disallows late binding.

So, can be told when there needs to be late binding!

How with my class definition array.sort still ran so well, I
don't know!  I did look at the C# code for array.sort and
didn't see anything very tricky with 'reflection' or using
actual memory addresses.

Maybe the big difference is that array.sort didn't try to do
the key comparisons with CompareTo and avoided this problem
with its statement

     if (comparer == null) comparer = Comparer.Default;

One of your points is to have

    Sub ao_heap_sort02(ByVal key_array() As Object, ByVal n As _
      Int32, ByVal comparer As IComparer)

that is, so that the sort routine can do its pair-wise
comparisons, just pass to the sort routine an instance of a
class that implements the interface IComparer.

Curious:  With your suggestions, I went ahead and wrote such
code.  I wrote two versions:  In one version I defined a class
that implements the interface IComparer and passed to the sort
routine the instance as an argument corresponding to a
parameter in the routine with type IComparer.  So, I was
passing an instance of a class to a parameter of type
interface.  Curious that I didn't get an error for this
'mismatch' between argument and parameter.

So, one lesson is NOT to complicate the definition of the
class, whose instances are being sorted in an array, with
comparison code.  GOOD.  Wondered about that.  One reason is
that in some applications might have a class with, say, seven
properties and for various purposes would want to sort on
different ones of the properties.  So, would want to have
several comparer classes, not just one, and certainly not just
one named CompareTo in the definition of the class of the
instances being sorted.

So, you showed me how to do that!  Good!

You have

     Public Class MatrixImcrementComparer
         Implements IComparer

         Public Function Compare(ByVal x As Object, ByVal y As Object) _
           As Integer Implements System.Collections.IComparer.Compare
             Dim matX As MatrixIncrement = DirectCast(x, MatrixIncrement)
             Dim matY As MatrixIncrement = DirectCast(y, MatrixIncrement)
             Return String.Compare(matX.key, matY.key)
         End Function

     End Class

Here maybe I understand:  This piece of code is particular to
class MatrixIncrement yet, still, its parameters are declared
Object instead of MatrixIncrement.  Maybe the reason is that
the polymorphic sort routine will pass arguments that are of
type Object so that having the parameters of type
MatrixIncrement would be a mismatch between arguments and
parameters and, possibly, an error, either in compilation or
execution.

But, inside the method, need to be particular to
MatrixIncrement.  So, do a DirectCast.  Here there is no issue
of 'how' the conversion is to be made because we are sure (we
could put in some code to check if we were not sure -- there
are examples on MSDN) that the actual arguments really are of
type MatrixIncrement.

Maybe this is what is going on.

I would guess that this polymorphic technique of doing all the
pair-wise comparisons would be horribly slow, but some of the
timings from both of us show that, somehow, the run time can
be commendably fast, maybe only a factor of 2 worse than with
no use of polymorphism.

Here is a summary of execution times on my computer for
sorting 500,000 strings of length 16 bytes where each array
component is either a string or an instance of a class where
one property of the class in such a string:

  629.4062500 seconds      own heap sort code sorting an array
                           of instances of a class where the
                           class has a messed up comparison
                           method definition

     7.5312500 seconds     own heap sort code sorting an array
                           of instances of a class where the
                           class and a comparison method are
                           defined following your suggestions

     3.8125000 seconds     array.sort sorting an array of
                           instances of a class where the
                           class has a messed up comparison
                           method definition

     2.9218750 seconds     own non-polymorphic heap sort code
                           sorting an array of strings by
                           surrogate

     2.5312500 seconds     own non-polymorphic heap sort code
                           soring an array of strings not by
                           surrogate

So your suggestions worked!  Success!  Now at least for simple
things I can write Visual Basic .NET polymorphic code as the
language designers intended and get decent performance!

You have shown me a LOT.  THANKS!

The purpose of this project was to get me to understand how to
write polymorphic code, but for sorting more could be done:

     o    When sorting by surrogate, it is possible to write
          the code to do a 'stable' sort, and that can be a
          nice tool when sorting on a sequence of keys.
          Sorting by surrogate might often be better when
          sorting large or complicated objects.

     o    Sorting exploiting multiple processor cores should
          soon be of importance.

For generics, I read the chapter in

     Francesco Bolena, 'Programming Microsoft Visual Basic
     2005:  The Language', Microsoft Press, Redmond,
     Washington, 2006.

didn't get even 10% of your understanding, and will have to
read the chapter again even to understand your use of the
syntax "Of T"!

For now, polymorphism will be enough for the real work.

Beyond sorting, the software development project plans include
a large scale use of collection classes -- 64 bit computing on
computers with 256 GB of main memory, hopefully more -- where
we will either have to make polymorphic code fast or write our
own AVL tree code, and very much would prefer just to use what
Microsoft already has in collection classes.

Back of the envelop arithmetic starting with k*n*log(n) from
the heap sort timings indicates that the collection class code
really MUST be able to exploit multiple cores, and not sure
that has been done yet or just how to do it.  Hmm ....

Many thanks!

sigmawaite  Sunday, April 20, 2008 7:27 PM
Curious:  With your suggestions, I went ahead and wrote such
code. I wrote two versions: In one version I defined a class
that implements the interface IComparer and passed to the sort
routine the instance as an argument corresponding to a
parameter in the routine with type IComparer. So, I was
passing an instance of a class to a parameter of type
interface. Curious that I didn't get an error for this
'mismatch' between argument and parameter.

This another bit of polymorphism - "many forms". If the class
MyComparer implements IComparer, then you can assign it
directly to a variable declared as being of type IComparer.
People often say "an interface is a contract".

The contract means that all the methods of IComparer will
be available in MyComparer. So, you create a MyComparer object
and you are originally referencing it with a variable declared
as being of type MyComparer, and so can use all the available
methods and properties of the object.

Later you pass it to a function that assigns it to a variable
declared as type IComparer. This is perfectly safe to do, as
you can only call the methods defined in the Interface via this
reference. All implementing classes have to be able to respond
to these calls because of the contract. The object is the same
object, but you are only able to use the parts exposed through
the interface.

So, MyComparer has at least 3 "forms" - you can reference it
with a variable declared as either MyComparer, IComparer or
Object.
You have

Public Class MatrixImcrementComparer
Implements IComparer

Public Function Compare(ByVal x As Object, ByVal y As Object) _
As Integer Implements System.Collections.IComparer.Compare
Dim matX As MatrixIncrement = DirectCast(x, MatrixIncrement)
Dim matY As MatrixIncrement = DirectCast(y, MatrixIncrement)
Return String.Compare(matX.key, matY.key)
End Function

End Class

Here maybe I understand: This piece of code is particular to
class MatrixIncrement yet, still, its parameters are declared
Object instead of MatrixIncrement. Maybe the reason is that
the polymorphic sort routine will pass arguments that are of
type Object so that having the parameters of type
MatrixIncrement would be a mismatch between arguments and
parameters and, possibly, an error, either in compilation or
execution.
The class is implementing IComparer, rather than
IComparer(Of T). IComparer has Object in its signature, and so
we have to use Object. But why didn't I use IComparer(Of T)?..

If we used IComparer(Of MatrixIncrement), then the
soring routine would need IComparer(Of MatrixIncrement) in its
signature. This would then make it pointless to use Object()
or Array, as you are tied to MatrixIncrement. Later when I
switched to the generic Function we could use the generic
IComparer.
But, inside the method, need to be particular to
MatrixIncrement. So, do a DirectCast. Here there is no issue
of 'how' the conversion is to be made because we are sure (we
could put in some code to check if we were not sure -- there
are examples on MSDN) that the actual arguments really are of
type MatrixIncrement.
Yes, that's right. We don't want to late bind, but we do want
the CompareTo method so we need to cast. I skipped error
checking. You need to check you have non-null objects of the
correct type. TryCast will attempt to cast the type, it
won't throw an exception if there is an error:

Public Class MatrixImcrementComparer
Implements IComparer

Public Function Compare(ByVal x As Object, ByVal y As Object) _
As Integer Implements System.Collections.IComparer.Compare
Dim matX As MatrixIncrement = TryCast(x, MatrixIncrement)
Dim matY As MatrixIncrement = TryCast(y, MatrixIncrement)
If matX Is Nothing Then Throw New ArgumentException("x is of the wrong type or null")
If matY Is Nothing Then Throw New ArgumentException("y is of the wrong type or null")
Return String.Compare(matX.key, matY.key)
End Function

End Class

The generics syntax takes quite a whle to get used to.

jo0ls  Monday, April 21, 2008 4:29 AM

You can use google to search for other answers

Custom Search

More Threads

• THE DARK AGES OF COMPUTING PART 4 THE BOSSES
• For Each Question
• Retrieving data
• How can I count a word in VB- I know how to count letters by using variable.Length but I need to count words.
• VS2008 Beta 2 to RTM breaking change?
• Is the order in which you declare arrays significant?
• How to login windows from a VB2005 Win aplication
• Filesystem.Writealltext generates 'UnauthorisedAccessException'????
• find a matching line and replace value
• System.NullReferenceException from a VC++ function called in VB.NET (via DLL)