Hello all,
I'm hoping to create a unique unsignedinteger type ID representation of a string. The string will be allways formated like this "w1234_letters.letters", where the 1234 and letter may wary.
I've tried using the myString.GetHasCode, but this may return a negative value and my integer ID must be positive and between 3 and 999999999. If i invert or in any way remove the the negative value I risk duplicating this value, right?
Please help? So close to finishing this project =)
Thanks for any response! | | Jorno Wednesday, July 04, 2007 7:13 PM | Hi,
Yeah there is a possibility that a duplicate will occur very difficult to determine how likely but as soon as you remove the negative then the hash doesn't match the string that was used to create it and so another string could produce the value.
I thought maybe the MD5 or SHA1 hashing algorithims in the crytpography namespace would help but these only produce hash values128 bits or upwards, too large for an integer. The other option, but I have no idea whether this would be unique, is to generate a DES key, which is 64 bits in size, from the string . DES is aencryption method that usesa secret key toencrypt the data,I'd like to think this would generate aunique value per unique string. Here is an example console application that outputs the numbers fora list.
Code Snippet
Imports System.Security.Cryptography
Imports System.Collections.Generic
Module Module1
Sub Main()
Dim values As New List(Of String)
values.Add("w1234_letters.letters")
values.Add("w1235_letters.letters")
values.Add("w1236_letters.letters")
values.Add("w1237_letters.letters")
values.Add("w1238_letters.letters")
values.Add("w1239_letters.letters")
values.Add("w1240_letters.letters")
For Each value As String In values
Dim salt(7) As Byte
RandomNumberGenerator.Create().GetBytes(salt)
Dim key(7) As Byte
Dim pdb As New PasswordDeriveBytes(value, salt)
key = pdb.CryptDeriveKey( "DES", "SHA1", 64, salt)
Console.WriteLine( "{0}:{1}", value, BitConverter.ToUInt16(key, 0))
Next
Console.ReadLine()
End Sub
End Module
The reason why it should create unique numbers is because it's almost like a hash based on a string, usually a password, if two different passwords created the same key then two different people, although very remote, could decrypt the same data with different passwords. Test it and see mate, let me know as I'm curious as to whether it will work or not.
| | Derek Smyth Wednesday, July 04, 2007 8:19 PM | Hey mate,
Thats lookingpretty positive, excellent. The code I posted might not be the best implementation of the idea but the idea of deriving a key from the string is looking sound.
There are a few waysyou could change the code, at the moment the salt value is recreated every time within the loop, that technically doesn't need to happen, you could use the same salt value through out the loop. Recreating the salt each and everytime might cause a scenaro where a newly generated salt value + the string results in a value that has occurred before.
There is also a class called Rfc2898DerivedBytes that you could you use instead of the PasswordDeriveBytes above. Just found the class and it reads that you can generate a key based on salt plus an integer value, like a counter in a loop, this would add another variable to the generation mix, might be more random. There is an example in the help.
Also the key size might also be changed... I'm a bitconfused as the code above should generate a key of size 64 bits but it's getting coverted ok toa UInt16 variable which should only hold 16 bits. VB.NET generally moans when you do that, if you looped 300,000 times and never got an error then perhaps you can increase the size of the key, I'd use Rfc2898DerviedBytes class for this, the help suggests it can create any size of key whilethe code above, which derives a DES key, needs to be very specific sizes.
Think regardless you'll need to store a list of previously used numbers, a match may happen so you need to catch that and handle it, maybe by changing the salt value. Need to go do some work now but I'll create an examplethat usesthe Rfc2898 class and see if it's a better solution, I'll post the code. | | Derek Smyth Thursday, July 05, 2007 9:44 AM | Hey Graham,
This code might bea bit better, I've made the salt a constant random value and generated a 4 byte keythis time which is 32 bits, enough to store in a UInt32 value. The third variable (10) could be based on something else, setting it as a constant is almost like it has no effect.
Code Snippet
Dim salt(7) As Byte
RandomNumberGenerator.Create().GetBytes(salt)
For Each value As String In values
Dim key(4) As Byte
Dim rfc As New Rfc2898DeriveBytes(value, salt, 10)
key = rfc.GetBytes(4)
Console.WriteLine( "{0}:{1}", value, BitConverter.ToUInt32(key, 0))
Next
Give that a test and see how likely that a match is and if the values it produces are within the valid range your looking for. | | Derek Smyth Thursday, July 05, 2007 11:34 AM | Hi,
Yeah there is a possibility that a duplicate will occur very difficult to determine how likely but as soon as you remove the negative then the hash doesn't match the string that was used to create it and so another string could produce the value.
I thought maybe the MD5 or SHA1 hashing algorithims in the crytpography namespace would help but these only produce hash values128 bits or upwards, too large for an integer. The other option, but I have no idea whether this would be unique, is to generate a DES key, which is 64 bits in size, from the string . DES is aencryption method that usesa secret key toencrypt the data,I'd like to think this would generate aunique value per unique string. Here is an example console application that outputs the numbers fora list.
Code Snippet
Imports System.Security.Cryptography
Imports System.Collections.Generic
Module Module1
Sub Main()
Dim values As New List(Of String)
values.Add("w1234_letters.letters")
values.Add("w1235_letters.letters")
values.Add("w1236_letters.letters")
values.Add("w1237_letters.letters")
values.Add("w1238_letters.letters")
values.Add("w1239_letters.letters")
values.Add("w1240_letters.letters")
For Each value As String In values
Dim salt(7) As Byte
RandomNumberGenerator.Create().GetBytes(salt)
Dim key(7) As Byte
Dim pdb As New PasswordDeriveBytes(value, salt)
key = pdb.CryptDeriveKey( "DES", "SHA1", 64, salt)
Console.WriteLine( "{0}:{1}", value, BitConverter.ToUInt16(key, 0))
Next
Console.ReadLine()
End Sub
End Module
The reason why it should create unique numbers is because it's almost like a hash based on a string, usually a password, if two different passwords created the same key then two different people, although very remote, could decrypt the same data with different passwords. Test it and see mate, let me know as I'm curious as to whether it will work or not.
| | Derek Smyth Wednesday, July 04, 2007 8:19 PM | Hi Derek,
This is great stuff man, thanks alot for putting so much thought into it. There is one thing which conserns me after testing this though. There are very few variations available in the 5 digit integer this returns compared to the much longer string with even more different characters available. I have no idea how to test this, but the mathematician in me (I think he's in there somewhere, hehe) tells me I may run into trouble here.
The two events that can crash my application is:
1. That a string unlike one already added to the metabase (yes this is for IIS) returns the same integer ID.
2.That a string identical to one already added gets a different integer ID and therefor is added successfully to the metabase.
If you're wondering why I need to use an integer ID for the metabase, it is to maintain the structure in which old systems are built upon.
I might get away with using the string as ID, but I'm guessing you agree with me, that a direction like the one you suggested would be way cooler =)
Thanks again for the effort Derek! | | Jorno Wednesday, July 04, 2007 10:14 PM | Hi again,
I've done some tests now. I've put in a constant integer value and comparing it to the value returned when "seeding" with a datetime.now.ticks. After 22 074 to 64 225 loops it hits a match. I must admit, that's pretty good, but I guess I might as well have happened after just a few. Any thoughts on this?
Edit: Just reached new hights: 148 399 loops before a match with different values.
Edit2: My philosophical aproach to mathmatics tells me that the further the values are apart, the less chance for a match when testing with this loop. The constant value was a datetime.now.tick from 10 minutes ago. I just now reached 339 574 loops before hitting a matching ID. This is pretty fun =) | | Jorno Wednesday, July 04, 2007 10:38 PM | Hey mate,
Thats lookingpretty positive, excellent. The code I posted might not be the best implementation of the idea but the idea of deriving a key from the string is looking sound.
There are a few waysyou could change the code, at the moment the salt value is recreated every time within the loop, that technically doesn't need to happen, you could use the same salt value through out the loop. Recreating the salt each and everytime might cause a scenaro where a newly generated salt value + the string results in a value that has occurred before.
There is also a class called Rfc2898DerivedBytes that you could you use instead of the PasswordDeriveBytes above. Just found the class and it reads that you can generate a key based on salt plus an integer value, like a counter in a loop, this would add another variable to the generation mix, might be more random. There is an example in the help.
Also the key size might also be changed... I'm a bitconfused as the code above should generate a key of size 64 bits but it's getting coverted ok toa UInt16 variable which should only hold 16 bits. VB.NET generally moans when you do that, if you looped 300,000 times and never got an error then perhaps you can increase the size of the key, I'd use Rfc2898DerviedBytes class for this, the help suggests it can create any size of key whilethe code above, which derives a DES key, needs to be very specific sizes.
Think regardless you'll need to store a list of previously used numbers, a match may happen so you need to catch that and handle it, maybe by changing the salt value. Need to go do some work now but I'll create an examplethat usesthe Rfc2898 class and see if it's a better solution, I'll post the code. | | Derek Smyth Thursday, July 05, 2007 9:44 AM | Hey Graham,
This code might bea bit better, I've made the salt a constant random value and generated a 4 byte keythis time which is 32 bits, enough to store in a UInt32 value. The third variable (10) could be based on something else, setting it as a constant is almost like it has no effect.
Code Snippet
Dim salt(7) As Byte
RandomNumberGenerator.Create().GetBytes(salt)
For Each value As String In values
Dim key(4) As Byte
Dim rfc As New Rfc2898DeriveBytes(value, salt, 10)
key = rfc.GetBytes(4)
Console.WriteLine( "{0}:{1}", value, BitConverter.ToUInt32(key, 0))
Next
Give that a test and see how likely that a match is and if the values it produces are within the valid range your looking for. | | Derek Smyth Thursday, July 05, 2007 11:34 AM |
|