There are some bits of code you end up writing dozens of times and showing other people how to write dozens of times more.
I thought I'd go through a few of the ones I come across frequently; to save other people the hassle of writing their own, which for first-time developers can sometimes seem a little bit counter-intuitive or daunting, and for the rest of us, well, we're just plain lazy ;-)
Public Shared Function HashPassword(ByVal password As String) As String
Dim md5 As New System.Security.Cryptography.MD5CryptoServiceProvider()
Dim bytes As Byte() = MD5.ComputeHash(System.Text.Encoding.UTF8.GetBytes(password))
Dim result As New System.Text.StringBuilder()For Each hashByte As Byte In Bytes
result.Append(hashByte.ToString("X2") )
NextReturn result.ToString()
End Function
The above bit of code computes the MD5 hash of an input string, it looks like a bit more code than one might expect at first glance, but it is quite logical:
MD5, and other hash algorithms such as SHA1 (which is now more popular than MD5) take an input as a series of bytes, and produce as an output hash of a fixed length, also in byte form. So the bulk of the code above is transforming to and from bytes.
First, we take the input string (password), and convert it into a byte representation, there are a number of ways of doing this, some of which are:
- System.Text.Encoding.ASCII - one byte per character but doesn't support international characters such as ü ö ä etc
- System.Text.Encoding.Unicode - currently supports many tens upon tens of thousands characters from around the globe, but requires a minimum of 2 bytes per character.
- System.Text.Encoding.UTF8 - can encode many characters as 1 byte, and uses more where necessary, for the most part you can think of UTF8 as a compressed unicode format.
I highly discourage the use of ASCII in modern applications as internationalization is very important, if not now, then it will almost undoubtedly become an issue at some point.
Which of UTF8 and Unicode you use is up to you, when hashing it doesn't really matter as long as you are consistent - but when transmitting data the choice can matter more, if your audience is primarily using a latin alphabet, then UTF8 is probably the way to go, otherwise the Unicode encoding will serve better as the average number of bytes per character will be smaller for non-latin characters.
The rest of the code merely asks the MD5 provider to compute the hash, then enumerates the resulting bytes to build up a string, all the .ToString("X2") means is "convert the byte to a string made up of 2 hexadecimal digits". Lowercase x means lowercase hex digits, and uppercase X means uppercase hex digits.