Monday, November 5, 2012

Compare two dll files programmatically using Hash

In the past days I needed to compare two .NET assemblies to see if they are functionally the same. As you may know, when you build the same project several times the resulting output .dll or .exe file is always different. That's because of the ModuleVerisonId property(MVID). So in order to check if two assemblies are the same we can disassemble them to IL code and compare them.
The easiest way to disassemble the file is to use MSIL Disassembler. However it is not in a common location. It can be in any of the following locations:

C:\Program Files\Microsoft SDKs\Windows\v6.0\bin\ildasm.exe
C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin\ildasm.exe
C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\ildasm.exe
C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\x64\ildasm.exe
C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\ildasm.exe
C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\x64\ildasm.exe
C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\ildasm.exe
C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\x64\ildasm.exe
C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\ildasm.exe
C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\x64\ildasm.exe


So in my following solution I have embedded the ildasm.exe file in a class library to be able to use it on any machine.When we disassemble an assembly we check if the ildasm.exe file exists in the executing assembly folder and if not the file is extracted there from our dll file.
Using the ildasm file we get the IL code and save it to a temporary file.
Then we need to remove the following three rows:
  1. MVID - as I wrote before this is a unique GUID generated with every build
  2. Image Base (The image base tells us as to where the program will be loaded in memory by the Windows loader.) - this is different with every build as well
  3. Time-date stamp - the time and date when the ildasm is run
So we read the temp file content, remove these rows we use regex and then save the file content to the same file.
You can find the Disassembler file here:
using System;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace FileHasher
{
    public class Disassembler
    {
        public static Regex regexMVID = new Regex("//\\s*MVID\\:\\s*\\{[a-zA-Z0-9\\-]+\\}", RegexOptions.Multiline | RegexOptions.Compiled);
        public static Regex regexImageBase = new Regex("//\\s*Image\\s+base\\:\\s0x[0-9A-Fa-f]*", RegexOptions.Multiline | RegexOptions.Compiled);
        public static Regex regexTimeStamp = new Regex("//\\s*Time-date\\s+stamp\\:\\s*0x[0-9A-Fa-f]*", RegexOptions.Multiline | RegexOptions.Compiled);
        
        private static readonly Lazy<Assembly> currentAssembly = new Lazy<Assembly>(() =>
        {
            return MethodBase.GetCurrentMethod().DeclaringType.Assembly;
        });

        private static readonly Lazy<string> executingAssemblyPath = new Lazy<string>(() =>
        {
            return Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
        });

        private static readonly Lazy<string> currentAssemblyFolder = new Lazy<string>(() =>
        {
            return Path.GetDirectoryName(currentAssembly.Value.Location);
        });

        private static readonly Lazy<string[]> arrResources = new Lazy<string[]>(() =>
        {
            return currentAssembly.Value.GetManifestResourceNames();
        });

        private const string ildasmArguments = "/all /text \"{0}\"";

        public static string ILDasmFileLocation
        {
            get
            {
                return Path.Combine(executingAssemblyPath.Value, "ildasm.exe");
            }
        }

        static Disassembler()
        {
            //extract the ildasm file to the executing assembly location
            ExtractFileToLocation("ildasm.exe", ILDasmFileLocation);
        }

        /// <summary>
        /// Saves the file from embedded resource to a given location.
        /// </summary>
        /// <param name="embeddedResourceName">Name of the embedded resource.</param>
        /// <param name="fileName">Name of the file.</param>
        protected static void SaveFileFromEmbeddedResource(string embeddedResourceName, string fileName)
        {
            if (File.Exists(fileName))
            {
                //the file already exists, we can add deletion here if we want to change the version of the 7zip
                return;
            }
            FileInfo fileInfoOutputFile = new FileInfo(fileName);

            using (FileStream streamToOutputFile = fileInfoOutputFile.OpenWrite())
            using (Stream streamToResourceFile = currentAssembly.Value.GetManifestResourceStream(embeddedResourceName))
            {
                const int size = 4096;
                byte[] bytes = new byte[4096];
                int numBytes;
                while ((numBytes = streamToResourceFile.Read(bytes, 0, size)) > 0)
                {
                    streamToOutputFile.Write(bytes, 0, numBytes);
                }

                streamToOutputFile.Close();
                streamToResourceFile.Close();
            }
        }

        /// <summary>
        /// Searches the embedded resource and extracts it to the given location.
        /// </summary>
        /// <param name="fileNameInDll">The file name in DLL.</param>
        /// <param name="outFileName">Name of the out file.</param>
        protected static void ExtractFileToLocation(string fileNameInDll, string outFileName)
        {
            string resourcePath = arrResources.Value.Where(resource => resource.EndsWith(fileNameInDll, StringComparison.InvariantCultureIgnoreCase)).FirstOrDefault();
            if (resourcePath == null)
            {
                throw new Exception(string.Format("Cannot find {0} in the embedded resources of {1}", fileNameInDll, currentAssembly.Value.FullName));
            }
            SaveFileFromEmbeddedResource(resourcePath, outFileName);
        }

        public static string GetDisassembledFile(string assemblyFilePath)
        {
            if (!File.Exists(assemblyFilePath))
            {
                throw new InvalidOperationException(string.Format("The file {0} does not exist!", assemblyFilePath));
            }

            string tempFileName = Path.GetTempFileName();
            var startInfo = new ProcessStartInfo(ILDasmFileLocation, string.Format(ildasmArguments, assemblyFilePath));
            startInfo.WindowStyle = ProcessWindowStyle.Hidden;
            startInfo.CreateNoWindow = true;
            startInfo.UseShellExecute = false;
            startInfo.RedirectStandardOutput = true;

            using (var process = System.Diagnostics.Process.Start(startInfo))
            {
                string output = process.StandardOutput.ReadToEnd();
                process.WaitForExit();

                if (process.ExitCode > 0)
                {
                    throw new InvalidOperationException(
                        string.Format("Generating IL code for file {0} failed with exit code - {1}. Log: {2}",
                        assemblyFilePath, process.ExitCode, output));
                }

                File.WriteAllText(tempFileName, output);
            }

            RemoveUnnededRows(tempFileName);
            return tempFileName;
        }

        private static void RemoveUnnededRows(string fileName)
        {
            string fileContent = File.ReadAllText(fileName);
            //remove MVID
            fileContent = regexMVID.Replace(fileContent, string.Empty);
            //remove Image Base
            fileContent = regexImageBase.Replace(fileContent, string.Empty);
            //remove Time Stamp
            fileContent = regexTimeStamp.Replace(fileContent, string.Empty);
            File.WriteAllText(fileName, fileContent);
        }

        public static string DisassembleFile(string assemblyFilePath)
        {
            string disassembledFile = GetDisassembledFile(assemblyFilePath);
            try
            {
                return File.ReadAllText(disassembledFile);
            }
            finally
            {
                if (File.Exists(disassembledFile))
                {
                    File.Delete(disassembledFile);
                }
            }
        }
    }
}

So using this class we can get the IL code of a .NET assemlby. Now we can create a Hash Calculator to calculate the hash of the file, so that we can store the hash and compare it with newer files. To calculate the hash I use the following approach:
  1. Check if the file ends with .dll or .exe - if yes this can be a .NET assembly, if no get file hash
  2. Try to open the assembly with Assembly.LoadFile to see if this is a valid .NET assembly, if there is a BadImageFormatException, this is not a .NET one so get file hash
  3. If this is a .net assembly get the IL code file and calculate it's hash
To calculate file hash we use MD5CryptoServiceProvider. So here is the Hash Calculator:
using System;
using System.IO;
using System.Reflection;

namespace FileHasher
{
    public class HashCalculator
    {
        public string FileName { get; private set; }

        public HashCalculator(string fileName)
        {
            this.FileName = fileName;
        }

        public string CalculateFileHash()
        {
            if (Path.GetExtension(this.FileName).Equals(".dll", System.StringComparison.InvariantCultureIgnoreCase)
                || Path.GetExtension(this.FileName).Equals(".exe", System.StringComparison.InvariantCultureIgnoreCase))
            {
                return GetAssemblyFileHash();
            }
            else
            {
                return GetFileHash();
            }
        }

        private string GetFileHash()
        {
            return CalculateHashFromStream(File.OpenRead(this.FileName));
        }

        private string GetAssemblyFileHash()
        {
            string tempFileName = null;
            try
            {
                //try to open the assembly to check if this is a .NET one
                var assembly = Assembly.LoadFile(this.FileName);
                tempFileName = Disassembler.GetDisassembledFile(this.FileName);
                return CalculateHashFromStream(File.OpenRead(tempFileName));
            }
            catch(BadImageFormatException)
            {
                return GetFileHash();
            }
            finally
            {
                if (File.Exists(tempFileName))
                {
                    File.Delete(tempFileName);
                }
            }
        }

        private string CalculateHashFromStream(Stream stream)
        {
            using (var readerSource = new System.IO.BufferedStream(stream, 1200000))
            {
                using (var md51 = new System.Security.Cryptography.MD5CryptoServiceProvider())
                {
                    md51.ComputeHash(readerSource);
                    return Convert.ToBase64String(md51.Hash);
                }
            }
        }
    }
}

You can find the code here AssemblyHasher.zip or on github AssemblyHasher

13 comments:

  1. Vasil, this is pretty valuable!
    I have 2 questions:
    - is this hashing the embedded resources also? (I think it should too)
    - do yo u have or plan to open source this? (maybe on github?)

    ReplyDelete
  2. Hi Benjamin,
    The embedded resource is listed in the generated IL code

    .mresource public TestILDasm.Test.xml
    {
    // Offset: 0x00000000 Length: 0x00000049
    // WARNING: managed resource file TestILDasm.Test.xml created
    }

    but there is only the length of the embedded resource, so if you change the embedded resource content with another having the same
    length, the same hash will be created.
    About github, I don't think there will be much changes to this code, so I'm not sure if there is a need to do this :)

    ReplyDelete
  3. Thank for you response.
    So that's actually not that bad, because the actual content of the embedded resources is output on .res file by ildasm, so we can hash both .il and .res files to cover any change in the assembly.

    Here comes a good reason to use github: I could fork, add the res hashing and send it to you as a pull request :)

    ReplyDelete
  4. If you don't mind, I think I'll create the github repo (with you as author), and I'll transfer it you later if you're interested.

    ReplyDelete
  5. https://github.com/benjamine/AssemblyHasher

    ReplyDelete
  6. OK, looks great :) I can create a repo by myself or you can transfer this one to https://github.com/vtrifonov.

    ReplyDelete
  7. Sure, can you create using fork? I have a few minor changes pending, that I'll commit today.
    BTW I guess it's fine if I add a MIT license?

    ReplyDelete
  8. Hi Vasil, great tool! However I think I may have found a bug, or at least an oversight / limitation of this implementation.

    For the most part I have extended the code, and have been happily comparing folders of .dll files but today I compared two folders of files which I knew where different but it returns the same hash.

    Now these .dll files are primarily WPF files, so .xaml files. When I load up the .dll in ildasm manually look I see no sign of the xaml files, only the c# files which have reamained unchanged.

    Having a nose around on the net, I have found that xaml actually compiles down to baml, and is as an example listed in the MANIFEST under something like
    '.mresource public LandingPage.Theme.g.resources' which isn't being compared.

    ReplyDelete
  9. Hello Vasil, this tool was very helpfull for me
    Thanks!

    ReplyDelete
  10. Hello, Vasil. Thank you for providing this code. One question: do you know if the "Checksum" row is important to detect differences between two DLLs? I had a lot of "false positives" because of this row, and solved it by putting a Regex in Disassembler to remove it in the checking. Thanks in advance.

    ReplyDelete
  11. Good thanks, however you might have an error about missing dll file, and to solve it I recommend use next fixed tool which you can fin right here:http://fix4dll.com/mss32_dll It helped me a lot when I had similar problem.

    ReplyDelete