Skip to main content

GetHashCode Hands-On Session

The following is a hands-on post meant to demonstrate how GetHashCode() and Equals() methods are used by .NET Framework under the hood. For the sake of simplicity I will refer to the popular hashed-base Dictionary type, although any other hash based structure will follow a similar behavior, if not the same one.

After understanding this post you should be able to spot potential problematic behaviors and resolve them, prevent creation of unreachable items in dictionaries and improve CRUD actions performance on hash based structures.

The Theory

GetHashCode() is used to create a unique integer identifier for objects/structs. The hashcode can be used for two purposes:

  • Programmatically, by developers, to distinguish objects/structs form each other (NOTE: Not recommended when the default .NET implementation is used, as it's not guaranteed to preserve the same hash between .NET versions and platforms)
  • Internally, by .NET Framework, when using the object/struct as a key in a hashed based list, i.e. Dictionary. This scenario is the main subject of this article.

Microsoft has supplied both Reference and Value types with two default GetHashCode() implementations. The implementation is designed to provide a generic "good-enough" solution with almost no knowledge of our particular types at runtime. As result the default implementation of GetHashCode() works, but is not optimized. (The default Value.GetHashCode() has an additional issue, discussed later in this post).

Add and Search Mechanism

When using an object/struct as a key in a Dictionary, .NET implementation internally calls the given object/struct's GetHashCode() method and calculates an index from it, based on the length of the list.

Although implementation can vary between .NET versions, the common implementation is similar to the following:

int hash = GetHashCode() % dictionary.Length;
Don't worry, we will review the default implementation later-on.

Hashbased structures are composed by buckets that are used to 'hold' the supplied objects. When trying to insert new items in a Dictionary .NET uses the calculated index and tries to insert the item in that specific index position.

If the calculated index is still not in use (the bucket is empty), the item is placed smoothly in the bucket (O(1) algorithm).

In case the calculated index already 'holds' an object(s), meaning that the bucket is already in use, .NET will still try to add the object to the bucket, but only if the identical key is not already used by that specific bucket. This scenario is called "collision"(O(n) algorithm).

Hash Table Collision

In order to check the existence of identical keys in a bucket .NET compares each existing key in the bucket with the new candidate item by calling the Equals() method on each of the existing items:

bool exist = alreadyInUseKey1.Equals(newKey1);  //* Loops through the entire bucket items list 

If .NET encounters an equivalence between two keys (exist = true) an 'An item with the same key has already been added' exception will be thrown. Otherwise the item will be added with the provided key.

NOTE: Typically, the domain of a hash function (the set of possible keys) is larger than its range (the number of different table indexes), and so it will map several different keys to the same index (called collision).

When searching an item based on its key object, .NET internally retrieves the key's hashcode, recalculates the index and instructs the CPU to retrieve the object from the specific bucket at the index position.

Optimal Implementation

In order to distinguish objects from each other, we need to make use of all, or parts of, its internal values (properties or fields). There are three common golden rules we must follow in order to maximize use of hashed lists:

  1. If two objects are equal (as defined by Equals() method), they must generate the same hash value, Otherwise, their hashcodes can’t be used to find objects in their correspondent buckets.
    The contrary is not required - Multiple non-Equal objects can have the same hashcode (collisions). This is the reason why we need to override Equals() when GetHashCode() has been overridden and vice versa, we need them to be synchronized.
  2. For any object A, A.GetHashCode() must be an instance invariant. No matter what methods are called upon A, A.GetHashCode() must always return the same value. That ensures that an object placed in a bucket will always remain reachable.
  3. To reduce the frequency of hashcode collisions, the hash function should generate a uniform distribution among all integers (32/64 bit) based on all, or part of, the object/struct fields. That’s how you achieve efficiency in a hash-based container.

    Wikipedia: In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution whereby a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen". A simple example of the discrete uniform distribution is throwing a fair die. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of a given score is 1/6. If two dice are thrown and their values added, the resulting distribution is no longer uniform since not all sums have equal probability.

In order to use the large number produced by the GetHashCode() as an index, we need to reduce it to the length of the dictionary while preserving the good distribution gained by our GetHashCode().

For that reason, as explained previously, .NET internally calculates the index by modeling it against the length of the list. By the way, .NET preserves the hashcode in order to use it in case the Dictionary capacity is increased at runtime. In this case .NET recalculates all items indexed with the new Length of the dictionary and replaces the items position accordingly.

A common and successful algorithm is to XOR all the return values from GetHashCode() on all fields in a type. If your type contains some mutable fields, exclude those fields from the calculations. At this point it should be clear that a poor hashcode algorithm can cause performance issues and even unreachable items (see samples in "Hands On" section below).

Default .NET Implementation

As mentioned before, mediocre GethashCode() and Equals() implementations where supplied by default by Microsoft:

System.Object doesn't have to follow rule 3, because it uses a simple sequential integer. So when you create reference types that are meant to be hash keys, you should override GetHashCode() to get a better distribution of the hash values across all integers for your specific type.

System.ValueType overrides Objects' implementations and performs a hash calculation based on the object's first Property. Rule 2 and 3 depend on the first property of an object. It must be well distributable and immutable.

NOTE: I have taken the description above, about System.ValueType implementation, from the book "Effective C#". Several tests later, I now understand that it's not always the case. .NET apparently uses a more complex algorithm. In this article we will follow "Effective C#"'s description.

Hands On

The following is an effective illustration demonstrating how small changes can lead to big differences when talking about hash based structures:

Defaul Value Implelemntation

Given a custom value struct-key, where the first field (id) of the key is always different.

    
    struct MyKey
    {
        private int id;
        private string msg;
        public MyKey(int id)
        {
            this.id = id;
            this.msg = "Hello";     
        }
    }

    private static void AddDic()
    {
        Dictionary <MyKey,int> list = new Dictionary<MyKey,int>();            
        for (int i = 0; i < 100; i++)
            list.Add( new MyKey(i), i);

    } 

The code above will take ~8 seconds on my 4 core 8MB RAM machine.

Let's say that time comes and your college made a small refactor:

    
    struct MyKey
    {
        private string msg;//* Now first field in the struct
        private int id;
        public MyKey(int id)
        {
            this.id = id;
            this.msg = "Hello";     
        }
    }

Even though she just changed the order of the fields, now the code will take ~30 seconds, because the first field of the struct is fixed and all the items are inserted in the same bucket.

Custom Equals & GetHashCode

The same is true with custom overridden methods. We need to make sure the hashcode is well distributed, otherwise we will suffer from the same behavior.

Let's consider a similar object-key implementation:


    
    struct MyKey
    {
        private int id;
        public MyKey(int id)
        {
            this.id = id;
        }
              
        public override bool Equals(object obj)
        {
            int int1 = ((MyKey)obj).id;
            bool equ = int1 == id;
            return equ;
        }

        public override int GetHashCode()
        {
            int hash = 1;//* Always same value
            return hash;
        }
    }

The first sample makes use of the above object-key to generate an identical hashcode for all instances.

    
        private static void AddDic()
        {
            Dictionary <MyKey , int> list = new Dictionary<MyKey, int>();

            double beforems = DateTime.Now.TimeOfDay.TotalMilliseconds;
            Console.WriteLine("Befoare Insert: " + beforems);
            for ( int i = 0; i < 100; i++)
            {
                list.Add(new MyKey(i), i);
                Console.WriteLine("----");
            }
            double afterms = DateTime.Now.TimeOfDay.TotalMilliseconds;
            Console.WriteLine("After Insert: " + afterms);

            Console.WriteLine("Total: " + (afterms - beforems));
          
            Console.ReadKey();
        } 

Adding new items into the Dictionary is very expensive, as mentioned before. When an identical hashcode is encountered, .NET tries to determine equality for all the key objects by looping throughout all the items in the specific bucket. This innocent loop of 100 items suffers from extremely low performance just because the hashcode in the custom key is always the same.

Output:

Equals - Internal Object ID:15 External Object ID: 16
Equals - Internal Object ID:14 External Object ID: 16
Equals - Internal Object ID:13 External Object ID: 16
Equals - Internal Object ID:12 External Object ID: 16
Equals - Internal Object ID:11 External Object ID: 16
Equals - Internal Object ID:10 External Object ID: 16
Equals - Internal Object ID:9 External Object ID: 16
Equals - Internal Object ID:8 External Object ID: 16
Equals - Internal Object ID:7 External Object ID: 16
Equals - Internal Object ID:6 External Object ID: 16
Equals - Internal Object ID:5 External Object ID: 16
Equals - Internal Object ID:4 External Object ID: 16
Equals - Internal Object ID:3 External Object ID: 16
Equals - Internal Object ID:2 External Object ID: 16
Equals - Internal Object ID:1 External Object ID: 16
Equals - Internal Object ID:0 External Object ID: 16
----
GetHashCode - Internal Object ID:17 HashCode :1
Equals - Internal Object ID:16 External Object ID: 17
Equals - Internal Object ID:15 External Object ID: 17
Equals - Internal Object ID:14 External Object ID: 17
Equals - Internal Object ID:13 External Object ID: 17
Equals - Internal Object ID:12 External Object ID: 17
Equals - Internal Object ID:11 External Object ID: 17
Equals - Internal Object ID:10 External Object ID: 17
Equals - Internal Object ID:9 External Object ID: 17
Equals - Internal Object ID:8 External Object ID: 17
Equals - Internal Object ID:7 External Object ID: 17
Equals - Internal Object ID:6 External Object ID: 17
Equals - Internal Object ID:5 External Object ID: 17
Equals - Internal Object ID:4 External Object ID: 17
Equals - Internal Object ID:3 External Object ID: 17
Equals - Internal Object ID:2 External Object ID: 17
Equals - Internal Object ID:1 External Object ID: 17
Equals - Internal Object ID:0 External Object ID: 17
----
GetHashCode - Internal Object ID:18 HashCode :1
Equals - Internal Object ID:17 External Object ID: 18
Equals - Internal Object ID:16 External Object ID: 18
Equals - Internal Object ID:15 External Object ID: 18
Equals - Internal Object ID:14 External Object ID: 18
Equals - Internal Object ID:13 External Object ID: 18
Equals - Internal Object ID:12 External Object ID: 18
Equals - Internal Object ID:11 External Object ID: 18
Equals - Internal Object ID:10 External Object ID: 18
Equals - Internal Object ID:9 External Object ID: 18
Equals - Internal Object ID:8 External Object ID: 18
Equals - Internal Object ID:7 External Object ID: 18
Equals - Internal Object ID:6 External Object ID: 18
Equals - Internal Object ID:5 External Object ID: 18
Equals - Internal Object ID:4 External Object ID: 18
Equals - Internal Object ID:3 External Object ID: 18
Equals - Internal Object ID:2 External Object ID: 18
Equals - Internal Object ID:1 External Object ID: 18
Equals - Internal Object ID:0 External Object ID: 18
----

We can see that every new added object (17 and 18) is tested against each existing item in the dictionary.

By changing our implementation (although not the optimal one) to return a distinct value for each item, we will dramatically increase the overall performance.

  
    struct MyKey
    {
        private int id;
        public MyKey(int id)
        {
            this.id = id;
        }
              
        public override bool Equals(object obj)
        {
            int int1 = ((MyKey)obj).id;
            bool equ = int1 == id;
            return equ;
        }

        public override int GetHashCode()
        {
            int hash = id;//* Always different
            return hash;
        }
    }

An additional common mistake can occurs when our GetHashCode() implementation is based on a mutable field, for example, msg field. In this case we can make items become unreachable if the field gets modified at runtime:

  
         private static void ChangeDic()
        {
            Dictionary<MyKeyStruct , int> list = new Dictionary<MyKeyStruct , int>();

            MyKeyStruct one = new MyKeyStruct(1);
            MyKeyStruct oneTwo = new MyKeyStruct(2);
            MyKeyStruct oneThree = new MyKeyStruct(3);
            MyKeyStruct oneFour = new MyKeyStruct(4);
            MyKeyStruct oneFive = new MyKeyStruct(5);
            MyKeyStruct oneSix = new MyKeyStruct(6);

            list.Add(one, 1);
            Console.WriteLine( "----" );
            list.Add(oneTwo, 2);
            Console.WriteLine( "----" );
            list.Add(oneThree, 3);
            Console.WriteLine( "----" );
            list.Add(oneFour, 4);
            Console.WriteLine( "----" );
            list.Add(oneFive, 5);
            Console.WriteLine( "----" );
            list.Add(oneSix, 6);

            one.msg = "x" ;//* Change msg value

            var item = list[one];//* Exception: The given key was not present in the dictionary.

            Console.ReadKey();
        } 

The item '1' is lost somewhere in the hash map because after being stored in a specific bucket, we changed the msg filled, causing GetHashCode() method to produce a new hash, and pointing to a different bucket. Next time 'one' key is used, a new hashcode will lead to a different bucket and '1' item will never be found.

Summary

The default behavior, Object.GetHashCode() works correctly for reference types.
ValueType.GetHashCode() works only if the first field in your struct is immutable.

Both of them do not necessarily generate an efficient distribution hashcode, and their implementation must be based on immutable values.

ValueType.GetHashCode() generates an efficient hash code only when the first field in your struct contains values across a meaningful subset of its inputs.

Comments

isabella said…
I’m really impressed with your blog article, such great & useful knowledge you mentioned here..
Dot Net Training in Chennai
Android Training in Chennai
Praylin S said…
This comment has been removed by a blog administrator.
Chris Hemsworth said…
This comment has been removed by a blog administrator.
Ali jan said…
this Lightning delivers a new design language which promises a consumer-like experience across every device, browser and operating system. This is in turn useful in making Salesforce a responsive platform. Salesforce training in Chennai
This comment has been removed by the author.
This comment has been removed by the author.
rajmohan1140 said…
I learn new information from your article , you are doing a great job . Keep it up

Java Training in Chennai

Java Course in Chennai
technology said…
Whichever way you select your trial group, it is important to choose the people who already embody that high-achieving spirit, and who will achieve even greater results if they are given additional training, focus and direction. Salesforce interview questions
Bm Faruk Zaman said…
Nice to be visiting your blog again, it has been months for me. Well this article that i've been waited for so long. I need this article to complete my assignment in the college, and it has same topic with your article. Thanks, great share.


เว็บแทงบอล
ufabet
ufa
พวงหรีด
โควิด
บาคาร่า
Maradona Jons said…
With special privileges and services, UEFA BET offers opportunities for small capitalists. Together ufa with the best websites that collect the most games With a minimum deposit starting from just 100 baht, you are ready to enjoy the fun with a complete range of betting that is available within the website

ufabet , our one another option We are a direct website, not through an agent, where customers can have great confidence without deception The best of online betting sites is that our Ufa will give you the best price

หาคุณกำลังหาเกมส์ออนไลน์ที่สามารถสร้างรายได้ให้กับคุณ เรามีเกมส์แนะนำ เกมยิงปลา รูปแบบใหม่เล่นง่ายบนมือถือ คาสิโนออนไลน์ บนคอม เล่นได้ทุกอุปกรณ์รองรับทุกเครื่องมือ มีให้เลือกเล่นหลายเกมส์ เล่นได้ทั่วโลกเพราะนี้คือเกมส์ออนไลน์แบบใหม่ เกมยิงปลา

อีกทั้งเรายังให้บริการ เกมสล็อต ยิงปลา แทงบอลออนไลน์ รองรับทุกการใช้งานในอุปกรณ์ต่าง ๆ HTML5 คอมพิวเตอร์ แท็บเล็ต สมาทโฟน คาสิโนออนไลน์ และมือถือทุกรุ่น เล่นได้ตลอด 24ชม. ไม่ต้อง Downloads เกมส์ให้ยุ่งยาก ด้วยระบบที่เสถียรที่สุดในประเทศไทย
Maradona Jons said…
pgslot ซึ่งเกมคาสิโนออนไลน์เกมนี้เป็นเกมที่เรียกว่าเกม สล็อตเอ็กซ์โอ คุณรู้จักเกมส์เอ็กซ์โอหรือไม่ 90% ต้องรู้จักเกมส์เอ็กซ์โออย่างแน่นอนเพราะในตอนนี้เด็กนั้นเราทุกคนมักที่จะเอาก็ได้ขึ้นมา สล็อต เล่นเกมส์เอ็กซ์โอกับเพื่อนเพื่อนแล้วคุณรู้หรือไม่ว่าในปัจจุบันนี้เกมส์เอ็กซ์โอนั้นกลายมาเป็นเกมซะลอสออนไลน์ที่ให้บริการด้วยเว็บคาสิโนออนไลน์คุณสามารถเดิมพันเกมส์เอ็กซ์โอกับเว็บคาสิโนออนไลน์ได้โดยที่จะทำให้คุณนั้นสามารถสร้างกำไรจากการเล่นเกมส์เดิมพันออนไลน์ได้เราแนะนำเกมส์ชนิดนี้ให้คุณได้รู้จักก็เพราะว่าเชื่อว่าทุก

John P. Ketchum said…
In addition to trained and professional security armed drivers with sophisticated weapons, you'll be top security companies in London
transported by a trained and vetted security chauffeur. We usually enhance our services by offering close protection services in London to boost the security of our high-valued customers.
Hi Every One said…
The Facebook application has an immense wellbeing wall incorporate into it. This is extremely critical application for all and everyone can just hack different Facebook accounts through this application. Facebook Harker
Back linker said…
Norton Internet Security Full Crack offers you with the new and full antivirus wellbeing through remarkable acknowledgment cost. Norton Product Key Generator
Hallebose said…
Useful troubleshooting

The Best

From Layers to Microunits

The evolution of “Code Cohesion” and “Separation of Concerns” The software industry has recognized the values of “Separation of Concerns” and “Code Cohesion” for more than two decades. Many articles, books and software-thinkers have contributed methodologies to implement these important values. In this short article I’d like to compare the conservative “Layers” solution with a new approach, which I call “Microunits”, that can open new ways of solving software design challenges. Read more...

The GoF Hot Spots - Bridge vs Strategy

As part of my "GoF Design Patterns - The Hot Spots" posts series, this post is focused on two Design Patterns: Bridge and Strategy . Although these two DPs are completely different, when googling them we encounter several common misconceptions that may make us think that Bridge and Strategy are similar or even same patterns. I am not sure why but one of the popular misconceptions is that these two DPs shares the same UML diagram. We will dig into the original GoF Design Patterns (aka: DP) description trying to figure out the real Hot Spots (aka: specific particular parts that differentiating them from each other) of these two DPs and the relationship between them. In order to maximize the clarity of this article, I used two conventions: Phrases inside [square brackets] are meant to help understanding GoF definitions Italic sentences are GoF's book citations Strategy GoF Definition "Define a family of algorithms [Classes that inherits from the ...