Monday, February 6, 2012


Link List facts in modern computer system 

This question has been bugging me for the last few days. Why would anyone use a linked-list instead of arrays? I argue that linked lists have become irrelevant in the context of a modern computer system. I have asked around a few colleagues and I include their counter arguments and my rebuttal for your reference. Please point out if I am missing something.

Lists have several disadvantages that did not exist in traditional systems:

1.They reduce the benefit of out-of-order execution. 

Out-of-order execution gains a lot of benefit by parallelizing independent memory accesses. Accesses in linked lists are dependent as the address of each node is only known after the current element has been loaded into the cache. This effectively eliminates the benefit of out-of-order executing when traversing the list. 

2. They throw off hardware pre-fetching

Most processors today employ hardware pre-fetchers. Hardware pre-fetchers are good at pre-fetching regular data such as arrays but they are unable to pre-fetch linked data structures.

<    3.They reduce DRAM and TLB locality

Linked lists spread data elements all over the memory. Consecutive elements can be in different DRAM pages. Since modern DRAMs take 3x more cycles to fetch random memory locations, each memory access of a linked list is potentially more expensive (Explained in my post on What every programmer should know about the DRAM memory). For the same reason, a more dispersed memory footprint requires more pages to be brought into the DRAM memory which increases the usage of physical memory. Lastly, if the list is dispersed across more pages, there are a larger number of TLB misses with a list.

      4. They cannot leverage SIMD.

In operations like lookup, multiple elements of an array can be compared to a value using a single SIMD instruction. Since elements in a linked list have to be fetched one at a time, it is unable to use the power-efficient SIMD model.

      5.     They are harder to send of GPUs. 

Today’s GPUs have a different memory address space than the CPU. Thus, when sending a data over to the GPU, all pointers have to convert from one memory space to the other. Linked lists have lots of pointers which makes it considerable expensive to send then to the GPU. Note: This advantage will go away once GPUs share the virtual memory with the CPU. When I gave the above reasons to my colleagues, they gave me some counter arguments about why linked lists are better.
Below are their list and my rebuttal to each point

<1.    Re-sizable Linked lists can be sized dynamically.

  My argument: Array with amortized doubling has solved this problem.

<2.   Different element types Linked lists are superior to arrays as they allow each node to be of a   different type.

My argument: I agree except that this property is rarely exploited.

<3. Cheaper insertions at the ends it is cheaper to insert an element at the head or tail of a list.

 My argument: Arrays can be used with a head and tail pointers and buffers at both ends for faster insertions.

<4.Cheaper insertions in the middle in ordered lists, inserts are often needed in the middle of the data structure. A link list insert is cheaper as on average it requires traversing only 1/2 of the list, finding the right spot, and inserting the node. Inserting in an array seems expensive because it requires traversing half of the array to find the insertion point and then shifting the other half of the array to make space for the incoming element.

My argument: The analysis is naive. Arrays are actually faster. Improved memory-level parallelism, locality, pre-fetching, and the ability to use SIMD far offsets the fact that arrays requires 2x the operations.

<5.  Pointers in the middle we can have jump pointers to make list accesses faster.
     
      My argument: This is called a hash table, not a list.
Do you have any other arguments that support the linked lists?