Tuesday, March 04, 2014

Data Structures

Memory Use of Data Structures - Most to Least

HashSet
-- a wrapper around a HashMap with less functionality
-- default 16 elements
-- default size is 144 bytes
-- 128 bits to wrap hashmap
-- argument - why use because more memory, less functionality

HashMap
-- default size of 16
-- empty size is 128 bytes
-- each object 46 bytes + 36 bytes per entry
-- overhead ~360k (1/3MB to do nothing on creation)
-- not synchronized

Hashtable
-- same as HashMap in terms of memory on creation
-- 11 default entries
-- synchronized by default

LinkedList
--default size is 1
-- default size 48 bytes
-- 24 bytes per entry (no key/hash)

ArrayList
-- default size is 10
-- empty size is 88 bytes
-- for 10,000 item list overehead ~40k

Compresses references reduces amount of ram increase on 64 bit OS over 32 bit.

Speed of Collections

HashMaps offer faster access - you have the key to determine the location - but use more memory

LinkedLists and ArrayLists slower to search but use less memory

LinkedList - insertion and deletion are fast because you simply move the pointers at the point of insertion or deletion rather than moving the whole list. Search is slow.


Arrays are bad for insertion because you have to move all the items after the insertion point +1 on insert.

HashTables offer fast insertion and searching. Limited in size because based on arrays (should avoid resizing). Hard to sort.

Implementations
  
Binary Tree

O (log n)

a data structure where each node has at most two children
left node is less than right
children less than parent

Good for:
- insert, update, delete, search

Implementation

Three classes:

1. Node (key value pair)
2. Binary Node (extends node with right and left child)
3. Binary Tree

package com.radicalsoftware.datastructure;

public class Node {

    protected int key;
    protected String value;
   
    public Node(int key, String name){
        this.key = key;
        this.value = name;
    }
    
    public String getString(){
        return value;
    }
   
    public int getKey(){
        return key;
    }
}

package com.radicalsoftware.datastructure;

public class BinaryNode extends Node {

    private BinaryNode leftChild;
    private BinaryNode rightChild;
   
    public BinaryNode(int key, String name){
        super(key, name);
    }
        
    public void setRightChild(BinaryNode rightChild){
        this.rightChild = rightChild;
    }
   
    public void setLeftChild(BinaryNode leftChild){
        this.leftChild = leftChild;
    }
   
    public BinaryNode getRightChild(){
        return rightChild;
    }
   
    public BinaryNode getLeftChild(){
        return leftChild;
    }
   
    public String toString(){
        return key + " " + value;
    }
   
    public boolean hasChildren(){
        return leftChild != null && rightChild != null;
    }
}


public class BinaryTree {

    private static BinaryNode root;
   
    public static void main(String[] args) {
        BinaryTree bt = new BinaryTree();
        bt.addNode(4, "Mickey");
        bt.addNode(2, "Goofy");
        bt.addNode(3, "Minnie");
        bt.addNode(1, "Donald");
       
        BinaryNode node = root;
       
        System.out.println("*******************");
        System.out.println("in order traversal");
        inOrderTraversal(node);

        System.out.println("*******************");
        System.out.println("pre order traversal");
        preOrderTraversal(node);
       
        System.out.println("*******************");
        System.out.println("post order traversal");
        postOrderTraversal(node);
       
        System.out.println("*******************");
        System.out.println("find 3");
        System.out.print(find(3));
    }
   
    public static BinaryNode find(int key){
       
        BinaryNode searchNode = root;
       
        while(searchNode.key != key){
           
            if (key < searchNode.key){
                searchNode = searchNode.getLeftChild();
            }
            else{
                searchNode = searchNode.getRightChild();
            }
           
            if (searchNode == null) return null;
        }
       
        return searchNode;
    }
    public static void inOrderTraversal(BinaryNode node){
       
        if (node != null){
            inOrderTraversal(node.getLeftChild());
            System.out.println(node);
            inOrderTraversal(node.getRightChild());
        }
       
    }
   
    public static void preOrderTraversal(BinaryNode node){
       
        if (node != null){
            System.out.println(node);
            preOrderTraversal(node.getLeftChild());
            preOrderTraversal(node.getRightChild());
        }
   
    }
   
    public static void postOrderTraversal(BinaryNode node){
       
        if (node != null){
            postOrderTraversal(node.getLeftChild());
            postOrderTraversal(node.getRightChild());
            System.out.println(node);
        }
    }
   
    public BinaryNode getRoot(){
        return root;
    }
   
    public void addNode(int key, String name){
       
        BinaryNode newNode = new BinaryNode(key, name);
       
        if (root == null){
            root = newNode;
        }else{
           
            BinaryNode parent = root;
           
            while(true){
               
               
                if (key < parent.getKey()){
                    if (parent.getLeftChild() == null){
                        parent.setLeftChild(newNode);
                        break;
                    }
                    else{
                        parent = parent.getLeftChild();
                    }
                }else{
                    if (parent.getRightChild() == null){
                        parent.setRightChild(newNode);
                        break;
                    }else{
                        parent = parent.getRightChild();
                    }
               
                }
            }
        }
    }
}


Hash Table
- key values are assigned using a hash function
- hash determines best index
- IDs used to generate hash must be unique
- hashes must be unique (duplicate hashes when generating called collisions and can affect performance)
- hash must be small enough to fit within the bounds of the array
- hashes make it possible to find the value in the array using a calculation vs. search which  means it takes one operation to access the data vs having to go through every element in the worst case search

Avoiding to collisions
- To avoid collisions you want your table storing hashes to be at least twice as big as size of array if you are using a mod has function
- using prime number for array size helps minimize collisions
- also avoid clustering by using a double hash function

To resize
- store values in current array and empty spaces
- get prime number bigger than array size
- use a hash function to fill new array with new hashes

To hash strings
hash value starts at 0
for each character hash =  (hash * 27 + charCode) % arraySize

Separate chaining
- if there is already a value in a spot in the array add it to a list instead of searching for an open spot

Sample Code

Classes:
1. ArrayTools - some toools for arrays such as remove spaces & print elements
2. RadMath - some math functions
3. Hash - some hash functions (note string hash not used below)
4. HashTable - assumes all elements added to array are Strings that convert to unique integers

package com.radicalsoftware.datastructure;

import com.radicalsoftware.util.ArrayTools;
import com.radicalsoftware.util.Hash;
import com.radicalsoftware.util.RadMath;

//This class assumes the values are Strings that can be convereted
//to integers
public class HashTable {

    private String[] array;

    HashTable(int size){
        array = new String[size];
    }
   
    /*************************************************
     * Methods to fill hash table
     *************************************************/
   
    //the hash calculation = parse integer from String value
    public void fillTableHash1(String[] strings){
   
        for (int n = 0; n < strings.length; n++){
            String s = strings[n];
            int hash = Hash.keyToInt(s);
            array[hash] = s;
        }
   
    }
   
    public void fillTableHash2ShowCollisions(String[] strings){
      
        for (int i = 0; i < strings.length; i++){
          
            String s = strings[i];
            int hash = Hash.modHashShowCollisions(s, array);
            System.out.println("hash: " + hash);
            array[hash] = s;
        }
      
    }
   
    //hash 2, don't show collisions
    public void fillTableHash2(String[] strings){
      
        for (int i = 0; i < strings.length; i++){
          
            String s = strings[i];
            int hash = Hash.modHash(s, array);
            array[hash] = s;
        }
      
    }
   
    public void fillTableDoubleHashShowCollisions(String[] strings){
      
        for (int i = 0; i < strings.length; i++){
          
            String s = strings[i];
            int hash = Hash.doubleHash(s, array, 5);
            System.out.println("hash: " + hash);
            array[hash] = s;
        }
      
    }

    /*************************************************
     * Find value in hash table
     *************************************************/
   
    public int find(String key){
      
        int hash = Hash.modHashBase(key, array);
        int count = 1;

        //Exit on null or checked entire array
        while (this.array[hash] != null &&
               count != this.array.length){
      
            if (this.array[hash].equals(key))
                return hash;
          
            hash++;  
            hash %= this.array.length;
            count ++;
        }
      
        return -1;
      
    }
   
    public int findDoubleHash(String key){
      
        int hash = Hash.doubleHashBase(key, array, 5);
        int count = 1;

        //Exit on null or checked entire array
        while (     this.array[hash] != null &&
                 count != this.array.length){
      
            if (this.array[hash].equals(key))
                return hash;
          
            hash++;  
            hash %= this.array.length;
            count ++;
        }
      
        return -1;
      
    }
   
   
    /*************************************************
     * Resize HashTable
     *************************************************/
    public void resize(int size){
      
        ArrayTools.removeSpacesInArray(array);
         size = RadMath.getNextPrime(size);  
         String[] newArray = new String[size];
       
         for(int i = 0; i < array.length; i++){
           
             if(array[i] == null){
                 array = newArray;
                 return;
             }
           
             String s= array[i];
             int hash = Hash.modHash(s, newArray);
           
             while(newArray[hash]!= null){
                 hash++;
                 hash %= newArray.length;
             }
           
             newArray[hash] = s;
         }
       
    }
   
    /*************************************************
     * Print hash table
     *************************************************/
    public void print(){
        ArrayTools.print(array);
    }
   
}
 

package com.radicalsoftware.util;

public class ArrayTools {

    //remove spaces in original array
    //could also create an array list and copy
    //everything over but this uses the existing
    //allocated array
    public static void removeSpacesInArray(String[] array){
       
        int pointer = 0;
       
        for(int i = 0; i < array.length; i++){
       
            if (array[i] != null){
                if (i > pointer){
                    array[pointer] = array[i];
                    array[i] = null;
                   
                    while (array[pointer] != null
                            && pointer < array.length){
                        pointer ++;
                        pointer %= array.length;
                    }
                }
            }
        }
    }

   
    public static void print(String[] array){
   
        for (int i = 0; i < array.length; i++){
           
            System.out.print(i);
            System.out.print(" ");
            if(array[i]!= null){
                System.out.print(array[i]);
            }
           
            System.out.print(System.lineSeparator());
           
        }
       
        System.out.println(System.lineSeparator());
       
    }
}
 


package com.radicalsoftware.util;

public class Hash {

   
    /*************************************************
     * Hash calculations
     *************************************************/
   
    //calculate hash by converting to int
    //assumes all input strings for table are unique
    //and only up to size of array
    public static int keyToInt(String s){
        return Integer.parseInt(s);
    }
   
   
    //let's say we have 900 potential values but only ever store
    //up to 25 values - it is not efficient to create a 900 item array.
    //make the array size of number of values we need to store and hash
    //on up to 900 values in a way that won't cause collisions
    //Solution: use mod size of array
   
    public static int modHashBase(String s, String[] array){
        int hash = Integer.parseInt(s) % array.length;
      
        return hash;
    }
    public static int modHash(String s, String[] array){
   
        int hash = modHashBase(s, array);
      
        while(array[hash]!= null){
            System.out.print("collision: array index (" + hash + ") has a value.");
          
            hash++;
            //if end of array start over at 0
            hash %= array.length;
          
            System.out.println(" Try: " + hash);
        }
      
        return hash;
    }

   

    public static int modHashShowCollisions(String s, String[] array){
      
        int hash = Integer.parseInt(s) % array.length;
      
        //if collision add 1 to hash until no more collisions
        while(array[hash]!= null){
            System.out.print("collision: array index (" + hash + ") has a value.");
          
            hash++;
            //if end of array start over at 0
            hash %= array.length;
          
            System.out.println(" Try: " + hash);
        }
      
        return hash;
    }
   
    //avoid clusters
    public static int doubleHashBase(String s, String[] hashArray, int step){
        int val = Integer.parseInt(s);
        int hash = val % hashArray.length;
        int stepDistance = step -(val % step);
        hash = hash + stepDistance;
        hash %= hashArray.length;
        return hash;
    }
   
    public static int doubleHash(String s, String[] array, int step){
      
        int hash = doubleHashBase(s, array, step);
      
        while(array[hash]!= null){
            hash ++;
            hash %= array.length;
        }
      
        return hash;
    }
   
    public static int hashString(String s, String[] array){
        int hash = 0;
      
        for (int i = 0; i < s.length(); i++){
            int charCode = s.charAt(i) - 96;
            hash += hashCharacter(charCode, hash, array.length);
        }
      
        return hash;
    }
   
    public static int hashCharacter(int charCode, int hash, int arrayLength){
      
        return (hash * 27 + charCode) % arrayLength;
    }
}


Graph

O(V+E)

- collection of verticies and edges
- undirected - no directions on edges, pairs are unordered
- directed - edges associated with directions - edges are ordered pairs
- two vertices are adjacent if connected by an edge
- edge is incident to a vertex if the vertex is one of the two vertices that the edge connects
- degree is number of edges
- directed graphs - in degree (edges in), out degree (edges out)
- path -  sequence of vertices between to vertexes connected by edges
- length of path -- last node is length minus one so in the case of three nodes the first node is 0, second is 1, 3rd is 2 (length -1) so length = 3
- cycle - path that begins and ends at the same point
- simple path - path with no repeating vertices
- simple cycle - simple path plus the original point
- connectedness - connected if path between every single pair of verticies
- complete - an edge connecting every pair of vertices, not just a path
- adjacency list - better for less complete
- adjacency matrix - better for fuller graphs
- diameter of graph - from start to final state you are seeking (i.e. like solving a 2 x 2 Rubik's cube)

Adjacency Lists:
Used to represent a graph
Array - each element is a pointer to a linked list
Array is indexed by a vertex
Index arrays by verticies
Your array could be a hash table
For each vertex store it's neighbors

Array[a] = {b}
Array[b] = {a, c}
Array[c] = {a, b}

Undirected uses parentheses (I think)

So that would mean vertex a connects to b, b connects to a and c, c connects to a and b

http://www.youtube.com/watch?v=vfCo5A4HGKc
http://www.youtube.com/watch?v=s-CYnVz-uh4

Other Data Structures

Tries
Stack
Queue
Vector/ArrayLists

Array
Linked List

StringBuffer
default size is 16 characters
empty size is 72 bytes