@TeriRadichel | Cloud. Security. Software.: March 2014

Thursday, March 27, 2014

Java 8 Crash Course Notes

A summary of new things in Java 8.

This is a little rough. Check the rest of the links on my Twitter account for more in depth articles.

Lambda

() ->

Method reference

Allows retrofitting existing method in as a lambda argument.

Interfaces

- allows adding default methods to interface

- functional interfaces

Types of lambdas:

Predicate

Consumer

- performs function on argument passed into it

Suppliers

- provide things

New forEach:

Collections now have forEach method

coll.forEach((blah)

if(ship .... Etc

)

c.forEach((x)

x::method

)

streams / parallelStreams

- have to synchronize collection to use parallelStream - old way is faster - if done incorrectly. Correct way is to use new Collectors object. e.g. Collectors.toList(), Collectors.toSet()

Variables within a function are thread safe - supports parallel execution. Supports distribution of load across multiple cores.

JVM in Java 8 can run JavaScript. Can execute JavaScript from command line http://www.takipiblog.com/2014/02/10/java-8-compiling-lambda-expressions-in-the-new-nashorn-js-engine/

Nashorn to replace Rhino

New Java Time package.

LocalDate

.toEpochDay()

.plusDays(2)

Human readable time:

d.getYear()

d.getMonth()

Clock

c = Clock.systemUTC()

Timezone

Calculations

Time Gap

Optional

- optional gets you away from nulls

- deals with nullable return values

IDE Support

- NetBeans

- IntelliJ

Code coverage:

Jacoco

Most tools working

FindBugs fails

Sunday, March 23, 2014

Brother Printer Won't Print - Wireless Network

For future reference, because I always forget, if you are ever trying to figure out the IP address of your brother printer if and when it changes and your machines won't connect here's how:

Go to your printer and choose MENU- PRINT REPORTS - NETWORK CONFIG

http://www.brother-usa.com/FAQs/Solution.aspx?FAQID=200000039376&Model=1818&ProductID=MFCJ825DW&Keyword=#.Uy9NHD_n_IU

Get the IP address, go to control panel in Windows, edit the printer properties and you can hard code the correct IP address in the TCP/IP port assigned to the printer.

What is also curious when I print this report are all the different protocols that are enabled. I had no idea. Might be a good idea to turn off the ones you don't need:

TCP/IP
NetBIOS/IP (Most likely want to turn this off if possible)
LegacyAuth
Remote Set Up
Raw Port
Web Services
PC fax
FTP
mDNS
Proxy
IPv6
APIPA
SNMP
LPD
IPP
Network scan
POP3/SMTP
TFTP
LLMNR

Yeah... that's a lot of stuff not sure is all necessary in most cases. Would rather have most turned off by default. Will have to play around with turning on and off and see what happens.

Crypto - Coursera classes

To watch...in my spare time in addition to studying SANS material for grad school program in information security engineering. No rest...

https://class.coursera.org/crypto-preview/lecture

C# + ASPX Web App - 1 week of lunch hours

I spent my lunch hours over the past week helping a friend and co-worker build a one page web application to track software deployments. She knew some C# but not how to build a web site.

It's been ages since I built .NET web sites. I once built a .NET web site through my business ultimately for State of Washington Department of Printing. That was probably most extensive. It integrated with a Java library (itext) over RPC to generate proof of envelopes and letterhead - generating the bar codes was cool.

But I digress. The rest was C#. Login, product catalog, image management, workflow to create and save new types of envelopes and letterhead and send through multiple levels of approval. Once approved the item was available in the product catalog for diffrent state agencies to order. The site also facilitated these orders.

Then I moved to Java and more customized stuff.

But here are some notes from what I forgot and remembered this week:

1. To generate the code behind a control, create an ASPX control in the HTML. Then you can use intellisense or whatever Microsoft calls to by typing an attribute stating with "On" to see all your event handlers. When you choose one then type = and a name of C# function in double quotes. Visual Studio will ask if you want to generate the code behind function. You can say yes and you'll then find it has been created when you switch from ASPX to the c# code behind view.

2. ASPX tags force you to set the runat attribute. Which is funny because you always have to set the value to "server". I bet it's legacy.

3. You can create an ASPX tag for a data source. Then you can assign that data source to other ASPX tags. This is good for beginners - I started out trying to do the connection in the code behind. I wonder how they handle connection pooling in this case but studying too many other things to go find out.

4. To get your page to show all the data you hooked up to it you can put Page.DataBind in the code behind. You can also do that for specific controls like DataGrids.

5. To select a row in a GridView you can add a button for that purpose - see online examples don't have time to find right now. Then the selected item events are triggered when that button is clicked.

6. Why is populating a drop down list so convoluted - in and language. I was right about the selected item, index etc. Depends what you are trying to set the time to but yes is selected index = something.

7. You can reference the GET and POST method using Request object as you would expect in the code behind.

8. Microsoft automagically gives you handles to all your ASPX tag components in the code behind so if you have and ASPX text box named George you can set the value of it by saying George.Text="Code Monkey"; in the code behind with no other code needed to reference the object.

This was a simple one pager and a great refresher. The principles probably get you pretty far for simple in house apps.

Saturday, March 15, 2014

AWS S3 + Encryption: Protecting the Key

I'm playing around with storing encrypted files on AWS S3. In theory if you encrypt the data yourself before you put it on S3 it doesn't matter who accesses it. They won't be able to read it. This makes some assumptions about cryptography. If you don't buy into the axiom that proper encryption protects your data, then really it isn't safe anywhere except on a box that never is connected to any network ultimately accessible from the Internet. I am not sure how useful that would be in most cases.

As for not putting data on shared infrastructure I find this argument interesting because most companies send data across the Internet over shared infrastructure such as Frame Relay or MPLS. All their data is flowing over equipment shared with the whole Internet. It's encrypted.

Companies can classify their data to determine what they feel comfortable putting into the cloud. Once you know what you want to put there, bottom line is you need to encrypt in transit and at rest.

There are things that make running applications in the cloud trickier than data storage. In memory data is a challenge (think Target) and there are legal forensic issues but cloud providers can address these upon request. This is a topic I have researched as part of SANS Master of Information Security Engineering program - still inquiring about the details with vendors and research is still a work in progress.

CipherCloud has a solution that makes sense but I am still researching where and how keys are stored. There are also legal issues when it comes to where the data is stored, as noted in this article:

http://mobile.blogs.wsj.com/cio/2013/06/21/encryption-vendors-compliance-claim-questioned/

Some issues have also been raised about CipherCloud encryption techniques used in this blog post. I can't speak to the accuracy of this because I haven't used the product but I am aware that encrypting data points in such a way the semantics can help decipher the content is a problem.

http://crypto.stackexchange.com/questions/3645/how-is-ciphercloud-doing-homomorphic-encryption

According to this article some improvements were made, but again cannot speak to whether the problems were all solved.

http://www.techworld.com.au/article/528997/ciphercloud_adds_more_randomness/

But then encrypting your data is probably better than no encryption at all - unless it was a scenario where you are letting a Trojan horse into your environment. I'm sure whomever is using these solutions has considered all of that. I think what I am trying to do is much more simple.

If you are storing files encrypted in entirety before they hit the AWS network and not giving any third party your keys, this seems like a good candidate for a trial application on AWS.

Of course you'll also need to consider compliance issues. AWS has the greatest number of compliance certifications so likely they can help meet that requirement in a more cost effective manner.

As I posted on Twitter a while back (I thought it up all on my own but since have heard others say similar things):

Encrypting data and storing the key in plain sight on the same box is like locking your front door and hanging the key on the door knob.

Found this article on general rules for crytopgraphy and key management:

Cryptographic storage cheat sheet.

https://www.owasp.org/index.php/Cryptographic_Storage_Cheat_Sheet

Guide to cryptography:

https://www.owasp.org/index.php/Guide_to_Cryptography

Exploring latest research (and studying material from my last SANS course on cryptography) and will add more later.

Thursday, March 13, 2014

Estimating AWS Costs Using AWS Calculator

Tips for estimating AWS costs

Amazon Calculator:
Amazon Calculator

Gather requirements
Map requirements to services
Right-size service choices
Evaluate pricing-model options
Use the Simple Monthly Calculator
Deliver *estimate* (educated guess)

Gather requirements
- Hardest part
- Location
- Duration
- Availability
- Operating system
- Prioritize Compute and Storage

EC2 will be the majority of the cost

Choose Instance type
- Start with memory requirements and architecture type (32 / 64 bit)
- then choose virtual cores required
- then select between alternatives based on use-case

Reserved instances may be cost effective but pay for the life of the instance even if you cancel it.
- if rates go down you'll get the lower rate going forward, but no refund on prior usage

Spot instances are good for map reduce jobs - bid for an instance, pay a lower cost, but instances can be terminated at any time.

Look for peak IOPS storage requirements for various components of solution and size EBS pIOPS and choose EBS-Optimized instances accordingly. Look at logs for SAN, database usage, etc to determine usage.

- based on specific work load
- database will have different IOPS requirements at different times

Pick the right region -- prices vary.

Detailed monitoring is every 1 minute instead of every 5 minutes. Costs more.

Data transfer in is always free. Data transfer out is not. May be able to estimate data transfer out looking at firewalls.

Storage on instance is ephemeral - will go away with your instance. Storage on EBS volume will remain if stop instance so you'll want EBS storage in addition to instance storage for most applications.

When selecting an instance take a look at dives - SSD vs other and IO - High, Medium, etc.

Competitive Analysis - AWS

AWS rated highest in Gartner magic quadrant in both ability to deliver and completeness of vision.

This analysis includes Google, Azure, RackSpace, SoftLayer, vCHS and is based on a third party presentation. I cannot speak authoritatively on the accuracy of the statements below.

IAAS vs PAAS - latter has some complications in the realm of gathering forensic data.

AWS has been around 8 years - other services much less.

Http://aws.amazon.com/choosing-a-cloud-platform/

Amazon compared to Microsoft (Azure), RackSpace, vCHS, Google, SoftLayer:

AWS - Broadest Global footprint

AWS - More scalable than RackSpace, SoftLayer, vCHS

AWS has most security certifications.

AWS offers the most Instance types for specific app needs: high IO, memory, compute power, etc.

Pricing model - no sub hour but spot, reserved instances.

AWS does not have live migration - Google, vCHS do if you feel that is important.

DevOps -yes - google, Azure, SoftLayer - no

AWS offers hosted desktop - workspaces.

Hadoop - AWS has EMR (elastic map reduce). Azure offers HD Insight.

Benchmarks need to choose apples to apples instance types. Some benchmark papers aren't highest quality.

Pricing: most competitors are more except Google which has lower price in some areas.

Edit 3/28/2014 Scratch that. Google cut their prices in 1/3. Amazon turned around and dropped some of their per hour pricing below Google's. However Google still offers per minute pricing and a cap if instances are on all the time which Amazon's pricing model cannot quickly match. But we'll have to keep an eye on that based on discussion I had with someone today - so that could change by the time you read this. Here's the pricing at this time:

http://aws.typepad.com/aws/2014/03/aws-price-reduction-42-ec2-s3-rds-elasticache-and-elastic-mapreduce.html

Security is the number one concern companies have when considering using the cloud. Amazon is the largest online retailer and understands securing financial transactions.

Ability to innovate:

Try out scripting and spinning up environments on each platform.

Security is the number one barrier to entry for enterprises.

Compare security processes.

Considerations:

https://s3-us-west-2.amazonaws.com/i.radicalsoftware.com/presentations/evaluating_cloud_security.pptx

Prioritizing Projects for AWS

List assets

Determine dependencies

Classify your IT assets

- top secret, secret, public

- low, medium, high compliance

- internal, partner, customer facing

- low medium, high, coupling

- strict vs relaxed licensing

StackRank your assets

- underutilized it assets

- immediate need to scale

- running out of capacity

- easiest to move

- build support, creates awareness and excitement

Build proof of concept - great way to demonstrate value of AWS

AWS - Total Cost of Ownership (TCO)

Consider Total Cost of Ownership before moving to AWS. Without a TCO baseline for the organization cannot adequately compare and contrast AWS infrastructure.

Some considerations when comparing TCO of AWS and other models:

Capacity Planning, Utilization hard to predict. Do you buy for peak load?

Need to consider total cost of ownership, not just cost of a single piece of hardware.

Duplicating datacenter in the cloud is not apples to apples comparison.

Need comprehensive analysis across organization.

TCO should be determined over multiple iterations.

Apply cost benefits of automation.

Consider tiered pricing.

TCO includes power, cooling, adminstration, hardware, software maintenance, redundancy.

Time to market - time to get, set up hardware.

Cost of over capacity.

Cost of disappointed or lost customers if not enough capacity.

Thursday, March 06, 2014

Memory

Memory (Stack vs. Heap)

Java stores primitives on the stack, objects on the heap.

Pointers to objects are stored on the stack.

Elements stored on the heap are FIFO (first in first out).

When the stack memory is used up Java throws java.lang.StackOverFlowError

When the heap memory is used up Java throws java.lang.OutOfMemoryError

Recursion can fill up the stack

Variables on the stack are available only to the thread that owns them.

Objects on the heap are visible to all threads.

http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html

http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap

http://www.vogella.com/tutorials/JavaPerformance/article.html

Videos - Algorithms & Data Structures

Java Videos

Some interesting Java videos

Java Heap (Memory)
http://www.youtube.com/watch?v=FLcXf9pO27w

Class Loaders
http://www.youtube.com/watch?v=t8sQw3pGJzM

Effective Java
http://www.youtube.com/watch?v=V1vQf4qyMXg

JVM and Low Latency
http://www.youtube.com/watch?v=CZytwF_y_x8

IOPS

I had an interesting discussion with one of my geek buddies last night. Someone told him to run something against a NAS that resulted in many small input and output operations (think map reduce reads and writes) and had some issues with that ...let's just say performance issues no big company wants to experience.

In the example above the NAS was not designed for a lot of small reads and writes.

This brings up an interesting point when considering performance of systems and determining how to architect the systems and write the software used by a particular piece of hardware. You have to understand the reads and writes your application requires and how the hardware you are running it on will handle it. For example if you are backing up data, you have large writes and minimal reads. A database or map reduce application will probably have a lot of small reads and writes.

Reading and writing data to/from disk is affected by IOPS.

IOPS = Input/Output Operations Per Second, pronounced eye-ops
http://en.wikipedia.org/wiki/IOPS

When you get an EC2 instance on AWS, for example, you can specified the IOPS. Understanding the reads and writes your application is doing will help you provision the correct amount of IOPS.

Understanding the devices in your network you are reading from and writing to will help you use the technology appropriately for a given application requirement.

Tuesday, March 04, 2014

Data Structures

Memory Use of Data Structures - Most to Least

HashSet
-- a wrapper around a HashMap with less functionality
-- default 16 elements
-- default size is 144 bytes
-- 128 bits to wrap hashmap
-- argument - why use because more memory, less functionality

HashMap
-- default size of 16
-- empty size is 128 bytes
-- each object 46 bytes + 36 bytes per entry
-- overhead ~360k (1/3MB to do nothing on creation)
-- not synchronized

Hashtable
-- same as HashMap in terms of memory on creation
-- 11 default entries
-- synchronized by default

LinkedList
--default size is 1
-- default size 48 bytes
-- 24 bytes per entry (no key/hash)

ArrayList
-- default size is 10
-- empty size is 88 bytes
-- for 10,000 item list overehead ~40k

Compresses references reduces amount of ram increase on 64 bit OS over 32 bit.

Speed of Collections

HashMaps offer faster access - you have the key to determine the location - but use more memory

LinkedLists and ArrayLists slower to search but use less memory

LinkedList - insertion and deletion are fast because you simply move the pointers at the point of insertion or deletion rather than moving the whole list. Search is slow.

Arrays are bad for insertion because you have to move all the items after the insertion point +1 on insert.

HashTables offer fast insertion and searching. Limited in size because based on arrays (should avoid resizing). Hard to sort.

Implementations

Binary Tree

O (log n)

a data structure where each node has at most two children
left node is less than right
children less than parent

Good for:
- insert, update, delete, search

Implementation

Three classes:

1. Node (key value pair)
2. Binary Node (extends node with right and left child)
3. Binary Tree

package com.radicalsoftware.datastructure;

public class Node {

    protected int key;
    protected String value;

    public Node(int key, String name){
       this.key = key;
       this.value = name;
    }

    public String getString(){
       return value;
    }

    public int getKey(){
       return key;
    }
}

package com.radicalsoftware.datastructure;

public class BinaryNode extends Node {

    private BinaryNode leftChild;
    private BinaryNode rightChild;

    public BinaryNode(int key, String name){
        super(key, name);
    }

    public void setRightChild(BinaryNode rightChild){
        this.rightChild = rightChild;
    }

    public void setLeftChild(BinaryNode leftChild){
        this.leftChild = leftChild;
    }

    public BinaryNode getRightChild(){
        return rightChild;
    }

    public BinaryNode getLeftChild(){
        return leftChild;
    }

    public String toString(){
        return key + " " + value;
    }

    public boolean hasChildren(){
        return leftChild != null && rightChild != null;
    }
}

public class BinaryTree {

    private static BinaryNode root;

    public static void main(String[] args) {
        BinaryTree bt = new BinaryTree();
        bt.addNode(4, "Mickey");
        bt.addNode(2, "Goofy");
        bt.addNode(3, "Minnie");
        bt.addNode(1, "Donald");

        BinaryNode node = root;

        System.out.println("*******************");
        System.out.println("in order traversal");
        inOrderTraversal(node);

        System.out.println("*******************");
        System.out.println("pre order traversal");
        preOrderTraversal(node);

        System.out.println("*******************");
        System.out.println("post order traversal");
        postOrderTraversal(node);

        System.out.println("*******************");
        System.out.println("find 3");
        System.out.print(find(3));
    }

    public static BinaryNode find(int key){

        BinaryNode searchNode = root;

        while(searchNode.key != key){

            if (key < searchNode.key){
                searchNode = searchNode.getLeftChild();
            }
            else{
                searchNode = searchNode.getRightChild();
            }

            if (searchNode == null) return null;
        }

        return searchNode;
    }
    public static void inOrderTraversal(BinaryNode node){

        if (node != null){
            inOrderTraversal(node.getLeftChild());
            System.out.println(node);
            inOrderTraversal(node.getRightChild());
        }

    }

    public static void preOrderTraversal(BinaryNode node){

        if (node != null){
            System.out.println(node);
            preOrderTraversal(node.getLeftChild());
            preOrderTraversal(node.getRightChild());
        }

    }

    public static void postOrderTraversal(BinaryNode node){

        if (node != null){
            postOrderTraversal(node.getLeftChild());
            postOrderTraversal(node.getRightChild());
            System.out.println(node);
        }
    }

    public BinaryNode getRoot(){
        return root;
    }

    public void addNode(int key, String name){

        BinaryNode newNode = new BinaryNode(key, name);

        if (root == null){
            root = newNode;
        }else{

            BinaryNode parent = root;

            while(true){


                if (key < parent.getKey()){
                    if (parent.getLeftChild() == null){
                        parent.setLeftChild(newNode);
                        break;
                    }
                    else{
                        parent = parent.getLeftChild();
                    }
                }else{
                    if (parent.getRightChild() == null){
                        parent.setRightChild(newNode);
                        break;
                    }else{
                        parent = parent.getRightChild();
                    }

                }
            }
        }
    }
}

Hash Table
- key values are assigned using a hash function
- hash determines best index
- IDs used to generate hash must be unique
- hashes must be unique (duplicate hashes when generating called collisions and can affect performance)
- hash must be small enough to fit within the bounds of the array
- hashes make it possible to find the value in the array using a calculation vs. search which means it takes one operation to access the data vs having to go through every element in the worst case search

Avoiding to collisions
- To avoid collisions you want your table storing hashes to be at least twice as big as size of array if you are using a mod has function
- using prime number for array size helps minimize collisions
- also avoid clustering by using a double hash function

To resize
- store values in current array and empty spaces
- get prime number bigger than array size
- use a hash function to fill new array with new hashes

To hash strings
hash value starts at 0
for each character hash = (hash * 27 + charCode) % arraySize

Separate chaining
- if there is already a value in a spot in the array add it to a list instead of searching for an open spot

Sample Code

Classes:
1. ArrayTools - some toools for arrays such as remove spaces & print elements
2. RadMath - some math functions
3. Hash - some hash functions (note string hash not used below)
4. HashTable - assumes all elements added to array are Strings that convert to unique integers

package com.radicalsoftware.datastructure;

import com.radicalsoftware.util.ArrayTools;
import com.radicalsoftware.util.Hash;
import com.radicalsoftware.util.RadMath;

//This class assumes the values are Strings that can be convereted
//to integers
public class HashTable {

    private String[] array;

    HashTable(int size){
       array = new String[size];
    }

    /*************************************************
    * Methods to fill hash table
    *************************************************/

    //the hash calculation = parse integer from String value
    public void fillTableHash1(String[] strings){

       for (int n = 0; n < strings.length; n++){
           String s = strings[n];
           int hash = Hash.keyToInt(s);
           array[hash] = s;
       }

    }

    public void fillTableHash2ShowCollisions(String[] strings){

       for (int i = 0; i < strings.length; i++){

           String s = strings[i];
           int hash = Hash.modHashShowCollisions(s, array);
           System.out.println("hash: " + hash);
           array[hash] = s;
       }

    }

    //hash 2, don't show collisions
    public void fillTableHash2(String[] strings){

       for (int i = 0; i < strings.length; i++){

           String s = strings[i];
           int hash = Hash.modHash(s, array);
           array[hash] = s;
       }

    }

    public void fillTableDoubleHashShowCollisions(String[] strings){

       for (int i = 0; i < strings.length; i++){

           String s = strings[i];
           int hash = Hash.doubleHash(s, array, 5);
           System.out.println("hash: " + hash);
           array[hash] = s;
       }

    }

    /*************************************************
    * Find value in hash table
    *************************************************/

    public int find(String key){

       int hash = Hash.modHashBase(key, array);
       int count = 1;

       //Exit on null or checked entire array
       while (this.array[hash] != null &&
               count != this.array.length){

           if (this.array[hash].equals(key))
               return hash;

           hash++;
           hash %= this.array.length;
           count ++;
       }

       return -1;

    }

    public int findDoubleHash(String key){

       int hash = Hash.doubleHashBase(key, array, 5);
       int count = 1;

       //Exit on null or checked entire array
       while (   this.array[hash] != null &&
               count != this.array.length){

           if (this.array[hash].equals(key))
               return hash;

           hash++;
           hash %= this.array.length;
           count ++;
       }

       return -1;

    }


    /*************************************************
    * Resize HashTable
    *************************************************/
    public void resize(int size){

       ArrayTools.removeSpacesInArray(array);
        size = RadMath.getNextPrime(size);
        String[] newArray = new String[size];

        for(int i = 0; i < array.length; i++){

            if(array[i] == null){
                array = newArray;
                return;
            }

            String s= array[i];
            int hash = Hash.modHash(s, newArray);

            while(newArray[hash]!= null){
                hash++;
                hash %= newArray.length;
            }

            newArray[hash] = s;
        }

    }

    /*************************************************
    * Print hash table
    *************************************************/
    public void print(){
       ArrayTools.print(array);
    }

}

package com.radicalsoftware.util;

public class ArrayTools {

    //remove spaces in original array
    //could also create an array list and copy
    //everything over but this uses the existing
    //allocated array
    public static void removeSpacesInArray(String[] array){

        int pointer = 0;

        for(int i = 0; i < array.length; i++){

            if (array[i] != null){
                if (i > pointer){
                    array[pointer] = array[i];
                    array[i] = null;

                    while (array[pointer] != null
                            && pointer < array.length){
                        pointer ++;
                        pointer %= array.length;
                    }
                }
            }
        }
    }


    public static void print(String[] array){

        for (int i = 0; i < array.length; i++){

            System.out.print(i);
            System.out.print(" ");
            if(array[i]!= null){
                System.out.print(array[i]);
            }

            System.out.print(System.lineSeparator());

        }

        System.out.println(System.lineSeparator());

    }
}

package com.radicalsoftware.util;

public class Hash {


    /*************************************************
    * Hash calculations
    *************************************************/

    //calculate hash by converting to int
    //assumes all input strings for table are unique
    //and only up to size of array
    public static int keyToInt(String s){
       return Integer.parseInt(s);
    }


    //let's say we have 900 potential values but only ever store
    //up to 25 values - it is not efficient to create a 900 item array.
    //make the array size of number of values we need to store and hash
    //on up to 900 values in a way that won't cause collisions
    //Solution: use mod size of array

    public static int modHashBase(String s, String[] array){
       int hash = Integer.parseInt(s) % array.length;

       return hash;
    }
    public static int modHash(String s, String[] array){

       int hash = modHashBase(s, array);

       while(array[hash]!= null){
           System.out.print("collision: array index (" + hash + ") has a value.");

           hash++;
           //if end of array start over at 0
           hash %= array.length;

           System.out.println(" Try: " + hash);
       }

       return hash;
    }



    public static int modHashShowCollisions(String s, String[] array){

       int hash = Integer.parseInt(s) % array.length;

       //if collision add 1 to hash until no more collisions
       while(array[hash]!= null){
           System.out.print("collision: array index (" + hash + ") has a value.");

           hash++;
           //if end of array start over at 0
           hash %= array.length;

           System.out.println(" Try: " + hash);
       }

       return hash;
    }

    //avoid clusters
    public static int doubleHashBase(String s, String[] hashArray, int step){
       int val = Integer.parseInt(s);
       int hash = val % hashArray.length;
       int stepDistance = step -(val % step);
       hash = hash + stepDistance;
       hash %= hashArray.length;
       return hash;
    }

    public static int doubleHash(String s, String[] array, int step){

       int hash = doubleHashBase(s, array, step);

       while(array[hash]!= null){
           hash ++;
           hash %= array.length;
       }

       return hash;
    }

    public static int hashString(String s, String[] array){
       int hash = 0;

       for (int i = 0; i < s.length(); i++){
           int charCode = s.charAt(i) - 96;
           hash += hashCharacter(charCode, hash, array.length);
       }

       return hash;
    }

    public static int hashCharacter(int charCode, int hash, int arrayLength){

       return (hash * 27 + charCode) % arrayLength;
    }
}

Graph

O(V+E)

- collection of verticies and edges
- undirected - no directions on edges, pairs are unordered
- directed - edges associated with directions - edges are ordered pairs
- two vertices are adjacent if connected by an edge
- edge is incident to a vertex if the vertex is one of the two vertices that the edge connects
- degree is number of edges
- directed graphs - in degree (edges in), out degree (edges out)
- path - sequence of vertices between to vertexes connected by edges
- length of path -- last node is length minus one so in the case of three nodes the first node is 0, second is 1, 3rd is 2 (length -1) so length = 3
- cycle - path that begins and ends at the same point
- simple path - path with no repeating vertices
- simple cycle - simple path plus the original point
- connectedness - connected if path between every single pair of verticies
- complete - an edge connecting every pair of vertices, not just a path
- adjacency list - better for less complete
- adjacency matrix - better for fuller graphs
- diameter of graph - from start to final state you are seeking (i.e. like solving a 2 x 2 Rubik's cube)

Adjacency Lists:
Used to represent a graph
Array - each element is a pointer to a linked list
Array is indexed by a vertex
Index arrays by verticies
Your array could be a hash table
For each vertex store it's neighbors

Array[a] = {b}
Array[b] = {a, c}
Array[c] = {a, b}

Undirected uses parentheses (I think)

So that would mean vertex a connects to b, b connects to a and c, c connects to a and b

http://www.youtube.com/watch?v=vfCo5A4HGKc
http://www.youtube.com/watch?v=s-CYnVz-uh4

Other Data Structures

Tries

Stack

Queue

Vector/ArrayLists

Array
Linked List

StringBuffer
default size is 16 characters
empty size is 72 bytes

Monday, March 03, 2014

Videos - Discrete Mathmematics and Linear Algebra

Discrete mathematics

Arsdigita 02 (Discrete Mathematics) Lecture 1/20
http://www.youtube.com/watch?v=h_9WjWENWV8

Arsdigita 02 (Discrete Mathematics) Lecture 2/20
http://www.youtube.com/watch?v=BrDto0heyaQ

Arsdigita 02 (Discrete Mathematics) Lecture 3/20
http://www.youtube.com/watch?v=t3XdRbPNtdg

Arsdigita 02 (Discrete Mathematics) Lecture 4/20
http://www.youtube.com/watch?v=J_vXRNMu1yQ

Lecture 5 not found

Arsdigita 02 (Discrete Mathematics) Lecture 6/20
http://www.youtube.com/watch?v=_FG9hhiZipo

Arsdigita 02 (Discrete Mathematics) Lecture 7/20
http://www.youtube.com/watch?v=uBKb1I70PhA

Arsdigita 02 (Discrete Mathematics) Lecture 8/20
http://www.youtube.com/watch?v=iT_Scp0hWWU

Arsdigita 02 (Discrete Mathematics) Lecture 9/20
http://www.youtube.com/watch?v=9C_ZD8Bqyjk

Arsdigita 02 (Discrete Mathematics) Lecture 10/20
http://www.youtube.com/watch?v=ZJqFHvBQthI

Arsdigita 02 (Discrete Mathematics) Lecture 11/20
http://www.youtube.com/watch?v=MLGTi0ewfBE

Arsdigita 02 (Discrete Mathematics) Lecture 12/20
http://www.youtube.com/watch?v=vWo892ut5Qo

Arsdigita 02 (Discrete Mathematics) Lecture 13/20
http://www.youtube.com/watch?v=0x_b_zPHy3c

Arsdigita 02 (Discrete Mathematics) Lecture 14/20
http://www.youtube.com/watch?v=ikRtC0FTv1k

Arsdigita 02 (Discrete Mathematics) Lecture 15/20
http://www.youtube.com/watch?v=oDEgQTxEjJ0

Arsdigita 02 (Discrete Mathematics) Lecture 16/20
http://www.youtube.com/watch?v=LzD4MqgGBJE

Arsdigita 02 (Discrete Mathematics) Lecture 17/20
http://www.youtube.com/watch?v=5gqQatmQvIc

Arsdigita 02 (Discrete Mathematics) Lecture 18/20
http://www.youtube.com/watch?v=WOZacrs5vT0

Arsdigita 02 (Discrete Mathematics) Lecture 19/20http://www.youtube.com/watch?v=z5vSAC9i3yo

Arsdigita 02 (Discrete Mathematics) Lecture 20/20
http://www.youtube.com/watch?v=Nt7JeGCZL5M

Linear Algebra

MIT - Linear Algebra 18.06

http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/

Sorting Algorithms

Some Notes on Sorting Algorithms...

Big O Complexity Cheat Sheet
http://bigocheatsheet.com/

Comparator Interface

When comparing two objects we'll be using the Comparator interface:
http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html

Comparator takes two arguments: left and right. Implementation is typically:

left < right = -1
left > right = 1
left = right = 0

String implements the Comparable interface and the examples below use Strings.
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html

So you can compare strings like this:

String s1 = "A";
String s2 = "B";
s1.compareTo(s2);

A < B = -1
A > B = 1
A = B = 0

This could be more generic than using Strings but using a particular data type makes the examples shorter for the moment.

Terminology

Stability: When two items with the same value stay in the same relative position after sort

In Place: Whether or not algorithm uses separate data structure to sort

Bubble Sort

Implementation:

- start at the beginning of the list
- compare the first and second items
- if the first is bigger than the second, swap
- repeat with each subsequent pair until end of list
- biggest item will be at the bottom
- start again at the beginning of the list and skip the last item as already sorted
- continue starting over at beginning of list and skipping already sorted items until all items sorted

Sample Code:

package com.radicalsoftware.example.sort;

import java.util.ArrayList;
import java.util.List;

public class BubbleSort {

   public static void main(String[] args) {

       ArrayList<String> list = new ArrayList<String>();
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
       sort(list);

       for (String s: list){
           System.out.println(s);
       }

   }

   public static List<String> sort(List<String> list){

       assert list != null : "list cannot be null";

       int size = list.size();

       for (int pass = 1; pass < size; ++pass){
           for (int left = 0; left < (size - pass); ++left){
               int right = left + 1;
               if (list.get(left).compareTo(list.get(right)) > 0){
                   swap(list, left, right);
               }
           }
       }

       return list;

   }

   private static <T> void swap(List<T> list, int left, int right){
       T obj = list.get(left);
       list.set(left, list.get(right));
       list.set(right, obj);
   }


}

Selection Sort

- similar to sorting books on a book shelf
- unstable
- for each spot in the list
- scan all items in list to find the smallest (or largest) item
- put the smallest (or largest) item in that spot
- move to the next spot and repeat skipping already sorted items

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class SelectionSort {

    public static void main(String[] args) {

        ArrayList<String> list = new ArrayList<String>();
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
        sort(list);

        for (String s: list){
            System.out.println(s);
        }


    }

    public static List<String> sort(List<String> list){

        assert list != null : "list cannot be null";

        int size = list.size();

        for (int slot = 0; slot < size-1; ++slot){

            int smallest = slot;

            //find the smallest in remaining items
            for (int checkSlot = slot + 1; checkSlot < size; ++checkSlot){
                if (list.get(slot).compareTo(list.get(checkSlot)) > 0){
                    smallest = checkSlot;
                    break;
                }
            }

            //if smallest does not equal current slot then swap
            if (smallest != slot)
                swap(list, smallest, slot);
        }

        return list;

    }

    private static <T> void swap(List<T> list, int left, int right){

        T obj = list.get(left);
        list.set(left, list.get(right));
        list.set(right, obj);

    }

}

Insertion Sort

- similar to sorting cards where you sort all the numbers within each suit
- create a new list and insert items into it
- use an iterator instead of an index
- insert the first item into the list
- use linked list so can scan and insert each item into correct relative location

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

public class InsertionSort {

    public static void main(String[] args) {

       ArrayList<String> list = new ArrayList<String>();

       list.add("Zero");
       list.add("Alpha");
       list.add("Cat");
       list.add("Boy");
       list.add("Dog");
       list.add("Rose");
       list.add("Lion");
       list.add("Fire");
       list.add("Dragon");
       list.add("Math");
       list.add("Logic");

       List<String> sortedList = sort(list);

       for (String s: sortedList){
           System.out.println(s);
       }


    }

    public static List<String> sort(List<String> list){

       assert list != null : "list cannot be null";

       final List<String> sortedList = new LinkedList<String>();

       for (String s: list){

           int slot = sortedList.size();

           while (slot > 0){
               if (s.compareTo(sortedList.get(slot - 1)) >= 0){
                   break;
               }
               --slot;
           }


           sortedList.add(slot, s);
       }


       return sortedList;

    }

}

Shell Sort

- breaks large lists into smaller lists
- based on concept of H-sorting - every Hth item is sorted relative to other items
- not stable

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class ShellSort {

    private static final int[] _increments = {1, 3, 5, 7, 11, 20};

    public static void main(String[] args) {

        ArrayList<String> list = new ArrayList<String>();

        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
        list.add("Money");
        list.add("Food");
        list.add("Stars");
        list.add("Moon");
        sort(list);

        for (String s: list){
            System.out.println(s);
        }


    }

    public static List<String> sort(List<String> list){

        assert list != null : "list cannot be null";

        for (int i = 0; i < _increments.length; i++){
            int increment = _increments[i];
            if (list.size() < increment * 2)
                break;
            hSort(list, increment);
        }

        return list;

    }

    private static void hSort(List<String> list, int increment){

        for (int startIndex = 0; startIndex < increment; ++startIndex){
            for (int i = startIndex + increment; i < list.size(); i += increment){

                String value = list.get(i);

                int j;

                for (j = i; j > startIndex; j -= increment){

                    String previousValue = list.get(j - increment);

                    if (value.compareTo(previousValue) >=0)
                        break;

                    list.set(j, previousValue);

                }

                list.set(j, value);

            }
        }
    }

}

Quick Sort

- uses recursion
- does the work first, then the recursion
- divide and conquer, recursively sorting smaller parts of list
- place an item in final position, smaller items left, larger items right
- divided into two parts (not necessarily two halves)
- not stable
- use quick sort until length of list is 10 or less and then use insertion sort

Best case: O(n Log(n)) -- partition element is middle element

Worst Case: O(n²) -- completely unbalanced

Implementation

- Choose partition item (last, mean of first, last and middle, etc.)
- indexes start at beginning and end (less the partition item) and advance towards each other
- left index stops when finds item larger than partitioning item
- right index stops when finds item smaller than partitioning item
- when both stop swap items
- point where indexes meet - insert partitioning item
- then sort the sublists to the right and left of the partitioning item

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class QuickSort {

    public static void main(String[] args) {

       ArrayList<String> list = new ArrayList<String>();
       list.add("Alpha");
       list.add("Zero");
       list.add("Cat");
       list.add("Boy");
       list.add("Dog");
       list.add("Rose");
       list.add("Lion");
       list.add("Fire");
       list.add("Dragon");
       list.add("Math");
       list.add("Logic");
       sort(list);

       for (String s: list){
           System.out.println(s);
       }


    }

    public static void sort(List<String> list){

       assert list != null : "list cannot be null";

       quicksort(list, 0, list.size()-1);

    }

    public static void quicksort(List<String> list, int startIndex, int endIndex){

       if (startIndex < 0 || endIndex >= list.size())
           return;

       if (endIndex <= startIndex)
           return;

       String value = list.get(endIndex);

       int partition = partition(list, value, startIndex, endIndex -1);

       if (list.get(partition).compareTo(value) < 0)
           ++partition;

       if (partition != endIndex)
           swap(list, partition, endIndex);

       //sort right and left of partition
       quicksort(list, startIndex, partition -1);
       quicksort(list, partition + 1, endIndex);

    }

    private static int partition(List<String> list, String value,
           int leftIndex, int rightIndex) {
       int left = leftIndex;
       int right = rightIndex;

       while (left < right){

           if(list.get(left).compareTo(value)< 0){
               ++left;
               continue;
           }

           if(list.get(right).compareTo(value)>=0){
               --right;
               continue;
           }

           swap(list, left, right);
           ++left;
       }

       return left;
    }

    private static void swap(List<String> list, int left, int right){
       String obj = list.get(left);
       list.set(left, list.get(right));
       list.set(right, obj);
    }
}

Merge Sort
- not in place

Implementation:

- divide the list in half recursively until you get to one item arrays
- merge each array together two at a time
-- pointer at the beginning of each two array merging
-- compare elements and put the smaller into a new array
-- move pointer of array you took element from +1
-- repeat for all elements
-- repeat for all arrays

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class MergeSort {

    public static void main(String[] args) {

       ArrayList<String> list = new ArrayList<String>();

       list.add("Zero");
       list.add("Alpha");
       list.add("Cat");
       list.add("Boy");
       list.add("Dog");
       list.add("Rose");
       list.add("Lion");
       list.add("Fire");
       list.add("Dragon");
       list.add("Math");
       list.add("Logic");
       list.add("Salmon");

       List<String> sortedList = sort(list);

       for (String s: sortedList){
           System.out.println(s);
       }
    }

    public static List<String> sort(List<String> list){

       assert list != null : "list cannot be null";

       return mergesort(list, 0, list.size()-1);

    }

    private static List<String> mergesort(List<String> list, int startIndex,
        int endIndex) {



       int middleIndex = (startIndex + endIndex)/2;

       if (startIndex < endIndex){

           List<String> left = mergesort(list, startIndex, middleIndex);
           List<String> right = mergesort(list, middleIndex + 1, endIndex);

           return merge(left, right);

       }

       List<String> mergedList = new ArrayList<String>();
       mergedList.add(list.get(startIndex));
       return mergedList;



    }

    private static List<String> merge(List<String> left, List<String> right) {


       List<String> mergedList = new ArrayList<String>();

       int rightIndex = 0;
       int leftIndex = 0;

       while(leftIndex < left.size() || rightIndex < right.size()){

           boolean rightHasMore = rightIndex < right.size();
           boolean leftHasMore = leftIndex < left.size();

           if (!rightHasMore ||
                (leftHasMore &&
                left.get(leftIndex).compareTo(right.get(rightIndex)) <= 0)){
               mergedList.add(left.get(leftIndex));
               leftIndex++;
           }else{
               if(rightHasMore){
                   mergedList.add(right.get(rightIndex));
                   rightIndex++;
               }
           }


       }

       return mergedList;
    }


}

Heap Sort
- in place (uses less memory than mergesort which has to allocate additional storage)
- comparison based sort
- selection sort family
- slower than well implemented quick sort, but better worst case
- in place
- not stable

Implementation

An array viewed as a nearly complete binary tree (meaning a subsequent level is not started until the previous level is completely filled).

http://www.youtube.com/watch?v=B7hVxCmfPtM

Bucket, counting and radix sort:
http://www.youtube.com/watch?v=woTbZMli6z4

Bucket Sort
- stable sort
- range of keys can't be too big
- linear time

Counting Sort
- stable sort
- range of keys can't be too big
- linear time

Implementation:
- histogram
- indicies
- place

Radix sort
- stable
- only good when
- use counting sort trick on the least significant digit
- not in place
- linear time
- unlimited number of keys
- uses the stability of bucket sort and counting sort

Teri Radichel - Software Programmer and System Architect - Cloud + Security