How To Get Code Deployed

Here are some ideas to help get code deployed at work if you are a developer or QA:

* Decline extraneous meetings

* Make sure the business requirements are complete for the story you are working before you start.

* Focus on writing and testing code

* Ignore distractions and noise like complaints and process discussions that don't get software deployed.

* Avoid being a distraction by complaining or instigating dissent and conflict.

* Stop talking about all the work you aren't going to be able to get done. Sit down and do what you can.

* Lead by example. Do what's right for the company.

* If you have competing priorities, ask the people giving them to you to prioritize them and start from the top. Make it visible to all the priority list you have been given.

* Think about ways to get everyone working in parallel. Make sure you are not the bottleneck and blocking others from delivering value but rather contributing to them doing so successfully.

* Make new stories for scope creep. This is makes the new effort clear to business.

* Before nixing a deployment go over the pros and cons of releasing all or part of the work in a certain state with business and give them the option of releasing or not.

* Focus on must haves for the end result that delivers business value.

* Sit with people when it makes sense and fix problems together

* Collaborate to resolve problems. Focus on the resolution that delivers the most business value, rather than an idealistic scenario or the one that gets people to stop bringing it up if it is truly hurting the company.

* Call out impediments that can be fixed; deal with those that are part of the process creatively, if you can.

* Write unit tests when you can and while QA is testing; Don't be ivory tower about it and block the whole team.

* Have QA write automated tests while waiting for code drops so their testing is faster and repeatable.

* Find ways to seamlessly share data set up scripts for QA automation and developer unit tests to leverage people's work in more places and get more automated tests in place.

* Create system designs that are flexible to change.

* Design new systems that can run in parallel with old systems and can be turned on or off before or after deployment if something goes wrong.

* Structure code with appropriate interfaces and decoupled components so they can be independently unit tested and team members can work in parallel.

* Avoid monolithic designs with 5000 line stored procedures or classes that are a bottleneck to your project.

* Design the system so it can be deployed in pieces by thinking about how to break the system down into independent processes - which could ultimately become distributed services.

* Work towards a horizontally versus vertically scaling architecture, but make sure your management and visibility into all systems rolls up into a single source. Think correlation of logs and Splunk or a service for logging all actions by all systems.

* Break off most problematic bits of complex systems into decoupled, horizontally scaling component intstead of rewriting and deploying a whole new system in one shot.

* Automate everything. Make people's jobs easier and avoid mistakes.

* Avoid technology for technology's sake.

* Beware of making a new technology or set of libraries a dependency until you have done a proof of concept and completely vetted the technology to understand implications, limitations and security risks (published on security and other web sites). Do research. Test. Consider the cost of every project forced to use it going forward, including maintenance and production support. Consider the potential longevity of the technology vs. other options and industry trends so you can find people who want to work for you in the future.

* Be like Amazon. Use services for interfaces between systems.

* Measure and think about what will deliver the most profit to the bottom line. 

* And if you are like me and obsessive about delivering business value, and your schedule is flexible, adjust your schedule so you can work at least a couple hours when there are less people in the office and no meetings. 

* Working from home one day would be really nice but it would be helpful if the team could agree on which day because in office collaboration can be extremely helpful at times to move a project forward.

* Amazon tells employees to be self-critical. I like that. If you are wrong admit you made a mistake rather than obstinately protecting your ego and blocking delivery of maximum business value.

* Considerations for the process as a whole: http://websitenotebook.blogspot.com/2014/04/agile-and-scum-how-to-make-it-work.html

Agile and Scum - How To Make It Work

Agile is about deploying software that delivers business value - faster.

The purpose is not to put points on stories or prioritize backlogs or look at burndowns. Those things support the goal. If they don't you should stop doing them.

The point of scrum is to get the work done that gets you a deliverable that has business value into production in the shortest window possible. (As opposed to a waterfall process that drags on for months before shipping, if anything gets shipped at all).

Your best measure of a scrum team is how often they deploy code, what the value of that code is to the business (ROI on projects) and how much it costs to maintain their software (defects, complexity, error handling). Determine how much time and money the business saves (or pays) after system improvements vs. how much it cost to get them. Savings could be in the form of meeting business regulations so you don't get fined, making people's jobs take 25 less steps, or making information more secure, accurate and timely to get and keep more customers, for example. If you're putting a new gadget on your web site, make sure it's something customers want more than anything else - that will bring you new customers and keep the ones you have. The great thing about web sites is you can test customer response to changes and see what kind of ROI you get. The value of a scrum team is delivering small pieces of functionality in a timely manner so you can measure the response for small changes as opposed to a monolithic project which may sink a lot of money and not deliver the expected returns.

Businesses are about delivering value and making money.

Ultimately this all flows to your business's profit and loss statement, so make sure your measurements are aligned proportionately with their impact to the bottom line.

Here's how to make agile or scrum work

(or whatever you want to call a technical process that delivers value)

...in my opinion and in the little utopia in my head... but also based on some relevant experience delivering business value. I helped a start up grow from $1,000 per month to a multi-million dollar company delivering customer focused software in frequent iterations in ways that minimized cost.

#1 Measure ROI. Determine which teams are producing the most deliverables with the least problems in regular intervals, as well as the value of the deliverables to the business bottom line. Observe. Emulate those teams. (But don't get in their way and distract them while you are doing it.)

#2 Stop scheduling meetings to talk about agile, process and scrum. Discuss blocking issues and share ideas in the two meetings within the scrum framework which are for this purpose - Retrospective and Sprint Planning. The point of agile is to get work done. If you are having lots of meetings to discuss process, agile and scrum outside of these meetings then my personal view is that your company is not agile or using scrum.

#3 Read the guide on scrum.org. Use what works and throw out the rest. Stop trying to add things that aren't in there unless a team asks for them and are proven to deploy more high quality software faster. I should note that the scrum guide changed from it's original form and seems to have been embellished with potentially extraneous meetings (and confirmed by someone who claims to have helped alter documentation at scrum.org). Focus on what gets software with high ROI shipped. Don't get hung up on symantecs. Also note that a lot of consultants are riding the scrum wave and not helping your ROI. A recent thread on the Seattle Java User Group had a lot of complaints about this. Scrum is simple and that is the point. Consultants make it more complicated to charge you more money so you end up with more meetings and nonsense in scrum meetings that detracts from getting deliverables shipped. My sole job as scrum master was canning meetings that were not productive. That is how I help my team stay on track - getting things out of their way they didn't want to do in the first place.

#4 Scrum is about supporting the technical team. If the technical team says something doesn't work for them stop trying to make them do it. Especially listen to the team members that are delivering the most value to the company, not so much the people who are complaining but not getting anything done -- or the people who say yes to everything and aren't contributing what they could be otherwise. They would probably be happy to contribute more if allowed.

#5 If the team is getting what they want and still not delivering software in a timely manner, either you have a dev lead who doesn't have the experience to design projects for agile deployments (who will learn with time), no leadership or some people blocking the team who are not team players or not good at agile and slowing things down. Perhaps people have personal agendas which are not aligned with business objectives or team success. Make sure you cut through the politics to figure out who is really delivering value. In other words the problem could be the people not the process. Make sure the team can move forward in a way that does not create undue business risk (test and audit everything, have solid team players that know what they are doing and can support those that don't but can learn) and remove impediments - which may mean removing people or adjusting teams to find a good mix of people who work well together and can deliver.

#6 Yes. You need leadership. The leader for implementation of scrum projects should be a technical person if you want to get things done in a timely manner and not have things blow up in production. But your leader can't be a person with a solely technical mindset either. You need to find leaders who understand the value the project is delivering and aligns the technical implementation and cost with the business need. How do you find these leaders? Measure. See #1. Resolve the people issues #5. Some people think there should be no leader and that undermines the person who is the lead. That impedes the overall potential success of the team because it leads to lack of vision and competing objectives. Give your team a leader and back that leader up. If the leader is a good leader, they will not dictate except where there is a potentially dangerous risk or cost or lost opportunity. At that point the leader will press to do what is right for the business and with support the business will win. You can measure your leaders the same way you measure your teams and deliverables - not based on a political campaign, but who delivers value that translates to the bottom line.

#7 Unless you do not care about the cost of the project or the risk to the business, I disagree with letting anyone on the team do whatever they want. If you don't care about the cost of the project or a flawed system go for it. The project can take an eternity if your budget is open ended. Everyone will do their best and you may have to change the code six times but whatever. It may cost ten times more to support it because of poor error handling - or worse. You could be losing money you don't know about. You'll have to catch that through auditing on the business side. So here is where I diverge from the scrum guide. I've had to work until 11:00 PM for weeks to basically re-write flawed software given to someone that didn't have the skills to write it (a prior job). A good dev lead will understand the capabilities of his or her team members and give them work to let them learn and grow - or partition it into pieces within a framework that gives people opportunity without creating undue risk; In other words, without creating a situation where the business has things blow up in production, do costly rewrites, or implement something flawed such as a financial system that doesn't reconcile or a multi-threaded application with a tricky random bug. There are ways to safely structure projects and code and then there's chaos. And yes everyone makes mistakes but keeping them to a manageable amount is better, right? There's also a way to focus on testing instead of dictating - set up tests to determine if each team member's code works or not before you put it in production, rather than dictate how they do things - but if it's a free-for-all those tests are disregarded and bypassed because people can just do whatever they want.

#8. If someone has a new idea try it. Measure it. Test it. With one team. That is open to the idea or is having problems. Don't schedule meetings and training with all teams across the whole company for an unproven concept. Don't force teams that don't want to do it participate if they are functional, happy teams. Every team is different. Let each team do what works as long as within company requirements and getting things done. Let the right team test it - one that can make sure the implementation is efficient and easy to use as possible - if you plan to force it on all teams.  

#9. Continually trying to "improve" processes that are not broken and "solve" problems we don't have adds stress for teams that are trying to focus on getting their work done. Leave functional teams alone. If there is a problem, prove it with measurements to show which teams are delivering the most value to the business - and take into consideration which teams get which projects and why (such as skill or politics). Just because a team is delivering a lot of boring projects doesn't mean they aren't producing a lot of business value. Consider that if you don't hear about a team they are probably doing good work and that's why. But check the stats.

#10. Stop scheduling meetings FOR teams. Teams should be able to tell you when they need a meeting whether for current or future work. Meetings to help non-technical people figure out what the team is doing are the worst. Stop creating separate meetings for projects and work shops and scrum training. There are a handful of meetings in scrum and the team should choose when to have them, not the project manager, backlog owner or scum master - depending on where they are in the process of getting deliverables completed for the current sprint. Because that is the point - delivering software. Let people get work done. Ask the team if they need or want the meeting before you schedule it if the business needs a meeting for some reason. If you have a team of yes-people, check in your scrum tool to see where deliverables for the current sprint are at before scheduling it. Meetings with too many or the wrong people are also a problem. Having business people in a technical design meeting might be fun for the business people but if they aren't needed it's better to have them off gathering much needed business requirements. Business requirement gathering meetings might be interesting to team members but they need to be getting software deployed. Bring everyone together on the business and technical side when there is a true need - such as an early demo of the software to discuss design and requirements where the two sides merge. The team will also let you know when they are ready for more work or stories - and that is a bigger problem - and my next point.

#11. Business people should be focusing on business requirements. Technical teams should focus on implementing them. Define the business rules (not how the screens look or technical implementation) before taking up the team's time. The technical team can't tell the backlog owner what the business rules are - like we only run this process on Tuesdays and it needs to happen before 5 p.m. The backlog owner, analyst, project manager, etc are not the ones who design and build the technical implementation. Stop scheduling meetings for new projects when the business hasn't defined the rules and compliance hasn't signed off on them. Stop writing stories with technical specifications and system design. Do provide the team with the current end to end business process as it exists today before you start showing us the changes - and the changes as business requirements should be to process, not systems. The requirements are from the point of view of a non-technical person - what they see and interact with to support their work flow.  Consider User Experience, not just User Interface. At first I was skeptical of UX thinking it was just another buzzword, but now I am fan. Consider a person trying to transfer money into a brokerage account. They enter their information on a web site. Done right? Not even close. The experience is from the point of the person entering the data to the point it all assets - which may trickle in at different times - and cost basis which may also come later, shows up in the customer's account. That involves the experience of the people that support all of that behind the scenes in both customer service, business operations and possibly third party vendors and contra brokers. And by the way don't forget ALL the scenarios, like the one where the request to transfer goes off to the contra broker and into the ether and never gets completed. If you don't understand scenarios and use cases you are making projects cost more to the business. There are many articles online about these things. The acceptance criteria is what the system users will be able to do (not how or in which system). Think about the value to the user and the business. Define the business process in non-technical terms. The technical team will translate that into a system to support the process.

#12. Use tools appropriately to keep team focused on goal. Some people say people over tools. However if you have lightweight tools it's easy to get a grasp of where a project is at without disrupting the team with extraneous meeting time. Version One is easy to use -- If you keep it simple. People try to over complicate this as well. I'm not a fan of getting everyone in a room to write on pieces of paper or talking about work in general, non-quantifiable terms. And PLEASE don't make us play games and color with markers or do other such goofy things. Project managers like this kind of thing. Technical people hate it. (At least the technical people on my team do.)

Here's how we use Version One:

Two days before upcoming sprint (no meeting!) Dev and QA lead talk team members and divvy out stories. That takes 5-10 minutes. Each person tasks out their stories - at their desk. Assignments can change but it's a starting point. Technical people can equate this to a multi-threaded or distributed process getting things done in parallel vs. a single threaded bottleneck with 10 people in a meeting room. Which will get the job done faster?

On day before or first day of sprint we have a one hour Sprint Planning meeting to review the plan. People can suggest additional tasks and they are added on the fly. Stories can be reassigned or removed from sprint depending on capacity. The other benefit of this is it allows people to have time to think about the tasks between the entering and review. We also hijacked a non-existent field for drop date because QA needs enough time before end of sprint to test if we are going to commit. We fill in release dates if not present for things worked this sprint. There is no need to discuss any of this planning for the upcoming or future sprints in the middle of a sprint unless you are pointing stories (i.e. getting work done). It is a distraction and hurts the team's ability to meet sprint commitments. And it is not scrum. Planning happens in Sprint Planning and Retrospective is for discussing and adjusting your process if needed. I've noticed another long term meeting for planning was added to the scrum guide - if you must have that meeting do it in conjunction with Sprint Planning and don't disrupt the team in the middle of sprint when focused on completing deliverables and put deployment items at risk.

Throughout the sprint the team updates Version One. Story and task status (in progress, done, etc) and hours left. The team also enters release dates and can enter dependencies once they are deep enough into the code to have a technical deployment strategy. Questions about what is in Version One can be asked in stand up if not too disruptive, or addressed to the person the question is for at his or her desk rather than taking up the entire team's time.

Poker Pointing occurs throughout the sprint and we plug story points into Version One - but with the following general rules:

- In my utopia, as has been the case for some projects, we do all our estimating in poker pointing, not in multiple separate meetings for the same purpose. Do it once - it's just a guess anyway and your estimates way in advance will be completely wrong because typically the business hasn't figured out exactly what they want or gotten sign off from all the appropriate people (which then leads to re-estimating and worse if code is started - rewriting).

- Sometimes we revisit and re-point stories if they have changed significantly.

- QA is typically blocked waiting for Dev drops in the first half of a four week sprint so poker pointing is best in the second week of the sprint if things are going well, or possibly third if not so well. Perhaps you wait until everything made it through RC because QA is also slammed in the third week and your deployment is at risk. The team may choose to have a few long poker pointing sessions at the end instead - especially if you have stories in a good state and/or enough points for the upcoming sprint. (Refer to the statement at the top of the article.)

- If your team is slammed consider if you have enough points for the next sprint, in which case you might not want to disrupt and cause the current deployment slide for the sake of poker pointing. (Again, refer to the goal at the top of the article).

- We try to make sure the stories are ready to point before poker pointing and that we have enough of them. Because I currently work with the world's greatest analyst this is generally not a problem.  All team members can review stories before poker pointing if you have issues with this or just your leads to make sure stories are ready before dragging everyone into a room and disrupting their work.

Project managers and backlog owners can look in Version One any time to see the status of the project - instead of scheduling a meeting and disrupting the team. More time can be spent doing work instead of talking about work. If the business people want to know about specific stories or blocking issues, come to stand up - don't schedule yet another meeting. Additionally all the information in Version One can roll up into reports for business showing where projects and teams are at and upcoming work. Stop creating separate spreadsheets, reports, emails and busy work. Create automated reports from Version One that gives business the decision points they need to run the business cost-effectively and profitably with minimal work for the team and less disruptive questions to teams trying to get work done. See #13.

#13. Measure. This is so important I will say it twice. Looking at Version One but more importantly what gets delivered each sprint is far more accurate than scheduling a meeting without looking at the facts and hearing the old "we are 90% complete" line. If the team says they will meet commitments and there are 20 hours left in the sprint but 60 hours left in Version One to do you can avoid a meeting which is only going to make it worse, and address the issue in retrospective and Sprint Planning. Measure production incidents, business value in terms of cost savings, increased income and profitability, new customers, and whether more work is getting done with new systems (as opposed to making things more cumbersome). Some systems create more work but deliver value by increasing assets, revenue, or reducing risk. You need someone who can measure appropriately - not just throw out numbers

#14. The project needs to end. Agile doesn't lend itself to this concept very well without proper planning. This can be a downfall for agile when people don't understand how to make it work. You can think of agile projects as time-boxed. You have a list of things to get done and a budget. You'll get as many things on the list done as possible within that budget. The more up front planning you do and the more time you give your team to focus on deliverables, the more items in the list will get done. You don't have to have every screen designed at the start if the design is flexible, but you need your high priority business rules and processes documented and signed off on by decision makers and compliance before you start. New requirements = increased scope and cost in most cases. Incomplete decisions = project delays and increased cost. You need someone on the team that understands how to prioritize deployments to deliver business value early on and in a manner in which the project can be cut off at any point and not be a total loss. If you have high priority deliverables make sure they get deployed early and leave the nice to haves at the end. The other nice thing about deploying possibly incomplete but functional initial deployments is the ability to minimize risk (run old and new components in parallel or release non customer facing side first) and to get feedback before the targeted end of the project so you have time for changes within the project window.

#15. Remember why we have jobs. Ultimately the measurement of our work boils down to financial statements - profit and loss. If the business is doing well, we keep our jobs. People who drag entire teams into meetings to talk about work instead of doing work aren't focused on improving the bottom line. People who are constantly hopping on the latest buzzword bandwagon but don't understand the cost and value to business aren't really going to help your company in the long run. People on teams are happiest when they are productive and add can value to their full potential. Some have more potential than others perhaps, but everyone can be an important contributor to the overall workings of a company if they focus on the right things. They want to help your business grow. Let them. It will help the bottom line and lead to a more successful company - if you have the right people. The right people understand why we have jobs - because we are paid by a business that needs to make money and be profitable to do so. So I'm back where I started.


Agile is about deploying software that delivers business value - faster.


So get out of that meeting, walk to someone's desk to get your question answered, and get to work!

Java 8 Crash Course Notes

A summary of new things in Java 8. 

This is a little rough. Check the rest of the links on my Twitter account for more in depth articles.

Lambda
() ->

Method reference
::
Allows retrofitting existing method in as a lambda argument.

Interfaces
- allows adding default methods to interface
- functional interfaces

Types of lambdas:

Predicate

Consumer
- performs function on argument passed into it

Suppliers
- provide things

New forEach:

Collections now have forEach method

coll.forEach((blah)
      if(ship .... Etc
)

c.forEach((x)
   x::method 
)

streams / parallelStreams
- have to synchronize collection to use parallelStream - old way is faster - if done incorrectly.  Correct way is to use new Collectors object. e.g. Collectors.toList(), Collectors.toSet()

Variables within a function are thread safe - supports parallel execution. Supports distribution of load across multiple cores.

JVM in Java 8 can run JavaScript. Can execute JavaScript from command line http://www.takipiblog.com/2014/02/10/java-8-compiling-lambda-expressions-in-the-new-nashorn-js-engine/


Nashorn to replace Rhino


New Java Time package.

LocalDate
.toEpochDay()
.plusDays(2)

Human readable time:
d.getYear()
d.getMonth()

Clock
c = Clock.systemUTC()
Timezone
Calculations
Time Gap

Optional
- optional gets you away from nulls
- deals with nullable return values

IDE Support
- NetBeans
- IntelliJ

Code coverage:
Jacoco
Most tools working
FindBugs fails












Brother Printer Won't Print - Wireless Network

For future reference, because I always forget, if you are ever trying to figure out the IP address of your brother printer if and when it changes and your machines won't connect here's how:

Go to your printer and choose MENU- PRINT REPORTS - NETWORK CONFIG

http://www.brother-usa.com/FAQs/Solution.aspx?FAQID=200000039376&Model=1818&ProductID=MFCJ825DW&Keyword=#.Uy9NHD_n_IU

Get the IP address, go to control panel in Windows, edit the printer properties and you can hard code the correct IP address in the TCP/IP port assigned to the printer.

What is also curious when I print this report are all the different protocols that are enabled. I had no idea. Might be a good idea to turn off the ones you don't need:

TCP/IP
NetBIOS/IP (Most likely want to turn this off if possible)
LegacyAuth
Remote Set Up
Raw Port
Web Services
PC fax
FTP
mDNS
Proxy
IPv6
APIPA
SNMP
LPD
IPP
Network scan
POP3/SMTP
TFTP
LLMNR

Yeah... that's a lot of stuff not sure is all necessary in most cases. Would rather have most turned off by default. Will have to play around with turning on and off and see what happens.

Crypto - Coursera classes

To watch...in my spare time in addition to studying SANS material for grad school program in information security engineering. No rest...

https://class.coursera.org/crypto-preview/lecture

C# + ASPX Web App - 1 week of lunch hours

I spent my lunch hours over the past week helping a friend and co-worker build a one page web application to track software deployments. She knew some C# but not how to build a web site. 

It's been ages since I built .NET web sites. I once built a .NET web site through my business ultimately for State of Washington Department of Printing. That was probably most extensive. It integrated with a Java library (itext) over RPC to generate proof of envelopes and letterhead - generating the bar codes was cool.

But I digress. The rest was C#. Login, product catalog, image management, workflow to create and save new types of envelopes and letterhead and send through multiple levels of approval. Once approved the item was available in the product catalog for diffrent state agencies to order. The site also facilitated these orders.

Then I moved to Java and more customized stuff.

But here are some notes from what I forgot and remembered this week:

1. To generate the code behind a control, create an ASPX control in the HTML. Then you can use intellisense or whatever Microsoft calls to by typing an attribute stating with "On" to see all your event handlers. When you choose one then type = and a name of C# function in double quotes. Visual Studio will ask if you want to generate the code behind function. You can say yes and you'll then find it has been created when you switch from ASPX to the c# code behind view.

2. ASPX tags force you to set the runat attribute. Which is funny because you always have to set the value to "server". I bet it's legacy. 

3. You can create an ASPX tag for a data source. Then you can assign that data source to other ASPX tags. This is good for beginners - I started out trying to do the connection in the code behind. I wonder how they handle connection pooling in this case but studying too many other things to go find out.

4. To get your page to show all the data you hooked up to it you can put Page.DataBind in the code behind. You can also do that for specific controls like DataGrids.

5. To select a row in a GridView you can add a button for that purpose - see online examples don't have time to find right now. Then the selected item events are triggered when that button is clicked.

6. Why is populating a drop down list so convoluted - in and language. I was right about the selected item, index etc. Depends what you are trying to set the time to but yes is selected index = something.

7. You can reference the GET and POST method using Request object as you would expect in the code behind.

8. Microsoft automagically gives you handles to all your ASPX tag components in the code behind so if you have and ASPX text box named George you can set the value of it by saying George.Text="Code Monkey"; in the code behind with no other code needed to reference the object.

This was a simple one pager and a great refresher. The principles probably get you pretty far for simple in house apps.

AWS S3 + Encryption: Protecting the Key

I'm playing around with storing encrypted files on AWS S3. In theory if you encrypt the data yourself before you put it on S3 it doesn't matter who accesses it. They won't be able to read it. This makes some assumptions about cryptography. If you don't buy into the axiom that proper encryption protects your data, then really it isn't safe anywhere except on a box that never is connected to any network ultimately accessible from the Internet. I am not sure how useful that would be in most cases.

As for not putting data on shared infrastructure I find this argument interesting because most companies send data across the Internet over shared infrastructure such as Frame Relay or MPLS. All their data is flowing over equipment shared with the whole Internet. It's encrypted.

Companies can classify their data to determine what they feel comfortable putting into the cloud. Once you know what you want to put there, bottom line is you need to encrypt in transit and at rest. 

There are things that make running applications in the cloud trickier than data storage. In memory data is a challenge (think Target) and there are legal forensic issues but cloud providers can address these upon request. This is a topic I have researched as part of  SANS Master of Information Security Engineering program - still inquiring about the details with vendors and research is still a work in progress.

CipherCloud has a solution that makes sense but I am still researching where and how keys are stored. There are also legal issues when it comes to where the data is stored, as noted in this article:


Some issues have also been raised about CipherCloud encryption techniques used in this blog post. I can't speak to the accuracy of this because I haven't used the product but I am aware that encrypting data points in such a way the semantics can help decipher the content is a problem. 


According to this article some improvements were made, but again cannot speak to whether the problems were all solved.

http://www.techworld.com.au/article/528997/ciphercloud_adds_more_randomness/

But then encrypting your data is probably better than no encryption at all - unless it was a scenario where you are letting a Trojan horse into your environment. I'm sure whomever is using these solutions has considered all of that. I think what I am trying to do is much more simple.

If you are storing files encrypted in entirety before they hit the AWS network and not giving any third party your keys, this seems like a good candidate for a trial application on AWS.

Of course you'll also need to consider compliance issues. AWS has the greatest number of compliance certifications so likely they can help meet that requirement in a more cost effective manner.

As I posted on Twitter a while back (I thought it up all on my own but since have heard others say similar things):

Encrypting data and storing the key in plain sight on the same box is like locking your front door and hanging the key on the door knob.

Found this article on general rules for crytopgraphy and key management:

Cryptographic storage cheat sheet.

Exploring latest research (and studying material from my last SANS course on cryptography) and will add more later.

Estimating AWS Costs Using AWS Calculator

Tips for estimating AWS costs

Amazon Calculator:
Amazon Calculator

Gather requirements
Map requirements to services
Right-size service choices
Evaluate pricing-model options
Use the Simple Monthly Calculator
Deliver *estimate* (educated guess)

Gather requirements
- Hardest part
- Location
- Duration
- Availability
- Operating system
- Prioritize Compute and Storage

EC2 will be the majority of the cost

Choose Instance type
- Start with memory requirements and architecture type (32 / 64 bit)
- then choose virtual cores required
- then select between alternatives based on use-case

Reserved instances may be cost effective but pay for the life of the instance even if you cancel it.
- if rates go down you'll get the lower rate going forward, but no refund on prior usage

Spot instances are good for map reduce jobs - bid for an instance, pay a lower cost, but instances can be terminated at any time.

Look for peak IOPS storage requirements for various components of solution and size EBS pIOPS and choose EBS-Optimized instances accordingly. Look at logs for SAN, database usage, etc to determine usage.

- based on specific work load
- database will have different IOPS requirements at different times

Pick the right region -- prices vary.

Detailed monitoring is every 1 minute instead of every 5 minutes. Costs more.

Data transfer in is always free. Data transfer out is not. May be able to estimate data transfer out looking at firewalls.

Storage on instance is ephemeral - will go away with your instance. Storage on EBS volume will remain if stop instance so you'll want EBS storage in addition to instance storage for most applications.

When selecting an instance take a look at dives - SSD vs other and IO - High, Medium, etc.

Competitive Analysis - AWS

AWS rated highest in Gartner magic quadrant in both ability to deliver and completeness of vision.

This analysis includes Google, Azure, RackSpace, SoftLayer, vCHS and is based on a third party presentation. I cannot speak authoritatively on the accuracy of the statements below.

IAAS vs PAAS - latter has some complications in the realm of gathering forensic data.

AWS has been around 8 years - other services much less.

Http://aws.amazon.com/choosing-a-cloud-platform/

Amazon compared to Microsoft (Azure), RackSpace, vCHS, Google, SoftLayer:

AWS - Broadest Global footprint

AWS - More scalable than RackSpace, SoftLayer, vCHS

AWS has most security certifications.

AWS offers the most Instance types for specific app needs: high IO, memory, compute power, etc.

Pricing model - no sub hour but spot, reserved instances.

AWS does not have live migration - Google, vCHS do if you feel that is important.

DevOps -yes - google, Azure, SoftLayer - no

AWS offers hosted desktop - workspaces.

Hadoop - AWS has EMR (elastic map reduce). Azure offers HD Insight.

Benchmarks need to choose apples to apples instance types. Some benchmark papers aren't highest quality.

Pricing: most competitors are more except Google which has lower price in some areas.

Edit 3/28/2014 Scratch that. Google cut their prices in 1/3. Amazon turned around and dropped some of their per hour pricing below Google's. However Google still offers per minute pricing and a cap if instances are on all the time which Amazon's pricing model cannot quickly match. But we'll have to keep an eye on that based on discussion I had with someone today - so that could change by the time you read this. Here's the pricing at this time:

http://aws.typepad.com/aws/2014/03/aws-price-reduction-42-ec2-s3-rds-elasticache-and-elastic-mapreduce.html

Security is the number one concern companies have when considering using the cloud. Amazon is the largest online retailer and understands securing financial transactions.

Ability to innovate: 
Try out scripting and spinning up environments on each platform.

Security is the number one barrier to entry for enterprises.
Compare security processes. 
Considerations:

https://s3-us-west-2.amazonaws.com/i.radicalsoftware.com/presentations/evaluating_cloud_security.pptx





Prioritizing Projects for AWS

List assets

Determine dependencies

Classify your IT assets
- top secret, secret, public
- low, medium, high compliance
- internal, partner, customer facing
- low medium, high, coupling
- strict vs relaxed licensing

StackRank your assets
- underutilized it assets
- immediate need to scale
- running out of capacity
- easiest to move
- build support, creates awareness and excitement

Build proof of concept - great way to demonstrate value of AWS


AWS - Total Cost of Ownership (TCO)

Consider Total Cost of Ownership before moving to AWS. Without a TCO baseline for the organization cannot adequately compare and contrast AWS infrastructure.

Some considerations when comparing TCO of AWS and other models:

Capacity Planning, Utilization hard to predict. Do you buy for peak load?

Need to consider total cost of ownership, not just cost of a single piece of hardware.

Duplicating datacenter in the cloud is not apples to apples comparison.

Need comprehensive analysis across organization.

TCO should be determined over multiple iterations.

Apply cost benefits of automation.

Consider tiered pricing.

TCO includes power, cooling, adminstration, hardware, software maintenance, redundancy.

Time to market - time to get, set up hardware.

Cost of over capacity.

Cost of disappointed or lost customers if not enough capacity.

Memory

Memory (Stack vs. Heap)

Java stores primitives on the stack, objects on the heap.

Pointers to objects are stored on the stack.

Elements stored on the heap are FIFO (first in first out).

When the stack memory is used up Java throws  java.lang.StackOverFlowError

When the heap memory is used up Java throws java.lang.OutOfMemoryError
 
Recursion can fill up the stack

Variables on the stack are available only to the thread that owns them.


Objects on the heap are visible to all threads.

http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html 

http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap

http://www.vogella.com/tutorials/JavaPerformance/article.html
 


Videos - Algorithms & Data Structures

Big O Notation

http://youtu.be/V6mKVRU1evU

Data Structures

Hash Table
http://www.youtube.com/watch?v=B4vqVDeERhI
http://www.youtube.com/watch?v=QWVBu_GlUgI
http://www.youtube.com/watch?v=SVsT7oG4ap8

Java Heaps
http://www.youtube.com/watch?v=eFCn6udv3gQ

Linked List
http://www.youtube.com/watch?v=195KUinjBpU
http://www.youtube.com/watch?v=iR5wyCaIayk

Stacks and Queues
http://www.youtube.com/watch?v=JvGZh_BdF-8

Java Binary Search Tree I & II
http://www.youtube.com/watch?v=M6lYob8STMI
http://www.youtube.com/watch?v=UcOxGmj45AA

Algorithms

Recursion
http://www.youtube.com/watch?v=neuDuf_i8Sg

Java Algorithms
http://www.youtube.com/watch?v=f5OD9CKrZEw

Shell Sort
http://www.youtube.com/watch?v=IlRyO9dXsYE

Quick Sort
http://www.youtube.com/watch?v=mN5ib1XasSA

Algorithms - Overview - Lecture 1
http://www.youtube.com/watch?v=gwlevtaC-u0

Algorithms - Sorting - Lecture 2
http://www.youtube.com/watch?v=odNJmw5TOEE

Algorithms - Sorting II - Lecture 3
http://www.youtube.com/watch?v=odNJmw5TOEE

Heap Sort
http://www.youtube.com/watch?v=B7hVxCmfPtM

Algorithms - Searching & Data Structures - Lecture 4
http://www.youtube.com/watch?v=1W3x0f_RmUo

Algorithms - Red-Black Trees - Lecture 5
http://www.youtube.com/watch?v=hm2GHwyKF1o

Graph Algorithms I - Topological Sorting, Prim's Algorithm - Lecture 6
http://www.youtube.com/watch?v=i_AQT_XfvD8

Graph Algorithms II - DFS, BFS, Kruskal's Algorithm, Union Find Data Structure - Lecture 7
http://www.youtube.com/watch?v=ufj5_bppBsA

Graph Algorithms III: Shortest Path - Lecture 8
http://www.youtube.com/watch?v=DiedsPsMKXc

Graph Alg. IV: Intro to geometric algorithms - Lecture 9
http://www.youtube.com/watch?v=XIAQRlNkJAw

Geometric Algorithms: Graham & Jarvis - Lecture 10
http://www.youtube.com/watch?v=J5aJEcOr6Eo

Dynamic Programming I - Lecture 11
http://www.youtube.com/watch?v=0EzHjQ_SOeU

Dynamic programming II - Lecture 12
http://www.youtube.com/watch?v=v1qiRwuJU7g

Parsing - Lecture 13
http://www.youtube.com/watch?v=gSsfoEjJ-Tc

Knapsack, Bandwidth Min. Intro: Greedy Algorithms - Lecture 14
http://www.youtube.com/watch?v=x0SJ_5A5MlA

Greedy Algs. II & Intro to NP Completeness - Lecture 15
http://www.youtube.com/watch?v=qcGnJ47Smlo

NP Completeness II & Reductions - Lecture 16
http://www.youtube.com/watch?v=e0tGC6ZQdQE

NP Completeness III - More Reductions - Lecutre 17
http://www.youtube.com/watch?v=fCX1BGT3wjE

NP Completeness IV - Lecture 18
http://www.youtube.com/watch?v=NKLDp3Rch3M

Approximation Algs. - Lecture 19
http://www.youtube.com/watch?v=oDniZCmNmNw

Solving Problems
http://www.youtube.com/watch?v=63BBWWsqsT0
http://www.youtube.com/watch?v=LswVAk59goM

Java Videos

Some interesting Java videos


Java Heap (Memory)
http://www.youtube.com/watch?v=FLcXf9pO27w

Class Loaders
http://www.youtube.com/watch?v=t8sQw3pGJzM

Effective Java
http://www.youtube.com/watch?v=V1vQf4qyMXg 

JVM and Low Latency
http://www.youtube.com/watch?v=CZytwF_y_x8

IOPS

I had an interesting discussion with one of my geek buddies last night. Someone told him to run something against a NAS that resulted in many small input and output operations (think map reduce reads and writes) and had some issues with that ...let's just say performance issues no big company wants to experience.

In the example above the NAS was not designed for a lot of small reads and writes.

This brings up an interesting point when considering performance of systems and determining how to architect the systems and write the software used by a particular piece of hardware. You have to understand the reads and writes your application requires and how the hardware you are running it on will handle it. For example if you are backing up data, you have large writes and minimal reads. A database or map reduce application will probably have a lot of small reads and writes.

Reading and writing data to/from disk is affected by IOPS.

IOPS = Input/Output Operations Per Second, pronounced eye-ops
http://en.wikipedia.org/wiki/IOPS

When you get an EC2 instance on AWS, for example, you can specified the IOPS. Understanding the reads and writes your application is doing will help you provision the correct amount of IOPS.

Understanding the devices in your network you are reading from and writing to will help you use the technology appropriately for a given application requirement.

Data Structures

Memory Use of Data Structures - Most to Least

HashSet
-- a wrapper around a HashMap with less functionality
-- default 16 elements
-- default size is 144 bytes
-- 128 bits to wrap hashmap
-- argument - why use because more memory, less functionality

HashMap
-- default size of 16
-- empty size is 128 bytes
-- each object 46 bytes + 36 bytes per entry
-- overhead ~360k (1/3MB to do nothing on creation)
-- not synchronized

Hashtable
-- same as HashMap in terms of memory on creation
-- 11 default entries
-- synchronized by default

LinkedList
--default size is 1
-- default size 48 bytes
-- 24 bytes per entry (no key/hash)

ArrayList
-- default size is 10
-- empty size is 88 bytes
-- for 10,000 item list overehead ~40k

Compresses references reduces amount of ram increase on 64 bit OS over 32 bit.

Speed of Collections

HashMaps offer faster access - you have the key to determine the location - but use more memory

LinkedLists and ArrayLists slower to search but use less memory

LinkedList - insertion and deletion are fast because you simply move the pointers at the point of insertion or deletion rather than moving the whole list. Search is slow.


Arrays are bad for insertion because you have to move all the items after the insertion point +1 on insert.

HashTables offer fast insertion and searching. Limited in size because based on arrays (should avoid resizing). Hard to sort.

Implementations
  
Binary Tree

O (log n)

a data structure where each node has at most two children
left node is less than right
children less than parent

Good for:
- insert, update, delete, search

Implementation

Three classes:

1. Node (key value pair)
2. Binary Node (extends node with right and left child)
3. Binary Tree

package com.radicalsoftware.datastructure;

public class Node {

    protected int key;
    protected String value;
   
    public Node(int key, String name){
        this.key = key;
        this.value = name;
    }
    
    public String getString(){
        return value;
    }
   
    public int getKey(){
        return key;
    }
}

package com.radicalsoftware.datastructure;

public class BinaryNode extends Node {

    private BinaryNode leftChild;
    private BinaryNode rightChild;
   
    public BinaryNode(int key, String name){
        super(key, name);
    }
        
    public void setRightChild(BinaryNode rightChild){
        this.rightChild = rightChild;
    }
   
    public void setLeftChild(BinaryNode leftChild){
        this.leftChild = leftChild;
    }
   
    public BinaryNode getRightChild(){
        return rightChild;
    }
   
    public BinaryNode getLeftChild(){
        return leftChild;
    }
   
    public String toString(){
        return key + " " + value;
    }
   
    public boolean hasChildren(){
        return leftChild != null && rightChild != null;
    }
}


public class BinaryTree {

    private static BinaryNode root;
   
    public static void main(String[] args) {
        BinaryTree bt = new BinaryTree();
        bt.addNode(4, "Mickey");
        bt.addNode(2, "Goofy");
        bt.addNode(3, "Minnie");
        bt.addNode(1, "Donald");
       
        BinaryNode node = root;
       
        System.out.println("*******************");
        System.out.println("in order traversal");
        inOrderTraversal(node);

        System.out.println("*******************");
        System.out.println("pre order traversal");
        preOrderTraversal(node);
       
        System.out.println("*******************");
        System.out.println("post order traversal");
        postOrderTraversal(node);
       
        System.out.println("*******************");
        System.out.println("find 3");
        System.out.print(find(3));
    }
   
    public static BinaryNode find(int key){
       
        BinaryNode searchNode = root;
       
        while(searchNode.key != key){
           
            if (key < searchNode.key){
                searchNode = searchNode.getLeftChild();
            }
            else{
                searchNode = searchNode.getRightChild();
            }
           
            if (searchNode == null) return null;
        }
       
        return searchNode;
    }
    public static void inOrderTraversal(BinaryNode node){
       
        if (node != null){
            inOrderTraversal(node.getLeftChild());
            System.out.println(node);
            inOrderTraversal(node.getRightChild());
        }
       
    }
   
    public static void preOrderTraversal(BinaryNode node){
       
        if (node != null){
            System.out.println(node);
            preOrderTraversal(node.getLeftChild());
            preOrderTraversal(node.getRightChild());
        }
   
    }
   
    public static void postOrderTraversal(BinaryNode node){
       
        if (node != null){
            postOrderTraversal(node.getLeftChild());
            postOrderTraversal(node.getRightChild());
            System.out.println(node);
        }
    }
   
    public BinaryNode getRoot(){
        return root;
    }
   
    public void addNode(int key, String name){
       
        BinaryNode newNode = new BinaryNode(key, name);
       
        if (root == null){
            root = newNode;
        }else{
           
            BinaryNode parent = root;
           
            while(true){
               
               
                if (key < parent.getKey()){
                    if (parent.getLeftChild() == null){
                        parent.setLeftChild(newNode);
                        break;
                    }
                    else{
                        parent = parent.getLeftChild();
                    }
                }else{
                    if (parent.getRightChild() == null){
                        parent.setRightChild(newNode);
                        break;
                    }else{
                        parent = parent.getRightChild();
                    }
               
                }
            }
        }
    }
}


Hash Table
- key values are assigned using a hash function
- hash determines best index
- IDs used to generate hash must be unique
- hashes must be unique (duplicate hashes when generating called collisions and can affect performance)
- hash must be small enough to fit within the bounds of the array
- hashes make it possible to find the value in the array using a calculation vs. search which  means it takes one operation to access the data vs having to go through every element in the worst case search

Avoiding to collisions
- To avoid collisions you want your table storing hashes to be at least twice as big as size of array if you are using a mod has function
- using prime number for array size helps minimize collisions
- also avoid clustering by using a double hash function

To resize
- store values in current array and empty spaces
- get prime number bigger than array size
- use a hash function to fill new array with new hashes

To hash strings
hash value starts at 0
for each character hash =  (hash * 27 + charCode) % arraySize

Separate chaining
- if there is already a value in a spot in the array add it to a list instead of searching for an open spot

Sample Code

Classes:
1. ArrayTools - some toools for arrays such as remove spaces & print elements
2. RadMath - some math functions
3. Hash - some hash functions (note string hash not used below)
4. HashTable - assumes all elements added to array are Strings that convert to unique integers

package com.radicalsoftware.datastructure;

import com.radicalsoftware.util.ArrayTools;
import com.radicalsoftware.util.Hash;
import com.radicalsoftware.util.RadMath;

//This class assumes the values are Strings that can be convereted
//to integers
public class HashTable {

    private String[] array;

    HashTable(int size){
        array = new String[size];
    }
   
    /*************************************************
     * Methods to fill hash table
     *************************************************/
   
    //the hash calculation = parse integer from String value
    public void fillTableHash1(String[] strings){
   
        for (int n = 0; n < strings.length; n++){
            String s = strings[n];
            int hash = Hash.keyToInt(s);
            array[hash] = s;
        }
   
    }
   
    public void fillTableHash2ShowCollisions(String[] strings){
      
        for (int i = 0; i < strings.length; i++){
          
            String s = strings[i];
            int hash = Hash.modHashShowCollisions(s, array);
            System.out.println("hash: " + hash);
            array[hash] = s;
        }
      
    }
   
    //hash 2, don't show collisions
    public void fillTableHash2(String[] strings){
      
        for (int i = 0; i < strings.length; i++){
          
            String s = strings[i];
            int hash = Hash.modHash(s, array);
            array[hash] = s;
        }
      
    }
   
    public void fillTableDoubleHashShowCollisions(String[] strings){
      
        for (int i = 0; i < strings.length; i++){
          
            String s = strings[i];
            int hash = Hash.doubleHash(s, array, 5);
            System.out.println("hash: " + hash);
            array[hash] = s;
        }
      
    }

    /*************************************************
     * Find value in hash table
     *************************************************/
   
    public int find(String key){
      
        int hash = Hash.modHashBase(key, array);
        int count = 1;

        //Exit on null or checked entire array
        while (this.array[hash] != null &&
               count != this.array.length){
      
            if (this.array[hash].equals(key))
                return hash;
          
            hash++;  
            hash %= this.array.length;
            count ++;
        }
      
        return -1;
      
    }
   
    public int findDoubleHash(String key){
      
        int hash = Hash.doubleHashBase(key, array, 5);
        int count = 1;

        //Exit on null or checked entire array
        while (     this.array[hash] != null &&
                 count != this.array.length){
      
            if (this.array[hash].equals(key))
                return hash;
          
            hash++;  
            hash %= this.array.length;
            count ++;
        }
      
        return -1;
      
    }
   
   
    /*************************************************
     * Resize HashTable
     *************************************************/
    public void resize(int size){
      
        ArrayTools.removeSpacesInArray(array);
         size = RadMath.getNextPrime(size);  
         String[] newArray = new String[size];
       
         for(int i = 0; i < array.length; i++){
           
             if(array[i] == null){
                 array = newArray;
                 return;
             }
           
             String s= array[i];
             int hash = Hash.modHash(s, newArray);
           
             while(newArray[hash]!= null){
                 hash++;
                 hash %= newArray.length;
             }
           
             newArray[hash] = s;
         }
       
    }
   
    /*************************************************
     * Print hash table
     *************************************************/
    public void print(){
        ArrayTools.print(array);
    }
   
}
 

package com.radicalsoftware.util;

public class ArrayTools {

    //remove spaces in original array
    //could also create an array list and copy
    //everything over but this uses the existing
    //allocated array
    public static void removeSpacesInArray(String[] array){
       
        int pointer = 0;
       
        for(int i = 0; i < array.length; i++){
       
            if (array[i] != null){
                if (i > pointer){
                    array[pointer] = array[i];
                    array[i] = null;
                   
                    while (array[pointer] != null
                            && pointer < array.length){
                        pointer ++;
                        pointer %= array.length;
                    }
                }
            }
        }
    }

   
    public static void print(String[] array){
   
        for (int i = 0; i < array.length; i++){
           
            System.out.print(i);
            System.out.print(" ");
            if(array[i]!= null){
                System.out.print(array[i]);
            }
           
            System.out.print(System.lineSeparator());
           
        }
       
        System.out.println(System.lineSeparator());
       
    }
}
 


package com.radicalsoftware.util;

public class Hash {

   
    /*************************************************
     * Hash calculations
     *************************************************/
   
    //calculate hash by converting to int
    //assumes all input strings for table are unique
    //and only up to size of array
    public static int keyToInt(String s){
        return Integer.parseInt(s);
    }
   
   
    //let's say we have 900 potential values but only ever store
    //up to 25 values - it is not efficient to create a 900 item array.
    //make the array size of number of values we need to store and hash
    //on up to 900 values in a way that won't cause collisions
    //Solution: use mod size of array
   
    public static int modHashBase(String s, String[] array){
        int hash = Integer.parseInt(s) % array.length;
      
        return hash;
    }
    public static int modHash(String s, String[] array){
   
        int hash = modHashBase(s, array);
      
        while(array[hash]!= null){
            System.out.print("collision: array index (" + hash + ") has a value.");
          
            hash++;
            //if end of array start over at 0
            hash %= array.length;
          
            System.out.println(" Try: " + hash);
        }
      
        return hash;
    }

   

    public static int modHashShowCollisions(String s, String[] array){
      
        int hash = Integer.parseInt(s) % array.length;
      
        //if collision add 1 to hash until no more collisions
        while(array[hash]!= null){
            System.out.print("collision: array index (" + hash + ") has a value.");
          
            hash++;
            //if end of array start over at 0
            hash %= array.length;
          
            System.out.println(" Try: " + hash);
        }
      
        return hash;
    }
   
    //avoid clusters
    public static int doubleHashBase(String s, String[] hashArray, int step){
        int val = Integer.parseInt(s);
        int hash = val % hashArray.length;
        int stepDistance = step -(val % step);
        hash = hash + stepDistance;
        hash %= hashArray.length;
        return hash;
    }
   
    public static int doubleHash(String s, String[] array, int step){
      
        int hash = doubleHashBase(s, array, step);
      
        while(array[hash]!= null){
            hash ++;
            hash %= array.length;
        }
      
        return hash;
    }
   
    public static int hashString(String s, String[] array){
        int hash = 0;
      
        for (int i = 0; i < s.length(); i++){
            int charCode = s.charAt(i) - 96;
            hash += hashCharacter(charCode, hash, array.length);
        }
      
        return hash;
    }
   
    public static int hashCharacter(int charCode, int hash, int arrayLength){
      
        return (hash * 27 + charCode) % arrayLength;
    }
}


Graph

O(V+E)

- collection of verticies and edges
- undirected - no directions on edges, pairs are unordered
- directed - edges associated with directions - edges are ordered pairs
- two vertices are adjacent if connected by an edge
- edge is incident to a vertex if the vertex is one of the two vertices that the edge connects
- degree is number of edges
- directed graphs - in degree (edges in), out degree (edges out)
- path -  sequence of vertices between to vertexes connected by edges
- length of path -- last node is length minus one so in the case of three nodes the first node is 0, second is 1, 3rd is 2 (length -1) so length = 3
- cycle - path that begins and ends at the same point
- simple path - path with no repeating vertices
- simple cycle - simple path plus the original point
- connectedness - connected if path between every single pair of verticies
- complete - an edge connecting every pair of vertices, not just a path
- adjacency list - better for less complete
- adjacency matrix - better for fuller graphs
- diameter of graph - from start to final state you are seeking (i.e. like solving a 2 x 2 Rubik's cube)

Adjacency Lists:
Used to represent a graph
Array - each element is a pointer to a linked list
Array is indexed by a vertex
Index arrays by verticies
Your array could be a hash table
For each vertex store it's neighbors

Array[a] = {b}
Array[b] = {a, c}
Array[c] = {a, b}

Undirected uses parentheses (I think)

So that would mean vertex a connects to b, b connects to a and c, c connects to a and b

http://www.youtube.com/watch?v=vfCo5A4HGKc
http://www.youtube.com/watch?v=s-CYnVz-uh4

Other Data Structures

Tries
Stack
Queue
Vector/ArrayLists

Array
Linked List

StringBuffer
default size is 16 characters
empty size is 72 bytes

Videos - Discrete Mathmematics and Linear Algebra

Discrete mathematics

Arsdigita 02 (Discrete Mathematics) Lecture 1/20
http://www.youtube.com/watch?v=h_9WjWENWV8

Arsdigita 02 (Discrete Mathematics) Lecture 2/20
http://www.youtube.com/watch?v=BrDto0heyaQ

Arsdigita 02 (Discrete Mathematics) Lecture 3/20
http://www.youtube.com/watch?v=t3XdRbPNtdg

Arsdigita 02 (Discrete Mathematics) Lecture 4/20
http://www.youtube.com/watch?v=J_vXRNMu1yQ

Lecture 5 not found

Arsdigita 02 (Discrete Mathematics) Lecture 6/20
http://www.youtube.com/watch?v=_FG9hhiZipo

Arsdigita 02 (Discrete Mathematics) Lecture 7/20
http://www.youtube.com/watch?v=uBKb1I70PhA

Arsdigita 02 (Discrete Mathematics) Lecture 8/20
http://www.youtube.com/watch?v=iT_Scp0hWWU

Arsdigita 02 (Discrete Mathematics) Lecture 9/20
http://www.youtube.com/watch?v=9C_ZD8Bqyjk

Arsdigita 02 (Discrete Mathematics) Lecture 10/20
http://www.youtube.com/watch?v=ZJqFHvBQthI

Arsdigita 02 (Discrete Mathematics) Lecture 11/20
http://www.youtube.com/watch?v=MLGTi0ewfBE

Arsdigita 02 (Discrete Mathematics) Lecture 12/20
http://www.youtube.com/watch?v=vWo892ut5Qo

Arsdigita 02 (Discrete Mathematics) Lecture 13/20
http://www.youtube.com/watch?v=0x_b_zPHy3c

Arsdigita 02 (Discrete Mathematics) Lecture 14/20
http://www.youtube.com/watch?v=ikRtC0FTv1k

Arsdigita 02 (Discrete Mathematics) Lecture 15/20
http://www.youtube.com/watch?v=oDEgQTxEjJ0

Arsdigita 02 (Discrete Mathematics) Lecture 16/20
http://www.youtube.com/watch?v=LzD4MqgGBJE

Arsdigita 02 (Discrete Mathematics) Lecture 17/20
http://www.youtube.com/watch?v=5gqQatmQvIc

Arsdigita 02 (Discrete Mathematics) Lecture 18/20
http://www.youtube.com/watch?v=WOZacrs5vT0

Arsdigita 02 (Discrete Mathematics) Lecture 19/20http://www.youtube.com/watch?v=z5vSAC9i3yo

Arsdigita 02 (Discrete Mathematics) Lecture 20/20
http://www.youtube.com/watch?v=Nt7JeGCZL5M

Linear Algebra

http://www.youtube.com/watch?v=ZK3O402wf1c

More on YouTube

Sorting Algorithms

Some Notes on Sorting Algorithms...

Big O Complexity Cheat Sheet
http://bigocheatsheet.com/

Comparator Interface

When comparing two objects we'll be using the Comparator interface:
http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html

Comparator takes two arguments: left and right. Implementation is typically:

left < right = -1
left > right = 1
left = right = 0

String implements the Comparable interface and the examples below use Strings.
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html

So you can compare strings like this:

String s1 = "A";
String s2 = "B";
s1.compareTo(s2);

A < B = -1
A > B = 1
A = B = 0

This could be more generic than using Strings but using a particular data type makes the examples shorter for the moment.

Terminology

Stability: When two items with the same value stay in the same relative position after sort

In Place:  Whether or not algorithm uses separate data structure to sort

Bubble Sort

Implementation:

- start at the beginning of the list
- compare the first and second items
- if the first is bigger than the second, swap
- repeat with each subsequent pair until end of list
- biggest item will be at the bottom
- start again at the beginning of the list and skip the last item as already sorted
- continue starting over at beginning of list and skipping already sorted items until all items sorted

Sample Code:

package com.radicalsoftware.example.sort;

import java.util.ArrayList;
import java.util.List;

public class BubbleSort {

    public static void main(String[] args) {
      
        ArrayList<String> list = new ArrayList<String>();
        
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");

        sort(list);
      
        for (String s: list){
            System.out.println(s);
        }
   
    }
  
    public static List<String> sort(List<String> list){
      
        assert list != null : "list cannot be null";
      
        int size = list.size();
      
        for (int pass = 1; pass < size; ++pass){
            for (int left = 0; left < (size - pass); ++left){
                int right = left + 1;
                if (list.get(left).compareTo(list.get(right)) > 0){
                    swap(list, left, right);
                }
            }
        }
      
        return list;
      
    }
  
    private static <T> void swap(List<T> list, int left, int right){
        T obj = list.get(left);
        list.set(left,  list.get(right));
        list.set(right, obj);
    }
  

}



Selection Sort

- similar to sorting books on a book shelf
- unstable
- for each spot in the list
- scan all items in list to find the smallest (or largest) item
- put the smallest (or largest) item in that spot
- move to the next spot and repeat skipping already sorted items

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class SelectionSort {

    public static void main(String[] args) {
       
        ArrayList<String> list = new ArrayList<String>();
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
        sort(list);
       
        for (String s: list){
            System.out.println(s);
        }
       
   
    }
   
    public static List<String> sort(List<String> list){
       
        assert list != null : "list cannot be null";
       
        int size = list.size();
       
        for (int slot = 0; slot < size-1; ++slot){
           
            int smallest = slot;
           
            //find the smallest in remaining items
            for (int checkSlot = slot + 1; checkSlot < size; ++checkSlot){
                if (list.get(slot).compareTo(list.get(checkSlot)) > 0){
                    smallest = checkSlot;
                    break;
                }
            }
           
            //if smallest does not equal current slot then swap
            if (smallest != slot)
                swap(list, smallest, slot);
        }
       
        return list;
       
    }
   
    private static <T> void swap(List<T> list, int left, int right){
       
        T obj = list.get(left);
        list.set(left,  list.get(right));
        list.set(right, obj);
   
    }

}


Insertion Sort

- similar to sorting cards where you sort all the numbers within each suit
- create a new list and insert items into it
- use an iterator instead of an index
- insert the first item into the list
- use linked list so can scan and insert each item into correct relative location

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

public class InsertionSort {

    public static void main(String[] args) {
      
        ArrayList<String> list = new ArrayList<String>();
      
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
      
        List<String> sortedList = sort(list);
      
        for (String s: sortedList){
            System.out.println(s);
        }
      
   
    }
   
    public static List<String> sort(List<String> list){
      
        assert list != null : "list cannot be null";
      
        final List<String> sortedList = new LinkedList<String>();
      
        for (String s: list){
   
            int slot = sortedList.size();
      
            while (slot > 0){
                if (s.compareTo(sortedList.get(slot - 1)) >= 0){
                    break;
                }
                --slot;
            }
          
          
            sortedList.add(slot, s);
        }
      
      
        return sortedList;
      
    }
   
}


Shell Sort

- breaks large lists into smaller lists
- based on concept of H-sorting - every Hth item is sorted relative to other items
- not stable

Sample Code: 

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class ShellSort {

    private static final int[] _increments = {1, 3, 5, 7, 11, 20};
   
    public static void main(String[] args) {
       
        ArrayList<String> list = new ArrayList<String>();
       
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
        list.add("Money");
        list.add("Food");
        list.add("Stars");
        list.add("Moon");
        sort(list);
       
        for (String s: list){
            System.out.println(s);
        }
       
   
    }
   
    public static List<String> sort(List<String> list){
       
        assert list != null : "list cannot be null";
       
        for (int i = 0; i < _increments.length; i++){
            int increment = _increments[i];
            if (list.size() < increment * 2)
                break;
            hSort(list, increment);
        }
       
        return list;
       
    }
   
    private static void hSort(List<String> list, int increment){

        for (int startIndex = 0; startIndex < increment; ++startIndex){
            for (int i = startIndex + increment; i < list.size(); i += increment){
               
                String value = list.get(i);
               
                int j;
               
                for (j = i; j > startIndex; j -= increment){
                   
                    String previousValue = list.get(j - increment);
                   
                    if (value.compareTo(previousValue) >=0)
                        break;
                   
                    list.set(j, previousValue);
                   
                }
               
                list.set(j, value);
               
            }
        }
    }
   
}


Quick Sort

- uses recursion
- does the work first, then the recursion
- divide and conquer, recursively sorting smaller parts of list
- place an item in final position, smaller items left, larger items right
- divided into two parts (not necessarily two halves)
- not stable
- use quick sort until length of list is 10 or less and then use insertion sort

Best case: O(n Log(n)) -- partition element is middle element

Worst Case: O(n2) -- completely unbalanced

Implementation

- Choose partition item (last, mean of first, last and middle, etc.)
- indexes start at beginning and end (less the partition item) and advance towards each other
- left index stops when finds item larger than partitioning item
- right index stops when finds item smaller than partitioning item
- when both stop swap items
- point where indexes meet - insert partitioning item
- then sort the sublists to the right and left of the partitioning item

Sample Code:

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class QuickSort {

    public static void main(String[] args) {
      
        ArrayList<String> list = new ArrayList<String>();
        list.add("Alpha");
        list.add("Zero");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
        sort(list);
      
        for (String s: list){
            System.out.println(s);
        }
      
   
    }
   
    public static void sort(List<String> list){
      
        assert list != null : "list cannot be null";
      
        quicksort(list, 0, list.size()-1);
      
    }
   
    public static void quicksort(List<String> list, int startIndex, int endIndex){
      
        if (startIndex < 0 || endIndex >= list.size())
            return;
      
        if (endIndex <= startIndex)
            return;
      
        String value = list.get(endIndex);
      
        int partition = partition(list, value, startIndex, endIndex -1);
      
        if (list.get(partition).compareTo(value) < 0)
            ++partition;
      
        if (partition != endIndex)
            swap(list, partition, endIndex);
      
        //sort right and left of partition
        quicksort(list, startIndex, partition -1);
        quicksort(list, partition + 1, endIndex);
      
    }
   
    private static int partition(List<String> list, String value,
            int leftIndex, int rightIndex) {
        int left = leftIndex;
        int right = rightIndex;
      
        while (left < right){
          
            if(list.get(left).compareTo(value)< 0){
                ++left;
                continue;
            }
   
            if(list.get(right).compareTo(value)>=0){
                --right;
                continue;
            }
          
            swap(list, left, right);
            ++left;
        }
   
        return left;
    }

    private static void swap(List<String> list, int left, int right){
        String obj = list.get(left);
        list.set(left,  list.get(right));
        list.set(right,  obj);
    }


Merge Sort
- not in place

Implementation:

- divide the list in half recursively until you get to one item arrays
- merge each array together two at a time
-- pointer at the beginning of each two array merging
-- compare elements and put the smaller into a new array
-- move pointer of array you took element from +1
-- repeat for all elements
-- repeat for all arrays

package com.radicalsoftware.sort;

import java.util.ArrayList;
import java.util.List;

public class MergeSort {


    public static void main(String[] args) {
      
        ArrayList<String> list = new ArrayList<String>();
      
        list.add("Zero");
        list.add("Alpha");
        list.add("Cat");
        list.add("Boy");
        list.add("Dog");
        list.add("Rose");
        list.add("Lion");
        list.add("Fire");
        list.add("Dragon");
        list.add("Math");
        list.add("Logic");
        list.add("Salmon");
   
        List<String> sortedList = sort(list);

        for (String s: sortedList){
            System.out.println(s);
        }
    }
   
    public static List<String> sort(List<String> list){
      
        assert list != null : "list cannot be null";
      
        return mergesort(list, 0, list.size()-1);
      
    }

    private static List<String> mergesort(List<String> list, int startIndex, 

        int endIndex) {

      
      
        int middleIndex = (startIndex + endIndex)/2;
      
        if (startIndex < endIndex){
          
            List<String> left = mergesort(list, startIndex, middleIndex);
            List<String> right = mergesort(list, middleIndex + 1, endIndex);
          
            return merge(left, right);
      
        }
      
        List<String> mergedList = new ArrayList<String>();
        mergedList.add(list.get(startIndex));      
        return mergedList;
      
      
      
    }

    private static List<String> merge(List<String> left, List<String> right) {
      

        List<String> mergedList = new ArrayList<String>();
      
        int rightIndex = 0;
        int leftIndex = 0;
      
        while(leftIndex < left.size() || rightIndex < right.size()){

            boolean rightHasMore = rightIndex < right.size();
            boolean leftHasMore = leftIndex < left.size();
          
            if (!rightHasMore || 

                (leftHasMore && 
                left.get(leftIndex).compareTo(right.get(rightIndex)) <= 0)){
                mergedList.add(left.get(leftIndex));
                leftIndex++;
            }else{
                if(rightHasMore){
                    mergedList.add(right.get(rightIndex));
                    rightIndex++;
                }
            }  
      
          
        }
          
        return mergedList;
    }
   

}


Heap Sort
- in place (uses less memory than mergesort which has to allocate additional storage)
- comparison based sort
- selection sort family
- slower than well implemented quick sort, but better worst case
- in place
- not stable

Implementation

An array viewed as a nearly complete binary tree (meaning a subsequent level is not started until the previous level is completely filled).

http://www.youtube.com/watch?v=B7hVxCmfPtM
 

Bucket, counting and radix sort:
http://www.youtube.com/watch?v=woTbZMli6z4

 Bucket Sort
- stable sort
- range of keys can't be too big
- linear time

Counting Sort 
- stable sort
- range of keys can't be too big
- linear time

Implementation:
- histogram
- indicies
- place

Radix sort
- stable
- only good when
- use counting sort trick on the least significant digit
- not in place
- linear time
- unlimited number of keys
- uses the stability of bucket sort and counting sort