Saturday, August 24, 2013

Access restriction: The method ... from the type ... is not accessible due to restriction on required library

Here's a fun one...

Got this error suddenly on a Java project in Eclipse that had been working fine previously.

Access restriction: The method [some method] from the type [some class] is not accessible due to restriction on required library [path to some Java jdk jar]

Fix:

Right click on project, click properties.
Choose Java Build Path on left, Libraries tab.
Select JRE, select Edit..

 



Instead of Execution environment...























Choose alternate JRE and click Finish, then OK.



Code compiles.

AWS Cloud Error Logging: EC2 instances, S3, DynamoDB, Alerts

In the past I have written generic error handlers that log errors by writing out the error stack trace and messages to a format that is more easy to review later than typical log file, and attach application specific data. The problem I have with standard files is the time it takes to parse through them to find a particular error, and the lack of application specific data that helps quickly replay and troubleshoot the error. There may be additional overhead in more specific logging but it it helps me eliminate the errors more quickly then the number of errors remains small.

Now I'm attempting to pass these errors asynchronously through an AWS SQS queue over to S3 and/or DynamoDB so the error logging doesn't tie up the system, the logs are separated from my EC2 instances in case one dies, logs are consolidated, and I can find a ways to optimize costs by reviewing usage stats and fixing errors that might use extraneous resources quickly. It is definitely scalable and the queue will store the messages for up to 4 days. I am not in a hurry to get the errors to the log store so they can sit in the queue for a while.  I can also hook up alerts using SQS, SES or SNS services from AWS depending on which is most appropriate for the case that my error logging is going berserk due to some system problem and catch it right away to minimize impact.

A Java program will throw different types of errors as explained in the article below, plus every other different type of standard and custom exception a programmer might throw, so code will need to deal with each of those types of errors appropriately. Basically I cast them all to throwables for my purposes, get the error message, stack, root cause error message and stack trace and store it all in a file with the information around that exception that helps me troubleshoot.

http://www.javaworld.com/jw-07-1998/jw-07-exceptions.html

Wondering the implications of storing individual errors vs dumping all the errors to one file and also using DynamoDB vs. S3. In the case of errors you may want to further extend your basic error to have additional information depending on what type of error it is. That means it doesn't fit exactly nicely into a single type of record because different parts of the system may log different information with each error. There are many alternatives but for my first cut I'm going to log the unstructured error information to a file and then explore logging a record in DynamoDB with error message, file name, timestamp, ID later for use with a user interface that can pull up the file from S3 if needed. At this time can probably live without the UI.

Friday, August 23, 2013

AWS SDK: Missing requirements


If you're having problems installing the AWS SDK for Java:

http://aws.amazon.com/sdkforjava/

Error messages I've Seen:
  • The software items you selected may not be valid with your current installation. Do you want to open the wizard anyway to review the selections?
  • Cannot find a solution satisfying the following requirements  org.eclipse.swt [3.4.0v3448f]
  •  AWS SDK for Java 2.0.0.v201212211205 (com.amazonaws.eclipse.sdk.
    feature.feature.group 2.0.0.v201212211205)
    Missing requirement: IdentityManagement 1.0.0.v201308121803 (com.amazonaws.eclipse.identitymanagement 1.0.0.v201308121803) requires 'bundle com.amazonaws.eclipse.ec2 1.1.0' but it could not be found
  • Cannot satisfy dependency:
    From: AWS Toolkit for Eclipse Core 2.0.1.v201308121803 (com.amazonaws.eclipse.core.feature.feature.group 2.0.1.v201308121803)
    To: com.amazonaws.eclipse.identitymanagement [1.0.0.v201308121803]
  • Cannot satisfy dependency: From: AWS SDK for Java 2.0.0.v201212211205 (com.amazonaws.eclipse.sdk.feature.feature.group 2.0.0.v201212211205) To: com.amazonaws.eclipse.core.feature.feature.group 1.0.1
Solutions:

Make sure you are downloading software from the correct site within Eclipse:

Add this site to your download sites in Eclipse (AWS instructions should explain how if you haven't read those already):

http://aws.amazon.com/eclipse

Missing Requirement Errors

Java programs have libraries they depend on to run (like many other languages). In order for one piece of software to install it may require the libraries it needs to be installed before it will allow you to install it. That's what all the missing requirement messages are about. I'm not sure what library contains what component above but by process of elimination and installing one library at a time I can get them to install by figuring out the correct order (and hopefully this is magically fixed in a future release).

 I have the AWS SDK working on Windows 8 with Eclipse Juno version with JDK 7. When I first installed I got an error stating a requirement was missing. I installed the AWS core package first and then I was able to install the rest.

I am helping a friend so decided to try installing again on Windows XP with old Ganymede version of Eclipse, JDK 6. Well first of all this doesn't work because you need an up to date version of Eclipse. So I figured I would install the latest which is Kepler (I installed Juno on the other machine a while back so at the time it was the latest). I had some problems installing Kepler even after upgrading to Java 7. It's not working possibly because I am using XP which is not a target platform but did get it to open. It just won't let me create a new project. But I tried installing AWS anyway.

I installed the packages in this order and got most of them installed (restarting in between each library):

#1 I tried to install core and that failed
#2 I tried to install AWS SDK and that failed
#3 I installed EC2 management - that worked and also installed the core package.
#4 Then I installed the SDK for Java
#5 Next DynamoDB library
#6 Workflow Management
#7 Elastic Beanstalk
#8 Cloud Formation

That left me with the following libraries which I could not install:

SimpleDB Management
RDS
Android

It appears these missing libraries in the error messages below may have been part of an older version of Eclipse (or perhaps because I don't have Eclipse working correctly they cannot be found). I could try to find these libraries in an older version of Eclipse or install an older version of Eclipse but since I have this working on Windows 8 + Juno just going to use that for now and hope installing in order resolves my friend's issue.

If you want to attempt finding the libraries you could try what I just suggested regarding Eclipse libraries and versions - or hopefully Amazon fixes this shortly...

http://wiki.eclipse.org/Older_Versions_Of_Eclipse

Error messages:

Cannot complete the install because one or more required items could not be found.
Software currently installed: Amazon RDS Management 1.0.0.v201308121803 (com.amazonaws.eclipse.rds.feature.feature.group 1.0.0.v201308121803)
Missing requirement: RDS 1.0.0.v201308121803 (com.amazonaws.eclipse.rds 1.0.0.v201308121803) requires 'bundle org.eclipse.datatools.connectivity.ui.dse 1.1.0' but it could not be found
Cannot satisfy dependency:
From: Amazon RDS Management 1.0.0.v201308121803 (com.amazonaws.eclipse.rds.feature.feature.group 1.0.0.v201308121803)
To: com.amazonaws.eclipse.rds [1.0.0.v201308121803]

I was not able to install the Android library because I got the following error:

Cannot complete the install because one or more required items could not be found.
  Software being installed: AWS SDK for Android 1.0.0.v201212110105 (com.amazonaws.eclipse.android.sdk.feature.feature.group 1.0.0.v201212110105)
  Missing requirement: AWS SDK for Android 1.0.0.v201212110105 (com.amazonaws.eclipse.android.sdk.feature.feature.group 1.0.0.v201212110105) requires 'com.android.ide.eclipse.adt 18.0.0' but it could not be found


Related: Installing Eclipse and Java JDK

  • If you need to upgrade Eclipse do not install one version over the other. Download the new version of eclipse and install it in it's own directory. 
  • I chose to install the Eclipse IDE for Java Developers
  • Make sure you have an up to date Java JDK. Install Java SE to get the JDK, not just a JRE. If you don't know what that means, read up on it but for now just go to this page and click the download button under JDK: http://www.oracle.com/technetwork/java/javase/downloads/index.html
  • Make sure you install the 32 bit version of both JDK and Eclipse if on a 32 bit operating system. Install the 64 bit version if you are on a 64bit operating system. If you are not sure which you have get a new computer and you'll have a 64 bit machine :) Or...read about that elsewhere. I had someone insisting to me he installed the correct version but after checking this was the problem...so double check you downloaded the correct version.
  • If you try to run Eclipse and it doesn't work, run from command line by opening command prompt window, navigate to the folder where eclipse.exe that you downloaded is located and type this command to start Eclipse and get debug information: eclipse -debug -console

Wednesday, August 21, 2013

NoSQL Databases and Financial Applications

-----
Update: AWS DynamoDB offers consistent reads. One consideration, however, is the need for complex queries or queries with changing criteria - unless you know exactly what questions you have in advance NoSQL may make sense. For complex and changing queries you may opt for RDBMS because NoSQL requires designing systems for queries at time of creation - not good for adhoc queries.
-----
I recently attended an Amazon AWS architecture class which included an overview of when and where to apply certain technologies within your architecture. Sometimes new technologies come out and they are all the rage. Everything starts looking like a nail and the new technology is the hammer.

One of these absolutely beyond cool new technologies (actually they have been around a while but starting to get widely used) are NoSQL databases:

http://nosql-database.org/

NoSQL databases are interesting because they overcome certain performance limitations such as your vertically scaling database architecture falling over under load and the only way to fix that is more database servers and convoluted replication and ID management strategies which require re-writes of all your ID sequence dependent applications. According to trainer at AWS very few companies do this well because it is very hard to do. Chances are restructuring database may be more desirable.

NoSQL databases are great in the case of what I'm playing around with on Amazon AWS (DynamoDB: http://aws.amazon.com/dynamodb/) for creating new databases, fast access, distributed systems, parallel processing of data using MapReduce. There are many types of unstructured data or applications that create user generated types on the fly that can benefit immensely from this type of database, as well as huge amounts of data (ok I'll use the buzzword: Big Data) that can be processed in parallel. It's also cool to be able to store certain types of data very fast and retrieve it very fast such as when you're loading a web page. DynamoDB runs on SSD so it's zippy. I was able to set up an application that sends data to a queue and log it to DynamoDB in about three days (this having never worked with SQS or DynamoDB before...)

The fact that the data is schema-less yes, does make things faster. The structure of the database supports horizontal scaling. Cool. The problem with schema-less is similar to the problem with not having a strongly typed programming language. You can have various types of flaws a strongly typed language will prevent. I just heard a speaker talking about never using JavaScript for financial applications because there is no decimal type and he had seen many mistakes related to decimal places. Someone at the event mentioned that he works at a very large company, which shall remain nameless, that has an online calculator with exactly that problem at the time of this writing.

Analogous to that, a NoSQL database has a bunch of key value pairs which can really be anything or maybe loosely enforced depending on the database. So you cannot count on a money value having the appropriate number of decimals, for example, or depending on the system even that it is a number if it's a generic list of key-value pairs (Some systems may enforce this, I don't know them all - leave as an exercise for the reader if you are planning to use this type of database for your financial application and you care. You should.) If you use a relational database you can be pretty sure that your defined column which is a decimal with two digits will be just that or throw an error if someone tries to put anything else in there.

[Please don't start going on about obscure rounding issues with money data types or whatever...that's for another blog post but as an example - you can find many for different vendors - check out the rounding issue after the 15th digit in Sybase IQ  15.2 decimal type: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/X315932.htm].

Another factor to consider when thinking about using a NoSQL database is lack of data integrity due to no schema, traditional joins or enforcement of complex relationships. The whole reason for 1st, 2nd and 3rd normal form in relational databases and the whole entity relationship diagram exercise is to maintain data integrity - preventing duplicate mismatched data, preventing orphans, enforcing unique values, etc. If you're unfamiliar with database normalization it's a big topic worthy of a book:

http://www.amazon.com/Database-Design-Databases-Books/b?ie=UTF8&node=3902

In some cases you don't want all that schema around your data. The reason people use warehouses and flatten everything out is to eliminate the joins and make reports run a lot faster. They still use the relational database in the first place where the data is entered for data integrity. In a similar fashion you need to consider the data you are storing, the purpose of it, how it will be used and how important data integrity is in this case, to decide where is most appropriate to put that data. NoSQL databases often repeat the same data across tables which can be an issue for data integrity but may be just what you need for the system you are implementing because you're only reading, not writing, or speed is more important that precision.

With a NoSQL database in many cases you generally lose triggers. Of course you can mimic triggers in your application but depending on how your system is structured you may end up with multiple points of data entry and the same rules in multiple places if not carefully managed that can end up getting mismatched down the line. (Duplicate code is the root of all evil...http://www.informit.com/articles/article.aspx?p=1313447 ). The same logic could be in a web site middle tier and a GUI application used internally and a batch processing application. Putting this logic in the database makes it easier to enforce certain rules at the point of insert, update and delete regardless of which application or piece of code is touching the data.

If using a NoSQL database you'll also want to make sure you have consistent reads and writes if you have a financial application:

http://aws.amazon.com/dynamodb/faqs/#What_does_read_consistency_mean_Why_should_I_care

There are, of course, alternative solutions to this problem and other environment dependent decision making criteria, but then you have to consider the cost of those alternate solutions vs. just using an RDBMS. What's the point anyway? To prove you can use NoSQL or follow some ivory tower principle you read in a book... or to create a superior business application that provides more ROI to the business? Some issues with NoSQL can be worked around to design a financial application if needed because you could funnel all requests through a queue and an application layer that enforces all your rules and the tool that parses the data on the way out could perform other functions typically provided by an RDMS. But is it worth it? That's a lot of expense and possibly error-prone code to replace a technology that is pretty solid and been around a long time. You may have other objectives and pain points not addressed here such as an RDBMS that is tipping over - those are just some considerations.

Perhaps the solution is applying the right technology in the right place to appropriately solve the problem and distributing data across databases to horizontally scale your application and move data that doesn't require a formal schema to a more cost-effective, fast, scalable data store. Your NoSQL database has certain data that does not have strict data integrity rules such as product descriptions, web request data, web content, or other unstructured more free form data and your the data that is highly critical to reconcile for security and auditing purposes remains in your RDBMS. Querying your RDBMS and stuffing the results into a NOSQL work table for use by a batch process that only reads, doesn't update, for the purposes of sending data to a vendor but it later queried and used by the application to flag results or rety errors. That could be interesting. 

Here's how some people are using NoSQL databases for highly scalable applications:

http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html

I always say there are many ways to solve a problem...So I am not suggesting a one size fits all answer in this post. I am simply highlighting some considerations which are missed in an otherwise interesting and useful book like:

Disruptive Possibilities and How Big Data Changes Everything

Picking an optimal solution has many factors and typically a combination of ideas can be more productive than turning your system into a nail....and in the case of financial applications where reconciling at the end of the day is pretty darn important you want to carefully consider the cost of implementing rock solid alternatives and what each technology buys you vs. what will be required to implement something across a large organization with disparate systems feeding and reading financial data. Then again the use of parallel processing of data for end of day applications is pretty intriguing...something to ponder.

Saturday, August 17, 2013

Fun with AWS Architecture

...or What I did on my summer vacation.

OK I only took two days off to review some things I learned in 3 day AWS architecture class...

http://aws.amazon.com/training/architect/

I do actually take real vacations on occasion. But here's what I have been up to...

Route 53 (DNS Hosting)

Moved all my domains off expensive, redundant DNS hosting to Route 53. DNSStuff.com says there's something odd about the configuration however the sites run quickly and allows me to do some interesting things with static content on S3 (more on that to follow). I was able to export the zone file from EasyDNS and import it into Amazon. I'm still testing this out but so far so good. EasyDNS has served me well but is a bit expensive now compared to Amazon DNS and Amazon Route 53 is the only service they offer with 100% SLA because they have servers in something like 43 locations. That's some decent coverage.

http://aws.amazon.com/route53/

IAM Roles (Security)

AWS security allows you to set up roles and when an EC2 instance (server or virtual server) is launched you give it a role. That role is allowed to do certain things. This prevents having to hard code permissions in files on the server or embed security credentials in an application. The application specifies the role it should use and gets a temporary token from the AWS Security Token Service. Amazon has set up instances to securely manage the roles and rotate permissions periodically.

http://docs.aws.amazon.com/IAM/latest/UserGuide/role-usecase-ec2app.html

http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html#AccessingSTS

http://docs.aws.amazon.com/STS/latest/UsingSTS/UsingTokens.html#RequestWithSTS

S3 (Application Storage for Static Content)

S3 is for sometimes changing static content for application servers that doesn't really belong in databases. Say images. You can also host a whole static web site on S3 and is cheaper than using an EC2 instance.

http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html

http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/HowToAliasRRS.html#CreateAliasRRSConsole

There are some benefits to hosting content and images on separate domains for speedier web site loading. I've done this for years but using S3 as opposed to EC2 and saving money is pretty cool. You can also put CloudFront (CDN) in front of it to distribute to multiple parts of the world but I'm not there yet.

For my purposes I set up a static domain and image domain for my content. I made the images publicly accessible. For the other content I want it to go through an analysis engine first so keeping that private and granting access to specific roles:

http://docs.aws.amazon.com/AmazonS3/latest/UG/EditingBucketPermissions.html

http://docs.aws.amazon.com/AmazonS3/latest/dev/AccessPolicyLanguage_UseCases_s3_a.html

DynamoDB (NoSQL Database)

I set up a table in DynamoDB for logging requests. DynamoDB is a NoSQL database hosted on SSD so is very fast. Will be faster to save data here than in a traditional SQL database and possibly cheaper than the SQL Server EC2 instance I'm also hosting. Will do some analysis after getting this all set up. DynamoDB also integrates with Amazon's Hadoop MapReduce service (see below) to run queries and analyze no SQL data efficiently - if data is structured for parallel processing.

http://aws.amazon.com/dynamodb/

SQS (Queue)

I set up an SQS (simple queue service) queue to accept asynchronous request logging messages to feed into an application that will save the data to DynamoDB. The problem when setting up DynamoDB is that it requires estimating throughput and throughput will vary widely for the web sites I host based on time of day. Rather than try to predict I can set the throughput very low to save money there and feed all the requests through the queue. The beauty of the queue is that the queue will save the data up to four days and feed it into DynamoDB over time. I can use the AWS Asynchronous client so it won't hold up the web pages from loading and I'm in no hurry to get it into the database. Additionally this decouples my logging from my application so any issues with logging will not bring down the application - the queue is scalable to handle traffic as needed.

http://aws.amazon.com/sqs/

http://tech.shazam.com/server/using-sqs-to-throttle-dynamodb-throughput/

In addition to the above want to design for horizontal scaling for a flexible, cost-effective architecture that can expand and contract based on usage. Also the way SQS is priced will save money by batching messages:

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/throughput.html


EC2 Instances  (Web Servers)

I changed the code on web servers to test out my theories with one web site. I have a generic servlet which takes a GET HTTP request, sends a message to the logging queue then returns static content. Will add into this business analysis and appropriate decision making. This web server is in a VPC (virtual private cloud - or Amazon's name for a virtual private network) in a public subnet with Internet access to receive web requests.

http://aws.amazon.com/ec2/

VPC (Virtual Private Cloud / Network)

I already had a VPC set up to keep certain servers in private and public subnets but here's some info on that:

http://aws.amazon.com/vpc/

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

Still working on configuring my Cisco firewall and VPN to work with the above. (Time Factor.)

EC2  (Application Server - Queue Reader Service)

Created application that reads messages from the queue. It will log errors to an error log repository and send the request data to DynamoDB.I am reviewing AWS buffering to determine if there is a more cost effective way to retrieve the messages and also the details to ensure no message is every picked up twice. Initially thought this might require multiple threads to talk to DynamoDB if waiting but DynamoDB is so fast only need one even though chose lowest possible thru put.

Also had to handle errors:

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/Query_QueryErrors.html


EMR (Amazon Elastic MapReduce - Hadoop)

Also working on using MapReduce to run queries and generate traffic reports for customers. Amazon's MapReduce basically spins up Hadoop on EC2 instances.

http://aws.amazon.com/elasticmapreduce/

Future plans...


The ultimate goal is to set up a horizontally scalable architecture that can be built from a single file plus backups which expands and contracts automatically based on traffic needs and is fault tolerant. (Fault tolerant such that the Netflix Chaos Monkey could not hurt it: http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html)

Move data, files and application logic around for better security, fault tolerance, performance and cost optimization. Set up load balancing and auto scaling. Check out CloudFront CDN. Should be hosting across multiple availability zones with master/slave database for RDS. Check out the mail and SMS services to replace other things I'm doing. Move backups to Glacier for cost effective archiving. Write CloudFormation scripts to spin up entire architecture from back up with basically one file for disaster recovery. And lots of other fun stuff...