In my last post I explained how to turn on detailed Boto logging to get more information about an AWS error. The specific problem I was having was sporadic timeouts connecting to an S3 Endpoint from a Lambda function.
Update: 7/12/2017 - For the past three nights (since I started tracking more closely) this happens at 8:30 p.m. PST. I post the errors on Twitter when they happen if you want to follow me and compare notes @teriradichel
Update 7/14/2017 - It appears that running the Lambda function one time generates multiple CloudWatch entries. Trying to determine if logs are just duplicate for the function is actually running 2x from one call.
Initially the only information I could see was a generic timeout message when connecting to the S3 bucket.
After turning on the detailed Boto logging I got an error which included this message:
ConnectionError: ('Connection aborted.', gaierror(-3, 'Temporary failure in name resolution'))
Name resolution sounds like DNS name resolution, meaning something is having a hard time translating the S3 bucket URL into an IP address to connect to in order to retrieve or upload files.
In this case it is unlikely that there would be S3 logs since the traffic wouldn't be able to make its way to the S3 bucket.
After getting this error on an off for days and sending the detailed logs to AWS support, it looks like they may have uncovered an issue. Still waiting for the final answer but it seems like a resolution is forthcoming. I am also trying to confirm this has nothing to do with traffic traversing one of my EC2 hosts, but I don't think that is the case.
Update: 7/12/2017 AWS Support closed this issue and said they are working on it. A suggestion was made to architect for timeouts however the Boto library times out after a number of retries in about two minutes. If you ran that function twice it would simply fail twice as I have been running this repeatedly. That would cost double the money and not fix the problem. The other option is to change the Boto library timeout but the max timeout could be is 5 minutes which is max time allowed for a Lambda function.
It looks like the Boto library is configurable and can be set to max time a Lambda function can run: http_socket_timeout = 5
If you experience this issue please report it to AWS Support.
It's always a good idea to get as many logs as possible and submit questions to AWS support to try to get help with errors when possible.
Also it is a very good idea to always use S3 endpoints for data that doesn't need to be publicly accessible, in light of all the recent S3 bucket data breaches. There was another one today...I explain S3 endpoints in my previous post from May 16 (at which point this error was not occurring by the way, so I know this solution can work!)