@TeriRadichel | Cloud. Security. Software.

Tuesday, July 11, 2017

Timeout Connecting to S3 Endpoint From Lambda

In my last post I explained how to turn on detailed Boto logging to get more information about an AWS error. The specific problem I was having was sporadic timeouts connecting to an S3 Endpoint from a Lambda function.

Update: 7/12/2017 - For the past three nights (since I started tracking more closely) this happens at 8:30 p.m. PST. I post the errors on Twitter when they happen if you want to follow me and compare notes @teriradichel

Update 7/14/2017 - It appears that running the Lambda function one time generates multiple CloudWatch entries. Trying to determine if logs are just duplicate for the function is actually running 2x from one call.

Initially the only information I could see was a generic timeout message when connecting to the S3 bucket.

After turning on the detailed Boto logging I got an error which included this message:

ConnectionError: ('Connection aborted.', gaierror(-3, 'Temporary failure in name resolution'))

Name resolution sounds like DNS name resolution, meaning something is having a hard time translating the S3 bucket URL into an IP address to connect to in order to retrieve or upload files.

In this case it is unlikely that there would be S3 logs since the traffic wouldn't be able to make its way to the S3 bucket.

After getting this error on an off for days and sending the detailed logs to AWS support, it looks like they may have uncovered an issue. Still waiting for the final answer but it seems like a resolution is forthcoming. I am also trying to confirm this has nothing to do with traffic traversing one of my EC2 hosts, but I don't think that is the case.

Update: 7/12/2017 AWS Support closed this issue and said they are working on it. A suggestion was made to architect for timeouts however the Boto library times out after a number of retries in about two minutes. If you ran that function twice it would simply fail twice as I have been running this repeatedly. That would cost double the money and not fix the problem. The other option is to change the Boto library timeout but the max timeout could be is 5 minutes which is max time allowed for a Lambda function.

It looks like the Boto library is configurable and can be set to max time a Lambda function can run: http_socket_timeout = 5

If you experience this issue please report it to AWS Support.

It's always a good idea to get as many logs as possible and submit questions to AWS support to try to get help with errors when possible.

Also it is a very good idea to always use S3 endpoints for data that doesn't need to be publicly accessible, in light of all the recent S3 bucket data breaches. There was another one today...I explain S3 endpoints in my previous post from May 16 (at which point this error was not occurring by the way, so I know this solution can work!)

http://websitenotebook.blogspot.com/2017/05/accessing-files-in-s3-via-lambda.html

Sunday, July 09, 2017

Detailed AWS Boto Library Logging

In my last post I explained how to turn on AWS X-RAY to log details about Lambda functions. That feature is probably most useful when you have a number of Lambda functions chained together. It doesn't change your logs it simply helps you trace which function called another to drill down to the original error (I think, still testing).

For detailed logging you still need to manage this within your code. For the AWS Boto library you can add this line to your logs (be careful where you send these logs because it can output a lot of details, some of which you may want to remain private):

boto3.set_stream_logger(name='botocore')

In my last blog post the error message simply stated the Lambda function could not reach the S3 endpoint. This seems to happen randomly. With the detailed boto library logging, it tells us there is a DNS issue reaching the endpoint.

Starting new HTTPS connection (5): firebox-private-cli-bucket-876833387914-us-west-2.s3.amazonaws.com
2017-07-10 03:30:55,224 botocore.endpoint [DEBUG] ConnectionError received when sending HTTP request.
...
ConnectionError: ('Connection aborted.', gaierror(-3, 'Temporary failure in name resolution'))

Since the error occurs in an AWS Lambda function using AWS DNS servers I submitted a support case to get some help with this issue. Having the extra logs and knowing more precisely what failed should help get the problem resolved more quickly.

Enable AWS X-Ray for Lambda Function using CloudFormation

I just realized there's a check box under Lambda configuration tab to enable x-ray. I was trying to enable it in other more complicated ways.

If get this error trying to check the X-RAY button on the configuration tab of a lambda function:

The Configuration tab failed to save. Reason: The provided execution role does not have permissions to call PutTraceSegments on XRAY

This means some permissions have to be added to the Lambda role as the error message states. The message above says the permissions will be automatically added but seems they are not.

An IAM role can perform the following x-ray actions, if allowed:

http://docs.aws.amazon.com/xray/latest/api/API_Operations.html

Out of the full list it the role only requires the "Put" commands so following best practices only add those two actions to my Lambda function role.

Effect: "Allow"

Action:

- "xray:PutTelemetryRecords"

Resource: "*"

Effect: "Allow"

Action:

- "xray:PutTraceSegments"

Resource: "*"

Here's the lambda role I'm using with the actions added:

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/resources/firebox-cli/clirole.yaml

While we're add it let's automatically add tracing to the lambda in CloudFormation. The Lambda Function CloudFormation properties includes something called TracingConfig:

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html#cfn-lambda-function-tracingconfig

Clicking on the tracing config shows tow modes: active and pass through with an explanation. We'll go with the default (Pass-Through):

http://docs.aws.amazon.com/lambda/latest/dg/API_TracingConfig.html

Although this is configurable doesn't appear to actually turn tracing on. Hopefully that's coming soon. For now log into the console and click on the Lambda function configuration tab, advanced settings, and then check the box to enable X-Ray tracing. Save the function and run it to see X-ray in action.

So bummer, the title is a bit misleading but at least we were able to set up our IAM role with the necessary X-Ray permissions.

Once X-ray is turned on, click on the monitoring tab of the Lambda function and you'll see some additional information. The first display has the list of executions:

Click on one of the executions to see more details about the trace:

Finally click on one of the lines in the trace to get even more details such as code stack trace that caused an error - in this case connection to an S3 endpoint is failing (and I have a support ticket out to AWS about this because it is a random occurrence in my case):

Not that this does not necessarily increase your logging but it will provide some additional information and what I hope is that it will track the error to the source in a micro-services environment with APIs calling other APIs. When you need more details in the logs, turn up the logging as explained this example for AWS Boto Python Library: http://websitenotebook.blogspot.com/2017/07/detailed-aws-boto-library-logging.html

I am not sure what visibility Amazon has into these logs but as always be careful what you log where and who has access to see it. Especially since I see no way to delete these traces...still looking. The data is auto-deleted in 30 days and may be able to delete it using the API. http://docs.aws.amazon.com/xray/latest/devguide/xray-guide.pdf

In the case that you want Amazon to see the logs to help troubleshoot X-Ray may help. I'll let you know!

Wednesday, July 05, 2017

Waiting For an EC2 Instance To Initialize Before Completing a Script

Sometimes when running a script to create AWS Resources, an EC2 instance needs to be created and up and running before the script can continue.

In my case I'm instantiating a WatchGuard Firebox Cloud. Then I need to wait until it is ready before I can instantiate a Lambda function to configure it.

I have a script that gets a value (called get_value.sh) as described in my last post on deleting a lambda function from CloudFormation which is used below. I query for an instance that has a particular tag, and the state is either pending or running. Then I use the instance-status-ok command to wait until the instance is ready to run my lambda function.

#this code assumes one firebox in account
#would need to get fancier to handle multiple instances with same tag

aws ec2 describe-instances --filters Name=tag-value,Values=firebox-network-firebox Name=instance-state-name,Values=pending,running > firebox.txt 2>&1

fireboxinstanceid=$(./execute/get_value.sh firebox.txt "InstanceId")

echo "* waiting for firebox instance...see status check column in EC2 console"

aws ec2 wait instance-status-ok --instance-ids $fireboxinstanceid

echo "* firebox instance running"

Once the instance is running the Lambda function that connects to it can execute. Note that just checking the instance is running with the instance-running command will cause problems because the instance isn't actually ready to receive connections.

The above code lives in the following file:

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/execute/action.sh

CloudFormation Won't Delete Lambda

I am noticing that AWS CloudFormation has difficulties deleting my particular Lambda function.

If you are having this problem the trick is to use the CLI to first forcibly detach the ENI, then delete it.

In my case I created a generic get_value.sh script that parses a file for a value from a query using the AWS CLI. It's using some funky bash script so if you wanted you could re-write this in your favorite language that's more readable.

Also note I am still *testing* this code. Make sure it does not delete the wrong ENIs. There is no good way to get a handle on the ENI specific to a Lambda function either unfortunately.

filename=$1;key=$2; var=""

var="$(cat $filename | grep "\"$key\"" | cut -d ':' -f 2- | sed -e 's/^[ \t]*//' -e 's/"//' -e 's/"//' -e 's/,//')"

var="$(echo "${var}" | tr -d '[:space:]')"

echo "$var"

https://github.com/tradichel/FireboxCloudAutomation/blob/master/code/execute/get_value.sh

A series of commands use the above function to query for the Attachment ID and Network Interface ID. These two values are used to force detachment and delete the end. Use the name of the Lambda Function to find the correct ENI to delete which is at the end of the Requester ID field.

#!/bin/sh
#get our lambda ENI as we need to force a detachment
aws ec2 describe-network-interfaces --filter Name="requester-id",Values="*ConfigureFirebox" > lambda-eni.txt 2>&1
attachmentid=$(./execute/get_value.sh lambda-eni.txt "AttachmentId")
if [ "$attachmentid" != "" ]; then
echo "aws ec2 detach-network-interface --attachment-id $attachmentid --force"
aws ec2 detach-network-interface --attachment-id $attachmentid --force

#I don't see a good way to wait for the network interface to detach.
#Pausing a few here and hope that works.
SLEEP 5

networkinterfaceid=$(./execute/get_value.sh lambda-eni.txt "NetworkInterfaceId")
echo "aws ec2 delete-network-interface --network-interface-id $networkinterfaceid"
output=$(aws ec2 delete-network-interface --network-interface-id $networkinterfaceid)
if [ "$output" != "" ]; then
echo "If an error occurs deleting the network interface run the script again. Contacting AWS for a better solution..."
fi
fi

This code lives in the following file:

https://github.com/tradichel/FireboxCloudAutomation/blob/master/code/execute/delete_lambda_eni.sh

Once the above code completes, the Lambda function can delete successfully.

The full code I am testing is found here:

https://github.com/tradichel/FireboxCloudAutomation

Tuesday, June 27, 2017

Setting NTP Server for AWS EC2 instance in User Data

In a previous post I mentioned using a WatchGuard Firebox Cloud as an NTP server.

Here's some sample code.

Our Firebox CloudFormation template provides IP addresses as outputs:

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/resources/firebox-network/firebox.yaml

This python code connects to the Firebox to to enable the NTP server on the Firebox via a lambda function and a key in a secure S3 bucket.

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/resources/firebox-lambda/fireboxconfig.py

Finally, this sample web server shows how pass in the Firebox IP and change the NTP configuration file. Of course you would probably want multiple NTP servers for redundancy.

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/resources/firebox-instances/webserver.yaml

The network acl entry identified by xxxx already exists

Here are some troubleshooting tips if you are getting this error when running a CloudFormation template to create NACLs:

"ResourceStatusReason": "The network acl entry identified by 2012 already exists."

First of all, check that you do not have duplicate rule numbers in your NACL rule list. The rule number property on a CloudFormation NACL resource looks like this:

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-network-acl-entry.html

NaclAWSDNS:
Type: "AWS::EC2::NetworkAclEntry"
Properties:
CidrBlock: !Ref paramsAWSGlobalDNS
Egress: true
NetworkAclId: !Ref FireboxPublicSubnetNacl
Protocol: 17
PortRange:
From: 53
To: 53
RuleAction : "Allow"
RuleNumber : "2012"

You can simply Ctrl-F and search in the file for 2012.

But what if you only have one rule with the RuleNumber 2012. In that case perhaps you renamed some rules. For instance NaclAWSDNS in a prior execution of the template was NaclSomethingElse.

When you renamed the NACL and create the new one you would think CloudFormation would delete the old one first and replace with the new one but doesn't seem to be doing that. It is leaving the old one in place, presumably for security reasons. Perhaps taking the old one out first will remove a DENY rule that is being replaced with a new DENY rule and creates a window of opportunity for a hacker. Or perhaps if the second rule addition fails for some other reason the network is left in a vulnerable state. Who knows why...

The solution in the latter case is to simply give the rule that you are renaming a different rule number that is unused.

Monday, June 19, 2017

not a valid EC private key file

If you are trying to download an SSH key pair from a bucket and getting an error trying to use the key pair (for example with Paramiko in a Lambda function as I explained in earlier blog posts) and you get this error:

"errorType": "SSHException",
"errorMessage": "not a valid EC private key file"

Check to see that your key file is not actually encrypted with server side encryption or a KMS key, in which case you will need to decrypt it before using it.
One way to check this would be to print out the contents of the file in the bucket:

f = open(localkeyfile, 'r')
print (f.read())
f.close()
Then compare the contents to your actual key file and if they don't match, and the content looks like random characters, chances are it has been encrypted in the S3 bucket and needs to be decrypted prior to use.

Update: 6/26/2017

Very strange...

Python code was failing with the error above. I was pretty sure this code worked before. I was looking into the Python Boto3 library to see how it handled encryption and trying different options, but it appeared that the file should just be downloaded unencrypted. There is no download option to specify a server side encryption algorithm (not using a customer key).

http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig

After fiddling with it a while I removed the extra arguments and now the code is working again. Magic!

WatchGuard Firebox Cloud Subscription Services in 11.12.2

WatchGuard Firebox Cloud offers subscriptions for various security services that help keep your network and instances secure. The full suite of all security services can be purchased in a packet called Total Security. More information can be found on the WatchGuard web site:

https://www.watchguard.com/wgrd-products/security-services?utm_source=teriradichel&utm_medium=Blogger&utm_campaign=bloggityblog

The WatchGuard Firebox Cloud BYOL (bring your own license) AMI includes all the services. To get a WatchGuard Firebox Cloud License contact a WatchGuard partner. The WatchGuard Firebox Cloud Pay As You Go AMI has some limitations at this time. Likely the Pay As You Go AMI will have all services in the near future. For more information refer to the WatchGuard Firebox Cloud documentation.

https://www.watchguard.com/wgrd-help/documentation/xtm

To enable WatchGuard Firebox Cloud Subscription Services in 11.12.2 please restart your instance after setting it up. In later versions this extra step will not be necessary. Check the release notes of new versions of the Firebox Cloud for updates, new features and bug fixes.

Sunday, June 18, 2017

Create Network Interfaces Separately in AWS to Tag With Names

Here's a hint - create Network Interfaces (ENIs) separately in AWS CloudFormation so you can assign names via tags. That way when you pull up your list of ENIs you will have a name associated with each ENI that appears in the first column making it easier to identify.

Why ENIs vs. EC2 Instances only? Because when you look in VPC Flow Logs they will be assigned to ENI IDs, not EC2 Instance IDs.

Of course you'll probably want some sort of better reporting tool in the long run but in the short term if you are trying to find an ENI associated with an instance to look up the VPC flow logs for that ENI, might be easier if you have names associated with the ENIs.

Here's an example of ENIs created separately with Name Tags assign:

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/resources/firebox-network/firebox.yaml

ImportValue in UserData with YAML ~ CloudFormation

When trying to concoct a UserData statement in an EC2 CloudFormation in YAML a !SUB function can be used to replace variables in the UserData with values from the same template, or from another template using the !ImportValue directive.

The UserData must be converted to Base64, followed by a Sub function which has two inputs: the first is the string with variables to replace. The second input is the map of variables. When the sub function is created a pipe or | is added to create a multiline statement. Note that UserData seems to be picky about when you use the fn:: format and when you reference something with an exclamation point like !Sub. Maybe this will change but for now this works:

UserData:
Fn::Base64: !Sub
- |

The first argument of the sub function is the UserData section with variables in the format of a dollar sign and enclosed in curly braces like this:

this is some text ${Variable1} followed by more text

The second argument is the list of variables.

When only one variable is required the argument can look like this:

- Variable1:
'Fn::ImportValue': 'NameOfTemplateOutputValue'

In the case of multiple variables, a map is passed in so have to understand the format for a map in YAML. The map format does not have dashes like a list. It is simply key value pairs on separate lines like this:

-
Variable1:
'Fn::ImportValue': 'NameOfTemplateOutputValue1'
Variable2:
'Fn::ImportValue': 'NameOfTemplateOutputValue2'

Variable3:
'Fn::ImportValue': 'NameOfTemplateOutputValue3'

For a complete template in action you can check out this one which is used to capture packets from a WatchGuard Firebox Cloud. It needs to retrieve the SSH key from an S3 bucket, SSH into the Firebox Cloud and it also sets up the instance to use the Firebox Cloud as an NTP server. The IP address for the Firebox Cloud and the S3 bucket name are output values from other templates:

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/resources/firebox-instances/packetcaptureserver.yaml

Tuesday, June 13, 2017

Do You Have HTTP Traffic Hidden in Your HTTPS Page?

I set my Firebox to only allow HTTPS traffic just so I could see what sites don't support SSL. At first I thought the Linksys site did not support SSL. I checked it on my phone and was able to get to the site via https://www.linksys.com. However if you try to reach the site via HTTPS only via Firewall rules that enforce this, you cannot get to the site.

Looking into the firewall logs I can see that when I try to reach that site via HTTPS there are numerous blocked requests to HTTP content on the Amazon network (AWS?). I didn't dig into the code but this could be images, scripts or other content embedded into the site.

Including Non-HTTPS traffic in an HTTPS site weakens the security of the site overall. Here's a nice article from Google on the topic:

https://developers.google.com/web/fundamentals/security/prevent-mixed-content/what-is-mixed-content

Developers of mobile sites and apps should also check that ALL content from a mobile site or APP is also encrypted. It seems that there are some mobile apps out there as well that don't have fully encrypted content.

Linksys makes a lot of consumer products that are easy for the consumer market to use which is great. Hopefully the web site will be updated to a full HTTPS site soon and the mobile app will be checked out as well to make sure it is encrypted end to end.

Many companies believe that simply having HTTPS web sites is sufficient without understanding the implications of mixed content. Turn on your firewall policies to allow HTTPS only, no HTTP and see what you find...

Monday, June 12, 2017

Why A VPN for AWS Cross Region Traffic?

Although HTTPS API request traffic is encrypted, AWS states best practice for cross-region communication is to use a VPN. A VPN will protect the details of the endpoints communicating with each other over the public Internet, and will display less information to an attacker in the network packets. Additionally if you force all traffic between two endpoints over a VPN, if some of the communications are unencrypted by a developer or an application that doesn't properly secure traffic, the VPN will provide an encrypted tunnel for the two endpoints of an application to communicate over the Internet even when SSL is incorrectly implemented, vulnerable to attack or non-existent.

An SSL VPN will encrypt the data as it flows between regions, however it will operate at layer 7 of the OSI model, exposing more details within the packet. An IPSEC VPN, such as that provided by a WatchGuard Hardware Firebox or WatchGuard Firebox Cloud will operate at layer 3 and provide greater protection by exposing less data. It encrypts the data at a point in the packet which hides some of the details exposed by the time the packet is at layer 7.

When relying on a VPN, it is very important to secure the hosts providing the VPN. In other words if you have a hardware device on one side and a software VPN endpoint in AWS, those two endpoints need to be secure because they are encrypting all your traffic. Anyone who can compromise those two hosts could get into your network traffic.

In the case of SSL, every host that is connecting to one another must be properly configured and protected. For instance if you have many APIs for different applications in AWS and you are communicating to those applications from hosts in your data center via HTTPS REST APIs, you must make sure every single one of those applications has SSL properly encrypted to prevent data leaks. One single misconfiguration could be a hole into your network that would allow attackers to scan your internal network looking for more holes, as I will be discussing in an presentation at AWS Community Data in San Francisco on Thursday.

With a VPN the traffic is encrypted at the two VPN endpoints, such as a hardware and software device to and from AWS or two software devices between two AWS regions. SSL encrypts the data between the two SSL endpoints which could be VPN endpoints or two HTTPS REST API endpoints. The implication here is that if you only use a VPN, when the traffic leaves your VPN tunnel and goes further into your network it is unencrypted. For this reason, it's probably a good idea to use both SSL and an IPSEC VPN tunnel for cross region communications and other places where traffic is exposed to the Internet if possible.

Yum updates on AWS ~ Which Region?

I have been digging into network traffic to truly understand all the sources and destinations when using AWS services - specifically S3 and Yum.

This is important to understand if you have any legal requirements that dictate all your traffic must be maintained in a particular region. It is also important to understand what traffic is encrypted, unencrypted, inside and outside your VPC (Virtual Private Cloud). Additionally it is always a good idea to know the exact source of your software packages.

We would presume that yum updates should go over an encrypted, private network, but this does not always seem to be the case. This is why it's a good idea to monitor your network and ensure your software packages are coming from the source you expect. I would recommend an internal software repository carefully monitored to ensure the updates are always from the correct source.

In my case I was running a Yum update from the us-west-2 AWS region (Oregon, US) and I noticed traffic going to the southeast Asia-Pacific region. Since I am testing and the only person in this account it was easy to spot. Having updates come from an alternate region could pose a problem in some scenarios from a legal perspective. There are jurisdictional and regulatory reasons why some companies need to ensure all their traffic is within the region they define and use.

On top of that, the traffic is on port 80 which I presume is unencrypted (haven't looked yet). If this was all internal to AWS you might live with that, however in my last exploratory post I found out from AWS support that traffic to buckets in different regions will traverse the Internet. Since Yum updates are hosted in S3 if they went from Oregon to Southeast Asia, does that mean I'm getting my Yum updates unencrypted over the Internet? Something to look into if that was my only option but fortunately, there is a solution.

AWS support recommended a plugin called "fastestmirror" which will get you to the closest Yum repo.

To install the fastestmiror plugin:

sudo yum install yum-plugin-fastestmirror -y

change fastest mirror_enabled=0 to fastestmirror_enabled=1 in amzn-main.repo and amen-updates.repo.

sudo yum clean all
sudo yum updates -y

The plugin will help you get your updates from the correct source but if you want to be absolutely sure your updates are coming from the correct source, you'll have to set up networking rules to block the alternatives.

For more about encrypting cross-region traffic: VPNs for AWS Cross-Region Encryption

Sunday, June 11, 2017

Where Does Traffic Flow for AWS S3

I've been working with AWS S3 and S3 endpoints lately and digging into the the nitty gritty of how it works with the support team at AWS in relation to network traffic. There are some questions that the documentation does not explicitly state so just wanted to get clear answers.

I have confirmed that in-region S3 traffic to an S3 endpoint never traverses the public Internet. ~~However, if you access a bucket in a different region, that will traverse the Interent.~~

8/17 Update from @colmmacc:

Unless you're sending data in/out of China, data between EC2 and S3 in a different region stays on the AWS backbone. That traffic has stayed on the AWS backbone for quite a while, but with the option to fail over to the Internet transit if two or three plus links fail. In our customer messaging we’ve given the most under promise, over deliver kind of answer ...... after a few years of data on fault rates, fiber cuts, our redundancy, etc, we got comfortable enough to turn off the tertiary failover.

Why do I care?

Multiple reasons.

One of the interesting things I forgot but now remember thanks to my co-worker, @kolbyallen, is that Yum updates from Amazon Yum repositories require access to S3 buckets. Looking at the traffic logs and the source and destination IP addresses this became obvious once again.

In order to provide access to S3 buckets you need to open up the appropriate S3 CIDR ranges in your networking rules from the AWS IP ranges. That means you have to have access through both security groups and NACLs to access these S3 bucket public IP ranges.

For security groups you might create one S3 access security group but hopefully your instances don't already have too many security groups associated. For the NACLs it gets complicated and eats up a lot of the already limited number of rules if you prefer to go that route.

Not only that, when I tested Yum updates the traffic was going over port 80 (unencrypted? I didn't check the actual packets but I presume) and S3 CLI calls for commands like cp use port 443. In addition I found that the CLI commands required access not only to the region I am using but also us-west-east1 in order to function. So if you really wanted to create rules for all those CIDRs that would be a lot of rules.

So why not just open up port 80 to everywhere? Well, if you just open up port 80 to any/any, then you cannot see if you are getting any DENY entries that indicate a problem. DENY log entries tell you that something is either misconfigured or someone is doing something nefarious.

And if you wonder why I care about that - If I am in the us-west-2 region I would expect that calls to other completely unrelated regions would not be present in my logs. Investigating a few things...

Update on said investigation: Do you know which region your YUM software updates are coming from on AWS?

S3 Endpoints - Traversing the Internet or Not?

If you use an S3 endpoint, you can access buckets without having to set up a NAT to allow the outbound traffic, however that does not mean that your traffic is necessarily not traversing the Internet. It could be going to an AWS NAT and from there traversing the Internet after that point. When I read the S3 endpoint documentation last it was not clear to me whether or not any of the traffic from my instance to the bucket was traversing the Interent. I think it does not per my last conversation with AWS support, even though the IPs are public. This is what my coworker and I understand at this point.

In addition, the S3 endpoint does not, without creating explicit policies, offer protection from some misconfiguration within AWS that might send your data to an alternate bucket, nor does a CIDR rule that allows access to all of S3. If you simply set up an endpoint with the * resource then your traffic could end up in someone else's bucket in someone else's account. Cross account bucket access was not visible in the S3 console last time I checked, so these buckets won't be obvious to you when they exist. I should test this again...Better would be to create a policy that only allows access to buckets in your own account.

Yes I am aware how painful it is to get the bucket policies and roles for a bucket working. Throw an S3 endpoint on top of that and it's going to get even trickier. But I've done it. You can check out the code in this repo if it helps.

https://github.com/tradichel/FireboxCloudAutomation

But this brings us back to our problem of the AWS Yum repos hosted in S3. If we set explicit policies on our S3 endpoints to only access our own buckets, we can't get to the AWS buckets used for Yum repos potentially, and some other AWS service repos. Without having the names of those Amazon-owned buckets to add to our S3 endpoint policies, if they are affected by these S3 endpoints, those service calls will fail. Without having those service calls over an S3 endpoint potentially we have software updates going over port 80 unencrypted on the Internet.

Do we even need to set up an S3 endpoint for the AWS repos and services? Or is that magically handled by AWS and all maintained within their network? Still looking for clarification on that last point.

variable names in Fn::Sub syntax must contain only alphanumeric characters, underscores, periods, and colons

While trying to use an ImportValue the way specified on the web sites in the UserData section of a CloudFormation template I go this error:

A client error (ValidationError) occurred when calling the CreateStack operation: Template error: variable names in Fn::Sub syntax must contain only alphanumeric characters, underscores, periods, and colons

The error seems straightforward but it is not exactly accurate, because the UserData section also allows things like # for comments, single quotes, and other characters. I whittled it down to disallowing the curly braces I was attempting to use based on other web sites that are obviously incorrect.

Getting the Index or Count in a Filtered List in Python

Attempting to get a count of filtered items while looping through the original list in Python will yield the count of the original list or the original index.

In other words if I want to query for "eggs" and "ham" in green, eggs, and, ham the index will be

2 eggs
4 ham

The count would be 4.

In order to obtain the index of items only based on the filtered list, or a count of the filtered list, perform actions on the output of the filtered list, not the original list.

pseudocode:

list=green,eggs,and,ham
filteredlist = filter list to return eggs, ham

filteredlist.index
filteredlist.count

The second approach gives me what I want:

1 eggs
2 ham

count: 2

This code is not pretty, but it has to be in bash for now which incorporates Python to do the above. It filters the larger list of AWS IP ranges down to the specific ranges I am seeking and grabs the index based on the second list to create a parameter name for each of the items returned.

https://github.com/tradichel/PacketCaptureAWS/blob/master/code/execute/get_s3_ips.sh

Using a WatchGuard Firebox for an NTP Server on AWS

When your instances run on AWS by default they will reach out to the Internet to an NTP service to update the clock that is used to create all the timestamps in system logs, and other time related functions. A more detailed explanation in found here on the Amazon Web Site:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

For better security you'll want to limit which subnets have Internet access by limiting the route tables that all an Internet Gateway. On the other hand, it is really important to have all your instances synchronized with accurate times, so that in the case of a security incident, the logs can be correctly correlated.

To overcome this issue we can use a WatchGuard Firebox or similar device to get the time from the Internet and have all the instances check the WatchGuard Firebox on the internal network to get time stamps.

To configure the WatchGuard Firebox as an NTP server the NTP option and the option to use the Firebox as an NTP server must be enabled.

http://www.watchguard.com/help/docs/fireware/11/en-US/Content/en-US/basicadmin/NTP_server_enable_add_c.html

I have some code for automated deployment of a WatchGuard Firebox here:

https://github.com/tradichel/FireboxCloudAutomation

There's a Lambda function that connects to the Firebox to make configuration changes here:

https://github.com/tradichel/FireboxCloudAutomation/blob/master/code/resources/firebox-lambda/fireboxconfig.py

The following commands can be added to that script to enable NTP:

#make Firebox an NTP server
command="ntp enable\n"
channel.send(command)
time.sleep(3)

command="ntp device-as-server enable\n"
channel.send(command)
time.sleep(3)

Note the space after ntp.

Once you have your Firebox set up as an NTP server, go back and update the instances as explained in the article at the top of the page to use the Firebox as an NTP server instead of the Amazon default NTP servers.

You'll need to ensure port 123 is open to and from the NTP server for the UDP protocol.

Of course you'll want to configure your Firebox Cloud in a high availability configuration if you want to ensure that you always have NTP available. This blog post presents one way to create a high availability NAT:

https://aws.amazon.com/articles/2781451301784570

Thinking about other options for a scalable, HA configuration of a Firebox Cloud for future blog posts.

Tuesday, June 06, 2017

One of the configured repositories failed (Unknown), and yum doesn't have enough cached data to continue.

Running AWS Minimal Linux instance got this error in the logs on start up:

Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Tue, 06 Jun 2017 09:10:02 +0000. Up 18.86 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper

One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:

1. Contact the upstream for the repository and get them to fix the problem.

2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).

3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:

yum-config-manager --disable <repoid>

4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:

yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

There's a lot of information there but if you scroll down a bit farther you'll see something like this:

Could not retrieve mirrorlist http://repo.us-west-2.amazonaws.com/latest/main/mirror.listerror was

12: Timeout on http://repo.us-west-2.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 10001 milliseconds')

This means the network configuration is blocking access the Amazon Yum Repo.

When I look up the IP address associated with the above repo I get at this moment:

nslookup repo.us-west-2.amazonaws.com

Name: s3-us-west-2-w.amazonaws.com

Address: 52.218.144.34

OK so apparently the AWS Yum repositories are hosted on S3.

We can look up the latest AWS IP Ranges here:

https://ip-ranges.amazonaws.com/ip-ranges.json

AWS IP Range Updates Are Published As Explained Here:

http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

There could be a few things causing this:

1. Security group rules disallow access to S3 IP ranges.

2. NACLs do not allow access to S3 IP ranges

3. No Internet Gateway route in subnet route table

4. The traffic routes from this subnet to another subnet in or outside AWS that can access the Internet, but traffic is blocked by a proxy, NAT or firewall that is disallowing the traffic.

As noted using a NAT on AWS is challenging but can be used to route traffic for updates from instances in a private network to the public Internet.

https://cloudonaut.io/private-subnets-are-broken-on-aws/

It may also be a good idea to internally host software on a repository that does not traverse the Internet to get updates. Consider Nexus, Artifactory or similar solutions. Then reaching out to the Internet would be limited to a few computers used to run the repository. A private repository could also be hosted on S3 with an S3 Endpoint.

Sunday, June 04, 2017

Find all the AMI IDs from a specific vendor in the AWS Marketplace

I am trying to find a way to get a list of AMIs from a specific vendor in the AWS Marketplace. I thought I figured out a way to do this but turns out the owner is just the "AWS Marketplace" not a specific vendor.

First of all I know the description from the vendor (WatchGuard in this case) of the particular AMI I am seeking (for a WatchGuard Firebox Cloud). The AMI starts with "firebox*". I can query for "firebox*" however the issue would be if someone else published an AMI with a similar name and I accidentally used an AMI that was not actually from the vendor whose AMI I was trying to use. In the past this issue existed on Amazon where people published AMIs with names similar to reputable AMIs to confuse new users as to which AMI they should actually use.

To ensure I am using an AMI that is actually from WatchGuard I can query the AMIs I know are from WatchGuard for the "Owner ID" like this:

aws ec2 describe-images --filters "Name=description,Values=firebox*" | grep Owner

I will get back a list something like this:

"ImageOwnerAlias": "aws-marketplace",
"OwnerId": "679593333241",
"ImageOwnerAlias": "aws-marketplace",
"OwnerId": "679593333241",
"ImageOwnerAlias": "aws-marketplace",
"OwnerId": "679593333241",
"ImageOwnerAlias": "aws-marketplace",
"OwnerId": "679593333241",
"ImageOwnerAlias": "aws-marketplace",
"OwnerId": "679593333241",

From here I can see that the owner ID for the AMI from WatchGuard is 679593333241. Now I can use that in my query to get all the AMIs from WatchGuard:

aws ec2 describe-images --filters "Name=description,Values=firebox*" --owners 679593333241

Oh but wait....The ImageOwnerAlias is "aws-marketplace". Does that ID truly only relate to WatchGuard AMIs? Let's query without the description. This query takes quite a while to execute:

aws ec2 describe-images --filters "Name=description,Values=firebox*" --owners 679593333241

Unfortunately this pulls back everything in the AWS Marketplace, not just my WatchGuard AMIs. Will put in a feature request to see if this can be fixed somehow. In the end, I think this is the best I can do to get the latest Firebox AMI but asking AWS support to see if they have a better answer.

amidesc=$(aws ec2 describe-images --filters "Name=description,Values=firebox*" --owners 679593333241 | grep "Description" | grep "pay" | sort -r | grep -m1 -v "rc" | cut -d ":" -f 2 | sed -e 's/^[[:space:]]*//')

imageid=$(aws ec2 describe-images --owners 679593333241 --filters "Name=description,Values=$amidesc" | grep ImageId | cut -d ":" -f 2 | sed -e 's/^[[:space:]]*//')

echo $imageid

Using the above script I can get a list of the available WatchGuard AMIs in my account and region and ask the user to select one, which will be used in the subsequent script:

You can see this code in action in the following file:

https://github.com/tradichel/FireboxCloudAutomation/blob/master/code/run.sh

'capabilities' failed to satisfy constraint: Member must satisfy constraint: [Member must satisfy enum value set: [CAPABILITY_NAMED_IAM, CAPABILITY_IAM]]

When running cloud formation templates certain templates require IAM permissions. You will need to add this to your IAM call:

--capabilities CAPABILITY_NAMED_IAM

The full command may look something like this:

aws cloudformation create-stack --stack-name firebox-nat-vpc --template-body file://resources/firebox-nat/vpc.yaml --capabilities CAPABILITY_NAMED_IAM --parameters ParameterKey=ParamStackName,ParameterValue=packet-capture-vpc

If you get this error:

'capabilities' failed to satisfy constraint: Member must satisfy constraint: [Member must satisfy enum value set: [CAPABILITY_NAMED_IAM, CAPABILITY_IAM]]

check to see what is following the capabilities switch to make sure it is correct. Although the error is related to the capabilities switch, the error may be due to something after that flag which is malformed. For example I left out the --parameters switch when dynamically piecing together the CloudFormation call and produced the following by accident and got the above error - noticed there is no --parameters switch:

aws cloudformation create-stack --stack-name firebox-nat-vpc --template-body file://resources/firebox-nat/vpc.yaml --capabilities CAPABILITY_NAMED_IAM ParameterKey=ParamStackName,ParameterValue=packet-capture-vpc

To see this code in action check out this GitHub repo:

https://github.com/tradichel/FireboxCloudAutomation

Here's the file that is generating the CloudFormation calls with parameters and capabilities:

https://github.com/tradichel/FireboxCloudAutomation/blob/master/code/execute/action.sh

Saturday, June 03, 2017

IP Spoofing

IP Spoofing means that someone likely manually crafted a network packet and put in a bogus IP address as the source address.

A network packet consists of a number of fields following a certain protocol. For more information see this post which contains a diagram and description of the fields in an IP packet header. One of these fields in the IP portion of the packet header is the source IP address. The source IP address should be the IP address of the device that sent the packet, and the IP address to which any response packets should return. If someone puts in a bogus IP address as the source IP, the sender won't be able to receive the responses to the network request. The response will go to the bogus IP address.

Why would someone put in a false return IP address? One reason would be to DDOS another computer. Many packets with the incorrect source IP address could be redirected to someone that malicious packet crafter wants to inundate with packets to take down their network equipment or host machines that cannot handle the load. There could be other malicious reasons for this.

An IP packet captured with tcpdump will have more or less information depending on what flags you set when you run the command, but basically it will look something like this:

 0x0000:  0001 0800 0604 0001 02a5 c63c 2226 0a00
 0x0010:  0001 0000 0000 0000 0a00 006d 0000 0000
 0x0020:  0000 0000 0000 0000 0000

That hex data can be converted to a human readable form. If you really want to nerd out you can break down the IP packet header as explained here to determine the source IP in your packets:

http://websitenotebook.blogspot.com/2014/05/decoding-ip-header-example.html

Another option to learn about decoding packets would be to take classes at SANS Institute, starting with the SANS Bootcamp:

https://www.sans.org/course/security-essentials-bootcamp-style

If a packet is truly spoofed, the IP address in the source address field will be incorrect, not matching the host that truly sent the packet. In some cases however, certain networking equipment has to try to figure out if an IP is spoofed by seeing if it can return a packet to the host. It may not be able to and assume the IP is spoofed, when the actual problem is that for some reason it received a packet but cannot send a response back to the host due to network routes or firewall rules.

If you know the return IP address in your packets is good, check the route tables and network configuration to make sure the packet is able to return to the host that sent the packet.

If the source IP is actually not valid, hopefully your network security services will detect that before it has a negative impact on your network or applications and block the traffic. Some external providers offer DDOS protection which will handle any extra load produced by traffic like this. AWS has some built in DDOS protection and a higher end service for larger customers called Shield. This SANS white paper has more information about defending against DDOS Attacks.

Error: The Device Administrator admin from x.x.x.x has selected Edit Mode. You cannot select Edit Mode until admin has selected View Mode.

If you see this error trying while working on the command line of WatchGuard Firebox:

WG#configure %Error: The Device Administrator admin from x.x.x.x has selected Edit Mode. You cannot select Edit Mode until admin has selected View Mode.

This could mean that you are trying to run a command after you typed configure to enter the configuration mode and the command needs to be run in main mode.

If you are in configure mode you can type "exit" to return to the main, read only mode.

For example, when trying to run tcpdump on a Firebox in configure mode you would get this error because tcpdump should be run in the main, read only mode.

For more information refer to the latest version of the WatchGuard Firebox CLI Documentation

Wednesday, May 24, 2017

0.0.0.0/0 in AWS Route Tables and Network Rules

Public Safety Announcement:

0.0.0.0/0 should be used sparingly. It means any host on any IP address (or any IPv4 address to be precise) on the Internet can use this route to connect to things on the other side. That means any host in this subnet can contact any host on the Internet and vice versa.

Here's a random sampling of traffic that hit my Firebox Cloud as soon as I set it up on the Internet, and why you might want to open up this type of traffic sparingly. As soon as you do this various nefarious (and accidental) traffic will start hitting any host with access from the Internet.

If you do open up to the Internet you might want security appliances inspecting traffic to and from your hosts prevent malicious traffic from accessing them. For example a WatchGuard Firebox would have prevented anyone using it from being infected by the WannaCry virus if configured properly. You can also use NACLs and Security Groups to limit access as will be noted in upcoming blog posts.

Route Tables: Protecting Your Network

When you set up your network it is very important to understand how route tables work and how they open up access to your network in unintended ways on AWS. Route tables define where traffic can flow on your network. They provide the routes or "roads" traffic can take to get from point A to point B. The following example architecture for a Firebox Cloud Architecture with a management network will be used to explain route tables. These routes are also explained in the AWS NAT documentation.

Let's say you want the resources in a subnet to have Internet access. In this case you create a PUBLIC subnet and put an Internet Gateway in the route table. The Internet Gateway allows resources in that subnet can get to the Internet and hosts on the Internet can connect to resources in this subnet (unless blocked by other security controls such as security groups, NACLs, or a Firebox Cloud):

If we look at the route table we can see that there are two routes.

The first route is added by default for the local VPC IP Range or CIDR. Since I created my VPC with the CIDR 10.0.0.0/16 the local route will allow any host in the associated subnet(s) to send data to or receive data from any host with an IP of in the IP range of 10.0.0.0/16 or 10.0.0.0 to 10.0.255.255. Again this can be further restricted by other controls.

The second route allows any host in the network to send data to or from the Internet via the AWS Internet Gateway (igw-xxxxxxxx) to anywhere: 0.0.0.0/0.

If you want to keep a subnet PRIVATE (meaning the hosts in the subnet cannot directly access the Internet) you need to ensure the subnet does not have a route for an Internet Gateway. In addition you need to ensure that any routes in that subnet to do not in turn route to something that ultimately can route or proxy that traffic to the Internet.

Here's a private subnet:

If we click on the Route Table tab we can see the following:

The local VPC route is again added by default.

In addition we allow any host in this subnet to send traffic to the PRIVATE (trusted) ENI of a WatchGuard Firebox Cloud. That means nothing can get out of our Private Subnet to the Internet without going through our Firebox Cloud if these are the only two subnets in our VPC.

There is NO route to the Internet Gateway. Therefore there is no way for hosts in this subnet to get directly to the Internet from this particular subnet (see caveats at the end of this article).

In other words any host in this subnet trying to send traffic and looking for a route to get it from point A to point B has two options - send it to something else in the VPC or send it to the Firebox Cloud.

In addition, we will also add a route to access an S3 Endpoint as explained previously, to ensure all access to our S3 bucket for managing the Firebox is in the PRIVATE network, never on the Internet and only from our management network to protect our Firebox SSH key:

If I go check out my elastic network interfaces I can verify that my Firebox Cloud Private or Trusted ENI is in this private subnet. That means any traffic to this private ENI must come from within the VPC and this private ENI cannot send data to or receive data from the Internet. The traffic can route through the Firebox to the public ENI which is how we inspect all the traffic getting to and coming from the Internet.

I can look at the details of the ENIs to make sure they are configured correctly. According to the WatchGuard Firebox Cloud documentation, the public Interface should be on eth0 and the private should be on eth1.

eth0 - Public

eth1 - Private

By architecting the network this way, we can make sure traffic that is allowed to access the Firebox management key and also the Firebox CLI is only allowed in the private network (if the rest of our network doesn't have extraneous holes in it). Additionally, we can also add security groups to whitelist and further restrict access as will be discussed in an upcoming blog post.

Now imagine you add a route to the Internet Gateway in one of the subnets that are currently private. You have just allowed them to bypass the Firebox and get to the Internet without inspection. You may also be sending management of the Firebox over the Internet to the public ENI, instead of to the private or trusted ENI.

Additionally as shown in the diagram for the AWS NAT documentation, traffic to hosts placed inside the public subnet can bypass the NAT, which in this case is our Firebox Cloud. There are other possible configurations but this is the one we are considering at the moment, so if you are using this configuration be aware of this fact and don't put hosts in the Firebox public subnet that you want to be protected by the Firebox cloud.

Here's another scenario: you add a route from this VPC to another VPC and another subnet in your account or another account using VPC Peering. If that other VPC subnet has Internet access, you have just allowed traffic to bypass the Firebox potentially and get to the Internet.

You may be thinking that due to the local route, anything in the private subnets can route to the public subnet, then to the Internet. This is true. A host in the private subnet, without any other controls, could connect to something in the local VPC public subnet. The host in the public subnet with Internet access could be used as a proxy to send data to the Internet (including malware stealing data), so it's best to limit what is in the public subnet and use Security Groups and NACLS to further restrict the traffic, as will be explained.

Understand the your network routes. Data on your network is like water. If you open a hole...it will flow there.

Teri Radichel - Software Programmer and System Architect - Cloud + Security