I've been working with AWS S3 and S3 endpoints lately and digging into the the nitty gritty of how it works with the support team at AWS in relation to network traffic. There are some questions that the documentation does not explicitly state so just wanted to get clear answers.
I have confirmed that in-region S3 traffic to an S3 endpoint never traverses the public Internet. However, if you access a bucket in a different region, that will traverse the Interent.
Why do I care?
One of the interesting things I forgot but now remember thanks to my co-worker, @kolbyallen, is that Yum updates from Amazon Yum repositories require access to S3 buckets. Looking at the traffic logs and the source and destination IP addresses this became obvious once again.
In order to provide access to S3 buckets you need to open up the appropriate S3 CIDR ranges in your networking rules from the AWS IP ranges. That means you have to have access through both security groups and NACLs to access these S3 bucket public IP ranges.
For security groups you might create one S3 access security group but hopefully your instances don't already have too many security groups associated. For the NACLs it gets complicated and eats up a lot of the already limited number of rules if you prefer to go that route.
Not only that, when I tested Yum updates the traffic was going over port 80 (unencrypted? I didn't check the actual packets but I presume) and S3 CLI calls for commands like cp use port 443. In addition I found that the CLI commands required access not only to the region I am using but also us-west-east1 in order to function. So if you really wanted to create rules for all those CIDRs that would be a lot of rules.
So why not just open up port 80 to everywhere? Well, if you just open up port 80 to any/any, then you cannot see if you are getting any DENY entries that indicate a problem. DENY log entries tell you that something is either misconfigured or someone is doing something nefarious.
And if you wonder why I care about that - If I am in the us-west-2 region I would expect that calls to other completely unrelated regions would not be present in my logs. Investigating a few things...
Update on said investigation: Do you know which region your YUM software updates are coming from on AWS?
S3 Endpoints - Traversing the Internet or Not?
If you use an S3 endpoint, you can access buckets without having to set up a NAT to allow the outbound traffic, however that does not mean that your traffic is necessarily not traversing the Internet. It could be going to an AWS NAT and from there traversing the Internet after that point. When I read the S3 endpoint documentation last it was not clear to me whether or not any of the traffic from my instance to the bucket was traversing the Interent. I think it does not per my last conversation with AWS support, even though the IPs are public. This is what my coworker and I understand at this point.
In addition, the S3 endpoint does not, without creating explicit policies, offer protection from some misconfiguration within AWS that might send your data to an alternate bucket, nor does a CIDR rule that allows access to all of S3. If you simply set up an endpoint with the * resource then your traffic could end up in someone else's bucket in someone else's account. Cross account bucket access was not visible in the S3 console last time I checked, so these buckets won't be obvious to you when they exist. I should test this again...Better would be to create a policy that only allows access to buckets in your own account.
Yes I am aware how painful it is to get the bucket policies and roles for a bucket working. Throw an S3 endpoint on top of that and it's going to get even trickier. But I've done it. You can check out the code in this repo if it helps.
But this brings us back to our problem of the AWS Yum repos hosted in S3. If we set explicit policies on our S3 endpoints to only access our own buckets, we can't get to the AWS buckets used for Yum repos potentially, and some other AWS service repos. Without having the names of those Amazon-owned buckets to add to our S3 endpoint policies, if they are affected by these S3 endpoints, those service calls will fail. Without having those service calls over an S3 endpoint potentially we have software updates going over port 80 unencrypted on the Internet.
Do we even need to set up an S3 endpoint for the AWS repos and services? Or is that magically handled by AWS and all maintained within their network? Still looking for clarification on that last point.