Applying GENEVE encapsulation (VPC<->VPC NAT at AWS)
This post is going to be about the coolest thing I’ve learned so far from working at AWS: a literal application of Geneve encapsulation to do VPC to VPC NAT which is how you get packets from one VPC to another VPC (effectively send packets from one datacenter to possibly another datacenter, or virtualized network entirely).
Precursor
Before I joined AWS under a team in the VPC org my understanding of networking was limited understandably. I mean, you can’t just “read books on networking” to actually “get” networking or understand its applications in the wild.
But here’s the thing: it’s really easy to write off knowing these things as unimportant.
Just like knowing low level theory about how to write a compiler, these things don’t seem directly applicable to anything but they give you very valuable insights into almost any job you do that involves networking or working in the cloud.
A lot of core products or things that gets built at AWS (especially in the VPC org) are built on knowledge in this area to make it more convenient for users to work in the wild. It is really important to have a grasp of this stuff.
From working here I’ve found that it has been a major uplevel, exposing me to to real life networking concepts and its applications. I am regularly surprised at how useful the work we do is in a real-world setting and how not a whole lot of it is pure theory (but applications of things you might learn in school)
Understanding the utility of the TCP/IP networking stack
When you hear the “networking stack” you need understand it in terms of layers of abstraction that build on top of each other. At the lowest level you have physical data transmission, the next level you have ethernet frames (data link), the next level contains IP frames (network layer) and the final level the application frame (transport frame, TCP/UDP primarily). Colloquially we just refer to these as layers 1-4.
The really important thing to understand is that on its own this information isn’t that useful especially because most people are only familiar with layer 4 the application so the other layers are less meaningful but this would be a mistake.
One way to understand the usefulness of the networking stack is in terms of what constraints you have: when you go up the networking stack you have less constraints. TCP on the application layer enforces a socket per connection and has strict ordering rules, but when you get to layer 3 (IP aka TUN layer) you can still inspect TCP packets without needing to be limited by TCP.
In practice at AWS we use TUN devices all the time. With a TUN device you can inspect packets, read the IP header to know where it’s going (it contains a source and destination address for IPv4/IPv6), then read TCP or UDP frames to re-write it to wherever you want to go. If you go down even further to TAP devices on layer 2 then you can inspect whole ethernet frames (including all the frames above it).
Within any networking context at AWS like an EC2 instance that is literally what is going on. If you send a packet to any gateway you aren’t limited to sniffing layer 4 packets, the entire packet frame gets sniffed and re-written as it allows the gateway to rewrite packets appropriately based on the rules the gateway is following hence why they are so useful.
Geneve encapsulation
RFC 8926 describes Geneve encapsulation. Don’t make the mistake of waving this off as too difficult to parse because I can assure you a high level understanding is good enough to be able to make sense of it.
Geneve encapsulation is most relevant from sections 3.1 to 3.3. The main thing you need to know to understand GENEVE is the following: it exists as a layer 4 (application layer) UDP way of encapsulating layer 2 - layer 4 information for network virtualization. So you can send these GENEVE encapsulated packets using a user space socket BUT it can communicate a lot more information because it has both an inner and outer packet frame. The inner frame can contain an entire ethernet frame. Any time you have a need to do network virtualization it is most likely you would have to use GENEVE encapsulation as it is the standard (supercedes VXLAN).
The included figure of GENEVE on IPv4 shows that your payload can be anything you want it to be or what the receiver expects. If you’re encapsulating a regular TCP packet you would just write the IPv4 header and TCP frame here.
Importantly the details of how these packets are used are up to the network you’re connecting to
Since GENEVE operates within the context of UDP it has a natural analogue to the connectionless semantics of Ethernet and IP which is why it lends itself so well to be used to directly pipe in data from a TUN device (as opposed to something like TCP).
GENEVE is supported for IPv4 and IPv6.
Why is this useful?
This has real world applications! At AWS you can’t actually just “send data to a customer’s VPC” (and there are definitely use cases for doing this, eg transit gateway or connecting VPCs to one another, private link) and so you have to follow the rules of GENEVE in order to do so. If you want to build something like a gateway that sends packet from customer A’s VPC to customer B’s VPC this is how you would do it.
In practice it’s more like another team set up the receiving end (which is functionally a NIC or switch) and this NIC exist 1:1 with each AZ in your VPC and all you have to do is follow this encapsulation format to make use of it and #innovate on top of the AWS substrate.
Conclusion
This post was brief but something that I felt was necessary to capture as I haven’t found any other articles about it online.
Thanks for reading!
Back