Archive for the ‘In The Field’ Category

IPv6, IPv4, and ARP on Xen for VPS

Thursday, October 9th, 2008

A little background is in order. Recently, my team released the DynDNS Spring ServerSM VPS platform based on Xen, with one of the main features being dual-stack IPv4 and IPv6 connectivity out-of-the-box. For networking the virtual servers, we’ve chosen a bridging approach, which makes it pretty easy to get IPv4 and IPv6 working; as long as all OSes involved support IPv6, it kind of “just works.”

What’s not so easy is trying to secure and lock down the individual virtual servers so they cannot interfere with one another’s network connection. The efforts spent on IPv6 also exposed some issues for IPv4 and ARP that could arise when used in a VPS environment.

Note: We’re going to get down and dirty into the technical nitty gritty, and it’s beyond the scope of this post to bring all of my readers up to speed. The faint of heart may wish to turn back now.

Problem Cases for IPv6

Two main network spoofing cases come to mind that need to be protected against for running IPv6 on Xen in a virtual private server environment:

1) A user changing the IPv6 address of their network interface to an IPv6 address we haven’t authorized them to use, or adding/aliasing additional IPv6 addresses to their network interface that they did not pay for.

2) A user changing/adding/aliasing an IPv6 address that’s in use by another customer, which can wreak all sorts of routing havoc, interfering with the ability for our customers to reliably communicate with others on the Internet.

Perhaps Luke Crawford put it best with his response to my question, “Have people already solved and dealt with IPv6 in Xen successfully (i.e., is it a non-issue at this point)? If not, I’d be happy to submit the changes and a guide to making it work and work well.” on the Xen developers email list:

Guides would be awesome. Also, if you made anti-spoof work for IPv6, that would also be awesome. I’ve had IPv6 going for a while with no problems (though the xen antispoof rules only work with IPv4), but I didn’t even need to enable IPv6 in the Dom0; it’s a layer 2 bridge, so it just works. In fact, stateless autoconfiguration even worked. I asked for an IPv6 allocation from my provider and my customers noticed it was working before I got around to doing any setup at all.

Problem Cases for IPv4 and ARP

When we initially went down the IPv6 road, we started paying closer attention to what was already implemented for IPv4 protection to get an idea for where to begin. In doing so, we found a couple of scenarios where the existing anti-spoof protection leaves Xen a bit too exposed for using as a VPS platform:

1) If a user aliases an IPv4 gateway IP address for the same subnet that user is on, all network connectivity can go down for that subnet. As an example, if your IP address is 1.2.3.5/24 and your gateway is at 1.2.3.1, and you run “ifconfig 1.2.3.1″ to alias the gateway IP address on your default network interface, you will now respond to ARP requests trying to find the gateway with your MAC address… meaning no other users on that subnet can reliably talk to the gateway… meaning their network connectivity is pretty much useless.

2) If a user aliases an IPv4 address that is already in use by another user, all network connectivity can go down for that other user, for the same reasons it goes down when aliasing the gateway IP address (i.e., you’ll be responding to ARP requests for their IP address).

3) Users can deliberately perform ARP poisoning attacks, by sending arbitrary ARP replies on the subnet, which can interfere with authorized communication for the same reasons as 1 and 2 above.

4) Users can see the network traffic for other domUs on the same dom0 by putting their interfaces into promiscuous mode (i.e., packet sniffing). 

My intent here is not to knock the Xen development community; they’ve put together a fantastic platform, and no one expects our business requirements for using Xen to host virtual private servers to be inline with their development requirements. I’m not even certain that we’ve squashed the issues entirely, but we have at least squashed the pressing issues for IPv6, IPv4, and ARP outlined above.

Now that we have it working in the field, I’d like to share the handful of changes with the community so they can benefit, and perhaps suggest better/more complete/more succinct ways of doing it. The solution we implemented was modeled after the current IPv4 anti-spoof protection.

I will start off with an overview of the current IPv4 anti-spoofing approach used by Xen, followed by the specific commands used to alter the behavior for what we needed for reliable and secure communication, concluded by the actual changes we made to the Xen networking scripts to enable this behavior modification.

Current IPv4 Anti-spoofing Protection for Xen

The current IPv4 anti-spoof protection scheme relies on iptables, in particular the FORWARD chain that each packet destined for a virtual server hits on its way in or out. Rules get added each time a virtual network interface comes up, and removed when the interface does down. 

For the sake of the remaining discussion, where I’ll be walking through some command examples, we’ll assume:

  • peth0 is the physical network interface that the bridge connects to
  • vif1.0 will be a virtual network interface connected to a domU, and the domU sees that device as eth0
  • The domU is “allowed” to use IPv4 address 216.146.46.43/24, with a gateway IP of 216.146.46.1 for IPv4. 
  • The domU is “allowed” to use IPv6 address 2607:f590:0:ffff::60/64, with a gateway IP of 2607:f590:0:ffff::1 for IPv6.
  • The MAC address of the virtual network interfaced vif1.0 is 00:16:3E:38:B4:AC.
  • The domU hostname is www.standingonthebrink.com.

Yes, you guessed it… all of the settings above are the real-world settings for the Xen VPS that is hosting this blog (with the exception of the virtual network interface name of vif1.0, which will change over time as the VPS is powered off and on).

In a nutshell, when you enable anti-spoofing as a parameter in the network-bridge script, the following iptables commands are issued when the bridge comes online:

  # Default policy for packets in the FORWARD chain is DROP.
  iptables -P FORWARD DROP

  # Flush all existing rules in the FORWARD chain.
  iptables -F FORWARD  

  # Accept packets that are entering the bridge from the physical
  #   network interface peth0.
  iptables -A FORWARD -m physdev --physdev-in peth0 -j ACCEPT

Now, when each domU virtual network interface comes online, the following iptables command is run for each IP address that the domU is allowed to use, as specified in the www.standingonthebrink.com.cfg file found in /etc/xen/:

  # Accept packets coming into the bridge from the domU if its
  #   source IP address is authorized.
  iptables -A FORWARD -m physdev --physdev-in vif1.0 \
    --source 216.146.46.43 -j ACCEPT

When the domU’s virtual network interface goes offline, the same command is run except the -A is changed to -D to delete the rule.

Into the Bridge? Out of the Bridge? What’s the Difference?

It’s important to point out the difference between “into” and “out of” the bridge. For the purpose of thinking about rules governing the packets that are allowed to traverse the bridge, it can be helpful to think of the bridge as a little switch connecting the physical network interface of the server to the virtual network interface of the dom0 as well as all of the virtual network interfaces for each of the domUs. The rules we write in the FORWARD chain will tell the bridge what packets to allow to enter and/or leave the bridge.

Network Bridge Diagram

All of the rules we’ll write are from the perspective of the network bridge. Perhaps a couple examples will help convey the concept:

  • A packet coming from the Internet destined for the first domU -> Comes IN peth0 and goes OUT vif1.0.
  • A packet coming from the first domU destined for the second domU -> Comes IN vif1.0 and goes OUT vif2.0.
  • A packet sent from the first domU to somewhere on the Internet -> Comes IN vif1.0 and goes OUT peth0.
With this knowledge, let’s revisit the last rule we looked at that is currently being used to prevent IPv4 address spoofing:
  # Accept packets coming into the bridge from the domU if its
  #  source IP address is authorized.
  iptables -A FORWARD -m physdev --physdev-in vif1.0 \
    --source 216.146.46.43 -j ACCEPT

This rule says a packet coming from the first domU (indicated by the “physdev-in vif1.0″) with a source IPv4 address of 216.146.46.43 and a destination of anywhere will be allowed into the bridge. If the domU tries to use a source IP address of anything else, the packet won’t be allowed into the bridge, and thus the user won’t gain the use of an unauthorized IP address.

Proposed Solution for ARP Poisoning Protection for Xen

Quite simply, we use arptables in a similar manner to using iptables. It can be a little tricky getting the rules right for what valid ARP traffic should be allowed, but with quite a bit of trial-and-error, we think we have it.

When the bridge comes online, we do:

  # Default policy for packets in the FORWARD chain is DROP.
  arptables -P FORWARD DROP

  # Flush all existing rules in the FORWARD chain.
  arptables -F FORWARD

When each virtual network interface comes online, we do:

  # Accept ARP requests coming from the domU into the bridge.
  arptables -A FORWARD --opcode Request --in-interface vif1.0 -j ACCEPT

  # Accept ARP requests coming out of the bridge into the domU.
  arptables -A FORWARD --opcode Request --out-interface vif1.0 -j ACCEPT

  # Accept ARP replies coming out of the bridge from the physical
  #  network into the domU.
  arptables -A FORWARD --opcode Reply --out-interface vif1.0 \
    --in-interface peth0 -j ACCEPT

In addition to the above rules that get added when the virtual network interface comes online, we do the following for each valid IPv4 address:

  # Accept ARP replies coming from the domU into the bridge if they
  #  provide a valid and authorized IP address to MAC address pair.
  arptables -A FORWARD --opcode Reply --in-interface vif1.0 \
    --source-ip 216.146.46.43 --source-mac 00:16:3E:38:B4:AC -j ACCEPT

It’s this last rule that prevents the ARP poisoning attacks and the havoc wreaking by aliasing gateway IP addresses. If invalid ARP replies are denied, as far as I can tell, no harm can be done. When the virtual network interface goes offline, the same commands are run with -D instead of -A to delete the rules.

Proposed Solution for Preventing IPv4 Packet Sniffing

As mentioned earlier, the current IPv4 rules that prevent anti-spoofing are:

  # Accept packets that are entering the bridge from the
  #  physical network interface peth0.    
  iptables -A FORWARD -m physdev --physdev-in peth0 -j ACCEPT

  # Accept packets coming into the bridge from the domU if its
  #  source IP address is authorized.
  iptables -A FORWARD -m physdev --physdev-in vif1.0 \
    --source 216.146.46.43 -j ACCEPT

These work great for preventing a user from using an unauthorized IPv4 address, but the rule for incoming traffic from peth0 is not sufficiently explicit, in my opinion. It allows users to see the network traffic of other users on the same dom0. By tightening up the rules a bit, we can prevent this.

The proposed rules are as follows, for the bridge coming online:

  # Default policy for packets in the FORWARD chain is DROP.
  iptables -P FORWARD DROP

  # Flush all existing rules in the FORWARD chain.
  iptables -F FORWARD

For each valid IPv4 address as each virtual network interface comes online:

  # Accept packets leaving the bridge going to the domU only if
  #  the destination IP for that packet matches an authorized IPv4
  #  address for that domU.
  iptables -A FORWARD -m physdev --physdev-out vif1.0 \
    --destination 216.146.46.43 -j ACCEPT

  # Accept packets coming into the bridge leaving the physical
  #  network interface peth0 only if the source IP for that packet
  #  matches an authorized IPv4 address for that domU.  
  iptables -A FORWARD -m physdev --physdev-in vif1.0 \
    --physdev-out peth0 --source 216.146.46.43 -j ACCEPT

Proposed Solution for IPv6 Anti-spoofing Protection for Xen

Quite simply, we use ip6tables with the exact same rule structure as shown above for IPv4. For the bridge coming online:

  # Default policy for packets in the FORWARD chain is DROP.
  ip6tables -P FORWARD DROP

  # Flush all existing rules in the FORWARD chain.
  ip6tables -F FORWARD

For each valid IPv6 address as each virtual network interface comes online:

  # Accept packets leaving the bridge going to the domU only if
  #  the destination IP for that packet matches an authorized IPv6
  #  address for that domU.
  ip6tables -A FORWARD -m physdev --physdev-out vif1.0 \
    --destination 2607:f590:0:ffff::60 -j ACCEPT

  # Accept packets coming into the bridge leaving the physical
  #  network interface peth0 only if the source IP for that packet
  #  matches an authorized IPv6 address for that domU.  
  ip6tables -A FORWARD -m physdev --physdev-in vif1.0 \
    --physdev-out peth0 --source 2607:f590:0:ffff::60 -j ACCEPT

Simple, eh?

Modifications to Xen Network Scripts

So, the above outlined the original IPv4 anti-spoofing approach, and used that as a guideline to IPv6 anti-spoofing, as well as protecting against ARP poisoning attacks. Now, here are the changes to make to actually get this new behavior.

It’s worth noting that these changes were done in a manner to minimize impact to the existing networking scripts (i.e., no refactoring was done). This was done in case these changes never made it out to the community, we still need them in place to effectively use Xen for VPS hosting, and we would have to merge in community changes alongside our changes with each new community release. Thus, there’s a lot of duplicated code. Should these changes make it into Xen, I would recommend refactoring to remove this duplication!

It’s worth noting also that a method was needed for storing the IPv6 address for a domU. This was done by adding the address to the www.standingonthebrink.com.cfg file under /etc/xen/ in the following format:

  #ipv6=2607:F590:0000:FFFF:0000:0000:0000:0060

The modifications to the Xen networking scripts include the ability to read this into the xenstore for usage.

Without further ado, the network script diffs, which were performed against xen-unstable on October 8, 2008:

I would also like to acknowledge the DynDNS team for helping to test and put together these modifications, and in particular Pierre Beaumier and Matthew Horsfall from Dynamic Network Services Inc., who discovered the vulnerabilities and greatly helped in putting a solution together.

A Week with the ‘Geeks in Paradise’

Monday, December 3rd, 2007

Tropical shores. A red convertible. Buddhist temples. And the WebReboot Enterprise.

Why do all of these things belong together? Because Chip and I went down to the University of Hawaii to participate in an InfoWorld Enterprise Shootout - “Pimp My Datacenter” - where we spent a week working with the sys admins building their state-of-the-art new computing research facility.

We actually came across Brian Chee (U of H, InfoWorld), Oliver Rist (PC Mag, InfoWorld), and the other fantastic folks running the event at the last Interop in Las Vegas. Ken of Silverback fame introduced us, so many thanks to Ken.

Ken & Sea TurtleOur trip consisted of:

  • 10% scuba diving
  • 20% sightseeing
  • 30% Oliver pinning the brute force of his wit upon me when he discovered my age in the first 10 seconds of conversation
  • 3% Oliver hassling everyone else
  • 7% attention reclaimed, Oliver hassling me
  • 30% oh crap, we have a lot of work to do!

In terms of scope, it was pretty simple… on paper. Move a whole bunch of IT equipment from around the campus into the brand new DC room. Many vendors were there donating either time, materials, or both. All showing off the latest and greatest in their IT management arsenals. All unprepared for the challenges ahead.

From severely strict weight-limits-per-square-foot (we’re talking “only a couple people in at a time, please” strict), to cooling systems that love to corrode in the Pacific sea breeze, to missing parts… you name it. Brian went through a heck of a time getting this event going, and like any good geek would do, he’s documented all his trouble, pain, and solutions so you don’t suffer through it yourself.

But hey, there are few things worth doing that are easy. And this was a week’s worth of work crammed into one overnight geek fest, so there was no chance of easy.

Our part? We set the lab up with a 48-port WebReboot Enterprise showcasing a distributed installation (i.e., one of our units connected to servers all over the place) and a 24-port WebReboot Enterprise in a centralized and confined installation (i.e., responsible for a single 24-server cluster in a single cabinet). Above and beyond that, we got a server up and running for Nagios to monitor and alert on all of their servers, added the WebReboot plugin to it for automatically recovering crashed equipment, and we got to be honorary-gorillas-for-a-day with the Silverback Migration’s crew hauling in and connecting all of the equipment.

UH Staff

University of Hawaii Staff in the Foreground, APC Folks in the Background

Getting a final plan together for cabling and arranging the new datacenter.

Battleplan

Brian Chee from the University of Hawaii and Ken Jamaca from Silverback Migration Solutions

Common Question: “What time is it?”
Common Answer: “You don’t want to know.”
We unpacked, racked up, cabled, migrated, and tested 12 cabinets and over 60 servers in about 36 hours. How’s that for service?

Cluster Cabinet

The 24-Server Cluster for the 24-port WebReboot Enterprise

This cluster consisted of 12 Dell PowerEdge SC1425 servers and 12 Dell PowerEdge 2650 servers. A perfect fit for the 24-port WebReboot Enterprise.

Installing WRE24

Chip Installing the WebReboot Enterprise in the Cluster Cabinet

The 1U form factor fit nicely at the top of the rack.

 

WRE24 Racked and Cabled

The Cluster Fully Cabled, with WebReboot Enteprise at Top

Ready for card installation. Each server gets an Advanced Server Card installed in the expansion slot, that in turn connects to the WebReboot Enterprise. This gives us the ability to reboot the server in the same fashion as pressing the power or reset switch on the front of the chassis. As a bonus, we also get temperature monitoring, power monitoring, and asset tracking.

WRE24 Installed and Rack Powered On

Cluster Powered Up

Servers Connected to WRE48

Beyond the Cluster - The Rest of the Datacenter

We deployed enough WebReboot ports to cover the entire facility. For the remaining servers, we centrally located a 48-port WebReboot Enterprise in the patch panel rack, and then used standard Cat6 cabling to each cabinet to carry our necessary signals. Above, a Sun Fire V210, Dell PowerEdge 1950, and Dell PowerEdge 2950 were connected to the WebReboot Enterprise (not shown, since it’s in the patch panel rack!).

Nagios and WRE Testing

Testing it All, and Nagios for Monitoring

Above you see the 48-port WebReboot Enterprise, one of our Power Cycle Modules (little orange box on top), our Servprise Demo Server (square, orange-faced box), a server to run Nagios (the 1U Dell PowerEdge SC1420 under laptop) with the Servprise Nagios plugin for automatically rebooting and power controlling crashed/overheated servers, and a laptop connected to the WebReboot Enterprise.

Wiring the Skittles Railroad

Ross Assembling the “Skittles Railroad”

The engineering folks at the university designed and built a custom cable carrier to run in a loop over the entire facility. Upon which, color coded Cat6 was placed between the patch panel rack and each cabinet in the room. As far as I can recall, green cabling was for the production network, red for the KVM over IP network, black for the WebReboot Enterprise network, and I must admit I don’t recall what blue was for.

While all of our equipment is installed, we held off on fully configuring Nagios for production use until the datacenter folks have a chance to regroup and take it all in. I’m looking forward to automating as much as I can for them. Ross, let me know when you are ready!

For more photos, I have a full Flickr set dedicated to the trip.

There were definitely three highlights of the trip that I will always remember:

1) The sheer gratitude and hospitality by the U of H computing staff. Ross, Pat and Sharon were great to work with, loved our products, shared some good laughs, and that made the long hours all worthwhile.

Temple

2) A tour of practically the entire island with Brian and his wife Cathy. What an incredible place Oahu is!

H3 In Convertible

3) Driving down H3 through the mountains in a bright red convertible with the top down. Few things equal it.


Home - Contact - Copyright 2007-2008 Cory von Wallenstein. All rights reserved.