site banner

Small-Scale Question Sunday for January 5, 2025

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

2
Jump in the discussion.

No email address required.

Much of this is really building on many decades worth of tech and it's hard to understand the why until you understand much of the whole stack.

Here's some of the whys, from my perspective in the order I would talk about them:

DHCP: when a device joins a network, it can broadcast on the network and ask for how it should configure it's network stack. Implicit in the request is the MAC (Media Access Control) address of the interface itself which provides the physical address of the interface. The DHCP server (in a home setting, usually in the router) assigns an IP from a block it manages and gives the rest of the networking details (gateway, subnet, etc) to the client. DHCP isn't strictly needed as the clients can be configured manually in many cases. Cheap IoT devices tend to rely on it.

Default Gateway: When you're sending any packet to something outside your local network, you send the packet to the gateway and it figures out how to get the packet to the destination. In a home setting, this will just be forwarding the packet upstream to your ISP. In a larger scale setting, it's going to consult things like BGP routing to figure out where to send things to. The beauty of IP is that the client doesn't need to worry about it and it's completely abstracted into the gateway itself.

Duplicate IPs: As mentioned before, every interface has a MAC address. When you're sending a packet on the network to another machine (i.e. not broadcast), you send the packet to the MAC address. But we're dealing with IP, not MACs. To translate from an IP address to a MAC address we send out a broadcast ARP (Address Resolution Protocol) request asking basically "will the device with IP xxx respond?" Broadcasts are received by all the machines on the network. The machine with the requested IP will respond. If there are multiple machines that are configured with the same IP, they'll all respond. What happens here is usually the first one wins. This is complicated by modern switches because they learn what IPs/MACs are on each of their ports. They'll likely assume there are two routes to the same host and weird things may happen. Lesson: don't do it, things break.

VLANs: From a switch perspective, it just controls what ports can talk to which other ports. If you have an 24-port switch, you can configure multiple VLANs such that, say, ports 1-12 can talk to each other, and 13-24 can talk to each other. It's setting up two "Virtual LANs." You can have a router that attaches to both of the VLANs to handle routing between them if you want. These are typically used to prioritize certain network traffic, or for security (e.g. a guest network can't talk to your servers).

UDP and NAT: Since there's no connection in UDP, the NAT device just remembers things like "when device XX using port YY sends a packet to internet address AA port BB, I sent the packet on my port PP. Later, if I get a packet from AA:BB on port PP, I'll look that up and forward the packet to XX:YY." The key here is that all IP packets have the source IP and port and destination IP and port. When it's doing NATing, it replaces the local IP (which isn't going to be publically routable) with it's own address and port. On the way back, it just does the reverse and replaces the destination IP/port (which is how the packet got to it in the first place) with the local network's addresses and ports and forwards.

Thanks, that was helpful!

DHCP: when a device joins a network, it can broadcast on the network and ask for how it should configure it's network stack. Implicit in the request is the MAC (Media Access Control) address of the interface itself which provides the physical address of the interface. The DHCP server (in a home setting, usually in the router) assigns an IP from a block it manages and gives the rest of the networking details (gateway, subnet, etc) to the client. DHCP isn't strictly needed as the clients can be configured manually in many cases. Cheap IoT devices tend to rely on it.

How does it broadcast its request if it doesn't have an IP address?

Default Gateway: When you're sending any packet to something outside your local network, you send the packet to the gateway and it figures out how to get the packet to the destination. In a home setting, this will just be forwarding the packet upstream to your ISP. In a larger scale setting, it's going to consult things like BGP routing to figure out where to send things to. The beauty of IP is that the client doesn't need to worry about it and it's completely abstracted into the gateway itself.

The local network is defined by the network mask, right? So with 255.255.255.0 if I send something from 192.168.1.2 192.168.1.3 there's no need for the gateway to be set up, but 192.168.2.3 is outside the network and the packets will be routed to the gateway?

This makes me wonder how the packets are routed within the local network, actually. Let's say I'm sending a request from my PC (192.168.1.5) to my NAS (192.168.1.2). The PC is connected to my wireless switch/AP (192.168.1.4), and both the switch/AP and the NAS are connected to the wired router (192.168.1.1). How does the switch/AP know it should send the request to the wired router and not to one of its other LAN ports?

How does it broadcast its request if it doesn't have an IP address?

It’s certainly only a model, but answering questions like this is why the OSI model is taught to students: this is the glory of the data link layer! (Or Network Access layer in the more accurate TCP/IP model)

It’s possible, though not really useful, to run a local network over purely MAC addressing, but few pieces of software actually can. But if you’ve ever used wake-on-LAN, digging deeper than IP is how it works!

Every device is intended to have a factory-unique MAC address, though virtual machines, software overrides, and newfangled privacy features just go with the randomize-and-pray model. Since there’s a unique MAC for each device, a host connected to a local network can perform a MAC broadcast without any IP bootstrapping, and hopefully find a DHCP server to hand it IP configuration.

I really love MAC addressing and layer 2 stuff, precisely because this stuff works so transparently in most cases and so you don’t have to think about it. It’s very elegant in that way, and I like elegance and autoconfiguration; it’s the computer’s job to worry about the numbers.

On a tangent: admiration for this elegance was the driving force behind IPv6, and I’d argue the only way to understand IPv6 is to see it’s a design intended to bring the fluidity and elegance of local networks to the internet. This runs into a lot of real-world roadblocks and administrative preferences towards centralized control — yet decentralized but coordinated systems are the great triumph of software engineering and I find it beautiful even if there are real-world obstacles.

How does it broadcast its request if it doesn't have an IP address?

DHCP requests are transmitted over UDP with a target destination of the broadcast address, usually 255.255.255.255. The standard says that this packet should have a source address of 0.0.0.0, but in my experience most DHCP servers aren't very picky about that. This packet is just a message going across a wire to every receiver on the local network (ie, up until the gateway), so the ethernet card doesn't need to have an IP address at that time. EDIT: for clarity, it uses the MAC address to identify itself and so the server can properly respond to just the correct machine. This is one of many reasons that getting DHCP to run across network boundaries is an absolute nightmare. /EDIT

The local network is defined by the network mask, right?

For the purposes of TCP/IP, the local network is defined by the netmask. Physical networks (eg, having multiple routers with different subnets plugged into the same big switch) and logical networks (VLANs) can and often are different. This is a space with a lot of namespace collision, so be wary of it.

So with 255.255.255.0 if I send something from 192.168.1.2 192.168.1.3 there's no need for the gateway to be set up...

At the risk of going too deep into the (lies-to-children!) OSI model:

Before doing anything else, the sending computer looks at its ARP table, which converts IP addresses to MAC addresses. If the destination IP address is not on the ARP table, it will send an ARP request, which is a broadcast message to the local network asking if any devices have that IP address (or, if not on the local network, it sends an ARP request for the local gateway). Once it finds the address, it inserts that IP-MAC pair into the ARP table, and uses it as part of the packet and frame shaping.

The computer forms a packet, with a source IP address of 192.168.1.2/24 and a destination of 192.168.1.3/24, at the TCP/IP network layer, or layer three. The ethernet card breaks this into one or more "frames" with a maximum size called the MTU (historically 1500 bytes, but can be larger where hardware supports it), aka the ethernet/MAC data link layer or layer two. It then transmits these frames as signals to the network switch, aka the ethernet physical layer or layer one.

This switch will receive the signals, and convert them into the layer two frame. On older hubs, it would simply echo the frame out every port. On modern switches, it then inspects the frame for a destination MAC address. If the switch has records of receiving frames with a source MAC address matching that destination, it only sends the frame to that specific physical port or ports. If it has no record, it floods the frame out every port, and it's up to the receiving device to filter whether it's address properly. But the switch tables get filed with records pretty quickly

((For older computers, there was a physical layer conversion issue; this is why crossover cables existed. But almost every modern device can automatically switch over.))

but 192.168.2.3 is outside the network and the packets will be routed to the gateway?

In that case, the frame would be configured with a destination MAC of the local gateway, so the switch would look in its MAC table for the MAC of the local gateway, and usually only send the packet to the physical ports of the local gateway. This is layer two switching, not layer three routing.

It's only when the frame gets to the gateway, which reassembles the frame into a packet to inspect the destination IP address, that the gateway examines what the target IP address is, and then routes it by checking its own routing tables and own default gateway.

How does the switch/AP know it should send the request to the wired router and not to one of its other LAN ports?

There are two kinds of network switches/hubs (well, there are more, but at least two). The dumb one just essentially pretend everybody is on the same bus, and so every port gets all the traffic from other ports. This of course is only good for very simple small networks. Smarter switch would remember which IPs and MAC addresses live on which ports and forward the packets accordingly. Of course, smarter switches are more expensive than the dumb ones. For bigger networks you'd have configuration capacity in the switch to tell it which networks live on which ports.

I don't think you'll see a true 'dumb switch' (technical term 'hub') in ethernet from a major store; I haven't seen a new one since back when 10/100mbps switches were just phasing in. But they definitely existed, and it wasn't uncommon for one person to be able to bog down an entire intranet.

In the modern day, the distinction between 'dumb' and 'smart' switches is usually going to emphasize 'smart' switches as having optional routing functionality, (aka 'layer 3 switching'). This technically means that the layer 3 switch has one or more ports that can be configured into a router mode, though in practice it'll be missing a lot of other functionality you'd expect from a small home or office router (almost always missing NAT/PAT, usually not having DHCP or DNS).

How does it broadcast its request if it doesn't have an IP address?

This is where IP and ethernet get a bit blurry. ARP is operating at the raw ethernet level and it's sending out the raw ethernet packet to the ethernet broadcast address. In the packet it has it's IP and the requested IP. Implicit in the packet is the MAC address of the requesting machine. (Deeper dive: https://en.wikipedia.org/wiki/Ethernet_frame)

In most cases you think "I'm IP xxx sending something to IP yyy," the reality is at the ethernet level, the IP stuff is all payload the network really doesn't care about. Internally, everything on the actual network level is working with MAC addresses. IPs are just a really convenient abstraction on top of it. (in this case "network" is the layer 2 of the entire stack -- the data link layer)

The local network is defined by the network mask, right? So with 255.255.255.0 if I send something from 192.168.1.2 192.168.1.3 there's no need for the gateway to be set up, but 192.168.2.3 is outside the network and the packets will be routed to the gateway?

That's correct. Anything on the local subnet stays on your local network. Anything outside gets punted to the gateway to deal with.

This makes me wonder how the packets are routed within the local network, actually. Let's say I'm sending a request from my PC (192.168.1.5) to my NAS (192.168.1.2). The PC is connected to my wireless switch/AP (192.168.1.4), and both the switch/AP and the NAS are connected to the wired router (192.168.1.1). How does the switch/AP know it should send the request to the wired router and not to one of its other LAN ports?

I'm going to cavalierly ignore WiFi in this because it muddies things up and deal with layer 2 of the stack and up and just treat it as a switch. This is what's in my mental model of what's happening in some detail.

  1. You try to access "nas.orthoxerox.com"
  2. DNS lookup for that. Oops, we only have the IP of the DNS server: 192.168.1.254 (making something up)
  3. ARP on ethernet to get the MAC for ...254.
  4. This gets to the switch. It'll broadcast this packet to all its ports. (Once the switch knows that a certain MAC is on a port it remembers it. Most home-grade switches can remember a few thousand MAC addresses)
  5. NAS responds and then the switch and your machine know the MAC of the DNS.
  6. DNS lookup (several round-trips to do this) -- you now know the IP of the NAS. (Since the switch now knows the IP of the DNS, it sends it directly to the port it knows it's on)
  7. ARP for the IP of the NAS. (same as before)
  8. Finally, send an ethernet packet from your machine to the NAS. (Again, from the ethernet perspective, this is sending from your machine to the NAS based on it's MAC address when we're at the low level)

If there are multiple switches between you and the destination, the broadcast just keeps going.

If you want to have some "fun," look up "ARP storm." It's likely one of the few times most networking folks (I'm a programmer) even think about things at that level.

Thanks a lot! How does Ethernet deal with someone pulling a Spartacus and spoofing MAC addresses of existing nodes?

By default, absolutely nothing... you've found one of the common attack surfaces of ethernet! You can use this to do all sorts of malicious things. You can overload the switches by just spamming them with new MAC addresses. You can intercept traffic. General denial of service attacks. Circumventing security. All sorts of mayhem.

So, ways of dealing with this... you can have switches that are configured to only allow an interface with a certain MAC to connect to certain ports. Or you can have softer ways of dealing with this by feeding information from the switch to some variety of intrusion detection system. Similarly, a switch can be configured to ensure that a device DHCPing for an address can't suddenly start using a different MAC.

There's a host of enterprise-y tech being built in this arms race if you want to fund some hardcore security-focused teams. That said, I don't think I've ever encountered (maybe because I'm not an attacker) these in the run-of-the-mill office environments. This is including working at Amazon, which is a bit persnickety on security. I'm quite sure that they're running these things in the data centers though. For something like AWS, they have segregated networks for control-plane traffic (the back-end of the services and how they are configured) and customer traffic. And for customer traffic, everything is on its own VLAN to ensure that I can't make a malicious service that would attack neighboring instances on the same machine or subnet. They also have a bunch of security in place to ensure only trusted clients can connect to services and verify the servers' authenticity.

This is one of the underlying reasons that having good physical security is essential. Once you have access to a network you want to attack, you have a lot more surface area that you can use to attack it while (preferably from the attacker's perspective) remaining undetected.

There are an annoying number of shops that used to love Cisco's port security option, which will lock down an interface on a switch to a certain segmentation of MAC addresses (usually configured in adaptive modes). It's... not as unmanagable as it sounds, though it is very unmanageable and very much something that's usually only helpful against very specific threat models and when paired with a lot of other stuff.

How does it broadcast its request if it doesn't have an IP address?

Because network communication doesn't always require an IP. Think of the network as different technologies arranged in a stack, each building on the last. Specifically, the stack generally looks like:

Ethernet

IP

TCP/UDP

Other protocols on top that (e.g. HTTP)

For DHCP, your machine broadcasts at the Ethernet level which works based on the MAC addresses baked into every network interface. It receives a reply in the same way. And even once you have an IP address, those IP packets will be riding on top of Ethernet frames which are sent out to the local network in much the same way as DHCP traffic is.