site banner

Small-Scale Question Sunday for January 5, 2025

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

2
Jump in the discussion.

No email address required.

What's a good book about TCP/IP networking? I am currently redoing my home network setup and I've realized my knowledge of networking is very fragmented. I know the right incantations, but I have no idea what they actually do.

  • what does "default gateway" actually do? What happens when it's the "wrong" IP? When it's blank?
  • what happens when two machines claim to have the same IP?
  • how exactly does DHCP work?
  • how does UDP go through NAT?
  • what are VLANs?

I know you asked specifically for a book, but https://learn.cantrill.io/p/tech-fundamentals is a pretty good free course about the various layers of the networking stack. If you want to go deep on the protocol-level stuff then maybe Stevens' TCP/IP Illustrated could be what you're after? It might be too TCP-focused, since you also have questions about VLANs and UDP.

Beej's guide is fairly concise but still covers all the important technical details. The sections about network programming in python can be skipped or skimmed through without missing much.

Default gateway does what it literally says. A gateway, in IP routing, is a term meaning "traffic for X network should be sent to the router at Y destination IP address". You can have potentially many routes on a system specifying what traffic goes where. The default gateway, then, is the gateway which your traffic will use when no other routing rule matches.

I'm not certain about what happens when two machines claim to have the same IP, actually. But I can take an educated guess. When you try to reach out to an IP address, your machine first needs to figure out which Ethernet MAC address it should send that traffic to (it does this using a protocol called ARP). Most likely, what would happen is you would start to see traffic for that one IP address go to both machines sporadically, depending on which is responding to ARP requests first. I'm not certain but that's what I would imagine.

DHCP works by sending out a broadcast on Ethernet asking for a DHCP server. When the server replies, it will give the client an IP address to use. That's the gist, though I don't know the exact details of the DHCP communication (I couldn't write my own software or anything).

UDP goes through NAT the same as TCP does. If you're making an outbound connection, the router will pick a port to listen for reply traffic, and forward replies to your client machine. If you're making an inbound connection, you need a port forwarded to the destination at the router level in advance.

A VLAN is a way to isolate Ethernet networks even if they are plugged into the same physical hardware. The switch you are plugged into lets you configure which ports are part of which VLAN, and only ports which are part of the same VLAN can talk to each other using Ethernet. You can also configure a port so that multiple VLANs are allowed, in which case the device plugged into the port must add a tag to any traffic it sends specifying which VLAN it is for (and it is only allowed to send traffic on the VLANs you configured on the switch).

I'm not certain about what happens when two machines claim to have the same IP, actually

Depends on the IP. If it's so called "local" IP (starts with 10., 192.168 or 172.16.) and they are not on the same local network, then nothing bad, since these addresses specifically designed for such use. If they are on the same local network, there would be trouble, not sure about the exact nature but likely both computers sharing an IP won't be able to properly use it. Usually your OS would scream at you in some way when such thing is detected. Using DHCP server is one of the ways to ensure this thing never happens.

If you have two hosts that have same IP and those are not local IPs then weird things would happen. In general, if you have NAT (which most home users for now should and would have) then outgoing connections should work fine (then again, there's no real reason for a machine used by home user to even have a non-local IP at all) but it's better to avoid that situation completely because things get weird. There are special organizations and protocols aimed at segregating IP space so nobody steps on each other's toes there. As a home user, you probably don't need any of that as the standard setup is to use local IPs for everything inside the home network and only use non-local IP for the main router egress address.

Yeah, I meant two machines on the local network. I've never tried that one before. I unfortunately have first-hand experience with machines on different networks using the same IP addresses, as one of my old employers was too cheap to buy new IPv4 subnets and we squatted on the DoD 22.0.0.0/8 subnet. Our network team had a very fun day when the DoD started using that subnet publicly, or so I heard from my old coworkers (I had left by that time).

DHCP server restarts can cause IP conflicts pretty often, especially if you're running the DHCP server on a small home/office router that doesn't persist state. Windows will specifically warn about the IP conflict, and newer (Win7+) will often try to automatically reregister with your DHCP server if you're not running in static modes; Linux has some optional standards-complaint IP conflict notifiers.

If not corrected, the usual results are inconsistent communication and higher network utilization: network switches will resolve the IP address to physical multiple ports, and this causes packets to be sent many more places than they need, and can sometimes cause TCP connections to go wonky.

((There are exceptions and sometimes even cases where you could use this behavior, but they were always rare and increasingly have been replaced by better solutions.))

DHCP server restarts can cause IP conflicts pretty often, especially if you're running the DHCP server on a small home/office router that doesn't persist state

More fun can be had if there's a rogue DHCP server on the network. Back in the days I did network admining work (a long time ago) I had to deal with such a case - turned out to be a new printer with helpful on-by-default DHCP server, but it took me a lot of frustration to figure it out because I never thought before a printer could do that to me.

What idiot thought it was a good idea to add a DHCP server to a printer? That is peak anti-social.

In an office setting, I know it would take me so much time to try to figure that out. At least in a home setting, it's far easier to remember the answer to "what was the last thing that I (or spouse or kids) attached to the network in the past day or two?"

Yeah that was my question exactly when I finally discovered what happened - who even thought it was a good idea to do this? Thankfully, haven't heard about someone doing that for a long while now.

What idiot thought it was a good idea to add a DHCP server to a printer?

It's probably so you can connect directly to the printer, without needing a router. Of course, it might be smarter to first try to acquire a DHCP lease before starting a DHCP server...

Yeah I think it was some kind of "smart" home solution when not everybody had routers on home network. A bit fuzzy on details now but that might be the idea at least. It had a normal "play nice" mode too, just for some reason it wasn't enabled by default... or maybe somebody switched it for some reason, impossible to know now.

Oh yeah that's a good time too. When I was in college, I worked in the IT department and every so often we had to deal with a student who brought in a router and plugged one of their LAN ports into the campus network. That got your port shut off by the school pretty quickly, and iirc you had to come in and promise to stop using your router to get it turned back on.

Students are something else. I am still ashamed of some of the shenanigans I did as a student, especially after eventually finding myself on the other side (not in an university, thankfully). It's a tough job to run IT in such places.

Questions like this are pretty much in the wheelhouse of things like ChatGPT. It's really good at answering these high-level questions and providing good direction with the ability to dive deeper into each of the topics.

I asked on your behalf and everything looks pretty much like I would've written. https://chatgpt.com/share/677bd93a-310c-8004-9dcc-9b36c30fde8c

My take:

For home networking, unless you're setting up a homelab, you can probably ignore VLANs. Honestly, most of these are pretty much ignorable for what I'm expecting your use case of home network are concerned.

Anything vaguely modern in terms of a home router should handle all of these pretty transparently. Without getting into packet-level stuff, DHCP from the router will configure the clients and configure the default gateway to itself as well as prevent duplicate IPs (unless you're configuring them manually). DHCP itself tends to just work out of the box. UDP NATing, similarly, tends to just work. VLANs, at what I'm expecting is your scale, should likely just be ignored.

In my case, I have a small server rack that has a couple of NASes living in it along with a few switches (1GbE and 10GbE). The switches support VLANs, but even for what I'm doing, I'm far from needing any of the functionality it would provide. The router I'm using are a set of Eeros -- they can provide a mesh network, but for me all of them are hardwired to the switch.

If you're looking to experiment from a homelab perspective, that's another story. But it could be a really fun story. A common way of getting started there to get a solid grounding on the fundamentals is doing something like setting up a Raspberry Pi cluster and playing with those. It's a cheap and approachable way to learn these concepts.

I can already set up my home network (which is currently an x86 router, a custom-built NAS, a router working as a wireless AP and another router working as a wireless extender plus all the end-user devices), I want to understand why I am doing the things I am doing. I am sorry, but your ChatGPT log didn't exactly help with that. I'll see if asking it for a more textbook explanation from the ground up will work.

Much of this is really building on many decades worth of tech and it's hard to understand the why until you understand much of the whole stack.

Here's some of the whys, from my perspective in the order I would talk about them:

DHCP: when a device joins a network, it can broadcast on the network and ask for how it should configure it's network stack. Implicit in the request is the MAC (Media Access Control) address of the interface itself which provides the physical address of the interface. The DHCP server (in a home setting, usually in the router) assigns an IP from a block it manages and gives the rest of the networking details (gateway, subnet, etc) to the client. DHCP isn't strictly needed as the clients can be configured manually in many cases. Cheap IoT devices tend to rely on it.

Default Gateway: When you're sending any packet to something outside your local network, you send the packet to the gateway and it figures out how to get the packet to the destination. In a home setting, this will just be forwarding the packet upstream to your ISP. In a larger scale setting, it's going to consult things like BGP routing to figure out where to send things to. The beauty of IP is that the client doesn't need to worry about it and it's completely abstracted into the gateway itself.

Duplicate IPs: As mentioned before, every interface has a MAC address. When you're sending a packet on the network to another machine (i.e. not broadcast), you send the packet to the MAC address. But we're dealing with IP, not MACs. To translate from an IP address to a MAC address we send out a broadcast ARP (Address Resolution Protocol) request asking basically "will the device with IP xxx respond?" Broadcasts are received by all the machines on the network. The machine with the requested IP will respond. If there are multiple machines that are configured with the same IP, they'll all respond. What happens here is usually the first one wins. This is complicated by modern switches because they learn what IPs/MACs are on each of their ports. They'll likely assume there are two routes to the same host and weird things may happen. Lesson: don't do it, things break.

VLANs: From a switch perspective, it just controls what ports can talk to which other ports. If you have an 24-port switch, you can configure multiple VLANs such that, say, ports 1-12 can talk to each other, and 13-24 can talk to each other. It's setting up two "Virtual LANs." You can have a router that attaches to both of the VLANs to handle routing between them if you want. These are typically used to prioritize certain network traffic, or for security (e.g. a guest network can't talk to your servers).

UDP and NAT: Since there's no connection in UDP, the NAT device just remembers things like "when device XX using port YY sends a packet to internet address AA port BB, I sent the packet on my port PP. Later, if I get a packet from AA:BB on port PP, I'll look that up and forward the packet to XX:YY." The key here is that all IP packets have the source IP and port and destination IP and port. When it's doing NATing, it replaces the local IP (which isn't going to be publically routable) with it's own address and port. On the way back, it just does the reverse and replaces the destination IP/port (which is how the packet got to it in the first place) with the local network's addresses and ports and forwards.

Thanks, that was helpful!

DHCP: when a device joins a network, it can broadcast on the network and ask for how it should configure it's network stack. Implicit in the request is the MAC (Media Access Control) address of the interface itself which provides the physical address of the interface. The DHCP server (in a home setting, usually in the router) assigns an IP from a block it manages and gives the rest of the networking details (gateway, subnet, etc) to the client. DHCP isn't strictly needed as the clients can be configured manually in many cases. Cheap IoT devices tend to rely on it.

How does it broadcast its request if it doesn't have an IP address?

Default Gateway: When you're sending any packet to something outside your local network, you send the packet to the gateway and it figures out how to get the packet to the destination. In a home setting, this will just be forwarding the packet upstream to your ISP. In a larger scale setting, it's going to consult things like BGP routing to figure out where to send things to. The beauty of IP is that the client doesn't need to worry about it and it's completely abstracted into the gateway itself.

The local network is defined by the network mask, right? So with 255.255.255.0 if I send something from 192.168.1.2 192.168.1.3 there's no need for the gateway to be set up, but 192.168.2.3 is outside the network and the packets will be routed to the gateway?

This makes me wonder how the packets are routed within the local network, actually. Let's say I'm sending a request from my PC (192.168.1.5) to my NAS (192.168.1.2). The PC is connected to my wireless switch/AP (192.168.1.4), and both the switch/AP and the NAS are connected to the wired router (192.168.1.1). How does the switch/AP know it should send the request to the wired router and not to one of its other LAN ports?

How does it broadcast its request if it doesn't have an IP address?

It’s certainly only a model, but answering questions like this is why the OSI model is taught to students: this is the glory of the data link layer! (Or Network Access layer in the more accurate TCP/IP model)

It’s possible, though not really useful, to run a local network over purely MAC addressing, but few pieces of software actually can. But if you’ve ever used wake-on-LAN, digging deeper than IP is how it works!

Every device is intended to have a factory-unique MAC address, though virtual machines, software overrides, and newfangled privacy features just go with the randomize-and-pray model. Since there’s a unique MAC for each device, a host connected to a local network can perform a MAC broadcast without any IP bootstrapping, and hopefully find a DHCP server to hand it IP configuration.

I really love MAC addressing and layer 2 stuff, precisely because this stuff works so transparently in most cases and so you don’t have to think about it. It’s very elegant in that way, and I like elegance and autoconfiguration; it’s the computer’s job to worry about the numbers.

On a tangent: admiration for this elegance was the driving force behind IPv6, and I’d argue the only way to understand IPv6 is to see it’s a design intended to bring the fluidity and elegance of local networks to the internet. This runs into a lot of real-world roadblocks and administrative preferences towards centralized control — yet decentralized but coordinated systems are the great triumph of software engineering and I find it beautiful even if there are real-world obstacles.

How does it broadcast its request if it doesn't have an IP address?

DHCP requests are transmitted over UDP with a target destination of the broadcast address, usually 255.255.255.255. The standard says that this packet should have a source address of 0.0.0.0, but in my experience most DHCP servers aren't very picky about that. This packet is just a message going across a wire to every receiver on the local network (ie, up until the gateway), so the ethernet card doesn't need to have an IP address at that time. EDIT: for clarity, it uses the MAC address to identify itself and so the server can properly respond to just the correct machine. This is one of many reasons that getting DHCP to run across network boundaries is an absolute nightmare. /EDIT

The local network is defined by the network mask, right?

For the purposes of TCP/IP, the local network is defined by the netmask. Physical networks (eg, having multiple routers with different subnets plugged into the same big switch) and logical networks (VLANs) can and often are different. This is a space with a lot of namespace collision, so be wary of it.

So with 255.255.255.0 if I send something from 192.168.1.2 192.168.1.3 there's no need for the gateway to be set up...

At the risk of going too deep into the (lies-to-children!) OSI model:

Before doing anything else, the sending computer looks at its ARP table, which converts IP addresses to MAC addresses. If the destination IP address is not on the ARP table, it will send an ARP request, which is a broadcast message to the local network asking if any devices have that IP address (or, if not on the local network, it sends an ARP request for the local gateway). Once it finds the address, it inserts that IP-MAC pair into the ARP table, and uses it as part of the packet and frame shaping.

The computer forms a packet, with a source IP address of 192.168.1.2/24 and a destination of 192.168.1.3/24, at the TCP/IP network layer, or layer three. The ethernet card breaks this into one or more "frames" with a maximum size called the MTU (historically 1500 bytes, but can be larger where hardware supports it), aka the ethernet/MAC data link layer or layer two. It then transmits these frames as signals to the network switch, aka the ethernet physical layer or layer one.

This switch will receive the signals, and convert them into the layer two frame. On older hubs, it would simply echo the frame out every port. On modern switches, it then inspects the frame for a destination MAC address. If the switch has records of receiving frames with a source MAC address matching that destination, it only sends the frame to that specific physical port or ports. If it has no record, it floods the frame out every port, and it's up to the receiving device to filter whether it's address properly. But the switch tables get filed with records pretty quickly

((For older computers, there was a physical layer conversion issue; this is why crossover cables existed. But almost every modern device can automatically switch over.))

but 192.168.2.3 is outside the network and the packets will be routed to the gateway?

In that case, the frame would be configured with a destination MAC of the local gateway, so the switch would look in its MAC table for the MAC of the local gateway, and usually only send the packet to the physical ports of the local gateway. This is layer two switching, not layer three routing.

It's only when the frame gets to the gateway, which reassembles the frame into a packet to inspect the destination IP address, that the gateway examines what the target IP address is, and then routes it by checking its own routing tables and own default gateway.

How does the switch/AP know it should send the request to the wired router and not to one of its other LAN ports?

There are two kinds of network switches/hubs (well, there are more, but at least two). The dumb one just essentially pretend everybody is on the same bus, and so every port gets all the traffic from other ports. This of course is only good for very simple small networks. Smarter switch would remember which IPs and MAC addresses live on which ports and forward the packets accordingly. Of course, smarter switches are more expensive than the dumb ones. For bigger networks you'd have configuration capacity in the switch to tell it which networks live on which ports.

I don't think you'll see a true 'dumb switch' (technical term 'hub') in ethernet from a major store; I haven't seen a new one since back when 10/100mbps switches were just phasing in. But they definitely existed, and it wasn't uncommon for one person to be able to bog down an entire intranet.

In the modern day, the distinction between 'dumb' and 'smart' switches is usually going to emphasize 'smart' switches as having optional routing functionality, (aka 'layer 3 switching'). This technically means that the layer 3 switch has one or more ports that can be configured into a router mode, though in practice it'll be missing a lot of other functionality you'd expect from a small home or office router (almost always missing NAT/PAT, usually not having DHCP or DNS).

How does it broadcast its request if it doesn't have an IP address?

This is where IP and ethernet get a bit blurry. ARP is operating at the raw ethernet level and it's sending out the raw ethernet packet to the ethernet broadcast address. In the packet it has it's IP and the requested IP. Implicit in the packet is the MAC address of the requesting machine. (Deeper dive: https://en.wikipedia.org/wiki/Ethernet_frame)

In most cases you think "I'm IP xxx sending something to IP yyy," the reality is at the ethernet level, the IP stuff is all payload the network really doesn't care about. Internally, everything on the actual network level is working with MAC addresses. IPs are just a really convenient abstraction on top of it. (in this case "network" is the layer 2 of the entire stack -- the data link layer)

The local network is defined by the network mask, right? So with 255.255.255.0 if I send something from 192.168.1.2 192.168.1.3 there's no need for the gateway to be set up, but 192.168.2.3 is outside the network and the packets will be routed to the gateway?

That's correct. Anything on the local subnet stays on your local network. Anything outside gets punted to the gateway to deal with.

This makes me wonder how the packets are routed within the local network, actually. Let's say I'm sending a request from my PC (192.168.1.5) to my NAS (192.168.1.2). The PC is connected to my wireless switch/AP (192.168.1.4), and both the switch/AP and the NAS are connected to the wired router (192.168.1.1). How does the switch/AP know it should send the request to the wired router and not to one of its other LAN ports?

I'm going to cavalierly ignore WiFi in this because it muddies things up and deal with layer 2 of the stack and up and just treat it as a switch. This is what's in my mental model of what's happening in some detail.

  1. You try to access "nas.orthoxerox.com"
  2. DNS lookup for that. Oops, we only have the IP of the DNS server: 192.168.1.254 (making something up)
  3. ARP on ethernet to get the MAC for ...254.
  4. This gets to the switch. It'll broadcast this packet to all its ports. (Once the switch knows that a certain MAC is on a port it remembers it. Most home-grade switches can remember a few thousand MAC addresses)
  5. NAS responds and then the switch and your machine know the MAC of the DNS.
  6. DNS lookup (several round-trips to do this) -- you now know the IP of the NAS. (Since the switch now knows the IP of the DNS, it sends it directly to the port it knows it's on)
  7. ARP for the IP of the NAS. (same as before)
  8. Finally, send an ethernet packet from your machine to the NAS. (Again, from the ethernet perspective, this is sending from your machine to the NAS based on it's MAC address when we're at the low level)

If there are multiple switches between you and the destination, the broadcast just keeps going.

If you want to have some "fun," look up "ARP storm." It's likely one of the few times most networking folks (I'm a programmer) even think about things at that level.

Thanks a lot! How does Ethernet deal with someone pulling a Spartacus and spoofing MAC addresses of existing nodes?

By default, absolutely nothing... you've found one of the common attack surfaces of ethernet! You can use this to do all sorts of malicious things. You can overload the switches by just spamming them with new MAC addresses. You can intercept traffic. General denial of service attacks. Circumventing security. All sorts of mayhem.

So, ways of dealing with this... you can have switches that are configured to only allow an interface with a certain MAC to connect to certain ports. Or you can have softer ways of dealing with this by feeding information from the switch to some variety of intrusion detection system. Similarly, a switch can be configured to ensure that a device DHCPing for an address can't suddenly start using a different MAC.

There's a host of enterprise-y tech being built in this arms race if you want to fund some hardcore security-focused teams. That said, I don't think I've ever encountered (maybe because I'm not an attacker) these in the run-of-the-mill office environments. This is including working at Amazon, which is a bit persnickety on security. I'm quite sure that they're running these things in the data centers though. For something like AWS, they have segregated networks for control-plane traffic (the back-end of the services and how they are configured) and customer traffic. And for customer traffic, everything is on its own VLAN to ensure that I can't make a malicious service that would attack neighboring instances on the same machine or subnet. They also have a bunch of security in place to ensure only trusted clients can connect to services and verify the servers' authenticity.

This is one of the underlying reasons that having good physical security is essential. Once you have access to a network you want to attack, you have a lot more surface area that you can use to attack it while (preferably from the attacker's perspective) remaining undetected.

There are an annoying number of shops that used to love Cisco's port security option, which will lock down an interface on a switch to a certain segmentation of MAC addresses (usually configured in adaptive modes). It's... not as unmanagable as it sounds, though it is very unmanageable and very much something that's usually only helpful against very specific threat models and when paired with a lot of other stuff.

How does it broadcast its request if it doesn't have an IP address?

Because network communication doesn't always require an IP. Think of the network as different technologies arranged in a stack, each building on the last. Specifically, the stack generally looks like:

Ethernet

IP

TCP/UDP

Other protocols on top that (e.g. HTTP)

For DHCP, your machine broadcasts at the Ethernet level which works based on the MAC addresses baked into every network interface. It receives a reply in the same way. And even once you have an IP address, those IP packets will be riding on top of Ethernet frames which are sent out to the local network in much the same way as DHCP traffic is.

For what you're looking for, I would pick up a cheap CompTIA Net+ book and maybe a Cisco CCENT book and read through the chapters you're interested in. They're written to provide the practical understanding that a junior IT tech would need to perform basic network-related tasks and they got me through the first few years of my tech career. You can probably find a ton of them on libgen. I would steer clear of single-topic books (e g. just TCP/IP) since they go into way more depth and detail than you'll ever need (though they are extremely interesting IMO).