Nothing stupid about them...

...I just stole the name from David Letterman's "stupid pet tricks". I hope these tips help you avoid or fix mistakes along the way in your *nix administration duties.

Tuesday, July 22, 2008

Network Troubleshooting 101

When troubleshooting, use IP addresses instead of names. Name resolution should be tackled last.

Start with question one and work your way down the list in order. Answer "no" and you can stop right there because that's your problem.

1. Does your network interface have a link up?
  • Look for a green LED on the NIC
  • Linux: ethtool eth0 (you should see link ready)
  • Solaris: ndd /dev/bge0 link_status (may be different, man ndd)

2. Can the NIC see other systems on the network?
# arp -a

(issue this command repeatedly because arp cache expires and refreshes frequently)

If so, are the IP addresses that appear on the same subnet?

Alternatively, use tcpdump or snoop to examine the traffic on the interface. Make sure you're connected to the right subnet.


3. Can you ping anything else on this subnet?
# ping 10.123.44.2
(or a known active IP address on the subnet)


4. Can you ping the default gateway? Find the default gateway with
netstat -rn

Solaris output looks like this:
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 10.123.40.1 UG 1 410


Linux looks like this:
Destination     Gateway         Genmask         Flags   MSS Window  irtt
Iface
0.0.0.0 10.123.14.1 0.0.0.0 UG 0 0 0 eth0


If this line does not appear in your netstat -rn, you have to add a default route (man route) to communicate outside of your subnet.


5. Can you traceroute to your destination system?
# traceroute 10.123.40.85
traceroute to beethoven (10.123.40.85), 30 hops max, 38 byte packets
1 chopin (10.123.14.3) 0.300 ms 0.302 ms 0.228 ms
2 mahler (10.123.248.115) 0.361 ms 0.317 ms 0.239 ms
3 shostakovich (10.123.248.83) 0.360 ms 0.313 ms 0.228 ms
4 beethoven (10.123.40.85) 0.358 ms 0.292 ms 0.239 ms
#



6. Is the destination system listening on the expected port? (see /etc/services for port number)

For instance, is the system accepting mail? (port 25)
# telnet 10.123.40.85 25


If you get a response with an escape character, the port is open.



If you pass all these steps, your issue is probably name resolution. Check /etc/hosts, /etc/nsswitch.conf, /etc/resolv.conf, nslookup, etc. to make sure your names are resolving as they should. Hard-code them in /etc/hosts to fix the issue immediately.

0 comments:

Search

Google