CIS 307: Structure and Naming in the Internet

Routers, IP Addresses, Forwarding, Subnetting, CIDR, Private Addresses And Networkd Address Translation (NAT), LAN Addresses, Tunnelling, Firewalls, Virtual Private Networks, Overlay Networks

Routers

A router is a box (often a regular computer) with (at least) two ports (i.e. interfaces), used to connect possibly dissimilar networks and help packets go from a source to a destination. It differs from bridges since it operates at the network level. [It will also use different addresses. For example a bridge may use Ethernet addresses while a router uses IP addresses.] It does all the transformations that may be required by the transfer of packets across the networks it connects.
A router is concerned with where to send packets next as they move from a source to a destination through a set of interconnected networks.
It is convenient to distinguish two activities:

Forwarding (or Switching): The process taking place at a router when it receives a packet and has to decide where to send it next on the basis of its destination and of information available at the router. And
Routing: The process through which routers receive and elaborate the information that they will need in the forwarding process. Usually this information is gathered and transmitted using protocols called routing protocols, and elaborated using algorithms. And usually the gathered information takes the form of routing (or forwarding) tables.

Unfortunately common terminology does not always preserve this distinction, and too often one sees "routing" used in place of "forwarding".

Routing (we really mean forwarding) could be of three kinds (at least!):

Source Routing, where the decision on what intermediate nodes to cross is made before a packet is sent; then the packet knows from the beginning where to go next after arriving at an intermediate node.
Virtual Circuit Routing, where a connection is established before the first packet is sent. Then each packet as it travels will contain as destination the id of a virtual circuit (this id may change as the packet moves across the network) and each intermediate node will contain a table with 4 entries, an entry-port, an entry-virtual-circuit-id, an exit-port, and an exit-virtual-circuit-id. Routing at a node determines the port the packed arrived from and its virtual circuit, then it will send the packet forward with the exit virtual circuit id as new destination and using the exit port as indicated by the table.
[Packet] Routing, where each packet is individually routed in accordance to a next-hop routing table. In these tables, for a given destination, there is usually a single next-hop. Even when there are more than one next-hop, the decision on where to go next is not done on the basis of source of the packet, but on the basis of some form of cost.

We will only consider packet routing.

Routing tables contain information that will indicate for each packet on the basis of its final destination (usually an IP address) where to go next (next-hop forwarding - the address of the next router). If there is no explicit indication of how to get to some destination, a default next-hop will be used. Cycles can exist in the graph that has routers as nodes and links as edges. Routing tables function also in the presence of cycles since packets have a Time-To-Live (TTL) field that is used to limit the number of hops they can go through. It is important that routing tables not be too large.

Evaluation of Routing Algorithms:

Route quality (optimality): network utilization, path length, delay, bandwidth, communication cost, reliability
Overhead (simplicity): control messages, processing, state (i.e. memory required)
Speed of convergence to best routes
Robustness: Responsiveness to topology changes

Routing characteristics:

Centralized/decentralized
Static/Dynamic
Location of decisions (hop-by-hop[decision at each node]/Source-routing[decision at source])
Frequency of decision (per packet, per session, per topology change)
Single Path/Multipath: The routing algorithm may provide alternative routes to be taken to avoid congestion, or improve throughput, ..
Flat or Hierarchical: i.e. all routers are at the same level, or routing takes place at two levels, one to get to the general area, the other to navigate the local neighborhood.
Protocol: Information distribution and route computation algorithm

IP Addresses

Names such as temple.edu are called domain (or network) names and names such as joda.cis.temple.edu are called host names. Domain names and host names are mapped to IP addresses using the Domain Name System (DNS). IP Version 4 addresses, also called IPv4 addresses, or just IP addresses, are 32 bit integers. [IPv6, which is the new version of IP, and which we do not study, uses an IP address with 128 bits.] They are normally written as 4 small integers representing the bytes of the number separated by periods (dotted decimal notation). For example 155.247.182.1 is an IP address. Each IP address consists of two portions, a network identifier and a host identifier. IP addresses are now allocated by IANA and soon will be by ICANN.
There are 5 classes of IP addresses:

Class A: The network identifier is 1 byte and the host identifier is 3 bytes. The network identifier will start with a 0 bit. For example 126.46.31.87 is a class A address. The network identifier is 126, often written 126.0.0.0/8 to stress that it is 8 bits.
Class B: The network identifier is 2 bytes and the host identifier is 2 bytes. The network identifier will start with the bits 10. For example 155.247.170.2 is a class B address. The network identifier is 155.247.0.0/16.
Class C: The network identifier is 3 bytes and the host identifier is 1 byte. The network identifier will start with the bits 110. For example 200.77.88.91 is a class C address. The network identifier is 200.77.88.0/24.
Class D: It starts with the bits 1110 and it is used as a multicast or an anycast address. For example 225.65.90.3 is a class D address. [unicast = sending from a source to a specific destination; broadcast = sending from a source to every destination within a network; multicast = sending from a source to a set of destinations. anycast = sending to any of a set of destinations (think of yourself at the supermarket: you don't care who is the teller that takes care of your business) ]
Class E: It starts with bits 1111 and it is currently not in use.

Hosts with the same network id are usually close to each other. But networks with similar identifiers could be anywhere (for example 155.247 is in the USA and 155.248 could be in Asia). A number of IP addresses have a standard meaning:

+------------+------------+----------+-------------------------------+
| Network    | Host       | Type of  | Purpose                       |
| Identifier | Identifier | Address  |                               |
+------------+------------+----------+-------------------------------+
| all 0s     |  all 0s    | this     | Used during bootstrap to      |
|            |            | computer | ask for own's IP address      |
+------------+------------+----------+-------------------------------+
| Network    |  all 0s    | specified| The specified network,        |
| Identifier |            | network  | independent of its hosts      |
+------------+------------+----------+-------------------------------+
| Network    |  all 1s    | specified| Broadcast address for the     |
| Identifier |            | network  | specified network.            |
+------------+------------+----------+-------------------------------+
| all 1s     |  all 1s    | local    | Broadcast to local network    |
|            |            | network  | only (limited broadcast)      |
+------------+------------+----------+-------------------------------+
| 127        | anything   | loopback | Testing of TCP/IP while not   |
|            |            |          | using the network(loopback)   |
+------------+------------+----------+-------------------------------+

IP addresses are associated to host interfaces, not directly to hosts. In other words, each network interface of a computer system has its own IP address: the map from hosts to IP addresses is one-to-many. In turn a particular host may have more than one host name, though one of the host names is called the canonical name of the host, thus also the map from IP addresses to host names is one-to-many. Mappings between IP addresses and host (and domain) names are managed by DNS. On Unix you can find out about these mappings using the command:

    %  nslookup ip_address-or-host_name

The association can be static, i.e. the interface receives always the same IP address, at start up, or dynamic, i.e. the interface requests and receives its IP address from a central allocator at start up for instance using DHCP

Forwarding with a Simple Routing Table

Here is a portion of a (real) routing table:

	Destination      Gateway            Interface
	================================================
	155.247.71/24    155.247.71.60      ln0
	127.0.0.1        127.0.0.1          lo0
	default          155.247.71.1       ln0
	================================================

155.247.71/24 is the name of the local network, packets to it should be sent out through the interface ln0 to the IP address 155.247.71.60. Notice the notation "../24". That means that we are interested only in the first 24 bits of this address. The consequence is that if we are trying to reach 155.247.71.83, that will match the entry 155.247.71/24. 127.0.0.1 is the loopback address, we can use it to test our networking software even without a network: it is sent through the interface lo0. For any other destination, the packet will be sent to IP address 155.247.71.1 through the interface ln0. The routing table of a Unix machine can be obtained with the command

    % netstat -rn

By the way, if you want to know what are the interfaces and their characteristics of your computer you can use

    % ifconfig -a

For example on my machine I find 3 interfaces, ln0, sl0, and lo0. I can then find more about each interface with, for example,

    % ifconfig -I ln0

In general, if T is a routing table with entries with fields [destination, gateway, interface], and D is the destination, then we execute the program:

	for each row R of the routing table T
	    if (D == T[R].destination) //equality is for the bits significant 
                                       //in T[R].destination
		send packet to T[R].gateway through interface T[R].interface;
		return;
	send packet to T[default].gateway through its interface.

Routing and routing tables are more concerned with reaching networks that with reaching hosts. So in the routing table the destination will denote a network, not an host. Once one reaches the correct network, the local system will worry about local delivery [think of delivery to a host on a LAN, the last step involves translation from IP address to physical address and transmission on the shared medium].

Routing algorithms, i.e. algorithms used to exchange the information needed for computing routing tables, are implemented using routing protocols. Examples of such protocols are RIP (Routing Information Protocol), OSPF(Open Shortest Path First), BGP (Border Gateway Protocol). IRDP (ICMP Router Discovery Protocol) is used to identify routers and to report their identity. The packets exchanged in the routing protocols are called routing packets and they contain control information, i.e. they are overhead.
In a different set of notes we will study the routing algorithms used in conjunction with the OSPF and RIP routing protocols.

Subnetting

The granularity of IP address classes leads often to poor utilization of the address space and to limited ability to address subgroups within a network. The solution is to use Subnetting. Assume that we have a class B network like 155.247. We can partition the host space into 10 bits for subnet id and 6 bits for host id. Thus we have 1024 subnets each with up to 62 hosts (64 - 1 network - 1 broadcast). Subnetting is based on the use of masks. In our example, the subnet mask is 255.255.255.192. The bitwise AND of an IP address with the submask will result in the subnet identity. In our example, if we have the IP address 155.247.182.98, then the subnetwork id, also called extended-network-prefix, is 155.247.182.64 and the subnet is known as 155.247.182.64/26 to stress that it uses 26 bits, leaving the remaining 6 bits for the host-id (which is 34). Notice that from far away packets will go to the network 155.247/16, and once there packets will go to the specific subnet, and from there to the intended host. [The address 155.247.71/24 we encountered earlier, means that the class B network 155.247 is split into 256 subnets each with 254 IP host addresses. In other words, it is as if the class B network was split into class C networks.]

To account for subnetting a routing tables T takes the form:

[subnet-id, subnet-mask, next-hop] where the subnet-id is uniquely defined for a network (i.e. all the subnets of a network share the same mask, i.e. they have the same number of bits). The next-hop is the port (interface) of the router through which the current packet should be forwarded plus the IP of the gateway which is the next host in the path to destination.

Then when an IP address A has to be routed the algorithm used is:

   For each row i of routing table T
       Let D = T[i].subnet-mask BitwiseAnd IP;
       If (D == T[i].subnet-id) then
       {
          Forward packet to T[i].next-hop;
          return;
       }
   Forward packet to default;

Normally, routing moves packets across the internet until the packet arrives to the destination network. Then the packet is directed to a specific host. With subnetting it becomes possible to route packets across the internet to arrive to a specific subnet, and then to move within the subnet to a specific host.

Classless Inter Domain Routing (CIDR)

The ideas of masks and subnetting have been generalized to allow more complex partitions of networks than the one we have just discussed. In particular, variable length subnet masks have been used. This is done with the Classless Inter Domain Routing (CIDR). Now the masks used in routing can be of any size and in matching IP addresses one aims for the longest match. For example, suppose that in a routing table we have a row for the network 1101011110110 and a row for the network 11010111101 then, if we are looking for the destination 11010111101101111110010111010010 we will use the first row since it matches the given destination and it is more specific than the second row. CIDR helps reduce two kinds of problems: the fact that IP addresses are not efficiently allocated using the class oriented schema; and the fact that routing table may grow to be very large. For example, if an ISP controls four Class C addresses:

    200.77.0/24
    200.77.1/24
    200.77.2/24
    200.77.3/24

then these four addresses can be aggregated into a single address

    200.77.00/22

thus requiring a single entry in routing tables instead of four [this is a form of supernetting, the inverse of subnetting].
But what if this ISP has only the networks 200.77.0/24, 200.77.1/24, 200.77.3/24, and another ISP has 200.77.2/24? We can still use aggregation: the first ISP uses the address 200.77.00/22. The second ISP uses 200.77.2/24. Then the entry for 200.77.2/24 is tested before the entry 200.77.00/22. Thus the first ISP, will use the second entry (200.77.00/22 does not match 200.77.2/24) while the second ISP will use the first entry (200.77.02/24 of course matches 200.77.02/24).
Here is another example. ISP X owns 128 class C networks, say 196.74.192.0/15 and sells them to 128 clients. Then in the middle of the internet the ISP can advertize only one address, 196.74.192.0, and when packets reach the ISP the ISP can take over the routing to the correct client class C network.

Private Addresses And Network Address Translation (NAT)

An enterprise may have networks that are mainly intended for internal use, i.e. for communicating within the enterprise, not with the outside. In this case the nodes of the enterprise may use any of the addresses in the following three blocks:

	10.0.0.0    to 10.255.255.255
	172.16.0.0  to 172.31.255.255
	192.168.0.0 to 192.168.255.255
	[also 169.254.0.0-169.254.255.255 are reserved for automatic private IP addressing, 
	but we will not talk about this use]

that are guaranteed never to be used anywhere in the (public) internet. As long as the nodes communicate with each other there is no problem since their IP addresses are "unique" within the network. The problem occurs when this private network is connected to the internet. At issue is what to do when communicating to/from an external node [it is assumed that in this situation the communication will be initiated by the local node]. In this case one can use a Network Address Translation (NAT) device (a router, or firewall, or ad-hoc device) to translate between the local addresses and public addresses that belong to the enterprise. This association, local/public, can be static (and usually for only a part of the local network), or dynamic, taking advantage of the fact that at one time only a few nodes will communicate with the environment.

Another solution is the use of a sophisticated NAT that uses Port Address Translation (PAT). It goes as follows: The NAT/router has a valid IP address (say 197.48.73.25) for use on the internet and in the local network it has a local address. When a local node (say 10.0.0.5) wants to talk to an external IP address (say 155.247.152.12) waiting on a port (say 9876), it sends the message to 155.247.152.12.9876 using a local port (say 8888). The message arrives to the NAT/router [it is acting as the single input/output to the Internet]. The NAT/router replaces the source address and port 10.0.0.5.8888 with 197.48.73.25.12345 where 12345 is a free port of the NAT/router. The NAT/router keeps in a table the map 155.247.152.12,9876 <--> 10.0.0.5.8888. When it receives a reply from 155.247.152.12.9876, it modifies it replacing the destination 197.48.73.25,12345 to redirect it to 10.0.0.5,8888. Checksum need to be recomputed at the NAT/router. People have been able to create very successful NAT products. [The way checksums are computed simplifies things: if the only change is the address, the new checksum is obtained from the old one by adding the old address and subtracting the new address (remember that the checksum is kept in one-complement form) [here is code suggested to compute the new checksum.] ]. Note that while routing is taking place at the network layer, NAT that involves ports is done at the transport layer. In fact people talk of "TCP splicing" when connecting to the NAT box and from there to the true external destination.

LAN Addresses

Autonomous System Numbers (16 bits), IP Addresses (32 or 128 bits), Domain and Host Names, are all "logical" identifiers: they are not physically tied to a specific hardware device. Only when we get close to the physical level, at the Data Link layer, we encounter physical addresses, called LAN Addresses or MAC Addresses (Media Access Control). These physical addresses are used when we finally want to communicate on a LAN. They are usually physically tied to the device (the Network Interface).

IP addresses have to be converted to LAN addresses before we can actually access the devices. ARP (Address Resolution protocol) is the protocol used to convert from IP to LAN addresses. The conversion from LAN addresses to IP addresses can be done with the RARP (Reverse ARP) protocol. LAN addresses are usually 48 bit numbers. At one instant the map between IP addresses and MAC addresses is one-to-one. The stress here is on "at one instant": though usually IP addresses are permanently bound to MAC addresses, it is now possible for a network to dynamically associate IP addresses to interfaces using the Dynamic Host Configuration protocol (DHCP). For instance an ISP may allocate IP addresses dynamically to its clients as they get on line.
You can see the information currently available to arp with the command

   % arp -a

Addressing and routing becomes more complex when we consider mobile computing, i.e. the situation where portable computers move around the world.

Since we are on the issue of names in the Internet, let's remember other names you have encountered in your computing practice:

Universal Resource Identifiers: Universal Resource Identifiers (URI) form a system of universal names for Internet objects. They take the form scheme : path. When the scheme is an existing Internet protocol, the URI is said to be an URL.
Uniform Resource Locators: Uniform Resource Locators (URL) are URI where the scheme corresponds to existing well-known Internet protocols such as HTTP, FTP, mailto, file, .. In URLs the scheme names are case-insensitive. Within an URL can appear only printable ASCII characters. In an URL the following characters are unsafe " " , "<", ">", "#", """, "%", "{", "}", "|", "\", "^", "~", "[", "]", "/", ";", ",", "?", ".", "@", "=", "&" since they may have a special meaning. As such, they can be used only where allowed with the specified meaning. For all other circumstances these characters should be encoded using the form "%xy" where x and y are hexadecimal digits.
E-Mail Addresses: E-mail addresses are well known to us all, as a way to identify interlocutors on the internet. As you can see from the RFC specification, e-mail addresses can be more complex than we usually expect.

(Protocol/IP) Tunneling

Tunneling is the process of placing an entire packet within another packet and sending it over a network. The protocol of the outer packet is understood by the network and the end points, called tunnel interfaces (where the packet enters and exits the network).
Tunneling requires three different protocols:

Carrier protocol - The protocol (IP, PPP, HTTP) used by the network that the information is traveling over.
Encapsulating protocol - The protocol (GRE, IPSec, L2F, PPTP, L2TP, ) that is wrapped around the original data
Passenger protocol - The original data (IPX, NetBeui, IP, TCP) being carried

For example we may have IPv4 supported in an internet, but not IPv6. Yet we want to move across this internet IPv6 traffic. IPv4 is the carrier, IPv6 is the passenger, and may be L2TP is the encapsulating protocol.
In order for tunneling to work we need special device drivers at the sender and receiver to appropriately encapsulate/deencapsulate at the endpoints and to process the content correctly.
An interesting form of tunneling is HTTP Tunneling. It is very convenient for getting around the obstacle of firewalls (they may allow access to only a few ports, usually 80 - HTTP, and 443 - HTTPS). One uses the HTTP protocol as the carrier protocol for non-web applications. If a client wants to access a server (an application) that is behind a firewall, it encloses data in HTTP PUT requests and sends them to a stub/plugin (for instance a CGI script) of an HTTP server placed in front of the application. The stub/plugin unwraps the data and passes it to the application. Similarly on the way back from application to client. If instead a client behind a firewall wants to access an application outside, it will need a stub to wrap its requests, send them to a proxy that will forward them to the server. Similarly for the responses.

Firewall

A firewall is a device (it may be part of a router) used to separate an interior internet from the external Internet. The firewall has a series of roles. It can be used to filter the packets that from the inside go outside (for example, to prevent employes to visit sport news sites during office hours) or that from the outside go inside (for example, to prevent outsiders to probe the individual nodes of the interior internet). It can also be used to log information about the traffic, to identify attacks from the outside, to act as a proxy for the internal nodes so as to hide their identify. Firewalls operate above the network layer, for instance they may recognize and prevent certain TCP connections or may deal directly with application level protocols.

Virtual Private Network (VPN)

A virtual private network is a private network that instead of consisting of private lines and routers, consists of tunnels implemented using IPSec or equivalent secure protocol within a public network infrastructure. Privacy is preserved through encoding (public key to exchange a session secret key, then use of session key for encryption and decryption), authentication and integrity is obtained by attaching a digital signature to each packet. A simple use is for employees to access securely their employer's systems. When a variety of tunnels join a number of sites of an enteprise, the various tunnels form a kind of overlay network, that is a network defined in terms of an existing network, using some of its resources.

Overlay Network

An overlay network is a virtual network defined above the transport layer of an underlying network, with links whose end points are edge nodes of the underlying network. The characteristics of such links, such as latency and data rate are derived from the characteristics of the underlying network. Users of the overlay network need not be concerned about possible dynamic changes in the structure or characteristics of the underlying network. They are used particularly in peer-to-peer (p2p) systems and in wireless systems.