Naming in the Internet: Autonomous Systems, Routers, IP Addresses

CIS 307: Structure and Naming in the Internet

Routers, IP Addresses, Autonomous Systems, LAN Addresses

Routers

A router is a box (often a regular computer) with (at least) two ports (i.e. interfaces), used to connect possibly dissimilar networks and help packets go from a source to a destination. It differs from bridges since it operates at the network level. [It will also use different addresses. For example a bridge may use Ethernet addresses while a router uses IP addresses.] It does all the transformations that may be required by the transfer of packets across the networks it connects.
A router is concerned with where to send packets next as they move from a source to a destination through a set of interconnected networks.
It is convenient to distinguish two activities:

Forwarding (or Switching): The process taking place at a router when it receives a packet and has to decide where to send it next on the basis of its destination and of information available at the router. And
Routing: The process through which routers receive and elaborate the information that they will need in the forwarding process. Usually this information is gathered and transmitted using protocols called routing protocols, and elaborated using algorithms. And usually the gathered information takes the form of routing (or forwarding) tables.

Unfortunately common terminology does not always preserve this distinction, and too often one sees "routing" used in place of "forwarding".

Routing (we really mean forwarding) could be of three kinds (at least!):

Source Routing, where the decision on what intermediate nodes to cross is made before a packet is sent; then the packet knows from the beginning where to go next after arriving at an intermediate node.
Virtual Circuit Routing, where a connection is established before the first packet is sent. Then each packet as it travels will contain as destination the id of a virtual circuit (this id may change as the packet moves across the network) and each intermediate node will contain a table with 4 entries, an entry-port, an entry-virtual-circuit-id, an exit-port, and an exit-virtual-circuit-id. Routing at a node determines the port the packed arrived from and its virtual circuit, then it will send the packet forward with the exit virtual circuit id as new destination and using the exit port as indicated by the table.
[Packet] Routing, where each packet is individually routed in accordance to a next-hop routing table. In these tables, for a given destination, there is usually a single next-hop. Even when there are more than one next-hop, the decision on where to go next is not done on the basis of source of the packet, but on the basis of some form of cost.

We will only consider packet routing.

Routing tables contain information that will indicate for each packet on the basis of its final destination (usually an IP address) where to go next (next-hop forwarding - the address of the next router). If there is no explicit indication of how to get to some destination, a default next-hop will be used. Cycles can exist in the graph that has routers as nodes and links as edges. Routing tables function also in the presence of cycles since packets have a Time-To-Live (TTL) field that is used to limit the number of hops they can go through. It is important that routing tables not be too large.

Evaluation of Routing Algorithms:

Route quality (optimality): network utilization, path length, delay, bandwidth, communication cost, reliability
Overhead (simplicity): control messages, processing, state (i.e. memory required)
Speed of convergence to best routes
Robustness: Responsiveness to topology changes

Routing characteristics:

Centralized/decentralized
Static/Dynamic
Location of decisions (hop-by-hop[decision at each node]/Source-routing[decision at source])
Frequency of decision (per packet, per session, per topology change)
Single Path/Multipath: The routing algorithm may provide alternative routes to be taken to avoid congestion, or improve throughput, ..
Flat or Hierarchical: i.e. all routers are at the same level, or routing takes place at two levels, one to get to the general area, the other to navigate the local neighborhood.
Protocol: Information distribution and route computation algorithm

IP Addresses

Names such as temple.edu are called domain (or network) names and names such as joda.cis.temple.edu are called host names. Domain names and host names are mapped to IP addresses using the Domain Name System (DNS). IP Version 4 addresses, also called IPv4 addresses, or just IP addresses, are 32 bit integers. [IPv6, which is the new version of IP, and which we do not study, uses an IP address with 128 bits.] They are normally written as 4 small integers representing the bytes of the number separated by periods (dotted decimal notation). For example 155.247.182.1 is an IP address. Each IP address consists of two portions, a network identifier and a host identifier. IP addresses are now allocated by IANA and soon will be by ICANN.
There are 5 classes of IP addresses:

Class A: The network identifier is 1 byte and the host identifier is 3 bytes. The network identifier will start with a 0 bit. For example 126.46.31.87 is a class A address. The network identifier is 126, often written 126/8 to stress that it is 8 bits.
Class B: The network identifier is 2 bytes and the host identifier is 2 bytes. The network identifier will start with the bits 10. For example 155.247.170.2 is a class B address. The network identifier is 155.247/16.
Class C: The network identifier is 3 bytes and the host identifier is 1 byte. The network identifier will start with the bits 110. For example 200.77.88.91 is a class C address. The network identifier is 200.77.88/24.
Class D: It starts with the bits 1110 and it is used as a multicast address. For example 225.65.90.3 is a class D address.
Class E: It starts with bits 1111 and it is currently not in use.

A number of IP addresses have a standard meaning:

+------------+------------+----------+-------------------------------+
| Network    | Host       | Type of  | Purpose                       |
| Identifier | Identifier | Address  |                               |
+------------+------------+----------+-------------------------------+
| all 0s     |  all 0s    | this     | Used during bootstrap to      |
|            |            | computer | ask for own's IP address      |
+------------+------------+----------+-------------------------------+
| Network    |  all 0s    | specified| The specified network,        |
| Identifier |            | network  | independent of its hosts      |
+------------+------------+----------+-------------------------------+
| Network    |  all 1s    | specified| Broadcast address for the     |
| Identifier |            | network  | specified network.            |
+------------+------------+----------+-------------------------------+
| all 1s     |  all 1s    | local    | Broadcast on local network    |
|            |            | network  |                               |
+------------+------------+----------+-------------------------------+
| 127        | anything   | loopback | Testing of TCP/IP while not   |
|            |            |          | using the network             |
+------------+------------+----------+-------------------------------+

IP addresses are associated to host interfaces, not directly to hosts. In other words, each network interface of a computer system has its own IP address: the map from hosts to IP addresses is one-to-many. In turn a particular host may have more than one host name, though one of the host names is called the canonical name of the host, thus also the map from IP addresses to host names is one-to-many. Mappings between IP addresses and host (and domain) names are managed by DNS. On Unix you can find out about these mappings using the command:

    %  nslookup ip_address-or-host_name

Forwarding with a Simple Routing Table

Here is a portion of a (real) routing table:

	Destination      Gateway            Interface
	================================================
	155.247.71/24    155.247.71.60      ln0
	127.0.0.1        127.0.0.1          lo0
	default          155.247.71.1       ln0
	================================================

155.247.71/24 is the name of the local network, packets to it should be sent out through the interface ln0 to the IP address 155.247.71.60. 127.0.0.1 is the loopback address, we can use it to test our networking software even without a network: it is sent through the interface lo0. For any other destination, the packet will be sent to IP address 155.247.71.1 through the interface ln0. The routing table of a Unix machine can be obtained with the command

    % netstat -rn

By the way, if you want to know what are the interfaces and their chracteristics of your computer you can use

    % ifconfig -a

In general, if T is a routing table with entries with fields [destination, gateway, interface], and D is the destination, then we execute the program:

	for each row R of the routing table T
	    if (D == T[R].destination)
		send packet to T[R].gateway through interface T[R].interface;
		return;
	send packet to T[default].gateway through its interface.

Routing and routing tables are more concerned with reaching networks that with reaching hosts. So in the routing table the destination will denote a network, not an host. Once one reaches the correct network, the local system will worry about local delivery [think of delivery to a host on a LAN, the last step involves translation from IP address to physical address and transmission on the shared medium].

Some demonstrations of routing from Central Queensland University:

Routing algorithms, i.e. algorithms used to exchange the information needed for computing routing tables, are implemented using routing protocols. Examples of such protocols are RIP (Routing Information Protocol), OSPF(Open Shortest Path First), BGP (Border Gateway Protocol). IRDP (ICMP Router Discovery Protocol) is used to identify routers and to report their identity. The packets exchanged in the routing protocols are called routing packets and they contain control information, i.e. they are overhead.
In a different set of notes we will study the routing algorithms used in conjunction with the OSPF and RIP routing protocols.

Subnetting

The granularity of IP address classes leads often to poor utilization of the address space and to limited ability to address subgroups within a network. The solution is to use Subnetting. Assume that we have a class B network like 155.247. We can partition the host space into 10 bits for subnet id and 6 bits for host id. Thus we have 1024 subnets each with up to 62 hosts (64 - 1 network - 1 broadcast). Subnetting is based on the use of masks. In our example, the subnet mask is 255.255.255.192. The bitwise AND of an IP address with the submask will result in the subnet identity. In our example, if we have the IP address 155.247.182.98, then the subnetwork id, also called extended-network-prefix, is 155.247.182.34 and the subnet is known as 155.247.182.34/26 to stress that it uses 26 bits, leaving the remaining 6 bits for the host-id. Notice that from far away packets will go to the network 155.247/16, and once there packets will go to the specific subnet, and from there to the intended host. [The address 155.247.71/24 we encountered earlier, means that the class B network 155.247 is split into 256 subnets each with 254 IP host addresses. In other words, it is as if the class B network was split into class C networks.]

To account for subnetting a routing tables T takes the form:

[subnet-id, subnet-mask, next-hop] where the subnet-id is uniquely defined for a network (i.e. all the subnets of a network share the same mask, i.e. they have the same number of bits). The next-hop is the port (interface) of the router through which the current packet should be forwarded.

Then when an IP address A has to be routed the algorithm used is:

   For each row i of routing table T
       Let D = T[i].subnet-mask BitwiseAnd IP;
       If (D == T[i].subnet-id) then
       {
          Forward packet to T[i].next-hop;
          return;
       }
   Forward packet to default;

Normally, routing moves packets across the internet until the packet arrives to the destination network. Then the packet is directed to a specific host. With subnetting it becomes possible to route packets across the internet to arrive to a specific subnet, and then to move within the subnet to a specific host.

Classless Inter Domain Routing (CIDR)

The ideas of masks and subnetting have been generalized to allow more complex partitions of networks than the one we have just discussed. In particular, variable length subnet masks have been used. This is done with the Classless Inter Domain Routing (CIDR). Now the masks used in routing can be of any size and in matching IP addresses one aims for the longest match. For example, suppose that in a routing table we have a row for the network 1101011110110 and a row for the network 11010111101 then, if we are looking for the destination 11010111101101111110010111010010 we will use the first row since it matches the given destination and it is more specific than the second row. CIDR helps reduce two kinds of problems: the fact that IP addresses are not efficiently allocated using the class oriented schema; and the fact that routing table may grow to be very large. For example, if an ISP controls four Class C addresses:

then these four addresses can be aggregated into a single address

    200.77.00/22

thus requiring a single entry in routing tables instead of four [this is a form of supernetting, the inverse of subnetting].
But what if this ISP has only the networks 200.77.0, 200.77.1, 200.77.2 and another ISP has 200.77.3? We can still use aggregation: the first ISP uses the address 200.77.00/22. The second ISP uses 200.77.3/24. Then the entry for 200.77.3/24 is tested before the entry 200.77.00/22. Thus the first ISP, will use the second entry (200.77.00/22 does not match 200.77.3/24) while the second ISP will use the first entry (200.77.03/24 of course matches 200.77.03/24).

Autonomous Systems

The Internet is a collection of Autonomous Systems (AS) which are connected by routers. ASs, in turn, are collections of local area networks (LANs) connected by routers. Paraphrasing [RFC1930], an Autonomous System (see also RFC1772) is a set of routers under a single technical administration, using an interior gateway protocol and common metrics to route packets within the AS, and using an exterior gateway protocol like BGP (Border Gateway Protocol, the de facto standard for inter-AS routing [BGP-4]), to route packets to other ASes. Alternatively, an AS is defined as a connected group of one or more IP prefixes run by one or more network operators which has a single well defined routing policy. Where an AS may contain many IP prefixes, an IP prefix should belong to a single AS.

Since these definitions were developed, it has become common for a single AS to use several interior gateway protocols and sometimes several metrics. Even when multiple IGPs and metrics are used, the administration of an AS appears to other ASes to follow a single interior routing plan and presents a consistent picture of what networks are reachable through it.

Autonomous System Numbers (ASNs) are globally unique 16-bit numbers that identify autonomous systems (ASes), and enable an AS to exchange exterior routing information with neighboring ASes.

More information on ASs can be obtained by visiting the American registry for Internet Numbers (ARIN) which assigns the AS numbers, http://whois.arin.net/ or by using the whois program. For example,

    whois -h whois.arin.net ASN-TEMPLE

You can also find extensive information at http://boardwatch.internet.com/isp/ including maps of the backbones of many providers.
Also interesting is the DIGEX server.

The Internet routers that connect autonomous systems are called AS Border Routers and they exchange routing information using the BGP protocol, version 4. [The routers that actually transmit AS routing information are called speakers.] In this protocol routers collect and exchange using TCP full path information for reaching other autonomous systems and use this information to carry out routing policies (for instance, making the decision to avoid sending traffic through certain ASs) and build routing tables. The information maintained by a router has a Time-To-Live attribute and it becomes obsolete after it expires. AS Border Routers usually are connected by point-to-point links that support high data rates.
Within a particular autonomous system routers communicate using the OSPF protocol (or using the RIP protocol).

There is a hierarchy of Internet providers. Providers at a tier are peers, i.e. they exchange routing information and forward each other's traffic. Providers at a tier are clients for some provider at the tier above [the default destination], and are servers [the default destination] for some providers at the tier below (to be a client of a server in this context may mean having to pay).

A Tier 1 provider has one or more specific routes to any node on the Internet, or at least to peer nodes from which any other node can be reached. That is, it can either transport Internet traffic anywhere in the world over its own LANs or over those accessible to someone else with which it has a mutual service agreement. A Tier 1 provider is usually treated as a single Autonomous System.
Names of some of the tier 1 providers are: - in the USA: UUNET (MCI WorldCom), with 30% of the backbone capacity, AT & T, GTE's Internetworking, Global Crossing, Qwest Communications International, PSINet; - internationally: Telstra, GlobalTeleSystems Group.

Tier 2 providers are called "regional aggregators". They collect traffic from Tier 3 sites and, if they cannot satisfy them directly, they pass it on to Tier 1 sites. Typically they provide only transport services. A tier 2 provider may also aggreagate IP network addresses.

A Tier 3 provider is the usual "Internet Service Provider" (ISP). ISPs provide transport services and may also provide e-mail and web service.

A Tier 4 provider represents the "backbone LAN" of an organization. It is usually a single autonomous system. Its connection to the outside will go to a tier 3 provider, or, for a sufficiently large organization, directly to a tier 2 provider.

A Tier 5 provider is at the bottom. It is not an autonomous system but one of the LANs that make up such a system.

People distinguish three types of AS:

Stub AS: It is connected to only one other AS. For routing purposes it is treated as part of the parent AS.
Multihomed AS: It is connected to more than one other AS, but does not allow transit traffic. Internally generated traffic can be routed to any of the connected ASs. It is used in large corporate networks that have a number of Internet connections, but do not want to carry traffic for others.
Transit AS: It is connected to more than one other AS and it can be used to carry transit traffic between other AS's.

Tier 1 and Tier 2 providers are usually Transit AS, Tier 3 providers are usually Transit or Multihomed ASs, and the Tier 4 and 5 providers are usually Stub ASs.

Here is a high-level view of the internet, from your workstation, up to the content providers such as yahoo and the New York Times.

LAN Addresses

Autonomous System Numbers (16 bits), IP Addresses (32 or 128 bits), Domain and Host Names, are all "logical" identifiers: they are not physically tied to a specific hardware device. Only when we get close to the physical level, at the Data Link layer, we encounter physical addresses, called LAN Addresses or MAC Addresses (Media Access Control). These physical addresses are used when we finally want to communicate on a LAN. They are usually physically tied to the device (the Network Interface).

IP addresses have to be converted to LAN addresses before we can actually access the devices. ARP (Address Resolution protocol) is the protocol used to convert from IP to LAN addresses. The conversion from LAN addresses to IP addresses can be done with the RARP (Reverse ARP) protocol. LAN addresses are usually 48 bit numbers. At one instant the map between IP addresses and MAC addresses is one-to-one. The stress here is on "at one instant": though usually IP addresses are permanently bound to MAC addresses, it is now possible for a network to dynamically associate IP addresses to interfaces using the Dynamic Host Configuration protocol (DHCP). For instance an ISP may allocate IP addresses dynamically to its clients as they get on line.
You can see the information currently available to arp with the command

   % arp -a

Addressing and routing becomes more complex when we consider mobile computing, i.e. the situation where portable computers move around the world.

Since we are on the issue of names in the Internet, let's remember other names you have encountered in your computing practice:

Universal Resource Identifiers: Universal Resource Identifiers (URI) form a system of universal names for Internet objects. They take the form scheme : path. When the scheme is an existing Internet protocol, the URI is said to be an URL.
Uniform Resource Locators: Uniform Resource Locators (URL) are URI where the scheme corresponds to existing well-known Internet protocols such as HTTP, FTP, mailto, file, .. In URLs the scheme names are case-insensitive. Within an URL can appear only printable ASCII characters. In an URL the following characters are unsafe " " , "<", ">", "#", """, "%", "{", "}", "|", "\", "^", "~", "[", "]", "/", ";", ",", "?", ".", "@", "=", "&" since they may have a special meaning. As such, they can be used only where allowed with the specified meaning. For all other circumstances these characters should be encoded using the form "%xy" where x and y are hexadecimal digits.
E-Mail Addresses: E-mail addresses are well known to us all, as a way to identify interlocutors on the internet. As you can see from the RFC specification, e-mail addresses can be more complex than we usually expect.