CIS 307: Structure and Naming in the Internet
Routers,
IP Addresses,
Forwarding,
Subnetting,
CIDR,
Private Addresses And Networkd Address Translation
(NAT),
LAN Addresses,
Tunnelling,
Firewalls,
Virtual Private Networks,
Overlay Networks
A router is a box (often a regular computer) with (at least)
two ports (i.e. interfaces), used to
connect possibly dissimilar networks and help packets go
from a source to a destination.
It differs from bridges since it operates at
the network level. [It will also use different addresses.
For example a bridge may use Ethernet addresses while a router uses
IP addresses.]
It does all the transformations that may be required by the transfer
of packets across the networks it connects.
A router is concerned with where to send packets next as they move
from a source to a destination through a set of interconnected networks.
It is convenient to distinguish two activities:
- Forwarding (or Switching): The process taking place at a router
when it receives a packet and has to decide where
to send it next on the basis of its destination and of information
available at the router. And
- Routing: The process through which routers receive and elaborate the
information that they will need in the forwarding process. Usually this
information is gathered and transmitted using protocols called
routing protocols, and elaborated using algorithms.
And usually the gathered information takes
the form of routing (or forwarding) tables.
Unfortunately common terminology does not always preserve this
distinction, and too often one sees "routing" used in place
of "forwarding".
Routing (we really mean forwarding) could be of three kinds (at least!):
- Source Routing, where
the decision on what intermediate nodes to cross is made before a packet is
sent; then the packet knows from the beginning where to go next after
arriving at an intermediate node.
- Virtual Circuit Routing, where
a connection is established before the first packet is sent. Then each packet
as it travels will contain as destination
the id of a virtual circuit (this id may change
as the packet moves across the network) and each intermediate node will contain
a table with 4 entries, an entry-port, an entry-virtual-circuit-id,
an exit-port, and an exit-virtual-circuit-id. Routing at a node determines
the port the packed arrived from and its virtual
circuit, then it will send the packet forward with the exit virtual
circuit id as new destination
and using the exit port as indicated by the table.
- [Packet] Routing, where each packet is individually routed
in accordance to a next-hop routing table. In these tables, for a given destination,
there is usually a single next-hop. Even when there are more than one next-hop,
the decision on where to go next is not done on the basis of source of the
packet, but on the basis of some form of cost.
We will only consider packet routing.
Routing tables contain information
that will indicate for each packet on the basis of its final destination
(usually an IP address)
where to go next (next-hop forwarding -
the address of the next router). If there is no explicit
indication of how to get to some destination, a default next-hop will be
used.
Cycles can exist in the graph that
has routers as nodes and links as edges.
Routing tables function also in the presence of cycles since packets have
a Time-To-Live (TTL) field that is used to limit the number
of hops they can go through.
It is important that routing tables not be too large.
Evaluation of Routing Algorithms:
- Route quality (optimality): network utilization, path length, delay,
bandwidth, communication cost, reliability
- Overhead (simplicity): control messages, processing,
state (i.e. memory required)
- Speed of convergence to best routes
- Robustness: Responsiveness to topology changes
Routing characteristics:
- Centralized/decentralized
- Static/Dynamic
- Location of decisions (hop-by-hop[decision at each node]/Source-routing[decision at source])
- Frequency of decision (per packet, per session, per topology change)
- Single Path/Multipath: The routing algorithm may provide alternative
routes to be taken to avoid congestion, or improve throughput, ..
- Flat or Hierarchical: i.e. all routers are at the same level, or
routing takes place at two levels, one to get to the general
area, the other to navigate the local neighborhood.
- Protocol: Information distribution and route computation algorithm
Names such as temple.edu are called domain (or network) names and
names such as joda.cis.temple.edu are called host names.
Domain names and host names are mapped to IP addresses using the Domain
Name System (DNS).
IP Version 4 addresses, also called IPv4 addresses, or just IP
addresses, are 32 bit integers. [IPv6, which is the new version of IP, and
which we do not study, uses an IP address with 128 bits.]
They are normally written as
4 small integers representing the bytes of the number separated by periods
(dotted decimal notation).
For example 155.247.182.1 is an IP address. Each IP address consists of two
portions, a network identifier and a host identifier.
IP addresses are now allocated by IANA
and soon will be by ICANN.
There are 5 classes of IP addresses:
- Class A: The network identifier is 1 byte and the host identifier is
3 bytes. The network identifier will start with a 0 bit. For example
126.46.31.87 is a class A address. The network identifier is 126,
often written 126.0.0.0/8 to stress that it is 8 bits.
- Class B: The network identifier is 2 bytes and the host identifier
is 2 bytes. The network identifier will start with the bits 10. For example
155.247.170.2 is a class B address. The network identifier is
155.247.0.0/16.
- Class C: The network identifier is 3 bytes and the host identifier is
1 byte. The network identifier will start with the bits 110. For
example 200.77.88.91 is a class C address. The network identifier
is 200.77.88.0/24.
- Class D: It starts with the bits 1110 and it is used as a
multicast or an anycast address. For example 225.65.90.3 is a class D
address.
[unicast = sending from a source to a specific destination;
broadcast = sending from a source to every destination
within a network;
multicast = sending from a source to a set of destinations.
anycast = sending to any of a set of destinations (think of
yourself at the supermarket: you don't care who is the teller that takes
care of your business) ]
- Class E: It starts with bits 1111 and it is currently not in use.
Hosts with the same network id are usually close to each other. But
networks with similar identifiers could be anywhere (for example 155.247
is in the USA and 155.248 could be in Asia).
A number of IP addresses have a standard meaning:
+------------+------------+----------+-------------------------------+
| Network | Host | Type of | Purpose |
| Identifier | Identifier | Address | |
+------------+------------+----------+-------------------------------+
| all 0s | all 0s | this | Used during bootstrap to |
| | | computer | ask for own's IP address |
+------------+------------+----------+-------------------------------+
| Network | all 0s | specified| The specified network, |
| Identifier | | network | independent of its hosts |
+------------+------------+----------+-------------------------------+
| Network | all 1s | specified| Broadcast address for the |
| Identifier | | network | specified network. |
+------------+------------+----------+-------------------------------+
| all 1s | all 1s | local | Broadcast to local network |
| | | network | only (limited broadcast) |
+------------+------------+----------+-------------------------------+
| 127 | anything | loopback | Testing of TCP/IP while not |
| | | | using the network(loopback) |
+------------+------------+----------+-------------------------------+
IP addresses are associated to host interfaces, not directly to hosts.
In other words, each network interface of a computer system has its own
IP address: the map from hosts to IP addresses is one-to-many. In turn
a particular host may have more than one host name, though one of the
host names is called the canonical name of the host, thus also the map
from IP addresses to host names is one-to-many. Mappings between IP addresses
and host (and domain) names are managed by DNS. On Unix you can find out
about these mappings using the command:
% nslookup ip_address-or-host_name
The association can be static, i.e. the interface receives always the same IP address, at start up, or
dynamic, i.e. the interface requests and receives its IP address from a central allocator at start up
for instance using
DHCP
Here is a portion of a (real) routing table:
Destination Gateway Interface
================================================
155.247.71/24 155.247.71.60 ln0
127.0.0.1 127.0.0.1 lo0
default 155.247.71.1 ln0
================================================
155.247.71/24 is the name of the local network, packets to it should be
sent out through the interface ln0 to the IP address 155.247.71.60.
Notice the notation "../24". That means that we are interested only in
the first 24 bits of this address. The consequence is that if we are
trying to reach 155.247.71.83, that will match the entry 155.247.71/24.
127.0.0.1 is the loopback address, we can use it to test our
networking software even without a network: it is sent through the
interface lo0. For any other destination, the packet will be sent to IP
address 155.247.71.1 through the interface ln0. The routing table of a
Unix machine can be obtained with the command
% netstat -rn
By the way, if you want to know what are the interfaces and their
characteristics of your computer you can use
% ifconfig -a
For example on my machine I find 3 interfaces, ln0, sl0, and lo0. I can then
find more about each interface with, for example,
% ifconfig -I ln0
In general, if T is a routing table with entries with fields [destination,
gateway, interface], and D is the destination, then we execute the program:
for each row R of the routing table T
if (D == T[R].destination) //equality is for the bits significant
//in T[R].destination
send packet to T[R].gateway through interface T[R].interface;
return;
send packet to T[default].gateway through its interface.
Routing and routing tables are more concerned with reaching networks that with
reaching hosts. So in the routing table
the destination will denote a network, not an host. Once one
reaches the correct network, the local system will worry about local delivery
[think of delivery to a host on a LAN, the last step involves translation
from IP address to physical address and transmission on the shared medium].
Routing algorithms, i.e. algorithms used to exchange the information
needed for computing routing tables, are
implemented using routing protocols. Examples of such protocols
are RIP (Routing Information Protocol),
OSPF(Open Shortest Path First),
BGP (Border Gateway Protocol).
IRDP (ICMP Router Discovery Protocol) is used
to identify routers and to report their identity.
The packets exchanged in
the routing protocols are called routing packets and they contain
control information, i.e. they are overhead.
In a different set of notes we will study the routing algorithms used in
conjunction with the OSPF and RIP routing protocols.
The granularity of IP address classes leads often to poor
utilization of the address space and to limited ability to address
subgroups within a network. The solution is to use Subnetting.
Assume that we have a class B network like 155.247. We can
partition the host space into 10 bits for subnet id and 6 bits for host
id.
Thus we have 1024 subnets each with up to 62 hosts (64 - 1 network
- 1 broadcast).
Subnetting is based on the use of masks. In our example, the subnet mask
is 255.255.255.192. The bitwise AND of an IP address with the submask will
result in the subnet identity. In our example, if we have the IP
address 155.247.182.98, then the subnetwork id,
also called extended-network-prefix, is 155.247.182.64 and the
subnet is known as 155.247.182.64/26 to stress that it uses 26 bits, leaving
the remaining 6 bits for the host-id (which is 34).
Notice that from far away packets will
go to the network 155.247/16, and once there packets will go to the specific
subnet, and from there to the intended host. [The address 155.247.71/24 we encountered
earlier, means that the class B network 155.247 is split into 256 subnets each with
254 IP host addresses. In other words, it is as if the class B network was
split into class C networks.]
To account for subnetting a routing tables T takes the form:
[subnet-id, subnet-mask, next-hop]
where the subnet-id is uniquely defined for a network (i.e. all the
subnets of a network share the same mask, i.e. they have the same
number of bits). The next-hop is the port (interface) of the router through
which the current packet should be forwarded plus the IP of the gateway which
is the next host in the path to destination.
Then when an IP address A has to be routed the algorithm used is:
For each row i of routing table T
Let D = T[i].subnet-mask BitwiseAnd IP;
If (D == T[i].subnet-id) then
{
Forward packet to T[i].next-hop;
return;
}
Forward packet to default;
Normally, routing moves packets across the internet until the packet
arrives to the destination network. Then the packet is directed to
a specific host. With subnetting it becomes possible to route packets
across the internet to arrive to a specific subnet, and then to move
within the subnet to a specific host.
The ideas of masks and subnetting have been generalized to allow
more complex partitions of networks than the one we have just
discussed. In particular, variable length subnet masks have been used.
This is done with the
Classless Inter Domain Routing (CIDR).
Now the masks used in routing can be of any size and in matching
IP addresses one aims for the longest match. For example,
suppose that in a routing table we have a row for the network
1101011110110 and a row for the network 11010111101 then, if we are looking
for the destination 11010111101101111110010111010010 we will use the
first row since it matches the given destination and it is more specific
than the second row. CIDR helps reduce two kinds of problems: the fact
that IP addresses are not efficiently allocated using the class oriented
schema; and the fact that routing table may grow to be very large.
For example, if an ISP controls four Class C addresses:
200.77.0/24
200.77.1/24
200.77.2/24
200.77.3/24
then these four addresses can be aggregated into a single address
200.77.00/22
thus requiring a single entry in routing tables instead of four [this is a form of supernetting,
the inverse of subnetting].
But what if this ISP has only the networks 200.77.0/24, 200.77.1/24,
200.77.3/24, and
another ISP has 200.77.2/24? We can still use aggregation: the first ISP
uses the address 200.77.00/22. The second ISP uses 200.77.2/24. Then the
entry for 200.77.2/24 is tested before the entry 200.77.00/22. Thus the
first
ISP, will use the second entry (200.77.00/22 does not match 200.77.2/24)
while the second ISP will use the first entry (200.77.02/24 of course
matches 200.77.02/24).
Here is another example. ISP X owns 128 class C networks, say 196.74.192.0/15 and
sells them to 128 clients.
Then in the middle of the internet the ISP can advertize only one address,
196.74.192.0, and when packets reach the ISP the ISP can take over the routing to
the correct client class C network.
An enterprise may have networks that are mainly intended for internal use,
i.e. for communicating within the enterprise, not with the outside.
In this case the nodes of the enterprise may use
any of the addresses in the following three blocks:
10.0.0.0 to 10.255.255.255
172.16.0.0 to 172.31.255.255
192.168.0.0 to 192.168.255.255
[also 169.254.0.0-169.254.255.255 are reserved for automatic private IP addressing,
but we will not talk about this use]
that are guaranteed never to be used
anywhere in the (public) internet.
As long as the nodes communicate with each other there is no problem
since their IP addresses are "unique" within the network.
The problem occurs when this private network is connected to the internet.
At issue is what to do when
communicating to/from an external node [it is assumed that in this
situation the communication
will be initiated by the local node].
In this case one can use a Network Address Translation (NAT) device
(a router, or firewall, or ad-hoc device) to translate between
the local addresses and public addresses that belong to the enterprise.
This association, local/public, can be static (and usually for only a
part of the local network), or dynamic, taking advantage of the fact that
at one time only a few nodes will communicate with the environment.
Another solution is the use of a sophisticated NAT that uses Port
Address Translation (PAT). It goes as follows:
The NAT/router has a valid IP address (say 197.48.73.25) for use on the
internet and in the local network it has a local address.
When a local node (say 10.0.0.5) wants to
talk to an external IP address
(say 155.247.152.12) waiting on a port (say 9876), it sends the message to
155.247.152.12.9876 using a local port (say 8888). The message arrives to
the NAT/router [it is acting as the single input/output to the Internet]. The NAT/router
replaces the source address and port 10.0.0.5.8888 with 197.48.73.25.12345
where 12345 is a free port of the NAT/router. The NAT/router keeps in a table the map
155.247.152.12,9876 <--> 10.0.0.5.8888. When it receives a reply from
155.247.152.12.9876, it modifies it replacing the
destination 197.48.73.25,12345
to redirect it to
10.0.0.5,8888.
Checksum need to be recomputed at the NAT/router. People have been
able to create very successful
NAT products. [The way checksums are computed simplifies things:
if the only change is the address, the new checksum is obtained from the
old one by adding the old address and subtracting the new address
(remember that the checksum is kept in one-complement form)
[here is code suggested to compute the new
checksum.]
]. Note that while routing is taking place at the network layer, NAT
that involves ports is done at the transport layer. In fact people talk of "TCP
splicing" when connecting to the NAT box and from there to the true external
destination.
Autonomous System Numbers (16 bits), IP Addresses (32 or 128 bits),
Domain and Host Names, are all "logical" identifiers: they are not
physically tied to a specific hardware device. Only when we get close to the
physical level, at the Data Link layer, we encounter physical addresses,
called LAN Addresses or MAC Addresses (Media Access Control).
These physical addresses are used when we finally want to communicate on a LAN.
They are usually physically tied to the device (the Network Interface).
IP addresses have to be converted to LAN addresses before we can actually
access the devices. ARP (Address Resolution protocol)
is the protocol used to convert from IP
to LAN addresses. The conversion from LAN addresses to IP addresses
can be done with the RARP (Reverse ARP) protocol. LAN addresses
are usually 48 bit numbers. At one instant the map between IP addresses and
MAC addresses is one-to-one. The stress here is on "at one instant": though
usually IP addresses are permanently bound to MAC addresses, it is now possible
for a network to dynamically associate IP addresses to interfaces using
the Dynamic Host Configuration protocol
(DHCP). For
instance an ISP may allocate IP addresses dynamically to its clients as they
get on line.
You can see the information currently available to arp with the command
% arp -a
Addressing and routing becomes more complex when we consider
mobile computing, i.e. the situation where
portable computers move around the world.
Since we are on the issue of names in the Internet, let's remember other names
you have encountered in your computing practice:
- Universal Resource Identifiers
- Universal
Resource Identifiers (URI) form a system of universal names for
Internet objects. They take the form scheme : path. When the scheme is
an existing Internet protocol, the URI is said to be an URL.
- Uniform Resource Locators
-
Uniform Resource Locators (URL) are URI where the scheme corresponds to
existing well-known Internet protocols such as HTTP, FTP, mailto, file, ..
In URLs the scheme names are case-insensitive. Within an URL
can appear only printable ASCII characters. In an URL the following characters
are unsafe " " , "<", ">", "#", """, "%", "{", "}", "|", "\", "^",
"~", "[", "]", "/", ";", ",", "?", ".", "@", "=", "&" since they
may have a special meaning. As such, they can be used only where allowed
with the specified meaning. For all other circumstances these characters
should be encoded using the form "%xy" where x and y are hexadecimal digits.
- E-Mail Addresses
- E-mail
addresses are well known to us all, as a way to identify interlocutors
on the internet. As you can see from the RFC specification, e-mail
addresses can be more complex than we usually expect.
Tunneling
is the process of placing an entire packet within another packet
and sending it over a network. The protocol of the outer packet is
understood by the network and the end points, called tunnel interfaces
(where the packet enters and exits the network).
Tunneling requires three different protocols:
- Carrier protocol - The protocol (IP, PPP, HTTP)
used by the network that
the information is traveling over.
- Encapsulating protocol - The protocol (GRE, IPSec,
L2F, PPTP, L2TP, )
that is wrapped around the original data
- Passenger protocol - The original data
(IPX, NetBeui, IP, TCP) being carried
For example we may have IPv4 supported in an
internet, but not IPv6. Yet we want to move across this internet IPv6
traffic. IPv4 is the carrier, IPv6 is the passenger, and may be L2TP
is the encapsulating protocol.
In order for tunneling to work we need special device drivers at the
sender and receiver to appropriately encapsulate/deencapsulate at the
endpoints and to process the content correctly.
An interesting form of tunneling is HTTP Tunneling.
It is very convenient
for getting around the obstacle of firewalls (they may allow access to only
a few ports, usually 80 - HTTP, and 443 - HTTPS). One uses the HTTP protocol
as the carrier protocol for non-web applications. If a client wants to access
a server (an application) that is behind a firewall, it encloses data in HTTP
PUT requests and sends them to a stub/plugin (for instance a CGI script)
of an HTTP server placed in front of the application.
The stub/plugin unwraps the data and passes it to
the application. Similarly on the way back from application to client.
If instead a client behind a firewall wants to access an application outside,
it will need a stub to wrap its requests, send them to a proxy that will
forward them to the server. Similarly for the responses.
A
firewall
is a device (it may be part of a router) used to separate an
interior internet from the external Internet.
The firewall has a series of roles. It can be used to
filter the packets that from the inside go outside (for example, to
prevent employes to visit sport news sites during office hours)
or that from the outside go inside (for example, to prevent outsiders to
probe the individual nodes of the interior internet). It can also be used
to log information about the traffic, to identify attacks from the
outside, to act as a proxy for the internal nodes so as to hide their
identify. Firewalls operate above the network layer, for instance
they may recognize and prevent certain TCP connections or may deal
directly with application level protocols.
Virtual Private Network (VPN)
A
virtual private network
is a private network that instead of consisting
of private lines and routers, consists of tunnels implemented using IPSec
or equivalent secure protocol within a public network
infrastructure. Privacy is preserved through encoding (public key to
exchange a session secret key, then use of session key for encryption and
decryption), authentication and
integrity is obtained by attaching a digital signature to each packet. A
simple use is for
employees to access securely their employer's systems. When a variety of
tunnels join a number of sites of an enteprise, the various tunnels form
a kind of overlay network, that is a network defined in terms of an
existing network, using some of its resources.
An
overlay network
is a virtual network defined above the transport layer of an
underlying network, with links whose end points are edge nodes of the underlying
network. The characteristics of such links, such as latency and data rate
are derived from the characteristics of the underlying network. Users of the
overlay network need not be concerned about possible dynamic changes in the structure
or characteristics of the underlying network. They are used particularly
in peer-to-peer (p2p) systems and in wireless systems.