How the Net Works
This chapter is included should you wish to understand a little more about how the internet works.
Imagine a group of individuals who decide to share information on their computers by connecting them, and by sending information between these computers. Their efforts result in a set of devices able to communicate with each other via a computer network. Of course, the network can be even more valuable and useful if it is connected to other networks and hence to other computers and network users. This simple desire to connect and share information electronically is manifested today in the global Internet. As the Internet has grown rapidly, the complexity of its interconnections has also increased, and the Internet is literally built up from the interconnection of a tremendous number of networks.
The fundamental task of the Internet can be described as facilitating the journey of digital information from its origin to its destination, using a suitable path and an appropriate mode of transportation.Local computer networks, called Local Area Networks, or LANs, physically connect a number of computers and other devices at the same physical location to one another. They can also connect to other networks via devices called routers that manage the information flow between networks. Computers in a LAN can communicate with each other directly for purposes like sharing files and printers, or playing multi-player networked video games. A LAN could be useful even if it were not connected to the outside world, but it clearly becomes more useful when it is.
The Internet today is a decentralized world-wide network of such local computer networks, as well as larger networks such as university and corporate networks, and the networks of hosting providers.
The organizations that arrange these interconnections between networks are called Internet Service Providers or ISPs. An ISP's responsibility is to deliver data to the appropriate place, usually by forwarding the data to another router (called "the next hop") closer to the data's final destination. Often, the next hop actually belongs to a different ISP.In order to do this, the ISP may purchase its own Internet access from a larger ISP, such as a national provider. (Some countries have only a single national-level provider, perhaps government-operated or government-affiliated, while others have several, which might be competing private telecommunications firms.) National providers may similarly receive their connections from one of the multinational companies that maintain and operate the servers and connections that are often mentioned as the backbone of the Internet.
The backbone is made up of major network equipment installations and global connections between them via fiber-optic cables and satellites. These connections enable communications between Internet users in different countries and continents. National and international providers connect to this backbone through routers sometimes known as gateways, which are connections that allow disparate networks to communicate with each other. These gateways, just like other routers, may be a point at which Internet traffic is monitored or controlled.
Building the Internet
The originators of the Internet generally believed that there is only one Internet, that it is global, and that it should allow any two computers anywhere in the world to communicate directly with one another, assuming the owners of both computers want this to happen.
In a 1996 memo, Brian Carpenter, then chairman of the Internet Architecture Board, wrote:
in very general terms, the [Internet engineering] community believes that the goal is connectivity ... [the] growth of the network seems to show that connectivity is its own reward, and is more valuable than any individual application.
The originators of the Internet created and continue to create standards aimed to make it easier for others to also create their own networks, and to join them to each other. Understanding Internet standards helps make clear how the Internet works and how network sites and services become accessible or inaccessible.
The most basic standard that unites all of the devices on the global Internet is called the Internet Protocol (IP).
Standards for identifying devices on the network
When your computer connects to the Internet, it is normally assigned a numeric IP address. Like a postal address, the IP address uniquely identifies a single computer on the Internet. Unlike the postal address, however, an IP address (particularly for a personal computing device) is not necessarily permanently associated with a specific computer. So, when your computer disconnects from the Internet and reconnects at a later time, it may receive a different (unique) IP address. The IP protocol version currently in predominant use is IPv4. In the IPv4 protocol, an IP address is written as four numbers in the range 0-255, separated by dots (e.g. 207.123.209.9).
Domain names and IP addresses
All Internet servers, such as those which host Web sites, also have IP addresses. For example, the IP address of www.freepressunlimited.org is 195.190.28.213. Since remembering IP addresses is cumbersome and IP addresses might change over time, specific systems are in place to make it easier for you to reach your destination on the Internet. This system is the Domain Name System (DNS), where a set of computers are dedicated to serving your computer with the IP addresses associated with the human-memorable "names".
For example, to access the Free Press Unlimited website you would type in the www.freepressunlimited.org address, also known as a domain name, instead of 195.190.28.213. Your computer then sends a message with this name to a DNS server. After the DNS server translates the domain name into an IP address, it shares that information with your computer. This system makes Web browsing and other Internet applications more human-friendly for humans, and computer-friendly for computers.
Mathematically speaking, IPv4 allows for a pool of about 4.2 billion different computers to be connected to the Internet. There is also technology that lets multiple computers share a single IP address. Despite this, the pool of available addresses was more or less exhausted at the beginning of 2011. As a result, the IPv6 protocol has been devised, with a much larger repository of possible unique addresses. IPv6 addresses are much longer, and even harder to remember, than traditional IPv4 addresses. An example of an IPv6 address is:
2001:0db8:85a3:0000:0000:8a2e:0370:7334
Although as of 2011 less than 1% of the Internet uses the IPv6 protocol, this will probably change dramatically in the near future.
Protocols for sending information through the network
The information you exchange as you use the Internet could take many forms:
- an e-mail to your embassy
- a picture or video of an event
- a database of contact information
- a file containing a set of instructions
- a document containing a report on a sensitive topic
- a computer program that teaches a skill.
There is a wide variety of Internet software to accommodate proper handling of the various forms of information according to specific protocols, such as:
- e-mail via Simple Mail Transport Protocol (SMTP)
- instant messaging via Extensible Messaging and Presence Protocol (XMPP)
- file sharing via File Transfer Protocol (FTP),
- peer-to-peer file sharing via BitTorrent protocol
- Usenet news via Network News Transfer Protocol (NNTP)
- a combination of protocols: voice communication using Voice Over Internet Protocol (VoIP), Session Initiation Protocol (SIP) and Real-time Transport Protocol (RTP)
The Web
Although many people use the terms "the Internet" and "the Web" interchangeably, actually the Web refers to just one way of communicating using the Internet. When you access the Web, you do so using software called a Web browser, such as Mozilla Firefox, Google Chrome, Opera, or Microsoft Internet Explorer. The protocol that the Web operates on is called the Hyper-Text Transfer Protocol or HTTP. You might also have heard of HTTPS, which is the secure version of HTTP that uses Transport Layer Security (TLS) encryption to protect your communications.
Following your information on the Internet - the journey
Let's follow the example of visiting a Web site from your home computer.
Browse to the Web site
- You type in http://freepressunlimited.org/. The computer sends the domain name "freepressunlimited.org" to a selected DNS server, which returns a message containing the IP address for the Free Press Unlimited server (currently, 195.190.28.213).
- The browser then sends a request for a connection to that IP address.
- The request goes through a series of routers, each one forwarding a copy of the request to a router closer to the destination, until it reaches a router that finds the specific computer needed.
- This computer sends information back to you, allowing your browser to send the full URL and receive the data to display the page.
The message from the Web site to you travels through other devices (computers or routers). Each such device along a path can be referred to as a "hop"; the number of hops is the number of computers or routers your message comes in contact with along its way and is often between 5 and 30.
Why This Matters
Normally all of these complex processes are hidden and you don't need to understand them in order to find the information you need. However, when people or organizations attempting to limit your access to information interfere with the operation of the system, your ability to use the Internet may be restricted. In that case, understanding just what they have done to interfere with your access can become extremely relevant.
Consider firewalls, which are devices that intentionally prevent certain kinds of communication between one computer and another. Firewalls help a network owner enforce policies about what kinds of communication and use of a network are allowed. Initially, the use of firewalls was conceived as a computer security measure, because they can help repel electronic attacks against inadvertently misconfigured and vulnerable computers. But firewalls have come to be used for a much wider range of purposes and for enforcing policies far beyond the purview of computer security, including content controls.
Another example is DNS servers, which were described as helping provide IP addresses corresponding to requested domain names. However, in some cases, these servers can be used as censoring mechanisms by preventing the proper IP address from being returned, and effectively blocking access to the requested information from that domain.
Censorship can occur at different points in the Internet infrastructure, covering whole networks, domains or subdomains, individual protocols, or specific content identified by filtering software. The best method to avoid censorship will depend on the specific censorship technique used. Understanding these differences will help you to choose appropriate measures for you to use the Internet effectively and safely.
Ports and Protocols
In order to share data and resources, computers need to agree on conventions about how to format and communicate information. These conventions, which we call protocols, are sometimes compared to the grammar of human languages. The Internet is based on a series of such protocols.
The layered networking model
Internet protocols rely on other protocols. For example, when you use a Web browser to access a Web site, the browser relies on the HTTP or HTTPS protocol to communicate with the Web server. This communication, in turn, relies on other protocols. Suppose we are using HTTPS for a particular Web site to ensure that we access it securely.
In the above example, the HTTPS protocol relies on the TLS protocol to perform encryption of the communications so that they are private and unmodified as they travel across the network. The TLS protocol, in turn, relies on the TCP protocol to ensure that information is not accidentally lost or corrupted in transmission. Finally, TCP relies on the IP protocol to ensure that data is delivered to the intended destination.
While using the encrypted HTTPS protocol, your computer still uses the unencrypted DNS protocol for retrieving an IP address for the domain name. The DNS protocol uses the UDPprotocol to mark the request for proper routing to a DNS server, and UDP relies on IP for actual transmission of data to the intended destination.
Because of this hierarchical protocol relationship, we often refer to network protocols as existing in a set of layers. A protocol at each layer is responsible for a particular aspect of the communications functionality.
Using Ports
Computers connect to each other via the TCP protocol mentioned above and stay connected for a period of time to allow higher-level protocols to carry out their tasks. TCP uses a concept of numbered ports to manage these connections and distinguish connections from one another. The use of numbered ports also allows the computer to decide which particular software should handle a specific request or piece of data. (UDP also uses port numbers for this purpose.)
The IANA (Internet Assigned Names Authority) assigns port numbers for various higher-level protocols used by application services. A few common examples of the standard assigned port numbers are:
- 20 and 21 - FTP (file transfer)
- 22 - SSH (secure shell remote access)
- 23 - Telnet (insecure remote access)
- 25 - SMTP (send e-mail)
- 53 - DNS (resolves a computer's name to an IP address)
- 80 - HTTP (normal Web browsing; also sometimes used for a proxy)
- 110 - POP3 (receive e-mail)
- 143 - IMAP (send/receive e-mail)
- 443 - HTTPS (secure Web connections)
- 993 - secure IMAP
- 995 - secure POP3
- 1080 - SOCKS proxy
- 1194 - OpenVPN
- 3128 - Squid proxy
- 8080 - Standard HTTP-style proxy
Using these particular numbers is not generally a technical requirement of the protocols; in fact, any sort of data could be sent over any port (and using non standard ports can be a useful circumvention technique). However, these assignments are used by default, for convenience. For example, your Web browser knows that if you access a Web site without specifying any port number, it should automatically try using port 80. Other kinds of software have similar defaults so that you can normally use Internet services without knowing or remembering the port numbers associated with the services you use.