A Basic Rundown of How the Internet Works
So how does the internet work? How does your desktop, phone, laptop, Xbox, television, all get access to YouTube or Facebook? It’s actually kind of miraculous that something that was once used as a long distance communication protocol to share research amongst universities is now being used as what I think is the hottest commodity of the modern world. What’s even more miraculous is that the internet is just a bunch of machines playing the largest game of telephone at an incredibly blistering rate.
Sections
- Finding the IP with DNS
- Creating a connection with TCP
- Requesting and Sending data with HTTP
- Rendering the DOM
Finding the IP address with DNS
So the first thing your browser needs is a website hyperlink so it can ask it to bring back what it finds there. Okay perfect, you got one, youtube.com, google.com, facebook.com, all your favorite sites all have names. Unfortunately, computers don’t handle people names very well, what they need is an IP address, a series of four separate numbers from [0-255] divided by periods. Fortunately for the browser there is a distributed library of domain names to IP address mappings that the browser can use to find out the IP address behind the domain. This ‘library’ is called the Domain Name Server (DNS).
If I may extend that terrible library analogy, think of the configured DNS as a librarian and all the domains and IP addresses in the internet as the books. There are many Domain Name Servers in the library of IP addresses just like there are many librarians in a library. When you ask your local librarian to find a book for you, they may know where it is immediately, or they may need to ask someone higher. Same with DNS’s. This someone higher is mainly just a manager of all the other librarians that fall under their specific domain and will redirect you to a higher librarian that might have the knowledge. This can repeat several times up the chain as there are a lot of higher level librarians to consult. Each higher level librarian may have the answer, but if they don’t, then the query goes all the way up to the Head Librarian. The same applies for DNS’s. Each DNS might not have the answer but each has a list of several others it can consult to find the answer. This asking around continues until it reaches the Root Domain. Root Domain’s don’t have the answer directly, but they know the perfect librarian who does. The way that Root Domains know which DNS to use is by checking the domain of the url, or the rightmost part of the url. If you’re asking for a url that ends in .com the root domain give you a name server that specializes in domains that end in .com, if you’re asking for a url that ends in .net the root domain will give you a name server that specializes in domains that end in .net. Your original librarian or DNS will finally get their answer and pass it back to the requestor, or the browser. You can query the DNS yourself to find out the IP address of your favorite sites with a tool like who.is. Try putting your favorite site in there and copying and pasting the IP address in your url. It may seem strange to put a bunch of cryptic numbers in your browser but that’s essentially what’s happening in the background whenever you type in your favorite site.
Creating a connection with TCP
So now that the browser has the IP address we just need to send a message asking the server that’s hosting the website to give us the webpage, but before we do that, we need to check if anybody is even there to respond.
If we may continue using terrible analogies imagine if your browser and the server are testing a snail-mail delivery system for the first time and they want to validate that they are able to successfully send each other letters accurately. Your browser initiates the conversation by sending a letter to the server’s IP address that has a header saying “Hello, my name is ‘Browser Client’. My sequence number is 10000. Please acknowledge back with my sequence number + 1, and your sequence number so that I know you got this message accurately”. Days pass after sending the letter until finally the mailman comes with a response from the server. Browser hurriedly scans the headers of the letter which reads “Hello ‘Browser Client’, my name is ‘Server’. I acknowledge your sequence number is now 10001, my sequence number is 90000. Please respond back with my sequence number + 1 to confirm if you got my message clearly”. Finally the Browser Client sends one final handshake to seal the deal and open a connection, this message contains something like “Hello ‘Server’ nice to hear from you. I acknowledge that your sequence number is now 90001”. The acknowledgements that each client and server send to each other let them know that their message has been received and that they are now synched to each other and ready to receive more data. So as you might have guessed, the letters represent TCP packets, the header of the letters represent the packet flags, there really is no body in the letters yet since we’re just trying to establish a connection and have not yet passed data (other than the information provided in the packet headers), and the postal service are the routers and switches that deliver the letters. A successful TCP connection allows both Browser Client and Server to start sending each other data.
An open connection means both client and server can start sending each other text, images, videos, binary files, anything as long as it can be serialized into bits. Sequence numbers are randomly generated by both sides and the amount of bytes that are passed are remembered and added to the initial sequence numbers. This ensures that each side has a way to ensure that the proper amount of bytes have been received by checking the updated sequence numbers from the receiver being equal to the amount of bytes sent in the previous transmission. This is why TCP is considered to be a reliable protocol because it will catch and fix broken packets containing incorrect messages.
Requesting and Sending Data with HTTP
So now we know the address, and now we’ve secured a connection, its time to start checking what our friends are doing on Facebook or check out that new movie on Netflix. The way that information is passed between the client and the server is through the HyperText Transfer Protocol (HTTP). The protocol kind of defines the language that both client and server will be using when communicating with each other. There’s a set amount of agreed upon actions that can be used when communicating through HTTP. The most commonly used words used are GET and POST. GET usually indicates the client receiving a file from the server, its akin to a READ action in terms of local storage. POST usually indicates a transference of data that will change something on the server, akin to a WRITE action. Lets simulate a GET request.
The Browser Client sends a GET request asking for some content lets say an index.html file. This GET request contains information that the server program would use to process and send back the appropriate file/text/image/binary file. The server software can read extra parameters like the URL, the attached cookies, and the user-agent to decide which kind of response to send back. The server then responds back with a response packet usually with a data payload along with important headers like the status code, content-type, and content-length so that whoever is receiving it can interpret the data correctly. In this case, the browser sends the GET request for index.html to the server and waits for a response. The server receives the GET request for index.html and sends back a response packet with the index.html in the body and sets the status to ‘200 OK’ and the content-type to text-html. The browser then receives the response packet, identifies that it is an html file and displays it to the user appropriately. If the content type is an image, it’ll try to render an image from the data, if the content-type is JSON it’ll render try to keep it in JavaScript Object form, if the content-type is XML it’ll try to display the XML, but usually, the content type is plain old html.
POST requests are kind of sent in the same way. The primary difference between a POST request and a GET request is that POSTs are not restricted in the type and size of data they can send. As I said earlier POSTs are usually used to indicate to the server to change or update something. Usually the role of POST gets a little muddled with other HTTP request methods such as PUT and DELETE which places and deletes the requested URI resource respectively. This has kind of been a problem with the web as these HTTP actions are not hard rules but rather guidelines open to interpretation. Most of the action verbs do the same thing, carry a URI with the same headers with an attached data body. This is why it’s up to the developer of the server software to decide how to handle the action verbs. Most of the time it just boils down to, if it changes something on the server, make it a POST.
Why do we use HTTP and not a protocol that’s more suited toward the modern web? Why do we still use a protocol that can only send one file per request and have to open a new connection with the same three-way handshake for each request? From what I’ve read, it’s because of legacy reasons and pretty much the fact that the web grew up with the protocol. People are just used to it now and are slow to change. Firewalls have been defaulted to accept requests from the standard http port 80 which allows communication to occur between server and browser. There has been a push for an upgrade to the http protocol called HTTP/2 which is working on reducing the constant request overhead by compressing header information, minimizing constant handshaking, stuffing all needed data into a single TCP connection, proactive server responses, amongst other cool things that’s certain to speed up the web even more. The great thing is that modern browsers and web companies are actively trying to support this new protocol in the race to be the fastest browser as well as be able to serve their content as quick as possible.
Rendering the DOM
So we have the address, we have the connection, we sent the request and now we have our index.html file, now all the browser needs to do is show it to you. Since the content-type is an html file, the browser will do what it does best and that’s create the Document Object Model (DOM). The DOM is a constructed object tree created from the nodes in the html file that’s used by the browser to more easily process the current html file. While the browser parses through the html file and creates the DOM it also creates new http GET requests for each extra resource it comes upon such as CSS files, js files, images, tracking tags, Iframed content, and so on. The browser grabs all available CSS files and generates a corresponding CSS Object Model (CSSOM). The CSSOM is applied on the DOM which ultimately creates the Render Tree. The Render Tree notes the dimensions, layout, and location of each document object awaits the draw phase. If we want to continue with the analogies, I guess we can compare the Render Tree to say the final sketch of painting before we color it in permanently on the canvas that is our browser window. The end result is your fully painted index.html homepage.
Of course this how it might have been fifteen years ago on very generic simple pages, but today we’re using a ton of cool technologies that calculate and create custom generated Facebook feeds, selective YouTube recommendations, Netflix movie suggestions, tailored Amazon shopping experiences, all within milliseconds. Today, html pages are rarely ever static, but are influenced by a wide range of things from the movies that you watch to the things that you buy to the sites that you visit. All of this advanced and cool things are still possible just by the simple bits of data that’s passed along with the very basic HTTP requests. We haven’t even got to the ability for sites to dynamically update with the inclusion of JavaScript but you can imagine how that adds another level of interactivity to a web page.
And That’s That
This explanation really is just the start of it. There’s so much more that goes down in each phase that its pretty incredible how something like this can even come together at a speed that’s actually somewhat fast. The arms race for browsers to render even faster and companies to serve their content as fast as possible is well underway and is definitely going to be the focal point of web development in the coming years. Its amazing that even with such a simple communication protocol we’re able to get the web to where it is today, as a marketplace, as entertainment, as a collaborative platform… Communication at the speed of light sure can achieve great things.
Resources
More information about DNS
More information about the Three Way Handshake
More information about Request and Response headers
More information on how browser rendering work
A more simpler version on how browser rendering works