WHAT HAPPENS WHEN YOU TYPE HTTPS://WWW.GOOGLE.COM
- Collins Kim
- May 11, 2024
- 7 min read
Updated: May 12, 2024
Everyday we usually pick up our devices and key in a requests on our favorite browsers with intend of accessing internet resources such as web pages, documents or images. Today, I will ponder the intricates of what happens when you input a query(www.google.com) within our browsers search bar and hit enter.
LEGEND
Before we proceed, it is wise to shed light on some key terms and abbreviations that will be used are referenced within the document.
Internet - collection of interconnected computers working coherently to produce seamless communication.
Domain Name - The actual name of your website.
DNS (Domain Name System) - The system that maintains and maps the domain name of your website on the server somewhere in the internet.
Uniform Resource Locator (URL) - This is the unique address that is used to identify a resource over the internet eg google.com/landing_page. The landing page here is contained within a directory within a google server.
google.com (Domain name) is the readable format of a website that was established by the various engineers with the intend of prompting us not to remember Internet Protocol (IP) addresses mapped to our website. It's equivalent IP address could be for IPv4: 8.8.8.8 and/or 8.8.4.4 . For IPv6: 2001:4860:4860::8888 and/or 2001:4860:4860::8844. Trying to grasp these number sets off-head will truly seems daunting for us humans.
Internet Protocol (IP) is a set of rules to be followed for smooth communication regardless of the different underlying technologies
An IP address is a unique number what identifies devices within a network, which can be either IPv4 or IPv6 addresses as seen previously.
With the above well stated and out of the way, I believe we can proceed without any discrepancies. There are specific steps followed after pressing enter, and all are stated below.
DNS Look-up
TCP/IP
SSL Handshake
HTTPS request
Server-Side operations
DNS LOOK-UP
When you type a domain name and hit enter, and it's the 1st time the DNS is queried within the device, the look-up is initially taken to the DNS resolver, whose prime objective is to resolve to the root servers.
The root servers are just but of definite numbers located across different zones, but there are distinct 13 IP addresses that are used to query them. The main role of the root servers is to direct the request to the Top Level Domain (TLD) server.
A TLD nameserver maintains information for all the domain names that share a common domain extension, such as .com, . net, or whatever comes after the last dot in a URL. For example, a .com TLD nameserver contains information for every website that ends in '.com', and in our case, will be the server of choice.
TLD nameservers then map to the Authoritative Server, which contains the actual IP address of interest. The IP address is usually identified within NS record, one of the many DNS records that server a variety of functions to DNS resolution.
DNS records - These are text instructions stored on DNS servers. They indicate the IP address associated with a domain and can provide other information as well. eg A record, AAAA records, C-NAME record etc.
From here, the IP address is then routed back to the client computer that queried it. This whole retrieval process is usually expensive if the user had to repeatedly sort for the website within short range time frames, and that's where caching comes in place.
Instead of repeating the whole DNS-lookup process, the IP address will be stored within the browser cache, or the operating system cache (L1,L2 or L3). This will ensure that retrieval costs are reduced thus increasing efficiency.
Transfer Control Protocol/Internet Protocol (TCP/IP)
By now, our endpoint device by now has the address for which it is interested, and now, it is all a matter of establishing a proper channels for communication. There exists a set of protocols suite that make this all possible, namely TCP/IP protocol suite.
This suite ideally is what build up to the Open Systems Interconnection (OSI Model), a reference model used to act as a blueprint for establishing systems that are meant to connect and communicate with others.
TCP/IP usually operates under the client/server model in which a remote machine somewhere offers is services to clients intended to connect to it. Collectively, the TCP/IP suite of protocols is classified as stateless, which means each client request is considered new because it's unrelated to previous requests. Being stateless frees up network paths so they can be used continuously.
The transport layer itself, however, is stateful. It transmits a single message and its connection remains in place until all the packets in a message have been received and reassembled at the destination.
The 4 layers of TCP/IP model
Network Link Layer: This usually entails interconnection of nodes within the physical layer and ensure proper pipeline between the nodes.
Internet Layer: Connects independent network for proper routing of packets
Transport Layer: Ensures a streamline delivery of packets from different endpoints, with the help of TCP and UDP
Application Layer: It basically exposes the users to the underlying services with the help of HTTP, HTTPS, FTP, DHCP etc
Now we are ready to get to the juicy part of things. With access to the IP, we are able now to request services from google, and with that, a request has to be generated. It will prompt for a GET request, of which will contain various headers. This will then break down the packets ready to be transmitted over TCP through the internet to the destination address.
Before the packets can be delivered, a proper channel has to be established between source and destination, and for that, a set of procedures have to be followed prior to connection.
Website Validation
Modern Websites have to verify their authenticity so that users can trust their resources and
interact with their services. This is achieved by deploying the use of the more modern Secure Socket Layer (SSL) certificates which leverages Secure Socket Layer (SSL) protocol. These ensures that the actual vendor of services is a genuine company trusted and registered as a licensed establishment. The client can then establish a connection by generating a SSL handshake before any request can be made.
SSL Handshake process
Every time a user want to interact with the website, the browser in use will be requesting for the SSL certificate which will be sent with a public key (one of the key pairs that will be used to encrypt the data).
Afterwards, a set of secrets will be shared between them and the end process will result to a secure tunnel where data is encrypted and sent, and without the private key, it cannot be deciphered. This makes it secure even in the event that hackers get ahold of the encrypted packets.
HTTPS request
Web technologies develop everyday, and they grow at increasing numbers everyday, but some of these sites aren't pretty much deploying safer channels between infrastructure. This exposes the traffic travelling from and to their components to malicious personas who obtain delicate information such as passwords travelling within the network. With that, there had to be some form of secure channels established, thus the emergence of HTTPS.
HTTPS is a secure version of HTTP in that it ensures an encrypted channel to be established, where data is made inaccessible with the help of cryptographic keys. This form of infrastructure is widely referred to as public key infrastructure, of which entails using a set of keys(public and private key) generated to ensure information isn't tampered with. This works hand in hand with SSL, as we saw previously above to ensure save haven for data
Most websites these days adopt HTTPS over HTTP as a means of creating validation to the user that their private information is kept secure, as well as to maintain authenticity.
TCP/ UDP
After a secure pipeline is developed between endpoints, information then after can be broken down into packets and can then be transmitted safely within a definite order. Packets are usually sent out with the intent that they will arrive at the destination safely, but that is not usually the case. Some packets are at times lost and it the duty of the underlying system to ensure the order of packets are received and reconstructed accordingly if needed.
TCP generally ensure that all packets sent are received accordingly, and ensures that lost packets are resent so that the information received on both end makes sense.
UDP contradicts this theory. It operates by sending the packets to the destination without regards to whether the packets were received or not. It might seem useless to use UDP, but it serves a greater purpose in some use-cases, such as online gaming.
SERVER - SIDE OPERATIONS
At the moment, connection is set about and people can be able to make requests and generate responses accordingly in a secure fashion. But what if two, five, million requests are made to the site? Traffic to the site has to be handled in a way that the whole system remains live throughout while taking optimum time to do so. With that, it also kin to mention a few technologies affiliated to servers.
Server - This is basically a computer with high processing power, that offers it services to clients trying to connect to it
Web Server - It served static content such as web pages in HTML form. The response generated from the request by default will include the search box page in which you can use to search other resources within the World Wide Web
Application Server - Usually exposes the business logic through a communication protocol, usually in-charge of delivering dynamic content.
Load Balancer - This is a server whose main purpose is to direct traffic to the appropriate servers so as to maintain full-time connectivity.
Of course google has the capabilities to maintain full availability and attain redundancy to curb single point of failure (SPOF) incase of emerging event, but still, there are cases where hackers of ill intend tend to cause damage to infrastructure by using a variety of means such as using malicious bots.
Firewalls are thus a preventive measure to ensure the wrong traffic is deterred from accessing the critical infrastructure. It's main functionality is to filter out packets going in and out to prevent access or attack events indefinitely.
It is without say that google servers are also configured accordingly so as to offer maximum services as well as ensuring everything is done securely. This includes:
Redirecting HTTP traffic to HTTPS
Configuring servers with SSL certificates
Using appropriate balancing algorithms to redirect traffic
Proper database configuration to ensure it's free from attacks like SQL injections
In conclusion, this is a quick overview of what happens when using web browsers to search for resources. There might be other left out key topics, but take this as the fundamentals and build up to gain more insights later on.
Comments