The Web is vast. It's an ever expanding universe of resources. A series of interconnected pages that move beyond any physical boundaries, time zones or languages. Small pockets of which group together like planetary clusters making up mini sites and communities. Some of which are larger than others, some are off the beaten track and others are purposefully hidden in the dark corners of the web.
The arduous task of trying to keep an index, and any resemblance of order for this rapidly expanding universe, comes down to a select few of the wealthiest corporations in the world. Corporations who have the skill, resources and capital to invest in this monumental endeavour. You may recognise household names such as Google, Microsoft and Yahoo amongst the interested parties tackling this issue.
Through statistical analysis and working theory, it is taken for granted that the Internet is expanding at an exponential rate. It is assumed that as access to the Internet increases and the means of publishing websites becomes easier, more and more people are able to contribute.
To put the scale of the web into context, it is estimated that there are currently 4.5 Billion users of the Internet worldwide and over 1.7 Billion websites.
Luckily, it is estimated that around 25% of registered websites are actually live and active. The rest are simply parked or inactive. This still creates a huge indexing problem, however, and Google, the worlds largest indexing engine, recently updated their How Search Works page to reflect that. Here they state that there is a minimum of 130 Trillion pages that they know of.
That's more than 130,000,000,000,000 pages
This means that, as a bare minimum, the web is at least this size. An increase in over 100 Trillion pages since Google first launched their How Search Works web page back in March, 2013.
According to Google, it takes 100 Million Gigabytes of data for them to store the index of just a fraction of these pages. Miraculously, when you run a search query, they manage to parse it, understand it, push it through several filters and past more than 200 other indicators to produce a final set of links that are considered the most relevant to you. What's even more remarkable is that they do all of this within 1/8th of a second. That's some feat of engineering, and perhaps why the overwhelming majority of the worlds Internet users tend to use Google Search to find the things they're after.
HOW DO EFFECTIVE SEO PRACTICES FEATURE IN CRAWLING THE WEB?
So, how does this relate to SEO you may ask. Well, SEO literally stands for 'Search Engine Optimisation'. It's the act of optimising your website ready for being crawled by search spiders. In order to index and rank your website, Google (and other search engines) have to crawl it first, making special note of a whole plethora of indicators that help them assess the quality of your pages and their relevance to the search terms used by the searcher.
As you can imagine, trying to index a fraction of 130 Trillion pages is a huge ask. In order to get through that many pages, concessions have to be made. Concepts like 'Crawl Budget', 'Crawl Frequency' and 'Depth of Crawl' were created to help manage crawling at scale.
For example, one of the many things a reputable SEO expert looks at is a websites' crawability and it's relative 'Crawl Budget' - an amount of time which the search spiders will spend crawling your site before they move on. The spiders will tend to crawl either from the home page onwards, pages indicated by a sitemap file or the pages of your site that have the greatest PageRank (ie, pages that have the largest amount of inbound links pointing to them from other pages). After that, assuming there is any 'budget' left, the spiders will crawl and index the remaining pages. Usually, pages that are deeper in the hierarchy of a site often receive less PageRank and so tend to be crawled less often. By optimising your crawlability and improving things such as the speed of your site, having proper page structure or by adding indicators in sitemaps or meta tags for the search bots, your crawl budget can be maximised. Having an efficient, crawlable website will result in more of your site's content being indexed and hence a better chance of it being discovered in the search results.
After the search engines have indexed your site, they then have to categorise it, quality assess it and finally rank it for the content that it holds. This involves a multifaceted approach, but in relation to Google, half of this is done on the fly. When a user submits a search query into Google, Google will produce a list of relevant results pulled from their Index. They will then run these results through several filters that look at various factors such as Page Rank, Trust, Bad Neighbourhoods, Locality, Topic, Content Quality, Dwell Time, Speed, Mobile Friendliness, Security and many many more. The job of an SEO specialist will be to optimise your website and it's content to account for as many of these filters as is practically possible.
It's not a straight forward task. It can sometimes be difficult and other times extremely time consuming to perform, but optimising your site for Search can mean the difference between winning and losing out to your competition. Search Engines have to show something to their users when they search. If it's not you, it will be your rivals.
By optimising your website for search, you help the Search Engines do their job more effectively. As a reward, your website features more prominently in the results and you benefit from it. Quid pro Quo.
If you'd like to see how we can improve your rankings, talk to one of our friendly and helpful video SEO experts today.