When you sit at the computer and do a search on Google, you get almost immediately a list of results from the Web. What systems used Google to find web pages matching your query, and determine the order of search results?
To put it simply, do a search on the Web is like consulting a large book in which an index telling us exactly the location of each item. When you perform a Google search, their programs check the index to determine which search results are the most relevant and show them to you.
These are the three main processes by which search results are provided:
Crawling: Does Google know your site? Can we find it?
Indexing: we can index your site in Google?
Publication: the site includes useful and quality content that is relevant to the user query?
Crawling is the process by which the Google robot discovers new and updated pages and added to the Google index.
We use a huge amount of computers to fetch (or “crawl”) billions of pages on the Web. The program responsible for fetching is the robot of Google, also known simply as robot or spider. Google’s robot uses an algorithmic process tracking: through software sites to crawl, the frequency and number of pages to be explored in each of them are determined.
The Google tracking process begins with a list of URLs of web pages generated from previous crawl processes and augmented with Sitemap data provided by webmasters. As the Google robot visits each of these websites it detects links on each page and adds them to the list of pages to crawl. New sites, changes in existing and dead links are noted and used to update the Google index.
Google does not accept payment to crawl a site more frequently and separates the search service income generation program AdWords.
Google’s robot processes all the pages it crawls to compile a massive index of all the words that go along with their location on each page. In addition, it also processes the information included on labels and attributes of key content, such as tags “title” and attributes “alt”. The robot Google can process many types of content, but there are certain types that can not process such as the content of some rich media files and dynamic pages.
Publication of results
When a user enters a query, our system searches the index pages that match the query and displays the results considered most relevant to the user. Relevance is determined from more than 200 factors, and one of them is the PageRank rating of a particular page. This parameter represents the importance Google assigns to a page based on the links from other web pages. In other words, each link to a page on your site included elsewhere adds value to your site’s PageRank. Not all links are equal: Google works to improve the service offered to the user identifying fraudulent links and other practices that negatively affect the search results. The best types of links are those created by the quality of content.
Getting your site to rank well in search results pages, it is important to ensure that Google can crawl and index correctly. In our Webmaster Guidelines we outline some best practices to avoid common errors and improve site classification practices.
Did you mean functions and Google Autocomplete are designed to allow users to save time and to this end, they show the related words, common misspellings, and popular queries. Like google.com search results, the keywords they use these functions automatically generated by our web crawlers and search algorithms. Only we show these predictions when we consider that they can save users time. If a site is well classified for a keyword, it’s because we’ve algorithmically determined that its content is more relevant to the user query.