Crawling
Trawler: This system is responsible for visiting and collecting data from web pages. It keeps track of which pages to visit, how often to visit them, and how frequently the content changes.
Indexing
Alexandria: The core system that stores and organizes the collected web pages into an index so that they can be quickly retrieved during a search.
SegIndexer: This system categorizes documents into different tiers within the index, likely based on their relevance and importance.
TeraGoogle: A secondary system that manages documents stored for the long term on disk, ensuring they are still accessible even if they are not frequently accessed.
Rendering
HtmlrenderWebkitHeadless: A system that processes web pages with JavaScript to understand and index their content correctly, even when dynamic content is involved.
Processing
LinkExtractor: Extracts and identifies links from web pages, understanding the relationships between different pages.
WebMirror: Manages the process of recognizing and handling duplicate content and deciding which version of a page is the primary one (canonicalization).
Ranking
Mustang: The main system for scoring and ranking web pages to determine their order in search results.
Ascorer: The primary algorithm that initially ranks pages before any adjustments are made.
NavBoost: Adjusts rankings based on user behavior, such as clicks and interactions with search results.
FreshnessTwiddler: Modifies rankings based on the freshness of the content, favoring more recently updated pages.
WebChooserScorer: Determines the feature names used in evaluating snippets for search results.
Serving
Google Web Server (GWS): The server that interfaces with Google’s frontend, delivering the search results to users.
SuperRoot: The central system that coordinates the processing and presentation of search results.
SnippetBrain: Generates the snippets (short descriptions) shown in search results.
Glue: Combines different types of search results (e.g., web, images, videos) using user behavior data.
Cookbook: Generates various signals that influence ranking, potentially created at runtime.
Key Ranking Signals Explained
- Click and User Behavior Signals
- Clicks and Post-Click Behavior: Measures like last good click, longest click, and user interactions are tracked to understand which results are most helpful.
- NavBoost: Uses click data to adjust rankings based on how users interact with search results.
- Good and Bad Clicks: Tracks positive and negative clicks to refine rankings.
- Content and Site Quality
- Content Decay: Pages that no longer attract clicks may decrease in ranking.
- Site Authority: Measures the overall credibility and importance of a website.
- Original Content Score: Assesses the originality of short content pieces.
- TitleMatchScore: Evaluates how well the page title matches the user’s search query.
- PageRank and SiteAuthority: Initially used to rank new pages before they have accumulated their own ranking data.
- Demotions and Penalties
- Anchor Mismatch: Demotes links where the link text does not accurately reflect the linked page.
- SERP Demotion: Reduces ranking for pages that perform poorly in search results based on user dissatisfaction.
- Nav Demotion: Penalizes pages with poor navigation or user experience.
- Exact Match Domains Demotion: Reduces the ranking boost previously given to domains that exactly match search queries.
- Product Review Demotion: Lowers rankings for poor-quality product reviews.
- Location Demotions: Demotes global pages when they are less relevant to local searches.
- Porn Demotions: Explicit content is ranked lower.
- Link Spam Velocity: Identifies and mitigates spikes in spammy links.
- Content Freshness
- Dates and Freshness: Uses various date signals (publication date, extracted date, date within the content) to prioritize fresh content.
- Link Analysis
- Homepage Trust and PageRank: Uses the trust and PageRank of the homepage to influence the ranking of other pages on the site.
- Link Velocity and Diversity: Measures the speed and variety of new links to detect spam or manipulative behavior.
- Dropped Local Anchor Count: Some internal links might be disregarded in ranking calculations.
- Special Signals
- Twiddlers: These are post-ranking adjustments that fine-tune the final search results.
- Embeddings and Semantic Matching: Uses advanced language models to understand and compare the content of pages and how they relate to each other.