CBPO

SEO

Adjusting Featured Snippet Answers by Context

November 27, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


10 Effective ways to boost click-through rate (CTR) using SERPs

November 25, 2020 No Comments

30-second summary:

  • Search engine ranking pages and algorithms are evolving quickly and you should keep pace with them to succeed.
  • Did you know, 51% of all searches end without a click?
  • Gone are the days when there are only organic text-based results on the page.
  • Today, there are paid listings, zero-click searches, images, videos, maps, featured snippets, people also asked for boxes, and even podcasts that result in dismal click-through rates (CTRs).
  • Branex’s digital marketing strategist, Irfan Ak has created a top 10 list that can boost your CTR in Google SERPs.

If you closely look at the first page of Google for any competitive keyword, you will find tons of elements on it. Gone are the days when there are only organic text-based results on the page. Today, there are images, videos, maps, featured snippets, people also asked for boxes, and even podcasts. Then there are paid listings which are visible on top of organic listings. SEO trends are changing quickly and it is impacting search engine results pages (SERPs). All this translates into declining organic reach, dismal click-through rate (CTR), and the rise of zero-click searches.

In fact, 51% of all searches end without a click. With search engines trying their best to fulfill user needs on search pages itself, fewer users will scroll down and click through your listing as they get the desired answer on the search page.

In this article, you will learn about ten effective ways to boost click-through rate (CTR) using SERPs.

How to increase click-through rate (CTR) - Stats

Source: SparkToro

1. Optimize for featured snippets

The coveted number one spot is no longer the target for digital marketers and digital marketing agencies. The focus has shifted to Position Zero. According to Ahrefs study, 12.3% of search queries have featured snippets. Search engines like Google pull data from the top 10 results to show as a featured snippet. If your blog or website is ranking on the first page of Google, you have an opportunity to grab the featured snippet and boost your visibility.

To do so, you need to understand the purpose of featured snippets. The main reason why search engines show featured snippets is that they want to provide a direct answer to a search query and if your listing does that, you have a bright chance of getting featured on a much sought-after position zero. 

Add featured snippets to increase click-through rate (CTR)

Source: Ahrefs

Secondly, featured snippets are displayed for long-tail keywords or questions-based queries. The focus is usually on offering short and precise answers to the user query and if your listing can do that while optimizing for long-tail keywords, it can rank on featured snippets. 

2. Improve your rankings

According to a study conducted by Backlnko which analyzed 5 million Google search results, moving one spot up can increase your click-through rate by almost 30.8%. Even though, this might vary depending on your current position and the position you have moved to. The same study also found that jumping from 10th position to 7th position did not have the same impact as moving from 6th position to 5th position or 2nd position to 1st position might have on your click-through rate. Instead of striving for ranking on the first page of Google, you should focus on ranking in the top three positions as 75.1% of all clicks go to the top three spots.

Click-through rate (CTR) organic - breakdown stats

Source: Backlinko

3. Write captivating headlines

David Ogilvy, the “Father of Advertising” and Founder of Ogilvy & Mather, once said,

“On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.”

What is the first thing that users will read when they look at your listing? It is the headline. It can literally make it or break it for you. That is why it is important to write attention-grabbing headlines. Add an emotional element to your headline as research has shown that including positive or negative sentiments to your headlines can increase its click-through rate by 7%. Backlinko’s study I referenced above also found that titles that contain 15-40 characters have the highest organic click-through rate.

4. Meta description and URL

Have you ever seen a search result closely? What does it contain? A search engine listing usually comprises of three things

  • Title
  • URL
  • Meta Description

After optimizing your title, you should focus on optimizing your URL and meta description for click-through rate. Add your keyword in the URL as it will increase your clickthrough rate by 45% as compared to URLs that don’t contain the keyword.

Just like the title and URL, add your keyword in the meta description as well. Write a meta description in active voice and try to make it as actionable as possible. Don’t forget to add a call to action to persuade users to click on your listings. Make sure all the pages on your website have a meta description because pages that contain the meta description generate 5.8% more clicks than pages without meta description.

5. Add a schema markup

Search engines use a spider to crawl web pages and create an index of all those pages. The easier it is for search engines to crawl your website, the faster they will crawl your website and more likely your website to get indexed and ranked. By adding schema markup to your website, you can make it easy for search engines to understand what your website is all about and how different pages on your website covers.

There are different types of schema markups and implementing the right kind on your website can do wonders. For example, a review schema markup allows search engines to display ratings in your organic results. If your rating is good, it can increase your credibility, build trust, and help you attract new customers while increasing your click-through rate.

6. Optimize for Google My Business

Do you have a Google My Business page? If your answer is no, then you are missing out. Get your business featured on Google My Business and enter all the business details. Whether it is location-based searches, branded searches, or business-related or service-related searches, Google My Business results tend to show up.

Another advantage of using Google My Business is that it allows you to collect reviews and ratings from customers as well as allows your business to answer user questions. Both can help you build trust and win new customers. The more positive reviews your business has or the higher the rating, the better. It also offers some useful features to customers such as sharing business information with others or contacting the business directly.

7. Run well-targeted PPC ads

One of the best ways to overcome declining organic reach is to invest in PPC ads. Yes, they might be expensive in certain industries and might not work that well in other industries but if you are looking for quick results, PPC ads are your best bet, provided your PPC targeting strategy is on the money. Run PPC ads on branded keywords and prevent others from occupying your ad space. 

When you run PPC ads, it attracts targeted traffic that is more likely to convert into paying customers. This means that it not only increases your click-through rate but also increases your conversion rates too. The key to success with PPC ads is to choose the right ad type according to your industry.

8. Optimize images and videos for SEO

As mentioned before, SERPs are no longer limited to showing organic results anymore. They also show images, videos, and featured snippets to name just a few. What’s even more interesting is the fact that SERPs showing images and videos are slowly but surely increasing in number. This means that you can optimize your images and videos to increase your chances of ranking on these SERPs.

Here are some of the ways you can use to optimize images for SEO.

  • Use targeted keywords in image and video title, description, and alternate text
  • Place the image and video in a section of the page or in content where it best matches the keyword intent
  • Compress large size images and videos
  • Add a caption to images
  • Use common image sizes and optimal image formats

9. Give an irresistible limited time offer

Create a sense of urgency and use tactics such as countdown timer or mention the number of items remaining. When a user sees these things on your page, they are rushed into taking the desired action. Give a limited time offer that your target audience cannot resist, and you will see your clickthrough rate shoot through the roof. Don’t forget to add a call to action that tells users which action they should take next.

10. Optimize social media channel to show up in knowledge panels

Last but certainly not least is to optimize social media pages for knowledge panels. Search engines display these knowledge panels in order to present all your business information in a concise way. As a business, you can use this as an opportunity to connect your social media accounts and let users contact you directly from search engine results pages. For this process to work, all your social media accounts should pass the verification by Google. You can also use schema markup to highlight your social media accounts.

How do you boost your click-through rate using search engine result pages? Let us know in the comments section below.

Irfan Ak is an experienced digital marketing strategist, growth hacker, digital transformation expert at Branex. He can be found on Twitter @irrfanAK.

The post 10 Effective ways to boost click-through rate (CTR) using SERPs appeared first on Search Engine Watch.

Search Engine Watch


Adjusting Featured Snippet Answers by Context

November 23, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Top six reasons you should caption your social media video content

November 21, 2020 No Comments

30-second summary:

  • Video marketing is more than a trend; it’s a must. But most companies are leaving out a key ingredient to ensure customers engage with their videos: captions and subtitles.
  • Captioning videos in English or subtitling them in other languages has been proven to greatly boost the success and accessibility of online video content.
  • Adding captions, subtitles or a transcript to videos allows Google to index the entirety of video content, rather than just indexing the video title.
  • Captions and subtitles ensure videos are accessible by all: those who don’t have their volume on and the 37.5 million Americans who are deaf or hearing impaired.

No matter what industry you’re in, video content is likely part of your marketing strategy. And if it’s not, it should be. According to a report by Cisco, online videos will make up more than 82% of all consumer internet traffic by 2022. And 72% of customers would rather learn about a product or service by video. Even still, videos aren’t some magic token that’ll get you to the next realm of marketing success and customer engagement. The online landscape is crowded, competitive, and moving at lightning speed. You don’t just need users to slow their scroll, you need them to engage. And when it comes to video content, the solution is quite simple, but often overlooked: closed captions.

Captioning your videos in English or subtitling them in other languages will greatly boost the success of your online video content. As a professional captioner and subtitler, I’m here to help you understand why:

1. Google can’t watch videos, but it can crawl captions

If you’re looking to improve your video’s SEO, adding captions is a quick and easy way to do it. Search engines like Google can’t watch your video content, but they can crawl your captions or transcripts and rank your video based on the keywords they find. Although your video will also be indexed for SEO by its title, description, and tags, captions will increase your keyword density and diversity even further. 

Google can't watch videos but can read captions

Next time you’ve got video content creation on the horizon, make sure you incorporate keywords into the script with this tip in mind, as it will pay off when it comes to video performance and SEO results in the long run.

2. Video captions drive more social engagement 

Adding captions to your videos is almost guaranteed to boost engagement, interaction, and conversion. According to a case study by Instapage, call-to-action clicks increased by 25% after they added captions to their Facebook videos. Another study found that captions increase the time viewers spend watching a video by almost 40% and make viewers 80 percent more likely to watch a video through the end. Simply adding captions to video content drives up clicks, overall view time, and view longevity.

3. A lot of people don’t (or can’t) turn on video sound

Have you ever insomnia-scrolled through Facebook for some entertainment while your partner slept soundly next to you? Or decided to take a peek at your feed during a boring class lecture? Or what about when you’re riding the public bus, having a cup of joe at your favorite coffee shop, or dining out solo? In all cases, playing a video aloud is not ideal… or socially acceptable.

Example of how captions support video experience without audio

As much as 85% of Facebook videos are played without sound. That means, if you don’t have captions on your video, it’ll be skipped by anyone watching with the mute button on, which could be a sizable chunk of your target audience. If you want to ensure your followers can view your content no matter where they are when they watch it, then do your part by adding captions. 

4. Captions boost comprehension, memory, and attention

Hundreds of studies have proven that captions improve comprehension of, attention to, and memory of video content. I’m a native English speaker, but my husband is Spanish. To improve his comprehension while watching TV shows and movies in English, we always watch content with the captions on. I was surprised to find that this also improved my comprehension and understanding of the content, and I now watch all video content with subtitles, whether or not my husband is sitting next to me on the couch. Including captions is the best way to ensure your takeaway hits home and leaves its mark on your viewers.  

5. Captions make videos more inclusive and accessible

Over 37.5 million Americans are deaf or have trouble hearing, so video audio serves little to no purpose to this group. And, only 36% of organizations caption all their video content. So why not get on the right side of that number? Without captions, you’re missing out on connecting with a huge audience. But remember, it’s not all about business and money, ensuring your video content is inclusive of all viewers is simply the right thing to do.

6. Most of your viewers likely live outside of your country of origin 

Making your content available worldwide is another way to grow your reach and the impact of your video content. According to YouTube, approximately “two-thirds of a channel’s views come from outside the creator’s home country.” Think about that: a huge portion of your audience might not fully understand your message or recognize your call to action. That’s a deep pool of potential customers you are missing out on.

Look at your analytics, figure out where your viewers live and consider creating subtitles in other languages to reach new markets. Make this a very strategic decision. Quality translation and subtitling are an investment, so you’ll want to make sure you choose the right language(s) to reach the target markets you’re able to serve. 

A word of caution: Resist the urge to DIY your captions and subtitles

While there are free machine translation, transcription, and captioning tools available on the market, take it from me: you don’t want to DIY your video captions. Leave this task to the experts. 

Captioning and subtitling are skills unto themselves, and without training and experience, can be time-consuming and delicate tasks. Captions and subtitles must follow strict rules, including character limits, reading speed, and cue-in and -out times (when the text appears on-screen and when it’s taken off the screen). Poorly timed captions and subtitles are difficult or impossible to read, which defeats the purpose of captioning or subtitling in the first place. 

word of caution on DIY captioning

Use resources such as the American Translators Association Directory (go to “Translation Service(s)” and choose “Dubbing/Subtitling” from the dropdown list) or visit GoSub’s job board to find a professional subtitler or captioner.

Don’t spend tons of time, money, and effort on creating the perfect video and leave out the key ingredient to ensuring your message reaches as many people as possible. Well-captioned and -subtitled content will increase your views, boost engagement and comprehension, and improve the overall success of your social media video content. Plus, for once, this is a quick and easy marketing fix that can make a big impact!

Molly Yurick is a Spanish to English translator, professional captioner, and subtitler. She is also Deputy Chair of Public Relations for the American Translators Association (ATA), which represents more than 10,000 translators and interpreters across 103 countries. 

The post Top six reasons you should caption your social media video content appeared first on Search Engine Watch.

Search Engine Watch


Adjusting Featured Snippet Answers by Context

November 19, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Top strategies behind link building success

November 17, 2020 No Comments

30-second summary:

  • Link building starts from on-page SEO
  • Get your on-page SEO right by working on pillar and cluster pages
  • Blend your on-page success with off-page SEO strategies
  • Link building is not about the quantity 
  • Create a 12-month plan to maximize your off-page SEO success

SEO is more important than ever. As online usage is increasing and consumers’ heads are turning to online purchases, it’s vital to have a strong search presence for your business. Your ranking and authority will determine whether you’ll get more visits from prospective customers. Link building can help you build your ranking and authority by earning valuable links.

Quality is crucial that’s why you need to aim for relevant content that will make your earned link more important.

So how do you get started with off-page SEO?

NO-BS have earned more than 50,000 contextual links after working with 5,000 websites for link building so they launched a helpful guide on how to build a strong off-page SEO strategy.

Let’s look at the takeaways.

Content created in partnership with NO-BS.

Start with your on-page strategy

You can’t work on your off-page strategy without building your on-page SEO strategy first.

Your site should be optimized for the best results so that you make the most of the links that you’ll learn.

Thus, you should prioritize working on quality content, target keywords, page speed, mobile optimization, and metadata among others.

User experience is very important. You can’t rank high on SERPs without providing the best experience to all users.

Here are a few ideas to get started with on-page SEO.

Create pillar and cluster pages

The pillar and cluster model was first introduced by HubSpot several years ago. It’s an extremely helpful way to organize your content based on the main themes and topics to make your SEO strategy more structured.

For example, if you’re working for a branding agency, a pillar page could be about ‘brand building’ and a cluster article could be ‘5 tips to be authentic when building your brand’

Think of the pillar pages that define your business and start organizing your current content based on clusters that fit in the pillar pages.

Working on the pillar and cluster pages can help you both with on-page and off-page SEO as you are building your authority through relevant content.

Build themed link funnels

Once you work on your pillars, it’s time to build your themed link funnels.

Link funnels can help you build topical authority starting from your pillars and moving it up to your conversion pages. It’s the best way to link your on-page and off-page strategy to get the best results out of them.

Now it’s time to build different tiers of your links that you can also use in your future content.

First of all, start by building your pillar pages. Think of the main themes that define your site’s content. Your pillar pages should be the pages that you ultimately want to lead everyone to.

It’s the page that will bring everyone closer to any sort of conversion. You can include testimonials, social proof, details of your services, etc.

The next step is to work on your content clusters.

Your content clusters should be pages that are somehow relevant to your pillar page. They are not as broad as your pillar page but they still have a connection to it.

All your content clusters should link to your pillar page.

Your links should be in two to three tiers to help you organize your content and they should be somehow linked back to your cluster page.

This is a useful strategy to blend on-page and off-page optimization.

Once you have your themed links ready, start exploring new ways to diversify your content and your links.

Work on the anchors, experiment with different types of content and always make sure you create unique content.

Now it’s time to improve your off-page SEO strategy.

Create a 12-month off-page strategy

The best way to build your off-page strategy is to work on a 12-month plan. You have more chances to be successful once you document your strategy.

NOBS have created a template with some examples of content and ideas on how to use the spreadsheet.

Here are a few of the ideas to consider.

Identify your top keywords

Start by identifying your most important keywords. You should sort them out in clusters as part of your on-page strategy.

Add each keyword to a pillar and explore the keyword difficulty for each one of them.

Come up with blog ideas for your pillar pages

Work on blog title ideas that could match your pillar pages. This way you are creating the URL for each page.

Add more topics to the relevant pillar to make it easier when creating new clusters.

Create a plan for your links

Set up a plan for the content you’ll create for the pillar pages. There’s no need to create all your posts at once! Expand your plan to the tier 1 and tier 2 links for each of your clusters.

Don’t make your strategy too complicated, you can focus on one cluster at a time and less competitive keywords.

Pay attention to the anchor text

Be careful with your anchor text. Make sure that the text you choose is relevant to the link you use on each cluster article.

You don’t need to aim for exact-match keywords, it’s better for the link to be naturally placed in the text.

Track the links you’ve earned

Keep track of the number of links that you’ve earned. It makes it easier for you and your team to keep track of your progress.

There’s no need to aim for large volume links at once. It’s better to start gradually with 1-2 links per month.

To find out more about the tactics to boost your off-page strategy, download the guide ‘How to design a strong off-page SEO strategy’ here.

The post Top strategies behind link building success appeared first on Search Engine Watch.

Search Engine Watch


Adjusting Featured Snippet Answers by Context

November 15, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Adjusting Featured Snippet Answers by Context

November 13, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Adjusting Featured Snippet Answers by Context

November 11, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Adjusting Featured Snippet Answers by Context

November 9, 2020 No Comments

How Are Featured Snippet Answers Decided Upon?

I recently wrote about Featured Snippet Answer Scores Ranking Signals. In that post, I described how Google was likely using query dependent and query independent ranking signals to create answer scores for queries that were looking like they wanted answers.

One of the inventors of that patent from that post was Steven Baker. I looked at other patents that he had written, and noticed that one of those was about context as part of query independent ranking signals for answers.

Remembering that patent about question-answering and context, I felt it was worth reviewing that patent and writing about it.

This patent is about processing question queries that want textual answers and how those answers may be decided upon.

it is a complicated patent, and at one point the description behind it seems to get a bit murky, but I wrote about when that happened in the patent, and I think the other details provide a lot of insight into how Google is scoring featured snippet answers. There is an additional related patent that I will be following up with after this post, and I will link to it from here as well.

This patent starts by telling us that a search system can identify resources in response to queries submitted by users and provide information about the resources in a manner that is useful to the users.

How Context Scoring Adjustments for Featured Snippet Answers Works

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources, like in this drawing from the patent, showing featured snippet answers:

featured snippet answers

For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc.

When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query.

For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question, which is often a featured snippet answer.

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.”

For example, for the question query [why is the sky blue], an answer explaining light as waves is helpful.

featured snippet answers - why is the sky blue

Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer.

Sections of the text are scored, and the section with the best score is selected as an answer.

In general, the patent tells us about one aspect of what it covers in the following process:

  • Receiving a query that is a question query seeking an answer response
  • Receiving candidate answer passages, each passage made of text selected from a text section subordinate to a heading on a resource, with a corresponding answer score
  • Determining a hierarchy of headings on a page, with two or more heading levels hierarchically arranged in parent-child relationships, where each heading level has one or more headings, a subheading of a respective heading is a child heading in a parent-child relationship and the respective heading is a parent heading in that relationship, and the heading hierarchy includes a root level corresponding to a root heading (for each candidate answer passage)
  • Determining a heading vector describing a path in the hierarchy of headings from the root heading to the respective heading to which the candidate answer passage is subordinate, determining a context score based, at least in part, on the heading vector, adjusting the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score
  • Selecting an answer passage from the candidate answer passages based on the adjusted answer scores

Advantages of the process in the patent

  1. Long query answers can be selected, based partially on context signals indicating answers relevant to a question
  2. The context signals may be, in part, query-independent (i.e., scored independently of their relatedness to terms of the query
  3. This part of the scoring process considers the context of the document (“resource”) in which the answer text is located, accounting for relevancy signals that may not otherwise be accounted for during query-dependent scoring
  4. Following this approach, long answers that are more likely to satisfy a searcher’s informational need are more likely to appear as answers

This patent can be found at:

Context scoring adjustments for answer passages
Inventors: Nitin Gupta, Srinivasan Venkatachary , Lingkun Chu, and Steven D. Baker
US Patent: 9,959,315
Granted: May 1, 2018
Appl. No.: 14/169,960
Filed: January 31, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for context scoring adjustments for candidate answer passages.

In one aspect, a method includes scoring candidate answer passages. For each candidate answer passage, the system determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading to which the candidate answer passage is subordinate; determines a context score based, at least in part, on the heading vector; and adjusts answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The system then selects an answer passage from the candidate answer passages based on the adjusted answer scores.

Using Context Scores to Adjust Answer Scores for Featured Snippets

A drawing from the patent shows different hierarchical headings that may be used to determine the context of answer passages that may be used to adjust answer scores for featured snippets:

Hierarchical headings for featured snippets

I discuss these headings and their hierarchy below. Note that the headings include the Page title as a heading (About the Moon), and the headings within heading elements on the page as well. And those headings give those answers context.

This context scoring process starts with receiving candidate answer passages and a score for each of the passages.

Those candidate answer passages and their respective scores are provided to a search engine that receives a query determined to be a question.

Each of those candidate answer passages is text selected from a text section under a particular heading from a specific resource (page) that has a certain answer score.

For each resource where a candidate answer passage has been selected, a context scoring process determines a heading hierarchy in the resource.

A heading is text or other data corresponding to a particular passage in the resource.

As an example, a heading can be text summarizing a section of text that immediately follows the heading (the heading describes what the text is about that follows it, or is contained within it.)

Headings may be indicated, for example, by specific formatting data, such as heading elements using HTML.

This next section from the patent reminded me of an observation that Cindy Krum of Mobile Moxie has about named anchors on a page, and how Google might index those to answer a question, to lead to an answer or a featured snippet. She wrote about those in What the Heck are Fraggles?

A heading could also be anchor text for an internal link (within the same page) that links to an anchor and corresponding text at some other position on the page.

A heading hierarchy could have two or more heading levels that are hierarchically arranged in parent-child relationships.

The first level, or the root heading, could be the title of the resource.

Each of the heading levels may have one or more headings, and a subheading of a respective heading is a child heading and the respective heading is a parent heading in the parent-child relationship.

For each candidate passage, a context scoring process may determine a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

The context scoring process could be used to determine the context score and determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The context score could be based, at least in part, on the heading vector.

The context scoring process can then adjust the answer score of the candidate answer passage at least in part by the context score to form an adjusted answer score.

The context scoring process can then select an answer passage from the candidate answer passages based on adjusted answer scores.

This flowchart from the patent shows the context scoring adjustment process:

context scoring adjustment flowchart

Identifying Question Queries And Answer Passages

I’ve written about understanding the context of answer passages. The patent tells us more about question queries and answer passages worth going over in more detail.

Some queries are in the form of a question or an implicit question.

For example, the query [distance of the earth from the moon] is in the form of an implicit question “What is the distance of the earth from the moon?”

An implicit question - the distance from the earth to the moon

Likewise, a question may be specific, as in the query [How far away is the moon].

The search engine includes a query question processor that uses processes that determine if a query is a query question (implicit or specific) and if it is, whether there are answers that are responsive to the question.

The query question processor can use several different algorithms to determine whether a query is a question and whether there are particular answers responsive to the question.

For example, it may use to determine question queries and answers:

  • Language models
  • Machine learned processes
  • Knowledge graphs
  • Grammars
  • Combinations of those

The query question processor may choose candidate answer passages in addition to or instead of answer facts. For example, for the query [how far away is the moon], an answer fact is 238,900 miles. And the search engine may just show that factual information since that is the average distance of the Earth from the moon.

But, the query question processor may choose to identify passages that are to be very relevant to the question query.

These passages are called candidate answer passages.

The answer passages are scored, and one passage is selected based on these scores and provided in response to the query.

An answer passage may be scored, and that score may be adjusted based on a context, which is the point behind this patent.

Often Google will identify several candidate answer passages that could be used as featured snippet answers.

Google may look at the information on the pages where those answers come from to better understand the context of the answers such as the title of the page, and the headings about the content that the answer was found within.

Contextual Scoring Adjustments for Featured Snippet Answers

The query question processor sends to a context scoring processor some candidate answer passages, information about the resources from which each answer passages was from, and a score for each of the featured snippet answers.

The scores of the candidate answer passages could be based on the following considerations:

  • Matching a query term to the text of the candidate answer passage
  • Matching answer terms to the text of the candidate answer passages
  • The quality of the underlying resource from which the candidate answer passage was selected

I recently wrote about featured snippet answer scores, and how a combination of query dependent and query independent scoring signals might be used to generate answer scores for answer passages.

The patent tells us that the query question processor may also take into account other factors when scoring candidate answer passages.

Candidate answer passages can be selected from the text of a particular section of the resource. And the query question processor could choose more than one candidate answer passage from a text section.

We are given the following examples of different answer passages from the same page

(These example answer passages are referred to in a few places in the remainder of the post.)

  • (1) It takes about 27 days (27 days, 7 hours, 43 minutes, and 11.6 seconds) for the Moon to orbit the Earth at its orbital distance
  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Each of those answers could be good ones for Google to use. We are told that:

More than three candidate answers can be selected from the resource, and more than one resource can be processed for candidate answers.

How would Google choose between those three possible answers?

Google might decide based on the number of sentences and a selection of up to a maximum number of characters.

The patent tells us this about choosing between those answers:

Each candidate answer has a corresponding score. For this example, assume that candidate answer passage (2) has the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1). Thus, without the context scoring processor, candidate answer passage (2) would have been provided in the answer box of FIG. 2. However, the context scoring processor takes into account the context of the answer passages and adjusts the scores provided by the query question processor.

So, we see that what might be chosen based on featured snippet answer scores could be adjusted based on the context of that answer from the page that it appears on.

Contextually Scoring Featured Snippet Answers

This process starts which begins with a query determined to be a question query seeking an answer response.

This process next receives candidate answer passages, each candidate answer passage chosen from the text of a resource.

Each of the candidate answer passages are text chosen from a text section that is subordinate to a respective heading (under a heading) in the resource and has a corresponding answer score.

For example, the query question processor provides the candidate answer passages, and their corresponding scores, to the context scoring processor.

A Heading Hierarchy to Determine Context

This process then determines a heading hierarchy from the resource.

The heading hierarchy would have two or more heading levels hierarchically arranged in parent-child relationships (Such as a page title, and an HTML heading element.)

Each heading level has one or more headings.

A subheading of a respective heading is a child heading (an (h2) heading might be a subheading of a (title)) in the parent-child relationship and the respective heading is a parent heading in the relationship.

The heading hierarchy includes a root level corresponding to a root heading.

The context scoring processor can process heading tags in a DOM tree to determine a heading hierarchy.

hierarchical headings for featured snippets

For example, concerning the drawing about the distance to the moon just above, the heading hierarchy for the resource may be:

The ROOT Heading (title) is: About The Moon (310)

The main heading (H1) on the page

H1: The Moon’s Orbit (330)

A secondary heading (h2) on the page:

H2: How long does it take for the Moon to orbit Earth? (334)

Another secondary heading (h2) on the page is:

H2: The distance from the Earth to the Moon (338)

Another Main heading (h1) on the page

H1: The Moon (360)

Another secondary Heading (h2) on the page:

H2: Age of the Moon (364)

Another secondary heading (h2) on the page:

H2: Life on the Moon (368)

Here is how the patent describes this heading hierarchy:

In this heading hierarchy, The title is the root heading at the root level; headings 330 and 360 are child headings of the heading, and are at a first level below the root level; headings 334 and 338 are child headings of the heading 330, and are at a second level that is one level below the first level, and two levels below the root level; and headings 364 and 368 are child headings of the heading 360 and are at a second level that is one level below the first level, and two levels below the root level.

The process from the patent determines a context score based, at least in part, on the relationship between the root heading and the respective heading to which the candidate answer passage is subordinate.

This score may be is based on a heading vector.

The patent says that the process, for each of the candidate answer passages, determines a heading vector that describes a path in the heading hierarchy from the root heading to the respective heading.

The heading vector would include the text of the headings for the candidate answer passage.

For the example candidate answer passages (1)-(3) above about how long it takes the moon to orbit the earch, the respectively corresponding heading vectors V1, V2 and V3 are:

  • V1=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: How long does it take for the Moon to orbit the Earth?]>
  • V2=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>
  • V3=<[Root: About The Moon], [H1: The Moon's Orbit], [H2: The distance from the Earth to the Moon]>

We are also told that because candidate answer passages (2) and (3) are selected from the same text section 340, their respective heading vectors V2 and V3 are the same (they are both in the content under the same (H2) heading.)

The process of adjusting a score, for each answer passage, uses a context score based, at least in part, on the heading vector (410).

That context score can be a single score used to scale the candidate answer passage score or can be a series of discrete scores/boosts that can be used to adjust the score of the candidate answer passage.

Where things Get Murky in This Patent

There do seem to be several related patents involving featured snippet answers, and this one which targets learning more about answers from their context based on where they fit in a heading hierarchy makes sense.

But, I’m confused by how the patent tells us that one answer based on the context would be adjusted over another one.

The first issue I have is that the answers they are comparing in the same contextual area have some overlap. Here those two are:

  • (2) Why is the distance changing? The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles
  • (3) The moon’s distance from Earth varies because the moon travels in a slightly elliptical orbit. Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles

Note that the second answer and the third answer both include the same line: “Thus, the moon’s distance from the Earth varies from 225,700 miles to 252,000 miles.” I find myself a little surprised that the second answer includes a couple of sentences that aren’t in the third answer, and skips a couple of lines from the third answer, and then includes the last sentence, which answers the question.

Since they both appear in the same heading and subheading section of the page they are from, it is difficult to imagine that there is a different adjustment based on context. But, the patent tells us differently:

The candidate answer score with the highest adjusted answer score (based on context from the headings) is selected, and the answer passage.

Recall that in the example above, the candidate answer passage (2) had the highest score, followed by candidate answer passage (3), and then by candidate answer passage (1).

However, after adjustments, candidate answer passage (3) has the highest score, followed by candidate answer passage (2), and then-candidate answer passage (1).

Accordingly, candidate answer passage (3) is selected and provided as the answer passage of FIG. 2.

Boosting Scores Based on Passage Coverage Ratio

A query question processor may limit the candidate answers to a maximum length.

The context scoring processor determines a coverage ratio which is a measure indicative of the coverage of the candidate answer passage from the text from which it was selected.

The patent describes alternative question answers:

Alternatively, the text block may include text sections subordinate to respective headings that include a first heading for which the text section from which the candidate answer passage was selected is subordinate, and sibling headings that have an immediate parent heading in common with the first heading. For example, for the candidate answer passage, the text block may include all the text in the portion 380 of the hierarchy; or may include only the text of the sections, of some other portion of text within the portion of the hierarchy. A similar block may be used for the portion of the hierarchy for candidate answer passages selected from that portion.

A small coverage ratio may indicate a candidate answer passage is incomplete. A high coverage ratio may indicate the candidate answer passage captures more of the content of the text passage from which it was selected. A candidate answer passage may receive a context adjustment, depending on this coverage ratio.

A passage coverage ratio is a ratio of the total number of characters in the candidate answer passage to the ratio of the total number of characters in the passage from which the candidate answer passage was selected.

The passage cover ratio could also be a ratio of the total number of sentences (or words) in the candidate answer passage to the ratio of the total number of sentences (or words) in the passage from which the candidate answer passage was selected.

We are told that other ratios can also be used.

From the three example candidate answer passages about the distance to the moon above (1)-(3) above, passage (1) has the highest ratio, passage (2) has the second-highest, and passage (3) has the lowest.

This process determines whether the coverage ratio is less than a threshold value. That threshold value can be, for example, 0.3, 0.35 or 0.4, or some other fraction. In our “distance to the moon” example, each coverage passage ratio meets or exceeds the threshold value.

If the coverage ratio is less than a threshold value, then the process would select a first answer boost factor. The first answer boost factor might be proportional to the coverage ratio according to a first relation, or maybe a fixed value, or maybe a non-boosting value (e.g., 1.0.)

But if the coverage ratio is not less than the threshold value, the process may select a second answer boost factor. The second answer boost factor may be proportional to the coverage ratio according to a second relation, or maybe fixed value, or maybe a value greater than the non-boosting value (e.g., 1.1.)

Scoring Based on Other Features

The context scoring process can also check for the presence of features in addition to those described above.

Three example features for contextually scoring an answer passage can be based on the additional features of the distinctive text, a preceding question, and a list format.

Distinctive text

Distinctive text is the text that may stand out because it is formatted differently than other text, like using bolding.

A Preceeding Question

A preceding question is a question in the text that precedes the candidate answer question.

The search engine may process various amounts of text to detect for the question.

Only the passage from which the candidate answer passage is extracted is detected.

A text window that can include header text and other text from other sections may be checked.

A boost score that is inversely proportional to the text distance from a question to the candidate answer passage is calculated, and the check is terminated at the occurrence of a first question.

That text distance may be measured in characters, words, or sentences, or by some other metric.

If the question is anchor text for a section of text and there is intervening text, such as in the case of a navigation list, then the question is determined to only precede the text passage to which it links, not precede intervening text.

In the drawing above about the moon, there are two questions in the resource: “How long does it take for the Moon to orbit Earth?” and “Why is the distance changing?”

The first question–“How long does it take for the Moon to orbit Earth?”– precedes the first candidate answer passage by a text distance of zero sentences, and it precedes the second candidate answer passage by a text distance of five sentences.

And the second question–“Why is the distance changing?”– precedes the third candidate answer by zero sentences.

If a preceding question is detected, then the process selects a question boost factor.

This boost factor may be proportional to the text distance, whether the text is in a text passage subordinate to a header or whether the question is a header, and, if the question is in a header, whether the candidate answer passage is subordinate to the header.

Considering these factors, the third candidate answer passage receives the highest boost factor, the first candidate answer receives the second-highest boost factor, and the second candidate answer receives the smallest boost factor.

Conversely, if the preceding text is not detected, or after the question boost factor is detected, then the process detects for the presence of a list.

The Presence of a List

A list is an indication of several steps usually instructive or informative. The detection of a list may be subject to the query question being a step modal query.

A step modal query is a query where a list-based answer is likely to a good answer. Examples of step model queries are queries like:

  • [How to . . . ]
  • [How do I . . . ]
  • [How to install a door knob]
  • [How do I change a tire]

The context scoring process may detect lists formed with:

  • HTML tags
  • Micro formats
  • Semantic meaning
  • Consecutive headings at the same level with the same or similar phrases (e.g., Step 1, Step 2; or First; Second; Third; etc.)

The context scoring process may also score a list for quality.

It would look at things such as:

  • A list in the center of a page, which does not include multiple links to other pages (indicative of reference lists)
  • HREF link text that does not occupy a large portion of the text of the list will be of higher quality than a list at the side of a page, and which does include multiple links to other pages (which are indicative of reference lists), and/are has HREF link text that does occupy a large portion of the text of the list

If a list is detected, then the process selects a list boost factor.

That list boost factor may be fixed or may be proportional to the quality score of the list.

If a list is not detected, or after the list boost factor is selected, the process ends.

In some implementations, the list boost factor may also be dependent on other feature scores.

If other features, such as coverage ratio, distinctive text, etc., have relatively high scores, then the list boot factor may be increased.

The patent tells us that this is because “the combination of these scores in the presence of a list is a strong signal of a high-quality answer passage.”

Adjustment of Featured Snippet Answers Scores

Answer scores for candidate answer passages are adjusted by scoring components based on heading vectors, passage coverage ratio, and other features described above.

The scoring process can select the largest boost value from those determined above or can select a combination of the boost values.

Once the answer scores are adjusted, the candidate answer passage with the highest adjusted answer score is selected as the featured snippet answer and is displayed to a searcher.

More to Come

I will be reviewing the first patent in this series of patents about candidate answer scores because it does have some additional elements to it that haven’t been covered in this post, and the post about query dependent/independent ranking signals for answer scores. If you have been paying attention to how Google has been answering queries that appear to be seeking answers, you have likely seen those improving in many cases. Some answers have been really bad though. It will be nice to have as complete an idea as we can of how Google decides what might be a good answer to a query, based on information available to them on the Web.

Added October 14, 2020 – I have written about another Google patent on Answer Scores, and it’s worth reading about all of the patents on this topic. The new post is at Weighted Answer Terms for Scoring Answer Passages, and is about the patent Weighted answer terms for scoring answer passages.

It is about identifying questions in resources, and answers for those questions, and describes using term weights as a way to score answer passages (along with the scoring approaches identified in the other related patents, including this one.)

Added October 15, 2020 – I have written a few other posts about answer passages that are worth reading if you are interested in how Google finds questions on pages and answers to those, and scores answer passages to determine which ones to show as featured snippets. I’ve linked to some of those in the body of this post, but here is another one of those posts:

Added October 22, 2020, I have written up a description of details from about how structured and unstructured data has been selected for answer passages based on specific criteria in the patent on Scoring Answer passages in the post Selecting Candidate Answer Passages.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Adjusting Featured Snippet Answers by Context appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓