CBPO

SEO

How would Google Answer Vague Questions in Queries?

July 18, 2019 No Comments

“How Long is Harry Potter?” is asked in a diagram from a Google Patent. The answer is unlikely to do with a dimension related to the fictional character but may have something to do with one of the best selling books featuring Harry Potter as a main Character.

When questions are asked as queries at Google, sometimes they aren’t asked clearly, with enough preciseness to make an answer easy to provide. How do vague questions get answered?

Question answering seems to be a common topic in Google Patents recently. I wrote about one not long ago in the post, How Google May Handle Question Answering when Facts are Missing

So this post is also on question answering but involves issues involving the questions rather than the answers. And particularly vague questions.

Early in the description for a recently granted Google Patent, we see this line, which is the focus of the patent:

Some queries may indicate that the user is searching for a particular fact to answer a question reflected in the query.

I’ve written a few posts about Google working on answering questions, and it is good seeing more information about that topic being published in a new patent. As I have noted, this one focuses upon when questions asking for facts may be vague:

When a question-and-answer (Q&A) system receives a query, such as in the search context, the system must interpret the query, determine whether to respond, and if so, select one or more answers with which to respond. Not all queries may be received in the form of a question, and some queries might be vague or ambiguous.

The patent provides an example query for “Washington’s age.”

Washington’s Age could be referring to:

  • President George Washington
  • Actor Denzel Washington
  • The state of Washington
  • Washington D.C.

For the Q&A system to work correctly, it would have to decide which the searcher who typed that into a search box the query was likely interested in finding the age of. Trying that query, Google decided that I was interested in George Washington:

Answering vague questions

The problem that this patent is intended to resolve is captured in this line from the summary of the patent:

The techniques described in this paper describe systems and methods for determining whether to respond to a query with one or more factual answers, including how to rank multiple candidate topics and answers in a way that indicates the most likely interpretation(s) of a query.

How would Google potentially resolve this problem?

It would likely start by trying to identify one or more candidate topics from a query. It may try to generate, for each candidate topic, a candidate topic-answer pair that includes both the candidate topic and an answer to the query for the candidate topic.

It would obtain search results based on the query, which references an annotated resource, which would be is a resource that, based on automated evaluation of the content of the resource, is associated with an annotation that identifies one or more likely topics associated with the resource. For each candidate topic-answer pair,

There would be a Determination of a score for the candidate topic-answer pair based on:

(i) The candidate topic appearing in the annotations of the resources referenced by one or more of the search results
(ii) The query answer appearing in annotations of the resources referenced by the search results, or in the resources referenced by the search results.

A decision would also be made on whether to respond to the query, with one or more answers from the candidate topic-answer pairs, based on the scores for each.

Topic-Answer Scores

The patent tells us about some optional features as well.

  1. The scores for the candidate topic-answer pairs would have to meet a predetermined threshold
  2. This process may decide to not respond to the query with any of the candidate topic answer pairs
  3. One or More of the highest-scoring topic-answer pairs might be shown
  4. An topic-answer might be selected from one of a number of interconnected nodes of a graph
  5. The Score for the topic-answer pair may also be based upon a respective query relevance score of the search results that include annotations in which the candidate topic occurs
  6. The score to the topic-answer pair may also be based upon a confidence measure associated with each of one or more annotations in which the candidate topic in a respective candidate topic-answer pair occurs, which could indicate the likelihood that the answer is correct for that question

Knowledge Graph Connection to Vague Questions?

vague answers answered with Knowledge base

This question-answering system can include a knowledge repository which includes a number of topics, each of which includes attributes and associated values for those attributes.

It may use a mapping module to identify one or more candidate topics from the topics in the knowledge repository, which may be determined to relate to a possible subject of the query.

An answer generator may generate for each candidate topic, a candidate topic-answer pair that includes:

(i) the candidate topic, and
(ii) an answer to the query for the candidate topic, wherein the answer for each candidate topic is identified from information in the knowledge repository.

A search engine may return search results based on the query, which can reference an annotated resource. A resource that, based on an automated evaluation of the content of the resource, is associated with an annotation that identifies one or more likely topics associated with the resource.

A score may be generated for each candidate topic-answer pair based on:

(i) an occurrence of the candidate topic in the annotations of the resources referenced by one or more of the search results
(ii) an occurrence of the answer in annotations of the resources referenced by the one or more search results, or in the resources referenced by the one or more search results. A front-end system at the one or more computing devices can determine whether to respond to the query with one or more answers from the candidate topic-answer pairs, based on the scores.

The additional features above for topic-answers appears to be repeated in this knowledge repository approach:

  1. The front end system can determine whether to respond to the query based on a comparison of one or more of the scores to a predetermined threshold
  2. Each of the number of topics that in the knowledge repository can be represented by a node in a graph of interconnected nodes
  3. The returned search results can be associated with a respective query relevance score and the score can be determined by the scoring module for each candidate topic-answer pair based on the query relevance scores of one or more of the search results that reference an annotated resource in which the candidate topic occurs
  4. For one or more of the candidate topic-answer pairs, the score can be further based on a confidence measure associated with each of one or more annotations in which the candidate topic in a respective candidate topic-answer pair occurs, or each of one or more annotations in which the answer in a respective candidate topic-answer pair occurs

Advantages of this Vague Questions Approach

  1. Candidate responses to the query can be scored so that a Q&A system or method can determine whether to provide a response to the query.
  2. If the query is not asking a question or none of the candidate answers are sufficiently relevant to the query, then no response may be provided
  3. The techniques described herein can interpret a vague or ambiguous query and provide a response that is most likely to be relevant to what a user desired in submitting the query.

This patent about answering vague questions is:

Determining question and answer alternatives
Inventors: David Smith, Engin Cinar Sahin and George Andrei Mihaila
Assignee: Google Inc.
US Patent: 10,346,415
Granted: July 9, 2019
Filed: April 1, 2016

Abstract

A computer-implemented method can include identifying one or more candidate topics from a query. The method can generate, for each candidate topic, a candidate topic-answer pair that includes both the candidate topic and an answer to the query for the candidate topic. The method can obtain search results based on the query, wherein one or more of the search results references an annotated resource. For each candidate topic-answer pair, the method can determine a score for the candidate topic-answer pair for use in determining a response to the query, based on (i) an occurrence of the candidate topic in the annotations of the resources referenced by one or more of the search results, and (ii) an occurrence of the answer in annotations of the resources referenced by the one or more search results, or in the resources referenced by the one or more search results.

Vague Questions Takeaways

I am reminded of a 2005 Google Blog post called Just the Facts, Fast when this patent tells us that sometimes it is “most helpful to a user to respond directly with one of more facts that answer a question determined to be relevant to a query.”

The different factors that might be used to determine which answer to show if an answer is shown, includes a confidence level, which may be confidence that an answer to a question is correct. That reminds me of the association scores of attributes related to entities that I wrote about in Google Shows Us How It Uses Entity Extractions for Knowledge Graphs. That patent told us that those association scores for entity attributes might be generated over the corpus of web documents as Googlebot crawled pages extracting entity information, so those confidence levels might be built into the knowledge graph for attributes that may be topic-answers for a question answering query.

A webpage that is relevant for such a query, and that an answer might be taken from may be used as an annotation for a displayed answer in search results.


Copyright © 2019 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How would Google Answer Vague Questions in Queries? appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


How Google May Handle Question Answering when Facts are Missing

July 14, 2019 No Comments

I wrote about a similar patent in the post, Google Extracts Facts from the Web to Provide Fact Answers

This one introduces itself with the following statement, indicating a problem that Google may have with answering questions from the facts it may collect from the Web to fill its knowledge graph:

Embodiments relate to relational models of knowledge, such as a graph-based data store, can be used to provide answers to search queries. Such models describe real-world entities (people, places, things) as facts in the form of graph nodes and edges between the nodes. While such graphs may represent a significant amount of facts, even the largest graphs may be missing tens of millions of facts or may have incorrect facts. For example, relationships, edges or other attributes between two or more nodes can often be missing.

That is the problem that this new patent is intended to solve. The patent was filed in November of 2017. The earlier patent I linked to above was granted in June 2017. It does not anticipate missing or incorrect facts like this newer patent warns us about. The newer patent tells us about how they might be able to answer some questions without access to some facts.

It’s also reminding me of another patent that I recently wrote about on the Go Fish Digital Website. That post is titled, Question Answering Explaining Estimates of Missing Facts. Both the patent that post was about and this new patent include Gal Chechik, Yaniv Leviathan, Yoav Tzur, Eyal Segalis, as inventors (the other patent has a couple of additional inventors as well.)

The earlier question answering with estimates patent talks about how they might infer answers, and provide explanations with those answers. This also tells it might infer answers, but doesn’t include the explanations:

Facts and/or attributes missing from a relational model of knowledge often can be inferred based on other related facts (or elements of facts) in the graph. For example, a search system may learn that an individual’s grandfather is a male parent of a parent. Accordingly, the system can determine with high confidence that an individual’s grandfather, even though there is no grandfather edge between nodes, is most likely a parent of a parent (given that there is a parent edge between nodes) with an additional check the parent of the parent is male. While this example uses one piece of supporting evidence (called a feature), inferring an individual’s grandfather, functions estimating missing facts are often more complex and can be based on several, even hundreds, of such features. Once the facts and/or attributes missing from a relational model of knowledge can be inferred, queries based on the facts and/or attributes missing from a relational model of knowledge can be resolved.

The process described in this question answering patent describes how Google may go about coming up with an answer to a question. This patent was filed after the one that includes estimates of how answers were created, so it does not include that step:

In one example embodiment, a computer system includes at least one processor and a memory storing a data graph and instructions. The instructions, when executed by the at least one processor, cause the system to generate a template sentence based on a fact including a first node, a second node and a string, wherein the first node and the second node exist in the data graph and the string represents a fact that is absent from the data graph, search the internet for a document including the template sentence, and upon determining the internet includes the document with the template sentence, infer the fact by generating a series of connections between nodes and edges of the data graph that together with the first node and the second node are configured to represent the fact, the series of connections defining a path, in the data graph, from the first node to the second node.

This process isn’t described in too much detail, but the patent does provide an example, which may be helpful in understanding how it may work. Here is that example:

For example, a node may correspond to a fact describing a parent-child relationship. For example, baseball player Bob Boone is the son of baseball player Ray Boone and the father of baseball players Aaron Boone and Bret Boone. Accordingly, the data graph may include an entity as a node corresponding to Bob Boone, which may include an edge for a parent relationship directed to Ray Boone and two edges for child corresponding, respectively, to Aaron Boone and Bret Boone. The entity or node may also be associated with a fact or an attribute that includes an edge (e.g., occupation) between Bob Boone as a node and baseball as a node. Alternatively, the node Bob Boone may include an attribute as a property (e.g., occupation) set to baseball.

However, there may be no edge in the entity (or the graph as a whole) corresponding to a grandparent relationship. Therefore, the relationship between Ray Boone and Aaron Boone may not be shown in the graph. However, the relationship between Ray Boone and Aaron Boone may be inferred from the graph so long as the question answering system knows (i.e., has been instructed accordingly) that there is such an entity as a grandparent.

The inference may be based on the joint distribution of one or more features, which represent facts in the data graph that are related to the missing information. The system may also be used to store the inferences (e.g., as functions or algorithms) and the semantically structured sentence (e.g., X is the attribute of Y) used to generate the inference. It then uses these entities to map new string that corresponds to relationships between nodes. By that system may be configured to learn new edges between existing nodes in the data graph. In some implementations, the system can generate an inference and its algorithm from a very large data graph, e.g., one with millions of entities and even more edges. The algorithm (or function) can include a series of connections between nodes and edges of the data graph. Accordingly, the algorithm can represent an attribute as an edge in a fact. The algorithm (or function) can also include a check of a property of a node (e.g., a gender property is male). While the system in FIG. 1 is described as an Internet search system, other configurations and applications may be used. For example, the system may be used in any circumstance where estimates based on features of a joint distribution are generated.

The mentions of Joint Distributions in this patent are worth studying in more depth as the relationships between properties of different entities may reveal information that worth a system like the knowledge graph knowing about. The son of someone’s son is their grandson. If the knowledge graph doesn’t include that grandson property, then being able to make that connection can mean that a question answering system can start answering questions like Aaron Boone is Ray Boone’s Grandson. Other relations beyond whom is related to whom within a family can use this approach to answer questions as well.

This patent that is aimed at helping fill in missing and incorrect facts for question answering systems is:

Semi structured question answering system
Inventors: Yaniv Leviathan, Eyal Segalis, Yoav Tzur, and Gal Chechik
Assignee: GOOGLE LLC
US Patent: 10,346,485
Granted: July 9, 2019
Filed: November 8, 2017

Abstract

In one example embodiment, a computer system includes at least one processor and a memory storing a data graph and instructions. The instructions, when executed by the at least one processor, cause the system to generate a template sentence based on a fact including a first node, a second node and a string, wherein the first node and the second node exist in the data graph and the string represents a fact that is absent from the data graph, search the internet for a document including the template sentence, and upon determining the internet includes the document with the template sentence, infer the fact by generating a series of connections between nodes and edges of the data graph that together with the first node and the second node are configured to represent the fact, the series of connections defining a path, in the data graph, from the first node to the second node.

Some posts I’ve written about patents involving question answering:

L:ast Update July 11, 2019.


Copyright © 2019 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google May Handle Question Answering when Facts are Missing appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


How Google May Rank Some Results based on Categorical Quality

June 13, 2019 No Comments

A New Patent on Categorical Quality

Some of the people who write patents for Google tend to stand out to me. One of those is Trystan Upstill. I noticed that he has published another one that looks really interesting, and worth reading. When I started following his patents, I read his doctoral thesis, Document ranking using web evidence which was really interesting, from the early days in his professional career. It is from before he was listed as the inventor of a number of patents, that I also found interesting. I’ve written about a number of patents he has participated in creating as well because they often focus upon Site Quality, and I learn something from reading them and trying to understand them. Here are posts from his patents which I have written about previously:

I noticed his name on a new one granted at the end of May, and I’ve been working through it now, too.

The patent is titled, “ReRanking Internet Resources based on Categorical Quality.”

It starts off by telling us about the importance of searches based on categories, which reminded me of web directories which have started to disappear.

Back when there were more directories online such as the Yahoo Directory or the Open Directory Project, those were often good places to begin searches because they showed you what they had in different categories. For instance, if you were interested in San Diego, you could find a category about San Diego, and browse through the sub-categories to see what was included in the broader category. There you could learn about Down Town, Old Town, North County, and other parts of San Diego.

A Categorical Quality Patent from Search Quality

This patent tells us that “a search system ranks the resources based on their relevance to the query and importance.” That is how most search engines rank documents that are returned on a search for a query. If you had a chance to read through the earlier patents I listed for Trystan Upstill, you won’t be surprised that he talks about the intent behind searches, such as informational and navigational intents.

The patent tells us that sometimes searchers perform searchers aimed at providing them with broad information, and sometimes they have an idea that a particular site exists and they are trying to find information from that site. The intent behind those types of searches, in the first case, is referred to as “informational,” and in the second case is known as “navigational.”

The patent tells us about what results are like for our informational searches and our navigational searches. There are usually many relevant results for informational searches, and it is often the case that no one particular result receives the vast majority of selections by searchers. These could be broad searches for things such as [football] or [space travel]. When someone searches for a navigational query, they are likely looking for a specific page or resource, which tend to be results that typically receives the most selections by searchers. For example, you may search for [espn] or [legoland].

The patent tells us that sometimes when you perform informational searches, there may be a lot of results that often are good ones, and it aims at a way of “re-ranking resources based on the quality of the resources.”

That is the problem that this patent aims at finding a way to solve.

It focuses upon understanding the categories behind a search, and instead of focusing upon relevance and authority as a primary way of ranking those results, it may rerank results based upon what it refers to as categorical quality.

Early on in the patent, it identifies this problem and then points out the advantages behind the patented categorical quality process.

Advantages of the Process involved in this patent

  1. By re-ranking search results for a proper subset of resources that satisfy a quality condition, the search system provides a set of search results that lists resources that belong to a category according to a quality ranking that differs from a search ranking of a received query.
  2. Because the search results are provided according to a ranking that is based, in part, on quality with respect to the category, the search results are more likely to satisfy a user’s informational need when the users issue a query that is categorical for the category.
  3. This also obviates the need for the user to issue several separate navigational queries or several informational queries, as the most popular resources with respect to the category tend to be boosted in the ranking during the re-ranking process.
  4. Furthermore, the re-ranking can be triggered only for certain queries for which there is a signal of a categorical interest, and not triggered when the query signals a non-categorical interest, such as a navigational interest, or where the query is an answer seeking query, etc. In these latter cases, there is a strong signal of the user’s informational need, and thus the re-ranking would likely be of little informational utility to the user.

This Categorical Quality patent is:

Re-Ranking Resources Based on Categorical Quality
Inventors: Trystan G. Upstill, Abhishek Das, Jeongwoo Ko, Neesha Subramaniam and Vishnu P. Natchu
US Patent Application: 20190155948
Published on: May 23, 2019
Filed: March 31, 2015

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, re-ranking resources for categorical queries. In one aspect, a method includes receiving queries, and for each received query: receiving data indicating resources identified by a search operation as being responsive to the query and ranked according to a first order, each resource having corresponding search score by which the resources are ranked in responsiveness to the query and determining whether a proper subset meets a quality condition based on a quality measure that is indicative of the quality of the resources in the proper subset and independent of search scores of the resources for received query. For each query for which the proper subset meets the quality condition, determining a quality score for each resource in the proper subset and re-ranking the resources in the proper subset according to their respective quality scores.

When a searcher doesn’t know very much about a category, it isn’t unusual for them to start off a search with a broader query. They may not know the category well, nor sites or resources that may provide the best answers to questions that they have, or meet the informational or situational needs that they have.

This patent aims at re-ranking results for broad category searches based upon their quality in the category which they are being searched within.

The patent tells us that “if they rerank search results based upon quality, the sites and resources they show will be the ones that best serve the categories searched for.”

So, what does it mean to rank results based upon categorical quality?

How is Category Quality Measured?

  1. Ranked according to responsiveness to received query
  2. A subset of the resources can also be selected, and a determination is made as to whether the proper subset meets a quality condition based on a quality measure that is indicative of the quality of the resources in the proper subset.
  3. A variety of quality conditions can be considered, including:
    1. traffic to each resource
    2. whether each resource is a navigational resource for a corresponding navigational query
    3. the authority of each resource relative to other resources
  4. The quality condition for the subset, for example, may be met when a threshold number of the resources in the proper subset meet a popularity condition. For example, the threshold number may be 70% of the number of resources in the proper subset. The popularity condition may be based on one or more criteria.

The Categorical Quality Patent tells us that “A resource satisfying the quality condition is a signal that the resource is a high-quality resource for the category to which the received query belongs.”

And it also tells us that “Various criteria can be used to determine if a resource satisfies a quality condition.”

Once resources have been determined to meet quality criteria, and have been given Categorical Quality Scores, they may be re-ranked based upon those scores.

These categorical quality scores appear to be based upon user behavior information about selections of pages in response to queries.

Click logs and query logs may be used to map queries submitted about web pages identified in search results and the actions taken by searchers in results to those pages.

Informational and Navigational Resources

This patent starts off by telling us about informational resources and navigational resources.

The click logs and query logs may be used to determine navigational scores to determine if a query could be determined to be a navigational query for a resource. If it is a navigational resource for a query, it may be given a score of 1. All other resources and websites may be given a score of 0.

They tell us that: “This type of scoring model is based on the premise that a query is only navigational for one resource, or for one website.”

An alternative is also presented, with a navigational score being a score with an upper bound and a lower bound, and that a query may have a “separate navigational score for each of multiple resources.”

Also, “an informational query may have a relatively flat score for many resources, indicating such resources are selected often for the query when identified by search results, and the score may gradually decrease to the lower bound for the remaining resources that are rarely selected for the query.”

Navigational queries may rank highly for one resource (or several resources belonging to one website) and very low scores for all other resources. This makes sense – if someone searches for ESPN, they are going to be satisfied with pages from the ESPN website, and not from other websites.

Categorical Quality Resources

A resource that satisfied a quality condition is one that shows the resource is popular for the category to which the received query belongs. It is in a subset of resources that are likely to “satisfy a user’s informational need with respect to the category.”

Re-Ranking Resources for Categorical Queries

This process of reranking resources is done fo each query received. A query can have one or more terms.

Resources are identified as being responsive to the query and those are ranked according to a first order (given a search score.) For example, the category quality ranking module receives data describing the output of a search of the index using the query. They are ranked in terms of how responsive they are to a query relative to the other resources identified. The categorical quality patent tells us that not all indexed resources are scored; only the top 1,000 scored resources may be included.

Only a fraction of those results may be checked to be re-ranked, such as a relatively small value, like the top 20 or 30 results.

A query can belong to more than one category type, and the number of results to be reranked will be the same for all of those category types.

The patent tells us that multiple categorization techniques may be used, such as “query clustering, vertical categorization based on selections of search results responsive to the query, and so on.”

Quality Conditions for a Resource Set

The quality of resources may be based upon things such as:

  • The authority of the resource relative to other resources
  • The traffic for each resource
  • The relevance of the resource to other queries that are different from the received query
  • Other factors that can be used to determine a quality measure of the set of resources

These quality signals can be said to be indicative of the ability of those resources that are being reranked to “satisfy a user’s informational need for a category to which a received query belongs.”

An example of determining whether a set of resources meets a quality condition for a category (it shows four features which may be different than in other determinations.) The four features in this example are:

  1. The quality of the resource as measured by navigational queries (if any)
  2. The topicality of the resource to the received query
  3. The performance of search results that reference the resource
  4. Whether the received query is itself a navigational query.

All of the resources in a subset of resources for a query will undergo this analysis.

In response to each of these features, each resource may be given a value, and those may be used to give a total score for those resources. If the subset of resources meets a threshold value, it will be determined to meet the quality condition for the category.

Additional features or fewer features may be used to determine the quality for a category.

The patent goes on to explain the value of different types of quality features.

Topicality Scores

A score might be determined that measures how topical the resource is for the query. This could be done a few different ways, such as:

  • The similarity of query terms to terms in the resources can be determined, and the more similar the terms of the query to the terms of the resource, the higher the topicality score.
  • The performance of search results that reference the resource when provided in response to the query can be determined. The higher the performance (e.g., selection rate), the higher the topicality score.

Other processes could be used to score topicality, and the higher the score for the resource, the more likely it is to meet that quality condition.

Other topicality scoring processes can also be used. The higher the topicality score, the more likely the resource is to meet the quality condition.

A score based on ther performance of search results in referencing the resource

The category quality ranking feature may determine a score based on an aggregation of selections of search results for the resource for all queries. The better a resource does, the more likely it will meet that quality condition.

Navigational Scores

The category quality re-ranking process may look to see whether resources fulfill any corresponding navigational queries. One or more navigational queries for a resource, or a number of queries with relatively high navigational scores, indicates that a resource is a popular resource. This determination may be based on queries that are different from the received query, since a high navigational score of the received query may preclude or otherwise reduce the likelihood of re-ranking the proper subset of the resources.

Categorical Quality Scores Overall

I’ve written about some of the features that may be used to determine categorical quality scores for resources that might be returned in response to a received query. The patent tells us that additional metric could be used as well, such as:

  • aggregate visits to a resource
  • social network shares for a resource
  • traffic patterns

Additional Implementations

There is a big “Additional Implementations” section that includes more details on how categorical quality might be scored or even potentially ignored like in the following:

For queries that have a high locality intent, this reranking based on categories may be disabled, because “the locality intent is a signal that user has a specific informational need that should not be discounted. An example of a query with a high locality intent is [Videos in Mountain View, Calif.].”

If you want to learn about all of the details behind this categorical quality approach where there are potentially a lot of good results for a query, and some of those may be re-ranked based upon quality scores, you can go through this section of the patent, and go through the patent overall.

This is a fairly complex patent, which includes things like click selections to determine quality scores that pages may be reranked on. We’ve been told by Google Spokespeople that Pages in Google’s search results aren’t ranked by searcher’s clicks.

If Google is following processes like the ones described in this patent those clicks aren’t directly being used to rank pages, but they look like a way that is being used to understand the quality of some resources that may be ranked in search results based upon categorical quality.


Copyright © 2019 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google May Rank Some Results based on Categorical Quality appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Google Image Classification and Landmarks

May 20, 2019 No Comments

Image Classification in the past

Back in 2008, I was writing about how a search engine might learn from photo databases like Flickr, and how people label images there in a post I wrote called, Community Tagging and Ranking in Images of Landmarks

In another post that covers the Flickr image classification Landmark work, Faces and Landmarks: Two Steps Towards Smarter Image Searches, I mentioned part of what the Yahoo study uncovered:

Using automatically generated location data, and software that can cluster together similar images to learn about images again goes beyond just looking at the words associated with pictures to learn what they are about.

That is using metadata from images in an image collection, which is very different from what Google is doing in this post about identifying landmarks in the post, How Google May Interpret Queries Based on Locations and Entities (Tested), where it might identify landmarks based upon a knowledge of their actual location.

More Recent Image Classification of Landmarks

I mention those earlier posts because I wanted to share what I had written about landmarks, before pointing to more recent studies from Google about how they might recognize landmarks, a year apart from each other, with one being a followup to the other.

The first of these papers, Google-Landmarks: A New Dataset and Challenge for Landmark Recognition, starts out by telling us about a problem that needs solving:

Image classification technology has shown remarkable improvement over the past few years, exemplified in part by the Imagenet classification challenge, where error rates continue to drop substantially every year. In order to continue advancing the state of the art in computer vision, many researchers are now putting more focus on fine-grained and instance-level recognition problems – instead of recognizing general entities such as buildings, mountains and (of course) cats, many are designing machine learning algorithms capable of identifying the Eiffel Tower, Mount Fuji or Persian cats. However, a significant obstacle for research in this area has been the lack of large annotated datasets.

A year later, Google worked to improve the dataset that was being used for image classification when identifying landmarks, and updated the dataset that they had created the year before, as they tell us in,Announcing Google-Landmarks-v2: An Improved Dataset for Landmark Recognition & Retrieval Part of the effort behind that work came from getting a lot of help as described in the blog post announcing it:

A particular problem in preparing Google-Landmarks-v2 was the generation of instance labels for the landmarks represented since it is virtually impossible for annotators to recognize all of the hundreds of thousands of landmarks that could potentially be present in a given photo. Our solution to this problem was to crowdsource the landmark labeling through the efforts of a world-spanning community of hobby photographers, each familiar with the landmarks in their region.

Google Patent for Image Classification when Identifying Landmarks in Image Collections

image classification

Google was recently granted a patent that focuses on identifying popular landmarks in large digital image collections. Considering Google operates Google photos, that makes a lot of sense. The landmark identification efforts at Flickr sound a little similar to this effort on Google’s part. The patent does target a specific problem which it tells us is:

However, there is no known system that can automatically extract information such as the most popular tourist destinations from these large collections. As numerous new photographs are added to these digital image collections, it may not be feasible for users to manually label the photographs in a complete and consistent manner that will increase the usefulness of those digital image collections. What is needed, therefore, are systems and methods that can automatically identify and label popular landmarks in large digital image collections.

Some of it does sound similar to the Flickr efforts where it talks about working to populate and update “a database of images of landmarks including geo-clustering geo-tagged images according to geographic proximity to generate one or more geo-clusters, and visual-clustering the one or more geo-clusters according to image similarity to generate one or more visual clusters.”

How might this play into image classification and search involving landmarks?

The patent describes how it could fit into searches, with the following steps:

  • Enhancing user queries to retrieve images of landmarks, including the stages of receiving a user query
  • Identifying one or more trigger words in the user query
  • Selecting one or more corresponding tags from a landmark database corresponding to the one or more trigger words
  • Supplementing the user query with the one or more corresponding tags, generating a supplemented user query

Trigger words appearing in queries is interesting.

The patent also tells us that it could also involve a method of automatically tagging a new digital image, which would also cover:

  • Comparing the new digital image to images in a landmark image database, wherein the landmark image database comprises visual clusters of images of one or more landmarks
  • tagging the new digital image with at least one tag based on at least one of said visual clusters

The patent is:

Automatic discovery of popular landmarks
Inventors: Fernando A. Brucher, Ulrich Buddemeier, Hartwig Adam and Hartmut Neven
Assignee: Google LLC
US Patent: 10,289,643
Granted: May 14, 2019
Filed: October 3, 2016

Abstract

In one embodiment the present invention is a method for populating and updating a database of images of landmarks including geo-clustering geo-tagged images according to geographic proximity to generate one or more geo-clusters, and visual-clustering the one or more geo-clusters according to image similarity to generate one or more visual clusters. In another embodiment, the present invention is a system for identifying landmarks from digital images, including the following components: a database of geo-tagged images; a landmark database; a geo-clustering module; and a visual clustering module. In other embodiments, the present invention may be a method of enhancing user queries to retrieve images of landmarks or a method of automatically tagging a new digital image with text labels.

Even Smarter Image Classification of Landmarks

This system appears to be capable of finding very popular landmarks in photo collections across the web and storing those in a landmark database, where it might geocluster those. It’s interesting thinking about this effort. If Google Might use those landmark images in Image Search Results, it may not stop image classification at that point

I recently wrote about Google Image Search Labels Becoming More Semantic? where we were told in an updated Google Patent that images were being labeled based upon an ontology related to the topics of those images. A Google image search for a landmark like The Washington Monument shows a number of image classification labels at the top of the results that can be clicked on if you want to narrow down the results to specific aspects of those monuments.

So, image classification may include specific monuments, and then even more narrow classifications, like having the following labels applied to the Washington Monument:

Reflecting Pool
Lincoln Memorial
Washington DC
Elevator
Inside
Trump
Construction
Top
Baltimore
Earthquake
Building
interior
Capstone
original
sunrise
Observation deck
National Mall

So, Google may have smarter image classification when it comes to landmarks, but it is labeling them so that they are more meaningful, too.


Copyright © 2019 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Google Image Classification and Landmarks appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Context Clusters in Search Query Suggestions

February 19, 2019 No Comments

Saketh Garuda

Context Clusters and Query Suggestions at Google

A new patent application from Google tells us about how the search engine may use context to find query suggestions before a searcher has completed typing in a full query. After seeing this patent, I’ve been thinking about previous patents I’ve seen from Google that have similarities.

It’s not the first time I’ve written about a Google Patent involving query suggestions. I’ve written about a couple of other patents that were very informative, in the past:

In both of those, the inclusion of entities in a query impacted the suggestions that were returned. This patent takes a slightly different approach, by also looking at context.

Context Clusters in Query Suggestions

We’ve been seeing the word Context spring up in Google patents recently. Context terms from knowledge bases appearing on pages that focus on the same query term with different meanings, and we have also seen pages that are about specific people using a disambiguation approach. While these were recent, I did blog about a paper in 2007, which talks about query context with an author from Yahoo. The paper was Using Query Contexts in Information Retrieval. The abstract from the paper provides a good glimpse into what it covers:

User query is an element that specifies an information need, but it is not the only one. Studies in literature have found many contextual factors that strongly influence the interpretation of a query. Recent studies have tried to consider the user’s interests by creating a user profile. However, a single profile for a user may not be sufficient for a variety of queries of the user. In this study, we propose to use query-specific contexts instead of user-centric ones, including context around query and context within query. The former specifies the environment of a query such as the domain of interest, while the latter refers to context words within the query, which is particularly useful for the selection of relevant term relations. In this paper, both types of context are integrated in an IR model based on language modeling. Our experiments on several TREC collections show that each of the context factors brings significant improvements in retrieval effectiveness.

The Google patent doesn’t take a user-based approach ether, but does look at some user contexts and interests. It sounds like searchers might be offered a chance to select a context cluster before showing query suggestions:

In some implementations, a set of queries (e.g., movie times, movie trailers) related to a particular topic (e.g., movies) may be grouped into context clusters. Given a context of a user device for a user, one or more context clusters may be presented to the user when the user is initiating a search operation, but prior to the user inputting one or more characters of the search query. For example, based on a user’s context (e.g., location, date and time, indicated user preferences and interests), when a user event occurs indicating the user is initiating a process of providing a search query (e.g., opening a web page associated with a search engine), one or more context clusters (e.g., “movies”) may be presented to the user for selection input prior to the user entering any query input. The user may select one of the context clusters that are presented and then a list of queries grouped into the context cluster may be presented as options for a query input selection.

I often look up the inventors of patents to get a sense of what else they may have written, and worked upon. I looked up Jakob D. Uszkoreit in LinkedIn, and his profile doesn’t surprise me. He tells us there of his experience at Google:

Previously I started and led a research team in Google Machine Intelligence, working on large-scale deep learning for natural language understanding, with applications in the Google Assistant and other products.

This passage reminded me of the search results being shown to me by the Google Assistant, which are based upon interests that I have shared with Google over time, and that Google allows me to update from time to time. If the inventor of this patent worked on Google Assistant, that doesn’t surprise me. I haven’t been offered context clusters yet (and wouldn’t know what those might look like if Google did offer them. I suspect if Google does start offering them, I will realize that I have found them at the time they are offered to me.)

Like many patents do, this one tells us what is “innovative” about it. It looks at:

…query data indicating query inputs received from user devices of a plurality of users, the query data also indicating an input context that describes, for each query input, an input context of the query input that is different from content described by the query input; grouping, by the data processing apparatus, the query inputs into context clusters based, in part, on the input context for each of the query inputs and the content described by each query input; determining, by the data processing apparatus, for each of the context clusters, a context cluster probability based on respective probabilities of entry of the query inputs that belong to the context cluster, the context cluster probability being indicative of a probability that at least one query input that belongs to the context cluster and provided for an input context of the context cluster will be selected by the user; and storing, in a data storage system accessible by the data processing apparatus, data describing the context clusters and the context cluster probabilities.

It also tells us that it will calculate probabilities that certain context clusters might be requested by a searcher. So how does Google know what to suggest as context clusters?

Each context cluster includes a group of one or more queries, the grouping being based on the input context (e.g., location, date and time, indicated user preferences and interests) for each of the query inputs, when the query input was provided, and the content described by each query input. One or more context clusters may be presented to the user for input selection based on a context cluster probability, which is based on the context of the user device and respective probabilities of entry of the query inputs that belong to the context cluster. The context cluster probability is indicative of a probability that at least one query input that belongs to the context cluster will be selected by the user. Upon selection of one of the context clusters that is presented to the user, a list of queries grouped into the context cluster may be presented as options for a query input selection. This advantageously results in individual query suggestions for query inputs that belong to the context cluster but that alone would not otherwise be provided due to their respectively low individual selection probabilities. Accordingly, users’ informational needs are more likely to be satisfied.

The Patent in this patent application is:

(US20190050450) Query Composition System
Publication Number: 20190050450
Publication Date: February 14, 2019
Applicants: Google LLC
Inventors: Jakob D. Uszkoreit
Abstract:

Methods, systems, and apparatus for generating data describing context clusters and context cluster probabilities, wherein each context cluster includes query inputs based on the input context for each of the query inputs and the content described by each query input, and each context cluster probability indicates a probability that at a query input that belongs to the context cluster will be selected by the user, receiving, from a user device, an indication of a user event that includes data indicating a context of the user device, selecting as a selected context cluster, based on the context cluster probabilities for each of the context clusters and the context of the user device, a context cluster for selection input by the user device, and providing, to the user device, data that causes the user device to display a context cluster selection input that indicates the selected context cluster for user selection.

What are Context Clusters as Query Suggestions?

The patent tells us that context clusters might be triggered when someone is starting a query on a web browser. I tried it out, starting a search for “movies” and got a number of suggestions that were combinations of queries, or what seem to be context clusters:

The patent says that context clusters would appear before someone began typing, based upon topics and user information such as location. So, if I were at a shopping mall that had a movie theatre, I might see Search suggestions for movies like the ones shown here:

Context Clusters

One of those clusters involved “Movies about Business”, which I selected, and it showed me a carousel, and buttons with subcategories to also choose from. This seems to be a context cluster:

Movies about Business

This seems to be a pretty new idea, and may be something that Google would announce as an availble option when it becomes available, if it does become available, much like they did with the Google Assistant. I usually check through the news from my Google Assistant at least once a day. If it starts offering search suggestions based upon things like my location, it could potentially be very interesting.

User Query Histories

The patent tells us that context clusters selected to be shown to a searcher might be based upon previous queries from a searcher, and provides the following example:

Further, a user query history may be provided by the user device (or stored in the log data) that includes queries and contexts previously provided by the user, and this information may also factor into the probability that a user may provide a particular query or a query within a particular context cluster. For example, if the user that initiates the user event provides a query for “movie show times” many Friday afternoons between 4 PM-6 PM, then when the user initiates the user event on a Friday afternoon in the future between these times, the probability associated with the user inputting “movie show times” may be boosted for that user. Consequentially, based on this example, the corresponding context cluster probability of the context cluster to which the query belongs may likewise be boosted with respect to that user.

It’s not easy to tell whether the examples I provided about movies above are related to this patent or if it is tied more closely to the search results that appear in Google Assistant results. It’s worth reading through and thinking about potential experimental searches to see if they might influence the results that you may see. It is interesting that Google may attempt to anticipate what is suggests to show to us as query suggestions, after showing us search results based upon what it believes are our interests based upon searches that we have performed or interests that we have identified for Google Assistant.

The contex cluster may be related to the location and time that someone accesses the search engine. The patent provides an example of what might be seen by the searcher like this:

In the current example, the user may be in the location of MegaPlex, which includes a department store, restaurants, and a movie theater. Additionally, the user context may indicate that the user event was initiated on a Friday evening at 6 PM. Upon the user initiating the user event, the search system and/or context cluster system may access the content cluster data 214 to determine whether one or more context clusters is to be provided to the user device as an input selection based at least in part on the context of the user. Based on the context of the user, the context cluster system and/or search system may determine, for each query in each context cluster, a probability that the user will provide that query and aggregate the probability for the context cluster to obtain a context cluster probability.

In the current example, there may be four queries grouped into the “Movies” cluster, four queries grouped into the “Restaurants” cluster, and three queries grouped into the “Dept. Store” cluster. Based on the analysis of the content cluster data, the context cluster system may determine that the aggregate probability of the queries in each of the “Movies” cluster, “Restaurant” cluster, and “Dept. Store” cluster have a high enough likelihood (e.g., meet a threshold probability) to be input by the user, based on the user context, that the context clusters are to be presented to the user for selection input in the search engine web site.

I could see running such a search at a shopping mall, to learn more about the location I was at, and what I could find there, from dining places to movies being shown. That sounds like it could be the start of an interesting adventure.


Copyright © 2019 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Context Clusters in Search Query Suggestions appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Universal Search Updated at Google

January 5, 2019 No Comments

Tristan Colangelo

Sura gave up on her debugging for the moment. “The word for all this is ‘mature programming environment.’ Basically, when hardware performance has been pushed to its final limit, and programmers have had several centuries to code, you reach a point where there is far more signicant code than can be rationalized. The best you can do is understand the overall layering, and know how to search for the oddball tool that may come in handy—take the situation I have here.” She waved at the dependency chart she had been working on. “We are low on working fluid for the coffins. Like a million other things, there was none for sale on dear old Canberra. Well, the obvious thing is to move the coffins near the aft hull, and cool by direct radiation. We don’t have the proper equipment to support this—so lately, I’ve been doing my share of archeology. It seems that five hundred years ago, a similar thing happened after an in-system war at Torma. They hacked together a temperature maintenance package that is precisely what we need.

“Almost precisely.”

~ Vernor Vinge, A Deepness in the Sky

In a science fiction novel set far in the future, Vernor Vinge writes about how people might engage in software archaeology. I understand the desire to do that, looking at some patents that give us hints about how technology is changing, and processes behind search engines do as well.

Google has just been granted a continuation patent for universal search. This post is looking at how the patents covering universal search at Google have changed. This post is not intended as a lesson on how patents work, but knowing something about how continuation patents work, can provide some insights into the processes that people at Google are trying to protect when they have updated the universal search patent. This post is also not intended as an analysis of patents, but rather a look at how search works, and has changed in the last dozen years or so

A patent is pursued by a company to protect the process described within the patent. It isn’t unusual that the process protected by a patent might change in some way as it is implemented, and put into use. What sometimes happens when that takes place is that the company that was originally assigned the initial patent might file another patent. One referred to as a continuation patent, which takes the original granted date of the first version of the patent as the start time for protection under the patent.

The continuation patents are usually very similar to the earlier versions of the patents, with the description sections often being very close to identical. The parts of the patents that change are the claims sections, which are what prosecuting attorneys deciding whether to grant a patent look at and review to see if the patents are new, non-obvious and useful, and should be granted.

So, in looking at updated patents covering a specific process, ideally it makes sense to look at how the claims have changed over time.

The Original Universal Search Patent Application

Before the patent was granted, I wrote about it in the post How Google Universal Search and Blended Results May Work which was about the Universal Search Patent application published in 2008. That patent was granted, and the claims from the original filing of the patent were updated from the original application, when it was granted in 2011 (Sometimes processes in original applications have to be amended for the patent to be granted, and the claims may change to match those).

The First Universal Search Patent

In the 2011 granted version of Interleaving Search Results, the first six claims to the patent give us a flavor for what the patent covers:

The invention claimed is:

1. A computer-implemented method, comprising: receiving a plurality of first search results in a first presentation format, the first search results received from a first search engine, the first search results identified for a search query directed to the first search engine, the first search results having an associated order indicative of respective first quality scores that are used to rank the first search results; receiving one or more second search results in a second presentation format different from the first presentation format, the second search results received from a second search engine, the second search results identified for the search query directed to the second search engine, wherein the first search engine searches a first corpus of first resources, wherein the second search engine searches a second corpus of second resources, and wherein the first search engine and the second search engines are distinct from each other; obtaining a respective first quality score for a plurality of the first search results, the respective first quality score determined in relation to the corpus of first resources and obtaining a respective second quality score for each of the one or more second search results, each respective second quality score determined in relation to the corpus of second resources; and inserting one or more of the second search results into the order including decreasing one or more of the respective first quality scores by reducing a contribution of a scoring feature unique to the first search results and distinct from scoring features of the second search results so that the inserted second search results occur within a number of top-ranked search results in the order.

2. The method of claim 1, wherein the plurality of first search results comprises an ordered list of search results, and wherein the plurality of first search results is a number of highest-quality search results provided by the first search engine that are identified as responsive to the search query.

3. The method of claim 1, further comprising: receiving a third search result, the third search result received from a third search engine, wherein the third search engine searches a corpus of third resources, and wherein the third search engine is distinct from the first search engine and the second search engine; and inserting the third search result into the order.

4. The method of claim 1, wherein: the first resources are generic web pages and the second resources are video resources.

5. The method of claim 1, wherein: the first resources are generic web pages and the second resources are news resources.

6. The method of claim 4, further comprising: receiving a third search result from the second search engine; and inserting the third search result at a position between two otherwise adjacent first search results in the order, the position not being adjacent to the inserted one or more second search results.

The Second Universal Search Patent

We know that Google introduced Universal Search Results at a Searchology presentation in 2007 (a few months before the patent was filed originally), and the patent has been updated since then, with a continuation patent titled Interleaving Search Results granted in 2015, which has new claims, which insert the concept of historic click data into those. Here are the first five claims from that version of the patent:

The invention claimed is:

1. A computer-implemented method comprising: receiving in a search engine system a query, the query comprising query text submitted by a user; searching a first collection of resources to obtain one or more first search results, wherein each of the one or more first search results has a respective first search result score; searching a second collection of web resources to obtain one or more second search results, wherein each of the one or more second search results has a respective second search result score, wherein the resources of the first collection of resources are different from the resources of the second collection of web resources; determining from historical user click data that resources from the first collection of resources are more likely to be selected by users than resources from other collections of data when presented by the search engine in a response to the query text; generating enhanced first search result scores for the first search results as a consequence of the determining, the enhanced first search result scores being greater than the respective first search result scores for the first search results; generating a presentation order of first search results and second search results in order of the enhanced first search result scores and the second search result scores; generating a presentation of highest-ranked first search results and second search results in the presentation order; and providing the presentation in a response to the query.

2. The method of claim 1, wherein the historical click data represents resource collections of search results selected by users after submitting the query.

3. The method of claim 1, wherein determining from historical user click data that resources from the first collection of resources are more likely to be selected by users than resources from other collections of data when presented by the search engine in a response to the query text comprises: obtaining one or more user characteristics of the user; and determining that users having the one or more user characteristics are more likely to select resources from the first collection of resources than resources from other collections of data.

4. The method of claim 1, wherein generating the presentation of highest-ranked first search results and second search results in the presentation order comprises generating the presentation so that at least one first search result occurs within a number of highest-ranked second search results.

5. The method of claim 1, wherein generating the presentation of highest-ranked first search results and second search results in the presentation order comprises: generating each of the second search results in a web search results presentation format; and; generating each of the first search results in a different presentation format

The Updated Universal Search Patent

The newest version of Interleaving Search Results is still a pending patent application at this point, published on January 2, 2019

Publication Number: 3422216
Publication Date: 02.01.2019
Applicants: GOOGLE LLC
Inventors: Bailey David R, Effrat Jonathan J, Singhal Amit
(EN) Interleaving Search Results

Abstract:

(EN) A method comprising receiving a plurality of first search results that satisfy a search query directed to a first search engine, each of the plurality of first search results having a respective first score, receiving a second search result from a second search engine, the second search result having a second score, wherein the search query is not directed to the second search engine, wherein at least one of the first and second scores is based on characteristics of queries or results of queries learned from user click data; and determining from the second score whether to present the second search result, and if so, presenting the first search results in an order according to their respective scores, and presenting the second search result at a position relative to the order, the position being determined using the first scores and the second score

1. A method comprising:

receiving a plurality of first search results that satisfy a search query directed to a first search engine, each of the plurality of first search results having a respective first score;

receiving a second search result from a second search engine, the second search result having a second score, wherein the search query is not directed to the second search engine;
wherein at least one of the first and second scores is based on characteristics of queries or results of queries learned from user click data; and

determining from the second score whether to present the second search result, and if so:

presenting the first search results in an order according to their respective scores, and

presenting the second search result at a position relative to the order, the position being determined using the first scores and the second score.

2. The method of claim 1, wherein receiving a second search result from a second search engine comprises:

receiving a plurality of second search results, each second search result having a respective second score, each second search results from a respective second search engine, wherein the search query is not directed to the respective second search engines; and

determining from the respective second scores whether to present respective ones of the second search results.

3. The method of claim 1, wherein presenting the second search result at a position relative to the order comprises inserting the second search result at a position between two otherwise adjacent first search results in the order.

4. The method of any preceding claim, wherein the first and second search result scores are based on multiple distinct scoring features, the multiple distinct scoring features including at least one unique scoring feature of the first search engine distinct from the scoring features of the second search engine.

5. The method of any preceding claim, wherein the characteristics of queries or results of queries learned from user click data comprise a relationship between one of the first corpus of first resources and the second corpus of second resources and a particular search query.

Changes to Universal Search

If you look at them, you will see David Bailey’s name on those patents. He wrote a guest post at Search Engine land about Universal Search that provides a lot of insight into how it works and the title of the post refers to that: An Insider’s View Of Google Universal Search It’s worth reading though his analysis of Universal search carefully before trying to compare the claims from one version of the patent to another

The second version of the claims refer to historic click data, and the newest version changes that to “user click data”, but doesn’t provide any insights into why that change in the claims was made. We’ve heard spokespeople from Google tell us that they don’t utilize user click data to rank content, so this gets a little confusing if they are taken at their word.

Another difference in the latest claims is where it refers to multiple distinct scoring features, and how each type of search that is blended into results has some unique scoring feature that sets it apart from the results inserted on to the search results page from a search engine before it. We do know that different types of search are ranked based upon different signals, such as freshness being important for news results, and links often for Web results. So results shown in universal search may all be relevant for a query searched for, but have some element that considers some unique features that adds diversity to what we see in SERPs.


Copyright © 2019 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Universal Search Updated at Google appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


How Google’s Knowledge Graph Updates Itself by Answering Questions

October 31, 2018 No Comments

How A Knowledge Graph Updates Itself

Elijah Hail

To those of us who are used to doing Search Engine Optimization, we’ve been looking at URLs filled with content, and links between that content, and how algorithms such as PageRank (based upon links pointed between pages) and information retrieval scores based upon the relevance of that content have been determining how well pages rank in search results in response to queries entered into search boxes by searchers. Web pages connected by links have been seen as information points connected by nodes. This was the first generation of SEO.

Search has been going through a transformation. Back in 2012, Google introduced something it refers to as the knowledge graph, in which they told us that they would begin focusing upon indexing things instead of strings. By “strings,” they were referring to words that appear in queries, and in documents on the Web. By “things,” they were referring to named entities, or real and specific people, places, and things. When people searched at Google, the search engines would show Search Engine Results Pages (SERPs) filled with URLs to pages that contained the strings of letters that we were searching for. Google still does that, and is slowly changing to showing search results that are about people, places, and things.

Google started showing us in patents how they were introducing entity recognition to search, as I described in this post:
How Google May Perform Entity Recognition

They now show us knowledge panels in search results that tell us about the people, places, and things they recognize in the queries we perform. In addition to crawling webpages and indexing the words on those pages, Google is collecting facts about the people, places, and things it finds on those pages.

A Google Patent that was just granted in the past week tells us about how Google’s knowledge graph updates itself when it collects information about entities, their properties and attributes and relationships involving them. This is part of the evolution of SEO that is taking place today – learning how Search is changing from being based upon search to being based upon knowledge.

What does the patent tell us about knowledge? This is one of the sections that details what a knowledge graph is like that Google might collect information about when it indexes pages these days:

Knowledge graph portion includes information related to the entity [George Washington], represented by [George Washington] node. [George Washington] node is connected to [U.S. President] entity type node by [Is A] edge with the semantic content [Is A], such that the 3-tuple defined by nodes and the edge contains the information “George Washington is a U.S. President.” Similarly, “Thomas Jefferson Is A U.S. President” is represented by the tuple of [Thomas Jefferson] node 310, [Is A] edge, and [U.S. President] node. Knowledge graph portion includes entity type nodes [Person], and [U.S. President] node. The person type is defined in part by the connections from [Person] node. For example, the type [Person] is defined as having the property [Date Of Birth] by node and edge, and is defined as having the property [Gender] by node 334 and edge 336. These relationships define in part a schema associated with the entity type [Person].

Note that SEO is no longer just about how often certain words appear on pages of the Web, what words appear in links to those pages, in page titles, and headings, alt text for images, and how often certain words may be repeated or related words may be used. Google is looking at the facts that are mentioned about entities, such as entity types like a “person,” and properties, such as “Date of Birth,” or “Gender.”

Note that quote also mentions the word “Schema” as in “These relationships define in part a schema associated with the entity type [Person].” As part of the transformation of SEO from Strings to Things, The major Search Engines joined forces to offer us information on how to use Schema for structured data on the Web to provide a machine readable way of sharing information with search engines about the entities that we write about, their properties, and relationships.

I’m writing about this patent because I am participating in a Webinar online about Knowledge Graphs and how those are being used, and updated. The Webinar is tomorrow at:
#SEOisAEO: How Google Uses The Knowledge Graph in its AE algorithm. I haven’t been referring to SEO as Answer Engine Optimization, or AEO and it’s unlikely that I will start, but see it as an evolution of SEO

I’m writing about this Google Patent, because it starts out with the following line which it titles “Background:”

This disclosure generally relates to updating information in a database. Data has previously been updated by, for example, user input.

This line points to the fact that this approach no longer needs to be updated by users, but instead involves how Google knowledge graphs update themselves.

Updating Knowledge Graphs

I attended a Semantic Technology and Business conference a couple of year ago, where the head of Yahoo’s knowledge base presented, and he was asked a number of questions in a question and answer session after he spoke. Someone asked him what happens when information from a knowledge graph changes and it needs to be updated?

His Answer was that a knowledge graph would have to be updated manually to have new information place within it.

That wasn’t a satisfactory answer because it would have been good to hear that the information from such a source could be easily updated. I’ve been waiting for Google to answer a question like this, which made seeing a line like this one from this patent a good experience:

In some implementations, a system identifies information that is missing from a collection of data. The system generates a question to provide to a question answering service based on the missing information, and uses the response from the question answering service to update the collection of data.

This would be a knowledge graph update, so that patent provides details using language that reflects that exacly:

In some implementations, a computer-implemented method is provided. The method includes identifying an entity reference in a knowledge graph, wherein the entity reference corresponds to an entity type. The method further includes identifying a missing data element associated with the entity reference. The method further includes generating a query based at least in part on the missing data element and the type of the entity reference. The method further includes providing the query to a query processing engine. The method further includes receiving information from the query processing engine in response to the query. The method further includes updating the knowledge graph based at least in part on the received information.

How does the search engine do this? The patent provides more information that fills in such details.

The approaches to achieve this would be to:

…Identifying a missing data element comprises comparing properties associated with the entity reference to a schema table associated with the entity type.

…Generating the query comprises generating a natural language query. This can involve selecting, from the knowledge graph, disambiguation query terms associated with the entity reference, wherein the terms comprise property values associated with the entity reference, or updating the knowledge graph by updating the data graph to include information in place of the missing data element.

…Identifying an element in a knowledge graph to be updated based at least in part on a query record. Operations further include generating a query based at least in part on the identified element. Operations further include providing the query to a query processing engine. Operations further include receiving information from the query processing engine in response to the query. Operations further include updating the knowledge graph based at least in part on the received information.

A knowledge graph updates itself in these ways:

(1) The knowledge Graph may be updated with one or more previously performed searches.
(2) The knowledge Graph may be updated with a natural language query, using disambiguation query terms associated with the entity reference, wherein the terms comprise property values associated with the entity reference.
(3) The knowledge Graph may use properties associated with the entity reference to include information updating missing data elements.

The patent that describes how Google’s knowledge graph updates themselves is:

Question answering to populate knowledge base
Inventors: Rahul Gupta, Shaohua Sun, John Blitzer, Dekang Lin, Evgeniy Gabrilovich
Assignee: Google
US Patent: 10,108,700
Granted: October 23, 2018
Filed: March 15, 2013

Abstract

Methods and systems are provided for a question answering. In some implementations, a data element to be updated is identified in a knowledge graph and a query is generated based at least in part on the data element. The query is provided to a query processing engine. Information is received from the query processing engine in response to the query. The knowledge graph is updated based at least in part on the received information.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google’s Knowledge Graph Updates Itself by Answering Questions appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


How Google might Identify Primary Versions of Duplicate Pages

October 15, 2018 No Comments

I came across this statement on the Web earlier this week, and wondered about it, and decided to investigate more:

If there are multiple instances of the same document on the web, the highest authority URL becomes the canonical version. The rest are considered duplicates.

~ Link inversion, the least known major ranking factor.

Man in a cave
Luke Leung

I read that article from Dejan SEO, and thought it was worth exploring more. As I was looking around at Google patents that included the word “Authority” in them, I found this patent which doesn’t quite say the same thing that Dejan does, but is interesting in that it finds ways to distinguish between duplicate pages on different domains based upon priority rules, which is interesting in determining which duplicate page might be the highest authority URL for a document.

The patent is:

Identifying a primary version of a document
Inventors: Alexandre A. Verstak and Anurag Acharya
Assignee: Google Inc.
US Patent: 9,779,072
Granted: October 3, 2017
Filed: July 31, 2013

Abstract

A system and method identifies a primary version out of different versions of the same document. The system selects a priority of authority for each document version based on a priority rule and information associated with the document version and selects a primary version based on the priority of authority and information associated with the document version.

Since the claims of a patent are what patent examiners at the USPTO look at when they are prosecuting a patent, and deciding whether or not it should be granted. I thought it would be worth looking at the claims contained within the patent to see if they helped encapsulate what it covered. The first one captures some aspects of it that are worth thinking about while talking about different document versions of particular documents, and how the metadata associated with a document might be looked at to determine which is the primary version of a document:

What is claimed is:

1. A method comprising: identifying, by a computer system, a plurality of different document versions of a particular document; identifying, by the computer system, a first type of metadata that is associated with each document version of the plurality of different document versions, wherein the first type of metadata includes data that describes a source that provides each document version of the plurality of different document versions; identifying, by the computer system, a second type of metadata that is associated with each document version of the plurality of different document versions, wherein the second type of metadata describes a feature of each document version of the plurality of different document versions other than the source of the document version; for each document version of the plurality of different document versions, applying, by the computer system, a priority rule to the first type of metadata and the second type of metadata, to generate a priority value; selecting, by the computer system, a particular document version, of the plurality of different document versions, based on the priority values generated for each document version of the plurality of different document versions; and providing, by the computer system, the particular document version for presentation.

This doesn’t advance the claim that the primary version of a document is considered the canonical version of that document, and all links pointed to that document are redirected to the primary version.

There is another patent that shares an inventor with this one that refers to one of the duplicate content URL being chosen as a representative page, though it doesn’t use the phrase “canonical.” From that patent:

Duplicate documents, sharing the same content, are identified by a web crawler system. Upon receiving a newly crawled document, a set of previously crawled documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query-independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions.

In some embodiments, a method for selecting a representative document from a set of duplicate documents includes: selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score, where each respective document in the plurality of documents has a fingerprint that identifies the content of the respective document, the fingerprint of each respective document in the plurality of documents indicating that each respective document in the plurality of documents has substantially identical content to every other document in the plurality of documents, and a first document in the plurality of documents is associated with the query-independent score. The method further includes indexing, in accordance with the query independent score, the first document thereby producing an indexed first document; and with respect to the plurality of documents, including only the indexed first document in a document index.

This other patent is:

Representative document selection for a set of duplicate documents
Inventors: Daniel Dulitz, Alexandre A. Verstak, Sanjay Ghemawat and Jeffrey A. Dean
Assignee: Google Inc.
US Patent: 8,868,559
Granted: October 21, 2014
Filed: August 30, 2012

Abstract

Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.

Regardless of whether the primary version of a set of duplicate documents is treated as the representative document as suggested in this second patent (whatever that may mean exactly), I think it’s important to get a better understanding of what a primary version of a document might be.

The primary version patent provides some reasons why one of them might be considered a primary version:

(1) Including of different versions of the same document does not provide additional useful information, and it does not benefit users.
(2) Search results that include different versions of the same document may crowd out diverse contents that should be included.
(3) Where there are multiple different versions of a document present in the search results, the user may not know which version is most authoritative, complete, or best to access, and thus may waste time accessing the different versions in order to compare them.

Those are the three reasons this duplicate document patent says it is ideal to identify a primary version from different versions of a document that appears on the Web. The search engine also wants to furnish “the most appropriate and reliable search result.”

How does it work?

The patent tells us that one method of identifying a primary version is as follows.

The different versions of a document are identified from a number of different sources, such as online databases, websites, and library data systems.

For each document version, a priority of authority is selected based on:

(1) The metadata information associated with the document version, such as

  • The source
  • Exclusive right to publish
  • Licensing right
  • Citation information
  • Keywords
  • Page rank
  • The like

(2) As a second step, the document versions are then determined for length qualification using a length measure. The version with a high priority of authority and a qualified length is deemed the primary version of the document.

If none of the document versions has both a high priority and a qualified length, then the primary version is selected based on the totality of information associated with each document version.

The patent tells us that scholarly works tend to work under the process in this patent:

Because works of scholarly literature are subject to rigorous format requirements, documents such as journal articles, conference articles, academic papers and citation records of journal articles, conference articles, and academic papers have metadata information describing the content and source of the document. As a result, works of scholarly literature are good candidates for the identification subsystem.

Meta data that might be looked at during this process could include such things as:

  • Author names
  • Title
  • Publisher
  • Publication date
  • Publication location
  • Keywords
  • Page rank
  • Citation information
  • Article identifiers such as Digital Object Identifier, PubMed Identifier, SICI, ISBN, and the like
  • Network locution (e.g., URL)
  • Reference count
  • Citation count
  • Language
  • So forth

The patent goes into more depth about the methodology behind determining the primary version of a document:

The priority rule generates a numeric value (e.g., a score) to reflect the authoritativeness, completeness, or best to access of a document version. In one example, the priority rule determines the priority of authority assigned to a document version by the source of the document version based on a source-priority list. The source-priority list comprises a list of sources, each source having a corresponding priority of authority. The priority of a source can be based on editorial selection, including consideration of extrinsic factors such as reputation of the source, size of source’s publication corpus, recency or frequency of updates, or any other factors. Each document version is thus associated with a priority of authority; this association can be maintained in a table, tree, or other data structures.

The patent includes a table illustrating the source-priority list.

The patent includes some alternative approaches as well. It tells us that “the priority measure for determining whether a document version has a qualified priority can be based on a qualified priority value.”

A qualified priority value is a threshold to determine whether a document version is authoritative, complete, or easy to access, depending on the priority rule. When the assigned priority of a document version is greater than or equal to the qualified priority value, the document is deemed to be authoritative, complete, or easy to access, depending on the priority rule. Alternatively, the qualified priority can be based on a relative measure, such as given the priorities of a set of document versions, only the highest priority is deemed as qualified priority.

Take aways

I was in a Google Hangout on air within the last couple of years where I and a number of other SEOs (Ammon Johns, Eric Enge, Jennifer Slegg, and I) asked some questions to John Mueller and Andrey Lipattse, and we asked some questions about duplicate content. It seems to be something that still raises questions among SEOs.

The patent goes into more detail regarding determining which duplicate document might be the primary document. We can’t tell whether that primary document might be treated as if it is at the canonical URL for all of the duplicate documents as suggested in the Dejan SEO article that I started with a link to in this post, but it is interesting seeing that Google has a way of deciding which version of a document might be the primary version. I didn’t go into much depth about quantified lengths being used to help identify the primary document, but the patent does spend some time going over that.

Is this a little-known ranking factor? The Google patent on identifying a primary version of duplicate documents does seem to find some importance in identifying what it believes to be the most important version among many duplicate documents. I’m not sure if there is anything here that most site owners can use to help them have their pages rank higher in search results, but it’s good seeing that Google may have explored this topic in more depth.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google might Identify Primary Versions of Duplicate Pages appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries

July 31, 2018 No Comments

Augmentation Queries

In general, the subject matter of this specification relates to identifying or generating augmentation queries, storing the augmentation queries, and identifying stored augmentation queries for use in augmenting user searches. An augmentation query can be a query that performs well in locating desirable documents identified in the search results. The performance of the query can be determined by user interactions. For example, if many users that enter the same query often select one or more of the search results relevant to the query, that query may be designated an augmentation query.

In addition to actual queries submitted by users, augmentation queries can also include synthetic queries that are machine generated. For example, an augmentation query can be identified by mining a corpus of documents and identifying search terms for which popular documents are relevant. These popular documents can, for example, include documents that are often selected when presented as search results. Yet another way of identifying an augmentation query is mining structured data, e.g., business telephone listings, and identifying queries that include terms of the structured data, e.g., business names.

These augmentation queries can be stored in an augmentation query data store. When a user submits a search query to a search engine, the terms of the submitted query can be evaluated and matched to terms of the stored augmentation queries to select one or more similar augmentation queries. The selected augmentation queries, in turn, can be used by the search engine to augment the search operation, thereby obtaining better search results. For example, search results obtained by a similar augmentation query can be presented to the user along with the search results obtained by the user query.

This past March, Google was granted a patent that involves giving quality scores to queries (the quote above is from that patent). The patent refers to high scoring queries as augmentation queries. Interesting to see that searcher selection is one way that might be used to determine the quality of queries. So, when someone searches. Google may compare the SERPs they receive from the original query to augmentation query results based upon previous searches using the same query terms or synthetic queries. This evaluation against augmentation queries is based upon which search results have received more clicks in the past. Google may decide to add results from an augmentation query to the results for the query searched for to improve the overall search results.

How does Google find augmentation queries? One place to look for those is in query logs and click logs. As the patent tells us:

To obtain augmentation queries, the augmentation query subsystem can examine performance data indicative of user interactions to identify queries that perform well in locating desirable search results. For example, augmentation queries can be identified by mining query logs and click logs. Using the query logs, for example, the augmentation query subsystem can identify common user queries. The click logs can be used to identify which user queries perform best, as indicated by the number of clicks associated with each query. The augmentation query subsystem stores the augmentation queries mined from the query logs and/or the click logs in the augmentation query store.

This doesn’t mean that Google is using clicks to directly determine rankings But it is deciding which augmentation queries might be worth using to provide SERPs that people may be satisfied with.

There are other things that Google may look at to decide which augmentation queries to use in a set of search results. The patent points out some other factors that may be helpful:

In some implementations, a synonym score, an edit distance score, and/or a transformation cost score can be applied to each candidate augmentation query. Similarity scores can also be determined based on the similarity of search results of the candidate augmentation queries to the search query. In other implementations, the synonym scores, edit distance scores, and other types of similarity scores can be applied on a term by term basis for terms in search queries that are being compared. These scores can then be used to compute an overall similarity score between two queries. For example, the scores can be averaged; the scores can be added; or the scores can be weighted according to the word structure (nouns weighted more than adjectives, for example) and averaged. The candidate augmentation queries can then be ranked based upon relative similarity scores.

I’ve seen white papers from Google before mentioning synthetic queries, which are queries performed by the search engine instead of human searchers. It makes sense for Google to be exploring query spaces in a manner like this, to see what results are like, and using information such as structured data as a source of those synthetic queries. I’ve written about synthetic queries before at least a couple of times, and in the post Does Google Search Google? How Google May Create and Use Synthetic Queries.

Implicit Signals of Query Quality

It is an interesting patent in that it talks about things such as long clicks and short clicks, and ranking web pages on the basis of such things. The patent refers to such things as “implicit Signals of query quality.” More about that in the patent here:

In some implementations, implicit signals of query quality are used to determine if a query can be used as an augmentation query. An implicit signal is a signal based on user actions in response to the query. Example implicit signals can include click-through rates (CTR) related to different user queries, long click metrics, and/or click-through reversions, as recorded within the click logs. A click-through for a query can occur, for example, when a user of a user device, selects or “clicks” on a search result returned by a search engine. The CTR is obtained by dividing the number of users that clicked on a search result by the number of times the query was submitted. For example, if a query is input 100 times, and 80 persons click on a search result, then the CTR for that query is 80%.

A long click occurs when a user, after clicking on a search result, dwells on the landing page (i.e., the document to which the search result links) of the search result or clicks on additional links that are present on the landing page. A long click can be interpreted as a signal that the query identified information that the user deemed to be interesting, as the user either spent a certain amount of time on the landing page or found additional items of interest on the landing page.

A click-through reversion (also known as a “short click”) occurs when a user, after clicking on a search result and being provided the referenced document, quickly returns to the search results page from the referenced document. A click-through reversion can be interpreted as a signal that the query did not identify information that the user deemed to be interesting, as the user quickly returned to the search results page.

These example implicit signals can be aggregated for each query, such as by collecting statistics for multiple instances of use of the query in search operations, and can further be used to compute an overall performance score. For example, a query having a high CTR, many long clicks, and few click-through reversions would likely have a high-performance score; conversely, a query having a low CTR, few long clicks, and many click-through reversions would likely have a low-performance score.

The reasons for the process behind the patent are explained in the description section of the patent where we are told:

Often users provide queries that cause a search engine to return results that are not of interest to the users or do not fully satisfy the users’ need for information. Search engines may provide such results for a number of reasons, such as the query including terms having term weights that do not reflect the users’ interest (e.g., in the case when a word in a query that is deemed most important by the users is attributed less weight by the search engine than other words in the query); the queries being a poor expression of the information needed; or the queries including misspelled words or unconventional terminology.

A quality signal for a query term can be defined in this way:

the quality signal being indicative of the performance of the first query in identifying information of interest to users for one or more instances of a first search operation in a search engine; determining whether the quality signal indicates that the first query exceeds a performance threshold; and storing the first query in an augmentation query data store if the quality signal indicates that the first query exceeds the performance threshold.

The patent can be found at:

Query augmentation
Inventors: Anand Shukla, Mark Pearson, Krishna Bharat and Stefan Buettcher
Assignee: Google LLC
US Patent: 9,916,366
Granted: March 13, 2018
Filed: July 28, 2015

Abstract

Methods, systems, and apparatus, including computer program products, for generating or using augmentation queries. In one aspect, a first query stored in a query log is identified and a quality signal related to the performance of the first query is compared to a performance threshold. The first query is stored in an augmentation query data store if the quality signal indicates that the first query exceeds a performance threshold.

References Cited about Augmentation Queries

These were a number of references cited by the applicants of the patent, which looked interesting, so I looked them up to see if I could find them to read them and share them here.

  1. Boyan, J. et al., A Machine Learning Architecture for Optimizing Web Search Engines,” School of Computer Science, Carnegie Mellon University, May 10, 1996, pp. 1-8. cited by applicant.
  2. Brin, S. et al., “The Anatomy of a Large-Scale Hypertextual Web Search Engine“, Computer Science Department, 1998. cited by applicant.
  3. Sahami, M. et al., T. D. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23-26, 2006). WWW ’06. ACM Press, New York, NY, pp. 377-386. cited by applicant.
  4. Ricardo A. Baeza-Yates et al., The Intention Behind Web Queries. SPIRE, 2006, pp. 98-109, 2006. cited by applicant.
  5. Smith et al. Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics” vol. 23, Oct. 7, 2007, 7 pages. cited by applicant.
  6. Robertson, S.E. Documentation Note on Term Selection for Query Expansion J. of Documentation, 46(4): Dec. 1990, pp. 359-364. cited by applicant.
  7. Talel Abdessalem, Bogdan Cautis, and Nora Derouiche. 2010. ObjectRunner: lightweight, targeted extraction and querying of structured web data. Proc. VLDB Endow. 3, 1-2 (Sep. 2010). cited by applicant .
  8. Jane Yung-jen Hsu and Wen-tau Yih. 1997. Template-based information mining from HTML documents. In Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative application of artificial intelligence (AAAI’97/IAAI’97). AAAI Press, pp. 256-262. cited by applicant .
  9. Ganesh, Agarwal, Govind Kabra, and Kevin Chen-Chuan Chang. 2010. Towards rich query interpretation: walking back and forth for mining query templates. In Proceedings of the 19th international conference on World wide web (WWW ’10). ACM, New York, NY USA, 1-10. DOI=10. 1145/1772690. 1772692 http://doi.acm.org/10.1145/1772690.1772692. cited by applicant.

This is a Second Look at Augmentation Queries

This is a continuation patent, which means that it was granted before, with the same description, and it now has new claims. When that happens, it can be worth looking at the old claims and the new claims to see how they have changed. I like that the new version seems to focus more strongly upon structured data. It tells us that it might use structured data in sites that appear for queries as synthetic queries, and if those meet the performance threshold, they may be added to the search results that appear for the original queries. The claims do seem to focus a little more on structured data as synthetic queries, but it doesn’t really change the claims that much. They haven’t changed enough to publish them side by side and compare them.

What Google Has Said about Structured Data and Rankings

Google spokespeople had been telling us that Structured Data doesn’t impact rankings directly, but what they have been saying does seem to have changed somewhat recently. In the Search Engine Roundtable post, Google: Structured Data Doesn’t Give You A Ranking Boost But Can Help Rankings we are told that just having structured data on a site doesn’t automatically boost the rankings of a page, but if the structured data for a page is used as a synthetic query, and it meets the performance threshold as an augmentation query, it might be shown in rankings, thus helping in rankings (as this patent tells us.)

Note that this isn’t new, and the continuation patent’s claims don’t appear to have changed that much so that structured data is still being used as synthetic queries, and is checked to see if they work as augmented queries. This does seem to be a really good reason to make sure you are using the appropriate structured data for your pages.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Learning to Rank

July 17, 2018 No Comments

My last Post was Five Years of Google Ranking Signals, and I start that post by saying that there are other posts about ranking signals that have some issues. But, I don’t want to turn people away from looking at one recent post that did contain a lot of useful information.

Cyrus Shepard recently published a post about Google Sucess Factors on Zyppy.com which I would recommend that you also check out.

Cyrus did a video with Ross Hudgins on Seige Media where he talked about those Ranking signals with Cyrus, called Google Ranking Factors with Cyrus Shepard. I’m keeping this post short on purpose, to make the discussion about ranking the focus of this post, and the star. There is some really good information in the Video and in the post from Cyrus. Cyrus takes a different approach on writing about ranking signals from what I wrote, but it’s worth the time visiting and listening and watching.

And have fun learning to rank.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Learning to Rank appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓