CBPO

Tag: Search

How Google Might Rank Image Search Results

July 9, 2020 No Comments

Changes to How Google Might Rank Image Search Results

We are seeing more references to machine learning in how Google is ranking pages and other documents in search results.

That seems to be a direction that will leave what we know as traditional, or old school signals that are referred to as ranking signals behind.

It’s still worth considering some of those older ranking signals because they may play a role in how things are ranked.

As I was going through a new patent application from Google on ranking image search results, I decided that it was worth including what I used to look at when trying to rank images.

Images can rank highly in image search, and they can also help pages that they appear upon rank higher in organic web results, because they can help make a page more relevant for the query terms that page may be optimized for.

Here are signals that I would include when I rank image search results:

  • Use meaningful images that reflect what the page those images appear on is about – make them relevant to that query
  • Use a file name for your image that is relevant to what the image is about (I like to separate words in file names for images with hyphens, too)
  • Use alt text for your alt attribute that describes the image well, and uses text that is relevant to the query terms that the page is optimized for) and avoid keyword stuffing
  • Use a caption that is helpful to viewers and relevant to what the page it is about, and the query term that the page is optimized for
  • Use a title and associated text on the page the image appears upon that is relevant for what the page is about, and what the image shows
  • Use a decent sized image at a decent resolution that isn’t mistaken for a thumbnail

Those are signals that I would consider when I rank image search results and include images on a page to help that page rank as well.

A patent application that was published this week tells us about how machine learning might be used in ranking image search results. It doesn’t itemize features that might help an image in those rankings, such as alt text, captions, or file names, but it does refer to “features” that likely include those as well as other signals. It makes sense to start looking at these patents that cover machine learning approaches to ranking because they may end up becoming more common.

Machine Learning Models to Rank Image Search Results

Giving Google a chance to try out different approaches, we are told that the machine learning model can use many different types of machine learning models.

The machine learning model can be a:

  • Deep machine learning model (e.g., a neural network that includes multiple layers of non-linear operations.)
  • Different type of machine learning model (e.g., a generalized linear model, a random forest, a decision tree model, and so on.)

We are told more about this machine learning model. It is “used to accurately generate relevance scores for image-landing page pairs in the index database.”

We are told about an image search system, which includes a training engine.

The training engine trains the machine learning model on training data generated using image-landing page pairs that are already associated with ground truth or known values of the relevance score.

The patent shows an example of the machine learning model generating a relevance score for a particular image search result from an image, landing page, and query features. In this image, a searcher submits an image search query. The system generates image query features based on the user-submitted image search query.

Rank Image Search Results includes Image Query Features

That system also learns about landing page features for the landing page that has been identified by the particular image search result as well as image features for the image identified by that image search result.

The image search system would then provide the query features, the landing page features, and the image features as input to the machine learning model.

Google may rank image search results based on various factors

Those may be separate signals from:

  1. Features of the image
  2. Features of the landing page
  3. A combining the separate signals following a fixed weighting scheme that is the same for each received search query

This patent describes how it would rank image search results in this manner:

  1. Obtaining many candidate image search results for the image search query
  2. Each candidate image search result identifies a respective image and a respective landing page for the respective image
  3. For each of the candidate image search results processing
    • Features of the image search query
    • Features of the respective image identified by the candidate image search result
  4. Features of the respective landing page identified by the candidate image search result using an image search result ranking machine learning model that has been trained to generate a relevance score that measures a relevance of the candidate image search result to the image search query
  5. Ranking the candidate image search results based on the relevance scores generated by the image search result ranking machine learning model
  6. – Generating an image search results presentation that displays the candidate image search results ordered according to the ranking
    – Providing the image search results for presentation by a user device

Advantages to Using a Machine Learning Model to Rank Image Search Results

If Google can rank image search query pairs based on relevance scores using a machine learning model, it can improve the relevance of the image search results in response to the image search query.

This differs from conventional methods to rank resources because the machine learning model receives a single input that includes features of the image search query, landing page, and the image identified by a given image search result to predicts the relevance of the image search result to the received query.

This process allows the machine learning model to be more dynamic and give more weight to landing page features or image features in a query-specific manner, improving the quality of the image search results that are returned to the user.

By using a machine learning model, the image search engine does not apply the same fixed weighting scheme for landing page features and image features for each received query. Instead, it combines the landing page and image features in a query-dependent manner.

The patent also tells us that a trained machine learning model can easily and optimally adjust weights assigned to various features based on changes to the initial signal distribution or additional features.

In a conventional image search, we are told that significant engineering effort is required to adjust the weights of a traditional manually tuned model based on changes to the initial signal distribution.

But under this patented process, adjusting the weights of a trained machine learning model based on changes to the signal distribution is significantly easier, thus improving the ease of maintenance of the image search engine.

Also, if a new feature is added, the manually tuned functions adjust the function on the new feature independently on an objective (i.e., loss function, while holding existing feature functions constant.)

But, a trained machine learning model can automatically adjust feature weights if a new feature is added.

Instead, the machine learning model can include the new feature and rebalance all its existing weights appropriately to optimize for the final objective.

Thus, the accuracy, efficiency, and maintenance of the image search engine can be improved.

The Rank Image Search results patent application can be found at

Ranking Image Search Results Using Machine Learning Models
US Patent Application Number 16263398
File Date: 31.01.2019
Publication Number US20200201915
Publication Date June 25, 2020
Applicants Google LLC
Inventors Manas Ashok Pathak, Sundeep Tirumalareddy, Wenyuan Yin, Suddha Kalyan Basu, Shubhang Verma, Sushrut Karanjkar, and Thomas Richard Strohmann

Abstract

Methods, systems, and apparatus including computer programs encoded on a computer storage medium, for ranking image search results using machine learning models. In one aspect, a method includes receiving an image search query from a user device; obtaining a plurality of candidate image search results; for each of the candidate image search results: processing (i) features of the image search query and (ii) features of the respective image identified by the candidate image search result using an image search result ranking machine learning model to generate a relevance score that measures a relevance of the candidate image search result to the image search query; ranking the candidate image search results based on the relevance scores; generating an image search results presentation; and providing the image search results for presentation by a user device.

The Indexing Engine

The search engine may include an indexing engine and a ranking engine.

The indexing engine indexes image-landing page pairs, and adds the indexed image-landing page pairs to an index database.

That is, the index database includes data identifying images and, for each image, a corresponding landing page.

The index database also associates the image-landing page pairs with:

  • Features of the image search query
  • Features of the images, i.e., features that characterize the images
  • Features of the landing pages, i.e., features that characterize the landing page

Optionally, the index database also associates the indexed image-landing page pairs in the collections of image-landing pairs with values of image search engine ranking signals for the indexed image-landing page pairs.

Each image search engine ranking signal is used by the ranking engine in ranking the image-landing page pair in response to a received search query.

The ranking engine generates respective ranking scores for image-landing page pairs indexed in the index database based on the values of image search engine ranking signals for the image-landing page pair, e.g., signals accessed from the index database or computed at query time, and ranks the image-landing page pair based on the respective ranking scores. The ranking score for a given image-landing page pair reflects the relevance of the image-landing page pair to the received search query, the quality of the given image-landing page pair, or both.

The image search engine can use a machine learning model to rank image-landing page pairs in response to received search queries.

The machine learning model is a machine learning model that is configured to receive an input that includes

(i) features of the image search query
(ii) features of an image and
(iii) features of the landing page of the image and generate a relevance score that measures the relevance of the candidate image search result to the image search query.

Once the machine learning model generates the relevance score for the image-landing page pair, the ranking engine can then use the relevance score to generate ranking scores for the image-landing page pair in response to the received search query.

The Ranking Engine behind the Process to Rank Image Search Results

In some implementations, the ranking engine generates an initial ranking score for each of multiple image—landing page pairs using the signals in the index database.

The ranking engine can then select a certain number of the highest-scoring image—landing pair pairs for processing by the machine learning model.

The ranking engine can then rank candidate image—landing page pairs based on relevance scores from the machine learning model or use those relevance scores as additional signals to adjust the initial ranking scores for the candidate image—landing page pairs.

The machine learning model would receive a single input that includes features of the image search query, the landing page, and the image to predict the relevance (i.e., relevance score, of the particular image search result to the user image query.)

We are told that this allows the machine learning model to give more weight to landing page features, image features, or image search query features in a query-specific manner, which can improve the quality of the image search results returned to the user.

Features That May Be Used from Images and Landing Pages to Rank Image Search Results

The first step is to receive the image search query.

Once that happens, the image search system may identify initial image-landing page pairs that satisfy the image search query.

It would do that from pairs that are indexed in a search engine index database from signals measuring the quality of the pairs, and the relevance of the pairs to the search query, or both.

For those pairs, the search system identifies:

  • Features of the image search query
  • Features of the image
  • Features of the landing page

Features Extracted From the Image

These features can include vectors that represent the content of the image.

Vectors to represent the image may be derived by processing the image through an embedding neural network.

Or those vectors may be generated through other image processing techniques for feature extraction. Examples of feature extraction techniques can include edge, corner, ridge, and blob detection. Feature vectors can include vectors generated using shape extraction techniques (e.g., thresholding, template matching, and so on.) Instead of or in addition to the feature vectors, when the machine learning model is a neural network the features can include the pixel data of the image.

Features Extracted From the Landing Page

These aren’t the kinds of features that I usually think about when optimizing images historically. These features can include:

  • The date the page was first crawled or updated
  • Data characterizing the author of the landing page
  • The language of the landing page
  • Features of the domain that the landing page belong to
  • Keywords representing the content of the landing page
  • Features of the links to the image and landing page such as the anchor text or source page for the links
  • Features that describe the context of the image in the landing page
  • So on

Features Extracted From The Landing Page That Describes The Context of the Image in the Landing Page

The patent interestingly separated these features out:

  • Data characterizing the location of the image within the landing page
  • Prominence of the image on the landing page
  • Textual descriptions of the image on the landing page
  • Etc.

More Details on the Context of the Image on the Landing Page

The patent points out some alternative ways that the location of the image within the Landing Page might be found:

  • Using pixel-based geometric location in horizontal and vertical dimensions
  • User-device based length (e.g., in inches) in horizontal and vertical dimensions
  • An HTML/XML DOM-based XPATH-like identifier
  • A CSS-based selector
  • Etc.

The prominence of the image on the landing page can be measured using the relative size of the image as displayed on a generic device and a specific user device.

The textual descriptions of the image on the landing page can include alt-text labels for the image, text surrounding the image, and so on.

Features Extracted from the Image Search Query

The features from the image search query can include::

  • Language of the search query
  • Some or all of the terms in the search query
  • Time that the search query was submitted
  • Location from which the search query was submitted
  • Data characterizing the user device from which the query was received
  • So on

How the Features from the Query, the Image, and the Landing Page Work Together

  • The features may be represented categorically or discretely
  • Additional relevant features can be created through pre-existing features (Relationships may be created between one or more features through a combination of addition, multiplication, or other mathematical operations.)
  • For each image-landing page pair, the system processes the features using an image search result ranking machine learning model to generate a relevance score output
  • The relevance score measures a relevance of the candidate image search result to the image search query (i.e., the relevance score of the candidate image search result measures a likelihood of a user submitting the search query would click on or otherwise interact with the search result. A higher relevance score indicates the user submitting the search query would find the candidate image search more relevant and click on it)
  • The relevance score of the candidate image search result can be a prediction of a score generated by a human rater to measure the quality of the result for the image search query

Adjusting Initial Ranking Scores

The system may adjust initial ranking scores for the image search results based on the relevance scores to:

  • Promote search results having higher relevance scores
  • Demote search results having lower relevance scores
  • Or both

Training a Ranking Machine Learning Model to Rank Image Search Results

The system receives a set of training image search queries
For each training image search query, training image search results for the query that are each associated with a ground truth relevance score.

A ground truth relevance score is the relevance score that should be generated for the image search result by the machine learning model (i.e., when the relevance scores measure a likelihood that a user would select a search result in response to a given search query, each ground truth relevance score can identify whether a user submitting the given search query selected the image search result or a proportion of times that users submitting the given search query select the image search result.)

The patent provides another example of how ground-truth relevance scores might be generated:

When the relevance scores generated by the model are a prediction of a score assigned to an image search result by a human, the ground truth relevance scores are actual scores assigned to the search results by human raters.

For each of the training image search queries, the system may generate features for each associated image-landing page pair.

For each of those pairs, the system may identify:

(i) features of the image search query
(ii) features of the image and
(iii) features of the landing page.

We are told that extracting, generating, and selecting features may take place before training or using the machine learning model. Examples of features are the ones I listed above related to the images, landing pages, and queries.

The ranking engine trains the machine learning model by processing for each image search query

  • Features of the image search query
  • Features of the respective image identified by the candidate image search result
  • Features of the respective landing page identified by the candidate image search result and the respective ground truth relevance that measures a relevance of the candidate image search result to the image search query

The patent provides some specific implementation processes that might differ based upon the machine learning system used.

Take Aways to Rank Image Search Results

I’ve provided some information about what kinds of features Google May have used in the past in ranking Image search results.

Under a machine learning approach, Google may be paying more attention to features from an image query, features from Images, and features from the landing page those images are found upon. The patent lists many of those features, and if you spend time comparing the older features with the ones under the machine learning model approach, you can see there is overlap, but the machine learning approach covers considerably more options.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google Might Rank Image Search Results appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


A Well-Formed Query Helps Search Engines Understand User Intent in the Query

July 1, 2020 No Comments

A Well-Formed Query Helps a Search Engine understand User Intent Behind the Query

To start this post, I wanted to include a couple of whitepapers that include authors from Google. The authors of the first paper are the inventors of a patent application that was just published on April 28, 2020, and it is very good seeing a white paper from the inventors of a recent patent published by Google. Both papers are worth reading to get a sense of how Google is trying to rewrite queries into “Well-Formed Natural Language Questions.

August 28, 2018 – Identifying Well-formed Natural Language Questions

The abstract for that paper:

Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline can perform more accurate interpretation, thus reducing downstream compounding errors.

Hence, identifying whether or not a query is well-formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set.

We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.

The paper provides examples of well-formed queries and ill-formed queries:

Examples of Well forned and non wll formed queries

November 21, 2019 – How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

The abstract for that paper:

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories.

The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects.

We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2%in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.

examples of ill-formed and well-formed questions

The patent application I am writing about was filed on January 18, 2019, which puts it around halfway between those two whitepapers, and both of them are recommended to get a good sense of the topic if you are interested in featured snippets, people also ask questions, and queries that Google tries to respond to. The Second Whitepaper refers to the first one, and tells us how it is trying to improve upon it:

Faruqui and Das (2018) introduced the task of identifying well-formed natural language questions. In this paper, we take a step further to investigate methods to rewrite ill-formed questions into well-formed ones without changing their semantics. We create a multi-domain question rewriting dataset (MQR) from human contributed StackExchange question edit histories.

Rewriting Ill-Formed Search Queries into Well-Formed Queries

Interestingly, the patent is also about rewriting search Queries.

It starts by telling us that “Rules-based rewrites of search queries have been utilized in query processing components of search systems.”

Sometimes this happens by removing certain stop-words from queries, such as “the”, “a”, etc.

After Rewriting a Query

Once a query is rewritten, it may be “submitted to the search system and search results returned that are responsive to the rewritten query.”

The patent also tells us about “people also search for X” queries (first patent I have seen them mentioned in.)

We are told that these similar queries are used to recommend additional queries that are related to a submitted query (e.g., “people also search for X”).

These “similar queries to a given query are often determined by navigational clustering.”

As an example, we are told that for the query “funny cat pictures”, a similar query of “funny cat pictures with captions” may be determined because that similar query is frequently submitted by searchers following submission of the query “funny cat pictures”.

Determining if a Query is a Well Formed Query

The patent tells us about a process that can be used to determine if a natural language search query is well-formed and if it is not, to use a trained canonicalization model to create a well-formed variant of that natural language search query.

First, we are given a definition of “Well-formedness” We are told that it is “an indication of how well a word, a phrase, and/or another additional linguistic element (s) conform to the grammar rules of a particular language.”

These are three steps to tell whether something is a well-formed query. It is:

  • Grammatically correct
  • Does not contain spelling errors
  • Asks an explicit question

The first paper from the authors of this patent tells us the following about queries:

The lack of regularity in the structure of queries makes it difficult to train models that can optimally process the query to extract information that can help understand the user intent behind the query.

That translates to the most important takeaway for this post:

A Well-Formed Query is structured in a way that allows a search engine to understand the user intent behind the query

The patent gives us an example:

“What are directions to Hypothetical Café?” is an example of a well-formed version of the natural language query “Hypothetical Café directions”.

How the Classification Model Works

It also tells us that the purpose behind the process in the patent is to determine whether a query is well-formed using a trained classification model and/or a well-formed variant of a query and if that well-formed version can be generated using a trained canonicalization model.

It can create that model by using features of the search query as input to the classification model and deciding whether the search query is well-formed.

Those features of the search query can include, for example:

  • Character(s)
  • Word(s)
  • Part(s) of speech
  • Entities included in the search query
  • And/or other linguistic representation(s) of the search query (such as word n-grams, character bag of words, etc.)

And the patent tells us more about the nature of the classification model:

The classification model is a machine learning model, such as a neural network model that contains one or more layers such as one or more feed-forward layers, softmax layer(s), and/or additional neural network layers. For example, the classification model can include several feed-forward layers utilized to generate feed-forward output. The resulting feed-forward output can be applied to softmax layer(s) to generate a measure (e.g., a probability) that indicates whether the search query is well-formed.

A Canonicalization Model May Be Used

If the Classification model determines that the search query is not well-formed, the query is turned over to a trained canonicalization model to generate a well-formed version of the search query.

The search query may have some of its features extracted from the search query, and/or additional input processed using the canonicalization model to generate a well-formed version that correlates with the search query.

The canonicalization model may be a neural network model. The patent provides more details on the nature of the neural network used.

The neural network can indicate a well-formed query version of the original query.

We are also told that in addition to identifying a well-formed query, it may also determine “one or more related queries for a given search query.”

A related query can be determined based on the related query being frequently submitted by users following the submission of the given search query.

The query canonicalization system can also determine if the related query is well-formed. If it isn’t, then it can determine a well-formed variant of the related query.

For example, in response to the submission of the given search query, a selectable version of the well-formed variant can be presented along with search results for the given query and, if selected, the well-formed variant (or the related query itself in some implementations) can be submitted as a search query and results for the well-formed variant (or the related query) then presented.

Again, the idea of “intent” surfaces in the patent regarding related queries (people also search for queries)

The value of showing a well-formed variant of a related query, instead of the related query itself, is to let a searcher more easily and/or more quickly understand the intent of the related query.

The patent tells us that this has a lot of value by stating:

Such efficient understanding enables the user to quickly submit the well-formed variant to quickly discover additional information (i.e., result(s) for the related query or well-formed variant) in performing a task and/or enables the user to only submit such query when the intent indicates likely relevant additional information in performing the task.

We are given an example of a related well-formed query in the patent:

As one example, the system can determine the phrase “hypothetical router configuration” is related to the query “reset hypothetical router” based on historical data indicating the two queries are submitted proximate (in time and/or order) to one another by a large number of users of a search system.

In some such implementations, the query canonicalization system can determine the related query “reset hypothetical router” is not a well-formed query, and can determine a well-formed variant of the related query, such as: “how to reset hypothetical router”.

The well-formed variant “how to reset hypothetical router” can then be associated, in a database, as a related query for “hypothetical router configuration”—and can optionally supplant any related query association between “reset hypothetical router” and “hypothetical router configuration”.

The patent tells us that sometimes a well-formed related query might be presented as a link to search results.

Again, one of the features of a well-formed query is that it is grammatical, is an explicit question, and contains no spelling errors.

The patent application can be found at:

Canonicalizing Search Queries to Natural language Questions
Inventors Manaal Faruqui and Dipanjan Das
Applicants Google LLC
Publication Number 20200167379
Filed: January 18, 2019
Publication Date May 28, 2020

Abstract

Techniques are described herein for training and/or utilizing a query canonicalization system. In various implementations, a query canonicalization system can include a classification model and a canonicalization model. A classification model can be used to determine if a search query is well-formed. Additionally, a canonicalization model can be used to determine a well-formed variant of a search query in response to determining a search query is not well-formed. In various implementations, a canonicalization model portion of a query canonicalization system can be a sequence to sequence model.

Well-Formed Query Takeaways

I have summarized the summary of the patent, and if you want to learn more details, click through and read the detailed description. The two white papers I started the post off with describing databases of well-formed questions that people as Google (including the inventors of this patent) have built and show the effort that Google has put into the idea of rewriting queries so that they are well-formed queries, where the intent behind them can be better understood by the search engine.

As we have seen from this patent, the analysis that is undertaken to find canonical queries also is used to surface “people also search for” queries, which may also be canonicalized and displayed in search results.

A well-formed query is grammatically correct, contains no spelling mistakes, and asks an explicit question. It also makes it clear to the search engine what the intent behind the query may be.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post A Well-Formed Query Helps Search Engines Understand User Intent in the Query appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Google advanced search: Six powerful tips for better SEO

June 30, 2020 No Comments

30-second summary:

  • Google advanced search helps you get granular with your searches and deliver hyper-focused searches with the help of search operators (or a combination of them).
  • For example, you can search for articles published in the last week by your competitors or discover internal linking opportunities you might’ve missed.
  • In this how-to guide, Venngage’s Aditya Sheth outlines six Google advanced search hacks you need to know to master Google search and become a better SEO.

I have to come clean on something: I’m lazy.

While being lazy may not be a virtue, it does come with an unseen advantage: It allows you to look for creative ways to get things done without necessarily spending more time.

And as an SEO, I’m always looking for ways to get more done without working longer hours. Essentially: aiming to accomplish more with less.

One way to do more with less is to look for tools, tactics or even hacks that help you cut down time wasted and get more done, faster. 

One of my favorite hacks ever? Google advanced search.

But what is it? In simple terms, the Google advanced search helps you fine-tune your searches to find exactly what you’re looking for. 

This is an especially useful skill if you want to quickly pull up small-bits of information without always having to rely on tools like Ahrefs, Moz, or SEMRush to do it for you.

In this how-to SEO guide, you’ll use advanced search operators to:

Before we dive into the meat of this guide, first things first:

A mini-crash course on advanced search operators

To keep things simple, we’re going to cover four operators I, as an SEO, use most often.

The first operator is the site search operator. What this allows you to do is retrieve results from a single website. All you have to do is type site:[any website] into Google.

For example, If I enter site:semrush.com, I will only see results pertaining to SEMrush:

You don’t need the http://, https://, or www prefixes when using the site operator.

That’s not all, you can even use a keyword in addition to the site operator to find if that site has written any content around that keyword.

Let’s say I want to find whether we’ve covered the keyword “infographic” on the site. I’ll enter “site:semrush.com infographic” and this is what comes up:

I personally use the site operator very frequently as it limits my search results to a single domain. Keep this operator in mind as we’re going to be relying on it later.

The next operator you’ll find useful is the quotes or exact-match (“”) operator. What the exact-match operator does is limit your searches to exact-match phrases only.

For example, here is a normal Google search (notice the number of results):

And now the same phrase wrapped in quotation marks: 

 

Notice something different? 

Compared to a normal Google search, exact-match queries will only show you results where your keyphrase has been mentioned exactly as it is (and not a variation). 

This operator is especially powerful to identify if your site has any duplicate content that could be sabotaging your rankings (more on this later).

Last but not the least, we’re going to learn the dash (-) and plus (+) operators to perform laser-targeted searches. 

What the dash (-) operator does is excludes certain keywords from appearing in the search results. So if I wanted to read about the topic of search engines but not search engine optimization, I’d use the following query: 

 

By using the “- optimization” in my search, I’ll only see results about search engines and not search engine optimization.

The plus (+) operator, you guessed it — does the exact opposite. You can use the plus operator to add words to your original search and show you a different set of results. 

For example, here’s a query I entered in Google search:

What did I do here? I used the site:, dash and plus operators in conjunction to show me articles that closely relate to search engine marketing but not SEO on the Search Engine Watch blog.

Venngage

There are many search operators out there (too many to list in fact). You can find a much more comprehensive list of search operators on the Moz blog.

But for simplicity’s sake, we’re going to stick to the site, exact match, dash, and plus operators in this guide.

Six Google advanced search tips for better SEO

Using the Google advanced search operators above, you can access exactly what you’re looking for and spend less time searching for it.

Advanced search can come really handy especially when you’re just starting out and don’t have the budget for expensive SEO tools.

Imagine all the endless possibilities that lie in wait for you as an SEO; if only you got better at googling. Well, it’s easier than you think. I’ll show it to you:

1. Conduct basic but insightful competitor research

Conducting competitor research on Google is really easy. All you have to do is use the “related:” search operator followed by a website URL. 

“Related:” allows you to find sites that are closely related to a specific URL. You can use related to identify not only direct competitors but also indirect peripheral competitors that you might’ve missed in your competitor research.

Not only that, the related: operator also helps you understand how Google is categorizing your competitors and your website.

Let’s look at what Google returns if we search for competitors related to Venngage

I already know the first three results are our direct competitors, but the last two are surprising because they seem to be indirectly competing with us (and I wasn’t even aware of them).

We’re an online infographic maker tool while both Column Five Media and InfoNewt appear to be done-for-you agencies. Google has identified and categorized them as sites related to Venngage which is an insightful find.

Don’t dismiss this advanced search hack because of its simplicity. Try it for yourself and see what Google comes up with. You might just come away with a better understanding of the competition as it pertains to SEO.

2. Stalk your competitor’s content strategy

Sticking to the topic of competitor research, here’s a cool way you can spy on your competitor’s content strategy: combining the site operator and Google’s date-range filter.

Let’s try this on one of our direct competitors: Piktochart.

To limit my search to only blog-related results, I’ll use Piktochart’s/blog subdomain instead of their website. And by the looks of it, they have 790 pages on their blog. 

I can use the date-range filter (click on tools and filter by date) to further drill down these results to identify what content they published in the last month only. Here’s what comes up: 

This not only tells me Pitkchart published four new articles last month but also gives me insight into Piktocharts’ content strategy and the keywords they’re targeting.

You can find even more data by filtering the results by days, months, or custom time periods. 

I can even include exact-match (“your keyword” in quotes) keywords to find out how much content Piktochart has published on any given topic, which is a clever way to uncover their topic cluster strategy. 

Let’s take content marketing as a topic for example

Using the site operator in conjunction with the date filters on Google search gives you information on: 

  • How much content your competition has published till date
  • How often they publish new content in a given time period
  • What kind of content they publish at a certain point in time
  • How often your competitor has written about a given topic

Pretty cool right? 

3. Unearth a gold mine of guest posting opportunities 

If your goal is to drive quality traffic back to your website, pick up high-quality backlinks, boost your website’s domain authority and even rank higher on Google — guest blogging will help you do all of the above.

Anybody that tells you guest blogging is dead is either lying or in on it. Guest blogging still works, even in 2020.

Now that we’ve briefly covered how important guest blogging really is, how do you uncover guest blogging opportunities in your niche or industry?

Here are a few advanced search queries you can copy and paste into Google

  • Your Keyword “guest post opportunities”
  • Your Keyword “guest post”
  • Your Keyword “submit guest post”
  • Your Keyword “submit blog post”
  • Your Keyword intitle:“write for us”
  • Your Keyword intitle:“guest post guidelines”

If I’m looking to guest post for sites in the design space, for example, I’d use the following query:

Sites bookmarked. Guest post pitches sent. Fingers crossed. 

Try out these search queries for yourself and you’ll be able to build a respectable list of sites to contribute for.

Brian Dean has the most exhaustive guide on guest blogging I’ve read (it includes a huge list of search operators that will help you find even more guest posting opportunities).

4. Discover hidden opportunities for internal linking

Internal linking plays a small but important role in the ranking factors that determine how well you rank on Google.

Irrespective of how well-designed and easy-to-navigate your site may be, a great internal linking structure can make all the difference when it comes to driving traffic from one post to another across your entire blog.

Internal linking also creates topical relevance by creating supporting content for the main topics of your website.

A few weeks ago, I published a mammoth webinar guide on the Venngage blog. I wanted it to start driving traffic to the post and rank for high-volume keywords immediately.

I got to work by finding out where I could link to our guide internally from as many relevant posts on our blog as possible. All I did was use the site operator and the keyword “webinar”: 

Boom! Barring the first result, I found 47 internal linking opportunities with a simple search. And all it took was a few seconds.

You can even use this search query: site:www.yourwebsite.com/blog intext:”your keyword” to pretty much do the same thing.

This advanced search hack won’t be as useful if you’ve recently started blogging, but it will come in handy if you manage a huge blog that already has a lot of existing content.

5. Find duplicate content on your website

Duplicate content is content that appears on more than one location on your website and can confuse search engines when it comes to deciding which page to rank higher. 

In short: Duplicate content can hurt your website rankings and it’s a technical SEO issue you cannot afford to ignore.

To show you an example of duplicate content, I’ll use this small piece of copy from the Apple Airpods product description on Walmart

Google advanced search tips: Duplicate Content

Using the site operator, I’ll paste the copy into Google using the exact-match operator. Here’s what I come up with: 

The same piece of copy shows up on six other pages on Walmart. Things could be a lot worse but still, not ideal.

But if I were to search for the same piece of copy across the web (not just Walmart) using the dash operator, this is what comes up:

The same piece of copy appears on ~19,000 other websites (excluding Walmart). That’s a lot of duplicate content. 

Duplicate content is especially a major issue for website blogs with 1,000s of pages or ecommerce sites with the same product descriptions. 

6. Find missed content opportunities

One of the last search operators I’ll cover is the “filetype” operator. 

Filetype can help you find non-HTML content on your site, such as Word Documents or PDF files. This content is often valuable, but not search optimized. And traffic to it doesn’t show up in your Analytics.

To use this search operator, simple type in “site:yourwebsite.com filetype:pdf” like so: 

Then look at that content. Have you published it as HTML content? Is it search optimized? Is there an opportunity to make it a valuable, rank-worthy and trackable webpage?

PDF files are often the rust of the internet, added to sites because the content manager doesn’t have an easy way to publish actual web pages.

They should always be an alternate (print-friendly, download-friendly) version of HTML content. They should almost never be the only version of a piece of content.  

Your turn to master Google search

Congratulations! You’ve officially made it to the end of this mammoth guide. 

Google is far more powerful and robust than we realize or give it credit for. 

Knowing what to search for and how to search for it with the help of Google advanced search operators will help you harness Google’s true power and in turn, grow your site.

As SEOs, our job comprises running SEO tests, tinkering with Google’s algorithms, and staying on top of the latest search trends.

Google advanced search is not only a fun skill that you can learn over the weekend. It can help you uncover opportunities hiding in plain sight and help you be more effective at your job.

The real kicker

Google is and always will be free. The know-how to fine-tune your searches will help you become a better SEO and pay dividends over the long term.

Has using Google advanced search in your day-to-day made you a better SEO? Which search operators do you use most frequently? Did I miss any advanced search tips? Drop them in the comments below.

Aditya Sheth does Content & SEO at Venngage. You can connect with him on Linkedin or find him on Twitter @iamadityashth.

The post Google advanced search: Six powerful tips for better SEO appeared first on Search Engine Watch.

Search Engine Watch


How Google May Annotate Images to Improve Search Results

June 25, 2020 No Comments

How might Google improve on information from sources such as knowledge bases to help them answer search queries?

That information may be learned from or inferred from sources outside of those knowledge bases when Google may:

  • Analyze and annotate images
  • Consider other data sources

A recent Google patent on this topic defines knowledge bases for us, why those are important, and it points out examples of how Google looks at entities while it may annotate images:

A knowledge base is an important repository of structured and unstructured data. The data stored in a knowledge base may include information such as entities, facts about entities, and relationships between entities. This information can be used to assist with or satisfy user search queries processed by a search engine.

Examples of knowledge bases include Google Knowledge Graph and Knowledge Vault, Microsoft Satori Knowledge Base, DBpedia, Yahoo! Knowledge Base, and Wolfram Knowledgebase.

The focus of this patent is upon improving upon information that can be found in knowledge bases:

The data stored in a knowledge base may be enriched or expanded by harvesting information from a wide variety of sources. For example, entities and facts may be obtained by crawling text included in Internet web pages. As another example, entities and facts may be collected using machine learning algorithms, while it may annotate images.

All gathered information may be stored in a knowledge base to enrich the information that is available for processing search queries.

Analyzing Images to Enrich Knowledge Base Information

This approach may annotate images and select object entities contained in those images. It reminded me of a post I recently wrote about Google annotating images, How Google May Map Image Queries

This is an effort to better understand and annotate images, and explore related entities in images, so Google can focus on “relationships between the object entities and attribute entities, and store the relationships in a knowledge base.”

Google can learn from images of real-world objects (a phrase they used for entities when they started the Knowledge Graph in 2012.)

I wrote another post about image search becoming more semantic, in the labels they added to categories in Google image search results. I wrote about those in Google Image Search Labels Becoming More Semantic?

When writing about mapping image queries, I couldn’t help but think about labels helping to organize information in a useful way. I’ve suggested using those labels to better learn about entities when creating content or doing keyword research. Doing image searches and looking at those semantic labels can be worth the effort.

This new patent tells us how Google may annotate images to identify entities contained in those images. While labeling, they may select an object entity from the entities pictured and then choose at least one attribute entity from the annotated images that contain the object entity. They could also infer a relationship between the object entity and the attribute entity or entities and include that relationship in a knowledge base.

In accordance with one exemplary embodiment, a computer-implemented method is provided for enriching a knowledge base for search queries. The method includes assigning annotations to images stored in a database. The annotations may identify entities contained in the images. An object entity among the entities may be selected based on the annotations. At least one attribute entity may be determined using the annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity may be inferred and stored in a knowledge base.

For example, when I search for my hometown, Carlsbad in Google image search, one of the category labels is for Legoland, which is an amusement park located in Carlsbad, California. Showing that as a label tells us that Legoland is located in Carlsbad (the captions for the pictures of Legoland tell us that it is located in Carlsbad.)

Carlsbad-Legoland-Attribute Entity

This patent can be found at:

Computerized systems and methods for enriching a knowledge base for search queries
Inventors: Ran El Manor and Yaniv Leviathan
Assignee: Google LLC
US Patent: 10,534,810
Granted: January 14, 2020
Filed: February 29, 2016

Abstract

Systems and methods are disclosed for enriching a knowledge base for search queries. According to certain embodiments, images are assigned annotations that identify entities contained in the images. An object entity is selected among the entities based on the annotations and at least one attribute entity is determined using annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity is inferred and stored in the knowledge base. In some embodiments, confidence may be calculated for the entities. The confidence scores may be aggregated across a plurality of images to identify an object entity.

Confidence Scores While Labeling of Entities in Images

One of the first phrases to jump out at me when I scanned this patent to decide that I wanted to write about it was the phrase, “confidence scores,” which reminded me of association scores which I wrote about discussing Google trying to extract information about entities and relationships with other entities and confidence scores about the relationships between those entities, and about attributes involving the entities. I mentioned association scores in the post Entity Extractions for Knowledge Graphs at Google, because those scores were described in the patent Computerized systems and methods for extracting and storing information regarding entities.

I also referred to these confidence scores when I wrote about Answering Questions Using Knowledge Graphs because association scores or confidence scores can lead to better answers to questions about entities in search results, which is an aim of this patent, and how it attempts to analyze and label images and understand the relationships between entities shown in those images.

The patent lays out the purpose it serves when it may analyze and annotate images like this:

Embodiments of the present disclosure provide improved systems and methods for enriching a knowledge base for search queries. The information used to enrich a knowledge base may be learned or inferred from analyzing images and other data sources.

Per some embodiments, object recognition technology is used to annotate images stored in databases or harvested from Internet web pages. The annotations may identify who and/or what is contained in the images.

The disclosed embodiments can learn which annotations are good indicators for facts by aggregating annotations over object entities and facts that are already known to be true. Grouping annotated images by the object entity help identify the top annotations for the object entity.

Top annotations can be selected as attributes for the object entities and relationships can be inferred between the object entities and the attributes.

As used herein, the term “inferring” refers to operations where an entity relationship is inferred from or determined using indirect factors such as image context, known entity relationships, and data stored in a knowledge base to draw an entity relationship conclusion instead of learning the entity-relationship from an explicit statement of the relationship such as in text on an Internet web page.

The inferred relationships may be stored in a knowledge base and subsequently used to assist with or respond to user search queries processed by a search engine.

The patent then tells us about how confidence scores are used, that they calculate confidence scores for annotations assigned to images. Those “confidence scores may reflect the likelihood that an entity identified by an annotation is contained in an image.”

If you look back up at the pictures for Legoland above, it may be considered an attribute entity of the Object Entity Carlsbad, because Legoland is located in Carlsbad. The label annotations indicate what the images portray, and infer a relationship between the entities.

Just like an image search for Milan Italy shows a category label for Duomo, a Cathedral located in the City. The Duomo is an attribute entity of the Object Entity of Milan because it is located in Milan Italy.

In those examples, we are inferring from Legoland being included under pictures of Carlsbad that it is an attribute entity of Carlsbad and that the Duomo is an attribute entity of Milan because it is included in the results of a search for Milan.

Milan Duomo Attribute Entity

A search engine may learn from label annotations and because of confidence scores about images because the search engine (or indexing engine thereof) may index:

  • Image annotations
  • Object entities
  • Attribute entities
  • Relationships between object entities and attribute entities
  • Facts learned about object entities

The Illustrations from the patent show us images of a Bear, eating a Fish, to tell us that the Bear is an Object Entity, and the Fish is an Attribute Entity and that Bears eat Fish.

Anotate images with Bear (Object Entity) and Fish (Attribute-Entity) entities

We are also shown that Bears, as object Entities have other Attribute Entities associated with them, since they will go into the water to hunt fish, and roam around on the grass.

Bears and attribute Entities

Annotations may be detailed and cover objects within photos or images, like the bear eating the fish above. The patent points out a range of entities that might appear in a single image by telling us about a photo from a baseball game:

An annotation may identify an entity contained in an image. An entity may be a person, place, thing, or concept. For example, an image taken at a baseball game may contain entities such as “baseball fan”, “grass”, “baseball player”, “baseball stadium”, etc.

An entity may also be a specific person, place, thing, or concept. For example, the image taken at the baseball game may contain entities such as “Nationals Park” and “Ryan Zimmerman”.

Defining an Object Entity When Google May Annotate Images

The patent provides more insights into what object entities are and how they might be selected:

An object entity may be an entity selected among the entities contained in a plurality of annotated images. Object entities may be used to group images to learn facts about those object entities. In some embodiments, a server may select a plurality of images and assign annotations to those images.

A server may select an object entity based on the entity contained in the greatest number of annotated images as identified by the annotations.

For example, a group of 50 images may be assigned annotations that identify George Washington in 30 of those images. Accordingly, a server may select George Washington as the object entity if 30 out of 50 annotated images is the greatest number for any identified entity.

Confidence scores may also be determined for annotations. Confidence scores are an indication that an entity identified by an annotation is contained in an image. It “quantifies a level of confidence in an annotation being accurate.” That confidence score could be calculated by using a template matching algorithm. The annotated image may be compared with a template image.

Defining Attribute Entities When Google May Annotate Images

An attribute entity may be an entity that is among the entities contained in images that contain the object entity. They are entities other than the object entity.

Annotated images that contain the object entity may be grouped and an attribute entity may be selected based on what entity might be contained in the greatest number of grouped images as identified by the annotations.

So, a group of 30 annotated images containing object entity “George Washington” may also include 20 images that contain “Martha Washington.”

In that case, “Martha Washington,” may be considered an attribute entity

(Of Course, “Martha Washington Could be an object Entity, and “George Washington, appearing in a number of the “Martha Washington” labeled images could be considered the attribute entity.)

Infering Relationships between entities by Analyzing Images

If more than a threshold of images of “Michael Jordon” contains a basketball in his hand, a relationship between “Michael Jordan” and basketball might be made (That Michael Jordan is a basketball player.)

From analyzing images of bears hunting for fish in water, and roaming around on grassy fields, some relationships between bears and fish and water and grass can be made also:

inferences between entities

By analyzing images of Michael Jordan with a basketball in his hand wearing a Chicago Bulls jersey, a search query asking a question such as “What basketball team does Michael Jordan play for?” may be satisfied with the answer “Chicago Bulls”.

To answer a query such as “What team did Michael Jordan play basketball for, Google could perform an image search for “Michael Jordan playing basketball”. Having those images that contain the object entity of interest can allow the images to be analyzed and an answer provided. See the picture at the top of this post, showing Michael Jordan in a Bulls jersey.

Take Aways

This process to collect and annotate images can be done using any images found on the Web, and isn’t limited to images that might be found in places like Wikipedia.

Google can analyze images online in a way that scales on a web-wide basis, and by analyzing images, it may provide insights that a knowledge graph might not, such as to answer the question, “where do Grizzly Bears hunt?” an analysis of photos reveals that they like to hunt near water so that they can eat fish.

The confidence scores in this patent aren’t like the association scores in the other patents about entities that I wrote about, because they are trying to gauge how likely it is that what is in a photo or image is indeed the entity that it might then be labeled with.

The association scores that I wrote about were trying to gauge how likely relationships between entities and attributes might be more likely to be true based upon things such as the reliability and popularity of the sources of that information.

So, Google is trying to learn about real-world objects (entities) by analyzing pictures of those entities when it may annotate images (ones that it has confidence in), as an alternative way of learning about the world and the things within it.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google May Annotate Images to Improve Search Results appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Eight great tips for internal site search optimization

June 21, 2020 No Comments

30-second summary:

  • An under-optimized UX can sometimes limit the user’s journey by creating roadblocks to conversions.
  • Internal site search is one of the hidden gems every SEO should know about. 
  • Not only does it improve UX and conversions but literally shows us what’s missing in our content strategy and product range.
  • CRO Expert Marco Bonomo reveals the top eight tips on internal site search optimization.

Making content easy to access and discovering content gaps are one of the greatest challenges in SEO. Aside from that, an under-optimized UX can sometimes limit the users journey by creating roadblocks to conversions. Thankfully, Internal site search can be used to reveal these gaps, and help in making the most of the existing traffic. Here are the top eight tips to take advantage of: 

1. Make internal site search part of your optimization routine 

Once you’ve set up the internal site search functionality and mastered the internal site search basicsit’s time to make site search part of your optimization routine. Important metrics such as exit rates, search refinements, or the “zero-result” search queries are particularly relevant in adding an extra layer of information to your audits and reports and help you in making more data-driven recommendations. 

2. Dive into users’ behavior using the Google Analytics segments 

Google Analytics segments come in handy when you’re looking to filter a portion of the traffic by a specific behavior, for example by selecting “Organic  Only” visitorsWhat is less known though, is that Google updated the default segment “Performed Site Search” from “sessions” to “users” not long ago. Even if this might seem like a small change, it means that you might look at users with several sessions and purchases in their history, but who are still looped in this segment because of the way it is measured. 

To mitigate this issue, you can copy the “Performed Site Search” segment and make it more targeted in two simple steps. The first one would be switching from “users” to “sessions”, while the second one (optional) consists in adding a “Bounces = 0” condition to exclude users that are less relevant for your site search analysis. For more info about the implementation, I recommend having a read at this piece from Loop54.

Sessions performed site search Google Analytics custom segment

3. Analyze the internal site search flow in Google Analytics  

Another report I recommend dive into is the Audience’s “Users Flow”. This particular report allows you to analyze the user’s behavior like never before and reveal further gaps in your UX and internal linking. To access the report, click on Audience > Users Flow and add your site’s custom search path (for example “/?s”) into the pop-up window from the node you are interested in querying, as per the screenshot below. 

Internal site search flow in Google Analytics

As a result, you should be able to see the specific path the visitors made after triggering the on-site search (see example below). For more information about the setup, I recommend having a look at this post from Bounteous 

Internal site search flow screenshot example

4. Use Google Tag Manager to unveil zero-result search queries 

Looking at search terms is a great way to have an understanding of what users are looking for, but what about the internal search queries that produced no results at all? Luckily, there’s a quick fix for this as creating a custom Google Tag Manager tag and an event on Analytics enables you to track these queries and identify even more gaps in your content or product rangesTo have this implemented (Analytics does not come with “zero-result” reports out of the box), have a read of the guide ‘How to implement the zero-result google analytics track’ by Dmitri Ilin.

Zero result search queries with analytics and tag manager

5. Implement a smart site search solution  

Using third-party software can enhance internal site search like never before. Especially for ecommerce websites, navigating through thousands of products can now be done in a more efficient way by simply implementing features that help improve the conversion rate. Features like auto-suggest or providing query results for different categories (see screenshot below) are now a must-have for ecommerce brands, as they aid in maximizing the site search revenue. 

Smart site search solution ecommerce example

6. Add a voice search functionality to your internal site search 

Especially if you’re working in a B2C niche, it’s definitely worth considering adding a voice search feature to your internal site search. Considering that mobile and tablet are now already generating over 50% of your traffic, it makes sense to make site search even more accessible by adding a feature that can only grow over time.   

Voice search feature for internal site search ecommerce

7. Consider disabling the Google search box 

Even though disabling the Google search box might sound like a bizarre idea, I recommend considering this in particular cases. For example, big ecommerce like Amazon might want to leverage the homepage to display personalized deals. Doing so, Amazon encourages users to purchase items they didn’t even think about, but that was made irresistible by a clever combination of search history and flash discounts. To have a look at this, simply follow the Google site links search box guidelines and test it for a short period of time to see if this solution works for your ecommerce too. 

Google sitelinks search box Amazon example

8. Use Google Data Studio for internal site search reporting 

In order to make sure that an on-site search is part of your optimization routine, I also advise you to create a dedicated Google Data Studio dashboard for your weekly or monInternal site search reporting with Google data studiothly reporting. The key metrics I suggest to display are the top search terms, search exits, search refinements, revenue, with the ability to filter by country and dates.  

Wrapping up 

Internal site search is one of the hidden gems every SEO should know about. It not only represents another way to improve UX and conversions but literally shows us what’s missing in our content strategy and product range. So why not start using these practical tips, and make the most of this underrated tool today? 

Marco Bonomo is an SEO & CRO Expert at MediaCom London. He can be found on Twitter @MarcoBonomoSEO.

The post Eight great tips for internal site search optimization appeared first on Search Engine Watch.

Search Engine Watch


Google Product Search and Learning about New Product Lines

June 17, 2020 No Comments

It’s interesting seeing patents from Google that focus on eCommerce topics. The last one I recall had Google distinguishing between products and accessories for those products in search results. I wrote about it in Ranking Search Results and Product Queries.

New Product Lines in Product Search

A new patent from Google is about when new products appear in existing product lines, like a laptop that comes with more Ram or a bigger hard drive, or a camera with a zoom lens that it didn’t have before.

This patent is about determining in product search whether a query is looking for a particular product line, from within a specific brand.

Searchers frequently search for products offered for sale. Google is trying to understand the intent behind shopping-related search queries.

For Google to be able to do that well, it has to understand different aspects of product categories. This can include such things as:

  • Whether a product as an association with a brand
  • Whether a product is in a specific product line

The patent tells us it is essential to detect terms designating product lines from within product queries from searchers.

That includes associating detected product line terms along with their corresponding brands, to let Google keep up with new product lines and retiring product lines soon after changes occur.

Under the new Google patent is a process aimed at determining product lines from product search queries:

  • A product query might be classified to identify a product category
  • A brand may be identified for the product query
  • The brand may be chosen from a list of known brands for the product category

Unknown Product Lines

The patent tells us that unknown product line terms may be identified within a product query.

A metric may indicate how well the unknown product line terms correspond to an actual product line within the brand.

The metric may be compared to a specified threshold. The unknown product line terms may be designated as a new product line of the brand if the metric compares to the specified threshold.

A product search may be performed using the product query. Product search results may be returned according to the product search.

This product lines patent can be found at:

Detecting product lines within product search queries
Inventors: Ritendra Datta
Assignee: GOOGLE LLC
US Patent: 10,394,816
Granted: August 27, 2019
Filed: December 27, 2012

Abstract

Systems and methods can determine product lines product searches.

One or more computing devices can receive a product query of search terms. The product query may be classified to identify a product category. A brand may be identified for the product query. The brand may be selected from a list of known brands for the product category.

One or more unknown product line terms may be identified within the product query. A metric may be computed to indicate how well the unknown product line terms correspond to an actual product line within the brand. The metric may be compared to a specified threshold. The unknown product line terms may be designated as a new product line of the brand if the metric favorably compares to the specified threshold. A product search may be performed on the product query. Product search results may be returned according to the product search.

High Precision Query Classifiers

This patent shows Google trying to identify new products and product lines, so it can distinguish them from older product lines.

Interestingly, Google is looking at search queries to identify products and product lines. As the patent tells us:

Product lines associated with product brands may be determined from analyzing the received product search queries.

The patent refers to a “high-precision query classifier,” which is the first time I have seen that mentioned anywhere at all.

How does a “high precision query classifier” work?

As described in this patent:

  • A search query may be automatically mapped to a product category
  • A list of known brands within the product category may be used to identify terms within the product query specifying the product brand
  • Similarly, a list of known category attributes may be used to identify terms within the product query specifying attributes of the product being searched

Attributes of Products

Product Attributes

The patent provides some examples of attributes for products:

  • A number of megapixels for digital cameras
  • An amount of RAM memory for laptop computers
  • A number of cylinders for a motor vehicle

Product Query Forms

We are told that the forms that a product query may take may vary a bit, but we are provided with some examples.

A product query could take the form “[B] [PL] [A].”

In such a query form, one or more terms [B] may indicate a brand that is a known brand within a list of known product brands, and one or more terms [A] may indicate attributes that are known attributes of the category. One or more unknown terms [PL] may then be identified as a potential new product line. Such an identification may be strengthened where [PL] is in a form associated with product lines. The identification may also be strengthened where [PL] is found with brand [B] frequently over time within various product queries. The identification may be further strengthened where the terms [PL] are infrequently, or never, found with brands other than the brand [B] throughout many product queries over time.

A metric is calculated by comparing what might be the attributes of products from a new product line, with attributes of an actual product line associated with a brand.

This metric may consider the number of unique product queries containing the terms [PL] having the correct structure and/or category along with the extent to which [B] dominates among every query that has a brand preceding [PL].

Why would Google be looking at Queries to learn about new product lines from brands instead of from product pages that describe the attributes of products?

Identifying Product Lines

How this identification process may work:

  • Software for product line resolution may identify product lines associated with brands for product categories determined by the query classifier
  • Product line resolution may use a category attribute dictionary and a product brand dictionary to establish pairings between brands and product lines
  • The product query and the determined brands and product lines may then be provided to a product search engine
  • The product search engine may then provide search results to the searcher
  • The query classifier may map the product query to a product category
  • Product line resolution can use product category information with the category attribute dictionary and the product brand dictionary to identify terms from the product query about specific product lines relate to product lines
  • The unknown terms identified by the product line resolution module for a category may be fed back into the category attribute dictionary as attributes for that category
  • Each identified product line may also be related to a particular brand listed in the product brand dictionary
  • The product brand dictionary can provide a list of known brands within various product categories
  • The known brands may be used to determine and resolve terms associated with product lines within each brand
  • The product line terms may then be used to identify a potential new product line

The identification of a new product line may be strengthened:

  • When unknown terms information is in a form associated with product lines
  • Where the unknown terms are found with a brand frequently over time within various product queries
  • Where the unknown terms are infrequently, or never, found with brands other than the brand identified throughout many products queries over time

Identifying When Unknown Terms Maybe in a form associated with product lines

Here are some observations about the form of product lines:

  • Product line terms generally start with a letter
  • Product lines generally contain few or no numbers (differentiating product line terms from model numbers or serial numbers
  • Product lines may be related to a category or a brand (One brand may generally have single word product lines while a second brand may use two-word product lines where the first word relates to performance and a second word is a three-digit number

These kinds of patterns or forms about product lines could be used to associate unknown terms within a product query as product line terms.

Using a Category Attribute Dictionary to Resolve Product Line Terms within Product Queries

The category attribute dictionary can provide a dictionary of attributes associated with various product categories and brands.

Terms from the category attribute dictionary may be used to resolving product line terms within the product query.

When unknown terms are often found within product queries along with brand information, those unknown terms could be seen as product line terms associated with a specific brand. When known attribute terms are found in the category attribute dictionary to be consistent with brand [B] or the category associated with the product query by the query classifier.

Product Query Processing

The patent includes this flowchart to describe the process behind the product search patent:

Where does Google Learn about product lines?

The patent doesn’t mention product schema, or merchant product feeds. It does tell us that it is getting a lot of information about product lines from searcher’s queries.

Google also collects information about products and product attributes from web sites that sell those products, in addition to looking at product queries, as described in this patent.

Collecting such information from site owners may be the starting source of much information found in the product and category dictionaries and product attribute categories that are mentioned in this patent.

The process of updating information about products and product lines from product queries from searchers is a way to crowdsource information about products from searchers and get an idea of how much interest there might be in specific products.

Google can learn a lot about products from product data feeds that merchants submit to Google. Google is trying to get merchants to submit product feeds even if they don’t use paid product search, to make those products visible in more places on Google in Surfaces across Google as described on this Google Support page: Show your products on Surfaces Across Google.

We saw that Google is using product feed information to help it distinguish between product pages and accessory pages for those products as I wrote about in the blog post I linked to at the start of this post.

Google also describes product markup on their developers page Product. Google tells site owners that they should include that markup for their products because:

Product markup enables a badge on the image in mobile image search results, which can encourage more users to click your content.

By collecting information about products from product feeds, Product Schema, product web pages, and product queries from searchers Google is collecting a lot of data about products, which could enable it to be pretty good at providing answers to product queries, and to understand when new product lines are launched.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Google Product Search and Learning about New Product Lines appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


A Well-Formed Query Helps Search Engines Understand User Intent in the Query

June 13, 2020 No Comments

A Well-Formed Query Helps a Search Engine understand User Intent Behind the Query

To start this post, I wanted to include a couple of whitepapers that include authors from Google. The authors of the first paper are the inventors of a patent application that was just published on April 28, 2020, and it is very good seeing a white paper from the inventors of a recent patent published by Google. Both papers are worth reading to get a sense of how Google is trying to rewrite queries into “Well-Formed Natural Language Questions.

August 28, 2018 – Identifying Well-formed Natural Language Questions

The abstract for that paper:

Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well-formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.

The paper provides examples of well-formed queries and ill-formed queries:

Examples of Well forned and non wll formed queries

November 21, 2019 – How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

The abstract for that paper:

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2%in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.

examples of ill-formed and well-formed questions

The patent application I am writing about was filed on January 18, 2019, which puts it around halfway between those two whitepapers, and both of them are recommended to get a good sense of the topic if you are interested in featured snippets, people also ask questions, and queries that Google tries to respond to. The Second Whitepaper refers to the first one, and tells us how it is trying to improve upon it:

Faruqui and Das (2018) introduced the task of identifying well-formed natural language questions. In this paper,we take a step further to investigate methods to rewrite ill-formed questions into well-formed ones without changing their semantics. We create a multi-domain question rewriting dataset (MQR) from human contributed StackExchange question edit histories.

Rewriting Ill-Formed Search Queries into Well-Formed Queries

Interestingly, the patent is also about rewriting search Queries.

It starts by telling us that “Rules-based rewrites of search queries have been utilized in query processing components of search systems.”

Sometimes this happens by removing certain stop-words from queries, such as “the”, “a”, etc.

After Rewriting a Query

Once a query is rewritten, it many be “submitted to the search system and search results returned that are responsive to the rewritten query.”

The patent also tells us about “people also search for X” queries (first patent I have seen them mentioned in.)

We are told that these similar queries are used to recommend additional queries that are related to a submitted query (e.g., “people also search for X”).

These “similar queries to a given query are often determined by navigational clustering.”

As an example, we are told that for the query “funny cat pictures”, a similar query of “funny cat pictures with captions” may be determined because that similar query is frequently submitted by searchers following submission of the query “funny cat pictures”.

Determining if a Query is a Well Formed Query

The patent tells us about a process that can be used to determine if a natural language search query is well-formed, and if it is not, to use a trained canonicalization model to create a well-formed variant of that natural language search query.

First, we are given a definition of “Well-formedness” We are told that it is “an indication of how well a word, a phrase, and/or another additional linguistic element (s) conform to the grammar rules of a particular language.”

These are three steps to tell whether something is a well-formed query. It is:

  • Grammatically correct
  • Does not contain spelling errors
  • Asks an explicit question

The first paper from the authors of this patent tells us the following about queries:

The lack of regularity in the structure of queries makes it difficult to train models that can optimally process the query to extract information that can help understand the user intent behind the query.

That translates to the most important takeaway for this post:

A Well-Formed Query is structured in a way that allows a search engine to understand the user intent behind the query

The patent gives us an example:

“What are directions to Hypothetical Café?” is an example of a well-formed version of the natural language query “Hypothetical Café directions”.

How the Classification Model Works

It also tells us that the purpose behind the process in the patent is to determine whether a query is well-formed using a trained classification model and/or a well-formed variant of a query and if that well-formed version can be generated using a trained canonicalization model.

It can create that model by using features of the search query as input to the classification model and deciding whether the search query is well-formed.

Those features of the search query can include, for example:

  • Character(s)
  • Word(s)
  • Part(s) of speech
  • Entities included in the search query
  • And/or other linguistic representation(s) of the search query (such as word n-grams, character bag of words, etc.)

And the patent tells us more about the nature of the classification model:

The classification model is a machine learning model, such as a neural network model that contains one or more layers such as one or more feed-forward layers, softmax layer(s), and/or additional neural network layers. For example, the classification model can include several feed-forward layers utilized to generate feed-forward output. The resulting feed-forward output can be applied to softmax layer(s) to generate a measure (e.g., a probability) that indicates whether the search query is well-formed.

A Canonicalization Model May Be Used

If the Classification model determines that the search query is not a well-formed query, the query is turned over to a trained canonicalization model to generate a well-formed version of the search query.

The search query may have some of its features extracted from the search query, and/or additional input processed using the canonicalization model to generate a well-formed version that correlates with the search query.

The canonicalization model may be a neural network model. The patent provides more details on the nature of the neural network used.

The neural network can indicate a well-formed query version of the original query.

We are also told that in addition to identifying a well-formed query, it may also determine “one or more related queries for a given search query.”

A related query can be determined based on the related query being frequently submitted by users following the submission of the given search query.

The query canonicalization system can also determine if the related query is a well-formed query. If it isn’t, then it can determine a well-formed variant of the related query.

For example, in response to the submission of the given search query, a selectable version of the well-formed variant can be presented along with search results for the given query and, if selected, the well-formed variant (or the related query itself in some implementations) can be submitted as a search query and results for the well-formed variant (or the related query) then presented.

Again, the idea of “intent” surfaces in the patent regarding related queries (people also search for queries)

The value of showing a well-formed variant of a related query, instead of the related query itself, is to let a searcher more easily and/or more quickly understand the intent of the related query.

The patent tells us that this has a lot of value by stating:

Such efficient understanding enables the user to quickly submit the well-formed variant to quickly discover additional information (i.e., result(s) for the related query or well-formed variant) in performing a task and/or enables the user to only submit such query when the intent indicates likely relevant additional information in performing the task.

We are given an example of a related well-formed query in the patent:

As one example, the system can determine the phrase “hypothetical router configuration” is related to the query “reset hypothetical router” based on historical data indicating the two queries are submitted proximate (in time and/or order) to one another by a large number of users of a search system.

In some such implementations, the query canonicalization system can determine the related query “reset hypothetical router” is not a well-formed query, and can determine a well-formed variant of the related query, such as: “how to reset hypothetical router”.

The well-formed variant “how to reset hypothetical router” can then be associated, in a database, as a related query for “hypothetical router configuration”—and can optionally supplant any related query association between “reset hypothetical router” and “hypothetical router configuration”.

The patent tells us that sometimes a well-formed related query might be presented as a link to search results.

Again, one of the features of a well-formed query is that it is grammatical, is an explicit question, and contains no spelling errors.

The patent application can be found at:

Canonicalizing Search Queries to Natural language Questions
Inventors Manaal Faruqui and Dipanjan Das
Applicants Google LLC
Publication Number 20200167379
Filed: January 18, 2019
Publication Date May 28, 2020

Abstract

Techniques are described herein for training and/or utilizing a query canonicalization system. In various implementations, a query canonicalization system can include a classification model and a canonicalization model. A classification model can be used to determine if a search query is well-formed. Additionally or alternatively, a canonicalization model can be used to determine a well-formed variant of a search query in response to determining a search query is not well-formed. In various implementations, a canonicalization model portion of a query canonicalization system can be a sequence to sequence model.

Well-Formed Query Takeaways

I have summarized the summary of the patent, and if you want to learn more details, click through and read the detailed description. The two white papers I started the post off with describing databases of well-formed questions that people as Google (including the inventors of this patent) have built and show the effort that Google has put into the idea of rewriting queries so that they are well-formed queries, where the intent behind them can be better understood by the search engine.

A well-formed query is grammatically correct, contains no spelling mistakes, and asks an explicit question. It also makes it clear to the search engine what the intent behind the query may be.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post A Well-Formed Query Helps Search Engines Understand User Intent in the Query appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


A Well-Formed Query Helps a Search Engine Understand User Intent in the Query

June 9, 2020 No Comments

A Well-Formed Query Helps a Search Engine understand User Intent Behind the Query

To start this post, I wanted to include a couple of whitepapers that include authors from Google. The authors of the first paper are the inventors of a patent application that was just published on April 28, 2020, and it is very good seeing a white paper from the inventors of a recent patent published by Google. Both papers are worth reading to get a sense of how Google is trying to rewrite queries into “Well-Formed Natural Language Questions.

August 28, 2018 – Identifying Well-formed Natural Language Questions

The abstract for that paper:

Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well-formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.

The paper provides examples of well-formed queries and ill-formed queries:

Examples of Well forned and non wll formed queries

November 21, 2019 – How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

The abstract for that paper:

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2%in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.

examples of ill-formed and well-formed questions

The patent application I am writing about was filed on January 18, 2019, which puts it around halfway between those two whitepapers, and both of them are recommended to get a good sense of the topic if you are interested in featured snippets, people also ask questions, and queries that Google tries to respond to. The Second Whitepaper refers to the first one, and tells us how it is trying to improve upon it:

Faruqui and Das (2018) introduced the task of identifying well-formed natural language questions. In this paper,we take a step further to investigate methods to rewrite ill-formed questions into well-formed ones without changing their semantics. We create a multi-domain question rewriting dataset (MQR) from human contributed StackExchange question edit histories.

Rewriting Ill-Formed Search Queries into Well-Formed Queries

Interestingly, the patent is also about rewriting search Queries.

It starts by telling us that “Rules-based rewrites of search queries have been utilized in query processing components of search systems.”

Sometimes this happens by removing certain stop-words from queries, such as “the”, “a”, etc.

After Rewriting a Query

Once a query is rewritten, it many be “submitted to the search system and search results returned that are responsive to the rewritten query.”

The patent also tells us about “people also search for X” queries (first patent I have seen them mentioned in.)

We are told that these similar queries are used to recommend additional queries that are related to a submitted query (e.g., “people also search for X”).

These “similar queries to a given query are often determined by navigational clustering.”

As an example, we are told that for the query “funny cat pictures”, a similar query of “funny cat pictures with captions” may be determined because that similar query is frequently submitted by searchers following submission of the query “funny cat pictures”.

Determining if a Query is a Well Formed Query

The patent tells us about a process that can be used to determine if a natural language search query is well-formed, and if it is not, to use a trained canonicalization model to create a well-formed variant of that natural language search query.

First, we are given a definition of “Well-formedness” We are told that it is “an indication of how well a word, a phrase, and/or another additional linguistic element (s) conform to the grammar rules of a particular language.”

These are three steps to tell whether something is a well-formed query. It is:

  • Grammatically correct
  • Does not contain spelling errors
  • Asks an explicit question

The first paper from the authors of this patent tells us the following about queries:

The lack of regularity in the structure of queries makes it difficult to train models that can optimally process the query to extract information that can help understand the user intent behind the query.

That translates to the most important takeaway for this post:

A Well-Formed Query is structured in a way that allows a search engine to understand the user intent behind the query

The patent gives us an example:

“What are directions to Hypothetical Café?” is an example of a well-formed version of the natural language query “Hypothetical Café directions”.

How the Classification Model Works

It also tells us that the purpose behind the process in the patent is to determine whether a query is well-formed using a trained classification model and/or a well-formed variant of a query and if that well-formed version can be generated using a trained canonicalization model.

It can create that model by using features of the search query as input to the classification model and deciding whether the search query is well-formed.

Those features of the search query can include, for example:

  • Character(s)
  • Word(s)
  • Part(s) of speech
  • Entities included in the search query
  • And/or other linguistic representation(s) of the search query (such as word n-grams, character bag of words, etc.)

And the patent tells us more about the nature of the classification model:

The classification model is a machine learning model, such as a neural network model that contains one or more layers such as one or more feed-forward layers, softmax layer(s), and/or additional neural network layers. For example, the classification model can include several feed-forward layers utilized to generate feed-forward output. The resulting feed-forward output can be applied to softmax layer(s) to generate a measure (e.g., a probability) that indicates whether the search query is well-formed.

A Canonicalization Model May Be Used

If the Classification model determines that the search query is not a well-formed query, the query is turned over to a trained canonicalization model to generate a well-formed version of the search query.

The search query may have some of its features extracted from the search query, and/or additional input processed using the canonicalization model to generate a well-formed version that correlates with the search query.

The canonicalization model may be a neural network model. The patent provides more details on the nature of the neural network used.

The neural network can indicate a well-formed query version of the original query.

We are also told that in addition to identifying a well-formed query, it may also determine “one or more related queries for a given search query.”

A related query can be determined based on the related query being frequently submitted by users following the submission of the given search query.

The query canonicalization system can also determine if the related query is a well-formed query. If it isn’t, then it can determine a well-formed variant of the related query.

For example, in response to the submission of the given search query, a selectable version of the well-formed variant can be presented along with search results for the given query and, if selected, the well-formed variant (or the related query itself in some implementations) can be submitted as a search query and results for the well-formed variant (or the related query) then presented.

Again, the idea of “intent” surfaces in the patent regarding related queries (people also search for queries)

The value of showing a well-formed variant of a related query, instead of the related query itself, is to let a searcher more easily and/or more quickly understand the intent of the related query.

The patent tells us that this has a lot of value by stating:

Such efficient understanding enables the user to quickly submit the well-formed variant to quickly discover additional information (i.e., result(s) for the related query or well-formed variant) in performing a task and/or enables the user to only submit such query when the intent indicates likely relevant additional information in performing the task.

We are given an example of a related well-formed query in the patent:

As one example, the system can determine the phrase “hypothetical router configuration” is related to the query “reset hypothetical router” based on historical data indicating the two queries are submitted proximate (in time and/or order) to one another by a large number of users of a search system.

In some such implementations, the query canonicalization system can determine the related query “reset hypothetical router” is not a well-formed query, and can determine a well-formed variant of the related query, such as: “how to reset hypothetical router”.

The well-formed variant “how to reset hypothetical router” can then be associated, in a database, as a related query for “hypothetical router configuration”—and can optionally supplant any related query association between “reset hypothetical router” and “hypothetical router configuration”.

The patent tells us that sometimes a well-formed related query might be presented as a link to search results.

Again, one of the features of a well-formed query is that it is grammatical, is an explicit question, and contains no spelling errors.

The patent application can be found at:

Canonicalizing Search Queries to Natural language Questions
Inventors Manaal Faruqui and Dipanjan Das
Applicants Google LLC
Publication Number 20200167379
Filed: January 18, 2019
Publication Date May 28, 2020

Abstract

Techniques are described herein for training and/or utilizing a query canonicalization system. In various implementations, a query canonicalization system can include a classification model and a canonicalization model. A classification model can be used to determine if a search query is well-formed. Additionally or alternatively, a canonicalization model can be used to determine a well-formed variant of a search query in response to determining a search query is not well-formed. In various implementations, a canonicalization model portion of a query canonicalization system can be a sequence to sequence model.

Well-Formed Query Takeaways

I have summarized the summary of the patent, and if you want to learn more details, click through and read the detailed description. The two white papers I started the post off with describing databases of well-formed questions that people as Google (including the inventors of this patent) have built and show the effort that Google has put into the idea of rewriting queries so that they are well-formed queries, where the intent behind them can be better understood by the search engine.

A well-formed query is grammatically correct, contains no spelling mistakes, and asks an explicit question. It also makes it clear to the search engine what they intent behind the query may be.


Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post A Well-Formed Query Helps a Search Engine Understand User Intent in the Query appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


Job Search Engine Using Occupation Vectors

June 1, 2020 No Comments

I worked for the Courts of Delaware at Superior Court.

I started working there as the Assistant Criminal Deputy Prothonotary.

I changed positions after 7 years there, and I became a Mini/Micro Computer Network Administrator.

The Court used an old English title for that first position which meant that I supervised Court Clerks in the Criminal Department of the Court. In the second position, I never saw a mini/micro-computer but it was a much more technical position. I was reminded of those titles when writing this post.

What unusual job titles might you have held in the past?

A Job Search Engine Based on Occupation Vectors and a Job Identification Model

An Example of Job Search at Google:

job search example

For two weeks, Google was granted patents with the same name each of those 2 weeks. This is the first of the two patents during that period granted under the name “Search Engine.”

It is about a specific type of search engine. One that focuses upon a specific search vertical – A Job Search Engine.

The second patent granted under the name “Search Engine,” was one that focused upon indexing data related to applications on mobile devices. I wrote about it in the post A Native Application Vertical Search Engine at Google

The reason why I find it important to learn about and understand how these new “Search Engine” patents work is that they adopt some newer approaches to answering searches than some of the previous vertical search engines developed by Google. Understanding how they work may provide some ideas about how older searches at Google may have changed.

This Job Search Engine patent works with a job identification model to enhance job search by improving the quality of search results in response to a job search query.

We are told that the job identification model can identify relevant job postings that could otherwise go unnoticed by conventional algorithms due to inherent limitations of keyword-based searching. What implications does this have for organic search at Google that has focused upon keyword searches?

This job search may use methods in addition to conventional keyword-based searching. It uses an identification model that can identify relevant job postings which include job titles that do not match the keywords of a received job search query.

So, the patent tells us that in a query using the words “Patent Guru,” the job identification model may identify postings related to a:

  • “Patent Attorney”
  • “Intellectual Property Attorney”
  • “Attorney”
  • the like

The method behind job searching may include (remember the word “vector.” It is one I am seeing from Google a lot lately):

  • Defining a vector vocabulary
  • Defining an occupation taxonomy includings multiple different occupations
  • Obtaining multiple labeled training data items, wherein each labeled training data item is associated with at least:
    • (i) a job title
    • (ii) an occupation
  • Generating an occupation vector which includes a feature weight for each respective term in the vector vocabulary
  • Associating each respective occupation vector with an occupation in the occupation taxonomy based on the occupation of the labeled training data item used to generate the occupation vector
  • Receiving a search query that includes a string related to a characteristic of one or more potential job opportunities, generating a first vector based on the received query
  • Determining, for each respective occupation of the multiple occupations in the occupation taxonomy, a confidence score that is indicative of whether the query vector is correctly classified in the respective occupation
  • Selecting the particular occupation that is associated with the highest confidence score
  • Obtaining one or more job postings using the selected occupation
  • Providing the obtained job postings in a set of search results in response to the search query

These operations may include:

  • Receiving a search query that includes a string related to a characteristic of one or more job opportunities
  • Generating, based on the query, a query vector that includes a feature weight for each respective term in a predetermined vector vocabulary
  • Determining, for each respective occupation of the multiple occupations in the occupation taxonomy, a confidence score that is indicative of whether the query vector is correctly classified in the respective occupation
  • Selecting the particular occupation that is associated with the highest confidence score
  • Obtaining one or more job postings using the selected occupation, and providing the obtained job postings in a set of search results in response to the search query
  • Feature Weights for Terms in Vector Vocabularies

    It sounds like Google is trying to understand job position titles and how they may be connected, and developing a vector vocabulary, and build ontologies of related positions

    A feature weight may be based on:

    • A term frequency determined on several occurrences of each term in the job title of the training data item
    • An inverse occupation frequency that is determined based on many occupations in the occupation taxonomy where each respective term in the job title of the respective training data item is present.
    • An occupation derivative based on a density of each respective term in the job title of the respective training data item across each of the respective occupations in the occupation taxonomy
    • Both (i) a second value representing the inverse occupation frequency that is determined based, at least in part, on several occupations in the occupation taxonomy where each respective term in the job title of the respective training data item is present and (ii) a third value representing an occupation derivative that is based, at least in part, on a density of each respective term in the job title of the respective training data item across each of the respective occupations in the occupation taxonomy
    • A sum of (i) the second value representing the inverse occupation frequency, and (ii) one-third of the third value representing the occupation derivative

    The predetermined vector vocabulary may include terms that are present in training data items stored in a text corpus and terms that are not present in at least one training data item stored in the text corpus.

    This Job Search Engine Patent can be found at:

    Search engine
    Inventors: Ye Tian, Seyed Reza Mir Ghaderi, Xuejun Tao), Matthew Courtney, Pei-Chun Chen, and Christian Posse
    Assignee: Google LLC
    US Patent: 10,643,183
    Granted: May 5, 2020
    Filed: October 18, 2016

    Abstract

    Methods, systems, and apparatus, including computer programs encoded on storage devices, for performing a job opportunity search. In one aspect, a system includes a data processing apparatus, and a computer-readable storage device having stored thereon instructions that, when executed by the data processing apparatus, cause the data processing apparatus to perform operations.

    The operations include defining a vector vocabulary, defining an occupation taxonomy that includes multiple different occupations, obtaining multiple labeled training data items, wherein each labeled training data item is associated with at least (i) a job title, and (ii) an occupation, generating, for each of the respective labeled training data items, an occupation vector that includes a feature weight for each respective term in the vector vocabulary and associating each respective occupation vector with an occupation in the occupation taxonomy based on the occupation of the labeled training data item used to generate the occupation vector.

    The Job Identification Model

    Job identification model

    Job postings from many different sources may be related to one or more occupations.

    An occupation may include a particular category that encompasses one or more job titles that describe the same profession.

    Two or more of the obtained job postings may be related to the same, or substantially similar, occupation while using different terminology to describe a job title for each of the two or more particular job postings.

    Such differences in the terminology used to describe a particular job title of a job posting may arise for a variety of different reasons:

    • Different people from different employers draft each respective job posting
    • Unique job titles may be based on the culture of the employer’s company, the employer’s marketing strategy, or the like

    occupation taxonomy

    How an Job Identification Model May Work

    An example:

    1. At a first hair salon marketed as a rugged barbershop, advertises a job posting for a “barber”
    2. At a second hair salon marketed as a trendy beauty salon, advertises a job posting for a “stylist”
    3. At both, the job posting seeks a person for the occupation of a “hairdresser” who cuts and styles hair
    4. In a search system limited to keyword-based searching, a searcher seeking job opportunities for a “hairdresser” searchings for job opportunities using the term “barber” may not receive available job postings for a “stylist,” “hairdresser,” or the like if those job postings do not include the term “barber”
    5. The process in this patent uses a job identification model seeking to address this problem

    The job occupation model includes:

    • A classification unit
    • An occupation taxonomy

    The occupation taxonomy associates known job titles from existing job posts with one or more particular occupations.

    During training, the job identification model associates each occupation vector that was generated for an obtained job posting with an occupation in the occupation taxonomy.

    The classification unit may receive the search query and generate a query vector.

    The classification unit may access the occupation taxonomy and calculate, for each particular occupation in the occupation taxonomy, a confidence score that is indicative of the likelihood that the query vector is properly classified into each particular occupation of the multiple occupations in the occupation taxonomy.

    Then, the classification unit may select the occupation associated with the highest confidence score as the occupation that is related to the query vector and provide the selected occupation to the job identification model.

    An Example of a Search Under this Job Opportunities Search Engine:

    1. A searcher queries “Software Guru” into a search box
    2. The search query may be received by the job identification model
    3. The job identification model provides an input to the classification unit including the query
    4. The classification unit generates a query vector
    5. The classification unit analyzes the query vector given the one or more occupation vectors that were generated and associated with each particular occupation in the occupation taxonomy such as occupation vectors
    6. The classification unit may then determine that the query vector is associated with a particular occupation based on a calculated confidence score, and select the particular occupation
    7. The job identification model may receive the particular occupation from the classification unit
    8. Alternatively, or besides, the output from the classification unit may include a confidence score that indicates the likelihood that the query vector is related to the occupation output by the occupation taxonomy
    9. The occupation output from the occupation taxonomy can be used to retrieve relevant job postings
    10. Specifically, given the output of a particular occupation, the job identification model can retrieve one or more job postings using a job posting index that stores references to job postings based on occupation type

    11. The references to job postings that were identified using the job posting index are returned to the user device
    12. The obtained references to job postings may be displayed on the graphical user interface
    13. The obtained references to job postings may be presented as search results and include references to job postings for a “Senior Programmer,” a “Software Engineer,” a “Software Ninja,” or the like
    14. The job postings included in the search results were determined to be responsive to the search query “Software Guru” based at least in part on the vector analysis of the query vector and one or more occupation vectors used to train the occupation taxonomy and not merely based on keyword searching alone

    Takeaways About this Job Search Engine

    In addition to the details about, the patent tells us how an occupation taxonomy may be trained, using training data. It also provides more details about the Job identification model. And then tells us about how a job search is performed using that job identification model.

    I mentioned above that this job search engine patent and the application search engine patent are using methods that we may see in other search verticals at Google. I have written about one approach that could be used in Organic search in the post Google Using Website Representation Vectors to Classify with Expertise and Authority

    Another one of those may involve image searching at Google. I’ve written about Google Image Search Labels Becoming More Semantic?

    I will be posting more soon about how Google Image search is using neural networks to categorize and cluster Images to return in search results.


    Copyright © 2020 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
    Plugin by Taragana

    The post Job Search Engine Using Occupation Vectors appeared first on SEO by the Sea ⚓.


    SEO by the Sea ⚓


    COVID-19 has altered paid search: How marketers can adjust strategies

    May 30, 2020 No Comments

    30-second summary:

    • Since shelter-in-place rules were enacted, the way people use the internet has changed. They’re consuming more media and increasing web research and browsing. 
    • Paid search strategy is not one-size-fits-all. Each vertical must be treated differently, as some industries like ecommerce have seen improved performance while others have seen a declined performance. 
    • A pandemic is not the time to cut ad budget. Instead, investing in advertising now should pay dividends when the market normalizes. 
    • Ensure your ad copy is appropriate for the landscape. That means even going back to a campaign that started before the pandemic to update any language that isn’t applicable to the current landscape. 
    • Marketers must stay flexible and agile during this time and monitor what’s working or not working and creating a quick plan to adjust. 

    When COVID-19 began spreading across the U.S., marketers scrambled to figure out how to respond. Sudden work-from-home mandates, cancelled business trips, postponed conferences and frozen budgets threw a wrench into usual expectations and plans. Users’ needs and online behaviours have changed in tandem, forcing marketers to meet them on their new terms.  

    Search is more important than ever now because people are spending almost all of their time at home and online, consuming media, researching, browsing and shopping. According to Forbes, total internet hits have surged by 50% to 70% with people under lockdown, while 32% of people say they are spending longer on social media. Hours spent in non-gaming apps are up as people turn to TikTok, WhatsApp, Instagram and Twitter to keep entertained, connected and informed. To stay relevant in these turbulent times, it’s imperative that marketers maintain their paid search presence while adjusting to the needs of the moment.  

    Vary strategy by vertical 

    While no industry is immune from the impact of coronavirus, businesses are affected differently and should adapt their paid search strategies accordingly. Industries like B2B and ecommerce have seen improved performance, while industries like travel and healthcare have struggled with poor results.  

    The fact that healthcare is struggling may seem paradoxical, given the overwhelming need for healthcare services right now. While hospitals are busy with COVID-19 patients, people who don’t have the virus are avoiding medical centres, hospitals, and non-essential medical services like bariatric surgery and physical therapy.

    Users are shifting their searches for their healthcare needs. Notably, people under shelter-in-place orders are seeking to receive care while staying in their homes. eMarketer published data from CivicScience which found that between February and March 2020, the number of U.S. adults who reported intent to use telemedicine rose from 18% to 30%. As a result, healthcare providers have to switch their offerings – along with their messaging – to emphasize virtual and telehealth services. The same is true for many restaurants as they pivot to pick up or delivery only.  

    The situation is different for B2B companies

    The situation is different for B2B companies, which have longer sales cycles. While businesses like restaurants are worried about running out of money now, B2B companies are concerned about how they’ll fare months and, in some cases, years from now. The instinct may be to cut down on marketing budgets to save money, but extreme changes in paid search strategies can have long-lasting effects on performance. During this time, it’s important B2B companies continue filling the funnel and building brand awareness to alleviate large sales gaps that can occur later in the year.  

    Financial service-related searches are surging

    Financial service-related searches are surging right now as people explore their options for economic relief like loans. Many companies in this space are smartly increasing their ad spending and shifting the bulk of it toward campaigns that push their best performing service lines. The same is true for ecommerce companies, especially those that sell household products and cleaning supplies, loungewear, cooking equipment, workout gear and entertainment items like board games and puzzles. Shares of Hasbro, for instance, have soared. For these companies, the adjustment is less about the offerings and more about the messaging.  

    Don’t stop advertising when times are tough 

    There are universal principles for how to optimize paid search strategies that apply to marketers in every industry. The first is not to neglect paid search, even during difficult times. The World Federation of Advertisers (WFA) recently ran a survey which found 81% of large advertisers deferred planned ad campaigns and cutting budgets due to the coronavirus pandemic. Of those surveyed, 57% said they had decreased budgets greatly or somewhat due to the virus outbreak; however, cutting out advertising or marketing completely can make the road to recovery more challenging.  

    Experts advise not to stop advertising during a downturn. Evidence from recent economic downturns like the 2008 housing crash show that companies come out stronger in the end if they continue investing in brand awareness. According to Google, “Even in categories where consumers have pulled back spending right now, creating a branding impact now will have a halo and pay dividends when the market normalizes. Research and historical examples of economic downturn have shown this to work.” It’s important to keep investing in your brand and branded keywords, regardless of industry. The last thing an organization wants is competitors monetizing on branded search results.  

    Every cent counts these days. Not only is paid search cost-effective with a low barrier to entry, but it also enables companies to be extremely agile. A company can get a campaign up and running pretty quickly, run tests, collect data and easily alter the messaging as things change day-to-day. Marketers can also see the results of engagement, click-through rates and conversions in real time, so they know whether their investment is paying off. COVID-19 is an unprecedented situation, so testing and learning are critical during this volatile time in the market.  

    Best practices for paid search 

    For any marketer thinking about how to adjust during COVID-19, here are a few best practices for how to optimize paid search.

    1. Pivot messaging

    Messaging needs to be both accurate and appropriate for the current landscape. Confirm that messaging is updated with current business hours and offerings, and revise CTAs away from messages like “Visit in-store.”  

    2. Keep an eye on the tone of messaging

    Is your copy appropriate or empathetic? An ad for booking a vacation package could feel out-of-touch. Customers will be turned off by companies that seem like they are trying to profit or gain from the pandemic, so craft communication to focus more on brand identity and values. Businesses can also use marketing to let customers know how they are responding to the pandemic. A construction firm or ecommerce company could talk about safety practices for workers, for example.   

    3. Adapt offerings to what your customers need

    As mentioned above, healthcare companies are moving to telehealth, restaurants are moving to pick up, delivery and B2B companies are repurposing content planned for conferences into virtual webinars. Marketers should be connecting with customers virtually to let them know how you are supporting them.  

    4. Adapt your strategy to your customers’ changing digital behaviour

    During the quarantine, desktop usage has increased. Conversely, the rise of remote work conditions and people being less on-the-go has caused mobile search traffic to decline by nearly 25%. We’ve all become accustomed to a mobile-first world, but given the predominance of desktop, it’s especially important to ensure all search ads and landing pages are optimized for both mobile and desktop.

    Move fast 

    This pandemic has caused so much of what used to be normal out of the window. Whereas before, marketers might have used a multiphase process for developing campaigns that involved planning and back-and-forth and feedback, now they have to act fast to keep up with the rapidly changing world. Marketers need to craft campaigns that are affordable, cost-effective and agile – and that means paid search.  

    As marketing and advertising professionals, we’re all trying to figure this out together as we go. There is no roadmap or rules, but there’s no doubt that staying flexible and using this time to connect with customers is a smart strategy.

     Brianna Desmet is Media specialist at digital and demand gen agency, R2i.

    The post COVID-19 has altered paid search: How marketers can adjust strategies appeared first on Search Engine Watch.

    Search Engine Watch