Monthly Archives: September 2017
Navneet Panda, whom the Google Panda update is named after, has co-invented a new patent that focuses on site quality scores. It’s worth studying to understand how it determines the quality of sites.
Back in 2013, I wrote the post Google Scoring Gibberish Content to Demote Pages in Rankings, about Google using ngrams from sites and building language models from them to determine if those sites were filled with gibberish, or spammy content. I was reminded of that post when I read this patent.
Rather than explaining what ngrams are in this post (which I did in the gibberish post), I’m going to point to an example of ngrams at the Google n-gram viewer, which shows Google indexing phrases in scanned books. This article published by the Wired site also focused upon ngrams: The Pitfalls of Using Google Ngram to Study Language.
An ngram phrase could be a 2-gram, a 3-gram, a 4-gram, or a 5-gram phrase; where pages are broken down into two-word phrases, three-word phrases, four-word phrases, or 5 word phrases. If a body of pages are broken down into ngrams, they could be used to create language models or phrase models to compare to other pages.
Language models, like the ones that Google used to create gibberish scores for sites could also be used to determine the quality of sites, if example sites were used to generate those language models. That seems to be the idea behind the new patent granted this week. The summary section of the patent tells us about this use of the process it describes and protects:
In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of obtaining baseline site quality scores for a plurality of previously-stored sites; generating a phrase model for a plurality of sites including the plurality of previously-scored sites, wherein the phrase model defines a mapping from phrase-specific relative frequency measures to phrase-specific baseline site quality scores; for a new site, the new site not being one of the plurality of previously-scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of the plurality of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.
The newly granted patent from Google is:
Predicting site quality
Inventors: Navneet Panda and Yun Zhou
US Patent: 9,767,157
Granted: September 19, 2017
Filed: March 15, 2013
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicating a measure of quality for a site, e.g., a web site. In some implementations, the methods include obtaining baseline site quality scores for multiple previously scored sites; generating a phrase model for multiple sites including the previously scored sites, wherein the phrase model defines a mapping from phrase specific relative frequency measures to phrase specific baseline site quality scores; for a new site that is not one of the previously scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.
In addition to generating ngrams from text upon sites, in some versions of the implementation of this patent will include generating ngrams from anchor text of links pointing to pages of the sites. Building a phrase model involves calculating the frequency of n-grams on a site “based on the count of pages divided by the number of pages on the site.”
The patent tells us that site quality scores can impact rankings of pages from those sites, according to the patent:
Obtain baseline site quality scores for a number of previously-scored sites. The baseline site quality scores are scores used by the system, e.g., by a ranking engine of the system, as signals, among other signals, to rank search results. In some implementations, the baseline scores are determined by a backend process that may be expensive in terms of time or computing resources, or by a process that may not be applicable to all sites. For these or other reasons, baseline site quality scores are not available for all sites.
Copyright © 2017 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
The post Using Ngram Phrase Models to Generate Site Quality Scores appeared first on SEO by the Sea ⚓.
Researchers have been rushing to tease apart how Zika went from tepid to toxic, and they may have found the genetic tweak to blame.
Feed: All Latest
Facebook has made no secret of its wish to do more in the online recruitment market — encroaching on territory today dominated by LinkedIn, the leader in tapping social networking graphs to boost job-hunting. Today, Facebook is taking the next step in that process. Facebook will now integrate with ZipRecruiter — an aggregator that allows those looking to fill jobs to post ads to… Read More
Social – TechCrunch
It seems odd that millennials get all of the marketing column inches when the over 50s have all of the cash.
Read more at PPCHero.com
This month’s Brighton SEO delegates all hoped for Google’s Gary Illyes to enlighten them on the major talking points in search this year. They weren’t disappointed.
Google algorithm updates are frequently on the minds of SEOs and webmasters, and have been a hot topic for years. We are always on tenterhooks, waiting for the next change that could damage our site’s rankings.
We are never able to rest, always at risk of being penalized by the next animal to enter Google’s zoo of updates.
Past assumptions about Google Fred
Back on March 7th 2017, many webmasters reported unexpected fluctuations to rankings. The name Google Fred then began to circulate, following a chat on Twitter between Barry Schwartz and Google’s Gary Illyes where Gary joked about future updates being named Fred.
sure! From now on every update, unless otherwise stated, shall be called Fred
— Gary "鯨理" Illyes (@methode) March 9, 2017
We safely assumed there was an adjustment to the algorithm as Google confirmed there are updates happening every day. As usual, Google did not confirm any details about this particular update, but analysis of affected sites suggested it focused on poor quality content sites that were benefiting from monetization tactics.
As this update felt larger than the normal day-to-day algorithm changes, it seemed only natural it should be worthy of a name. As a result, the name “Google Fred” officially stuck, despite Gary Illyes intending his tongue-in-cheek comment to refer to all future updates.
So how can we tell the difference between the Fred update in March and other updates?
What is Google Fred, really?
In a Q&A session at September’s Brighton SEO, Google Fred was brought up once again, and we got the final word on Fred from Gary Illyes himself. Here’s what Fred’s creator had to say:
Interviewer: Let’s talk about Fred.
Gary Illyes: Who?
Interviewer: You are the person that created Fred. So Fred is basically an algo that…
Gary Illyes: It’s not one algo, it’s all the algos.
Interviewer: So you can confirm it’s not a single algo – it’s a whole umbrella of a bunch of different changes and updates that everyone has just kind of put under this umbrella of “Fred”.
Gary Illyes: Right, so the story behind Fred is that basically I’m an asshole on Twitter. And I’m also very sarcastic which is usually a very bad combination. And Barry Schwartz, because who else, was asking me about some update that we did to the search algorithm.
And I don’t know if you know, but in average we do three or two to three updates to the search algorithm, ranking algorithm every single day. So usually our response to Barry is that sure, it’s very likely there was an update. But that day I felt even more sarcastic than I actually am, and I had to tell him that.
Oh, he was begging me practically for a name for the algorithm or update, because he likes Panda or Penguin and what’s the new one. Pork, owl, shit like that. And I just told him that, you know what, from now on every single update that we make – unless we say otherwise – will be called Fred; every single one of them.
Interviewer: So now we’re in a perpetual state of Freds?
Gary Illyes: Correct. Basically every single update that we make is a Fred. I don’t like, or I was sarcastic because I don’t like that people are focusing on this.
Every single update that we make is around quality of the site or general quality, perceived quality of the site, content and the links or whatever. All these are in the Webmaster Guidelines. When there’s something that is not in line with our Webmaster Guidelines, or we change an algorithm that modifies the Webmaster Guidelines, then we update the Webmaster Guidelines as well.
Or we publish something like a Penguin algorithm, or work with journalists like you to publish, throw them something like they did with Panda.
Interviewer: So for all these one to two updates a day, when webmasters go on and see their rankings go up or down, how many of those changes are actually actionable? Can webmasters actually take something away from that, or is it just under the generic and for the quality of your site?
Gary Illyes: I would say that for the vast majority, and I’m talking about probably over 95%, 98% of the launches are not actionable for webmasters. And that’s because we may change, for example, which keywords from the page we pick up because we see, let’s say, that people in a certain region put up the content differently and we want to adapt to that.
Basically, if you publish high quality content that is highly cited on the internet – and I’m not talking about just links, but also mentions on social networks and people talking about your branding, crap like that.
Then, I shouldn’t have said that right? Then you are doing great. And fluctuations will always happen to your traffic. We can’t help that; it would be really weird if there wasn’t fluctuation, because that would mean we don’t change, we don’t improve our search results anymore.
(Transcript has been lightly edited for clarity)
So there we have it: every update is a Fred unless otherwise stated. The ranking drops in March may well have been triggered by the “original” Fred update, but so will all fluctuations, for they are all Fred.
How can we optimize for Fred?
Gary says that 95-98% of updates are not actionable for webmasters. With two or three updates a day, that accounts for a lot of updates each year! So what do we do?
The answer is simple – do what you were doing before. Build great websites, build your brand and produce high quality content aimed to satisfy the needs of searchers whilst adhering to the Webmaster Guidelines.
As Simon Ensor wrote in his recent article on the SEO industry and its sweeping statements, SEOs shouldn’t fear algorithm updates from Google:
“Many may complain that Google moves the goalposts but in actual fact, the fundamentals remain the same. Avoiding manipulative behavior, staying relevant, developing authority and thinking about your users are four simple factors that will go a long way to keeping you on the straight and narrow.
The Google updates are inevitable. Techniques will evolve, and results will require some hard graft. Every campaign is different, but if you stick to the core principles of white-hat SEO, you need not take notice of the sweeping statements that abound in our corner of the marketing world. Nor should you have to fear future Google updates.”
What does it mean for SEOs?
Sage advice aside, this explanation from Gary Illyes may still leave SEOs feeling slightly frustrated. We appreciate that not every small update warrants a name or set of webmaster guidelines, but we still have a job to do and a changeable industry to make sense of.
We have stakeholders and clients to answer to and explain ranking fluctuations to. It doesn’t help us to put all updates under the carpet of Fred.
Of course we would find it really useful if each major update came with clear guidelines immediately, not leaving us for days in the dark, figuring it out and stabilizing our rankings.
But maybe – as Gary may have been alluding – where would the fun be if it were that simple?
To read the full transcript of the Q&A with Gary Illyes or watch a recording of the interview, check out this blog post by iThinkMedia.
Microsoft today announced that it has joined the Open Source Initiative (OSI) as a Premium Sponsor. The OSI, which launched in 1998, takes a relatively pragmatic approach to open source and advocates for open source in business and government. The OSI also reviews open source licenses, which are often vendor specific, to ensure that they conform to “community norms and… Read More
Enterprise – TechCrunch
With some help from Luminar and its 22-year-old CEO Austin Russell.
Feed: All Latest
Late yesterday Twitter announced the unthinkable for time-pressed journalists everywhere by revealing it is testing doubling the character length of tweets. Farewell sweet brevity of 140chars. Hello pointless extra blah-blah-blah #280. Read More
Social – TechCrunch
Pinterest today announced a partnership with Target to integrate its visual search technology known as Lens into Target’s apps, and later its desktop website. The deal, which represents the first time the company has integrated Lens into a retailer’s app, will allow Target shoppers to snap a photo of any product, and then find similar items available for sale at Target. The… Read More
- What is speakable schema markup and how does it impact the future of SEO?
- Tech giants should let startups defer cloud payments
- Search specialist shares five ways to adapt your search strategy in uncertain times
- Social Bluebook was hacked, exposing 217,000 influencers’ accounts
- John Borthwick & Matt Hartman of betaworks discuss coronavirus adaptation strategies