More than 500 engineers, designers, managers and other staff from across global Google offices have signed an open letter at Medium.com calling on the search engine to stop their Dragonfly project and for company leaders to “commit to transparency, clear communication, and real accountability.”
Google’s project Dragonfly first came to light in August via The Intercept and their reports surrounding leaked documents containing details about the project. In a nutshell, the project looked to be a fresh attempt by Google to re-enter the Chinese search market since having been blocked by the state back in 2010.
Dragonfly was speculated to be taking the shape of a mobile search engine. This is unsurprising, as more than 97% of internet users in China go online via mobile devices (according to CNNIC) and it is in this vertical that the industry is seeing most disruption from companies such as Shenma, Sogou and Haosou.
What was surprising were the subsequent leaked details from The Intercept and other sources hinting that Dragonfly users would be forced to sign-in to make searches, with IP addresses and phone numbers being linked to their activities too. It is also expected that the service would be censored in order to adhere to the country’s ‘cyber sovereignty’ laws – and, an as yet unknown, domestic partner would also have access to this customer data.
Consequently, there was significant uproar among many Google employees. On August 20th, senior research scientist Jack Poulson left the company and published his lengthy resignation letter online. ‘I believe that Google is largely composed of altruistic employees,’ he wrote. ‘But, due to my conviction that dissent is fundamental to functioning democracies, I am forced to resign in order to avoid contributing to, or profiting from, the erosion of protections for dissidents.’
A dangerous precedent at a volatile political moment
The latest open letter published on November 27th echoes much of the sentiment expressed by Poulson in his resignation document. ‘We are Google employees and we join Amnesty International in calling on Google to cancel project Dragonfly, Google’s effort to create a censored search engine for the Chinese market that enables state surveillance,’ it says.
‘Our opposition to Dragonfly is not about China: we object to technologies that aid the powerful in oppressing the vulnerable, wherever they may be. The Chinese government certainly isn’t alone in its readiness to stifle freedom of expression, and to use surveillance to repress dissent. Dragonfly in China would establish a dangerous precedent at a volatile political moment, one that would make it harder for Google to deny other countries similar concessions.’
The post also describes 2018 as a year of disappointment for Google employees. It references Project Maven (where Google assisted with development of AI in US military drones) which saw protests and resignations, as well as the Rubingate scandal which saw Android developer Andy Rubin given ‘a hero’s farewell’ and a $ 90m exit package after claims of sexual misconduct were made against him – ultimately leading to a global synchronized walkout on November 1st.
Official responses from Google have not satisfied critics
Google are yet to address the direct requests as detailed in this latest open letter. At a Q & A session during the WIRED 25 Summit in October, Google CEO Sundar Pichai sung the praises of the technical achievements of Dragonfly during recent tests. He also justified Google’s position by saying: ‘We are compelled by our mission [to] provide information to everyone, and [China is] 20 percent of the world’s population.’
Pichai also made reference to Google’s stated position on China back in 2010, when it decided it would no longer censor its SERPs at google.cn essentially putting an end to its operations in the country (this is regarded as the preferable position among employees signing the latest open letter). He stated it was time to ‘re-evaluate that choice’ and when asked about employees who were critical of such changes in company policy remarked, ‘we don’t run the company by holding referendums.’
This letter arguably hasn’t yet captured as many headlines as the walk-out earlier this month. There are a number of likely reasons for this – the biggest being that Dragonfly isn’t at this point in time a tangible product which we can see. It is understandably hard to get the public and mainstream press behind a campaign criticising a product that doesn’t even have a launch date.
The ethical issues with Dragonfly are also less black and white than those raised with Project Maven and Rubingate earlier in the year. As one user asks in the comments under the letter: ‘If Google drops dragonfly, the Chinese netizens will end up using “Baidu”, is it better or worse for the netizens of China?’ Another user argues: ‘The debate here really is about sacrificing the western pride and values surrounding censorship in favour of enabling a billion people that are already censored to the freedoms of information the west has.’
These positions might be surprising to some of the Google employees who have signed the letter or left the company, but it is in-keeping with data published at The Drum showing that more than 72% of Weibo users (one of the country’s leading microblogging sites) would choose Google over Baidu et al. if it were to launch its new service. There is something to be said for the potential for Google to disrupt the monopoly Baidu has in the country, and to potentially deliver better quality results if not less censored ones.
The signees, of course are not alone either. They are joined by Amnesty International (and a number of other organisations) who are bolstering the argument that Dragonfly will endanger human rights defenders and journalists who might use it. The number of employees signing the letter is growing, and the number of organisations joining the opposition to Dragonfly looks to be growing too.
Many are now looking to a Congress hearing on December 5th where Pichai is set to defend Google against accusations of bias in its algorithm. It is likely that the subject of China and Dragonfly will rear its head there, too, and that Pichai will likely face his toughest round of questioning yet. As things stand, Google re-entering the Chinese search market is looking more and more likely to happen. We can reasonably expect that the closer we get to that time, the more people we will see join the voices of the signees of this latest letter and the more heated the debate will become.
The post Dragonfly: 500+ staff sign open letter for Google to drop new Chinese search engine appeared first on Search Engine Watch.
But don’t expect a flawless experience — it takes multiple steps. After updating the Google Assistant app on iOS, you need to open the app to set up a new Siri Shortcut for Google Assistant.
As the name suggests, Siri Shortcuts lets you record custom phrases to launch specific apps or features. For instance, you can create Siri Shortcuts to play your favorite playlist, launch directions to a specific place, text someone and more. If you want to chain multiple actions together, you can even create complicated algorithms using Apple’s Shortcuts app.
By default, Google suggests the phrase “OK Google.” You can choose something shorter, or “Hey Google,” for instance. After setting that up, you can summon Siri and use this custom phrase to launch Google’s app.
You may need to unlock your iPhone or iPad to let iOS open the app. The Google Assistant app then automatically listens to your query. Again, you need to pause and wait for the app to appear before saying your query.
This is quite a cumbersome walk-around and I’m not sure many people are going to use it. But the fact that “Hey Siri, OK Google” exists is still very funny.
On another note, Google Assistant is still the worst when it comes to your privacy. The app pushes you to enable “web & app activity,” the infamous all-encompassing privacy destroyer. If you activate that setting, Google will collect your search history, your Chrome browsing history, your location, your credit card purchases and more.
It’s a great example of dark pattern design. If you haven’t enabled web & app activity, there’s a flashy blue banner at the bottom of the app that tells you that you can “unlock more Assistant features.”
When you tap it, you get a cute little animated drawing to distract you from the text. There’s only one button, which says “More,” If you tap it, the “More” button becomes “Turn on” — many people are not even going to see “No thanks” on the bottom left.
It’s a classic persuasion method. If somebody asks you multiple questions and you say yes every time, you’ll tend to say yes to the last question even if you don’t agree with it. You tapped on “Get started” and “More” so you want to tap on the same button one more time. If you say no, Google asks you one more time if you’re 100 percent sure.
So make sure you read everything and you understand that you’re making a privacy trade-off by using Google Assistant.
Greene took over the position almost exactly three years ago when Google bought Bebop, the startup she was running. The thinking at the time was that the company needed someone with a strong enterprise background and Greene, who helped launch VMware, certainly had the enterprise credentials they were looking for.
In the blog post announcing the transition, she trumpeted her accomplishments. “The Google Cloud team has accomplished amazing things over the last three years, and I’m proud to have been a part of this transformative work. We have moved Google Cloud from having only two significant customers and a collection of startups to having major Fortune 1000 enterprises betting their future on Google Cloud, something we should accept as a great compliment as well as a huge responsibility,” she wrote.
The company had a disparate set of cloud services when she took over, and one of the first things Greene did was to put them all under a single Google Cloud umbrella. “We’ve built a strong business together — set up by integrating sales, marketing, Google Cloud Platform (GCP), and Google Apps/G Suite into what is now called Google Cloud,” she wrote in the blog post.
As for Kurian, he stepped down as president of product development at Oracle at the end of September. He had announced a leave of absence earlier in the month before making the exit permanent. Like Greene before him, he brings a level of enterprise street cred, which the company needs as it continues to try to grow its cloud business.
After three years with Greene at the helm, Google, which has tried to position itself as the more open cloud alternative to Microsoft and Amazon, has still struggled to gain market share against its competitors, remaining under 10 percent consistently throughout Greene’s tenure.
As Synergy’s John Dinsdale told TechCrunch in an article on Google Cloud’s strategy in 2017, the company had not been particularly strong in the enterprise to that point. “The issues of course are around it being late to market and the perception that Google isn’t strong in the enterprise. Until recently Google never gave the impression (through words or deeds) that cloud services were really important to it. It is now trying to make up for lost ground, but AWS and Microsoft are streets ahead,” Dinsdale explained at the time. Greene was trying hard to change that perception.
Holger Mueller, an analyst at Constellation Research says Greene was able to shift the focus to enterprise more, but he likes what Kurian brings to the table, even if it will take a bit of a cultural shift from his many years at Oracle. “What Greene did not address has been how to tie the product portfolio of Google’s autonomous and disparate development teams together. Kurian is a great fit for that job, having lead 35k+ developers at Oracle, ending the trench warfare between product teams and divisions that has plagued Oracle a decade ago,” Mueller explained.
Google has not released many revenue numbers related to the cloud, but in February it indicated they were earning a billion dollars a quarter, a number that Greene felt put Google in elite company. Amazon and Google were reporting numbers like that for a quarter at the time. Google stopped reporting cloud revenue after that report.
Regardless, the company will turn to Kurian to continue growing those numbers now. “I will continue as CEO through January, working with Thomas to ensure a smooth transition. I will remain a Director on the Alphabet board,” Greene wrote in her blog post.
Interestingly enough, Oracle has struggled with its own transition to the cloud. Kurian gets a company that was born in the cloud, rather than one that has made a transition from on-prem software and hardware to one solely in the cloud. It will be up to him to steer Google Cloud moving forward.
Over the past two years, the notch moved from anomaly to fact of life, and no company has proven itself more pro-notch than Google. From its embrace of #notchlife in Android Pie to the downright gigantic one found up top on the Pixel 3 XL, Google’s really notchin’ it up.
In fact, as noted by Android Police, the Pixel 3 XL has a notch so nice, Google’s delivering it twice. A number of owners have reported an admittedly hilarious bug that’s causing the massive handset to double up on the notch, with a second cutout appearing on the side of the device.
— UrAvgConsumer (@UrAvgConsumer) October 24, 2018
Google has acknowledged (acknotchleged?) the issue and noted that it’s working on a fix, which should be coming soon. The company hasn’t offered a reason behind the issue, but it appears to stem from Pie’s built-in notch feature, and likely has something to do with how the background adjusts when the handset changes from portrait to landscape mode.
It seems even in 2018, that’s a notch too far.
Yesterday while I was having a blast reading “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” I happened across some fun facts.
We got into some of the more technical goods from the paper yesterday, but figured these would also be an worthwhile — or at least more enjoyable — read. Friday and all.
1. “Wow, you looked at a lot of pages from my web site. How did you like it?” – people encountering a crawler for the first time
They note that they received almost daily emails from people either concerned about copyright issues or asking if they liked the site after looking at it. For many people with web pages, this was one of the first crawlers they had seen.
“It turns out that running a crawler which connects to more than half a million servers, and generates tens of millions of log entries generates a fair amount of email and phone calls. Because of the vast number of people coming on line, there are always those who do not know what a crawler is, because this is the first one they have seen. Almost daily, we receive an email something like, “Wow, you looked at a lot of pages from my web site. How did you like it?” There are also some people who do not know about the robots exclusion protocol, and think their page should be protected from indexing by a statement like, “This page is copyrighted and should not be indexed.”
More innocent times.
2. A billion web documents predicted by 2000
“It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. . . The goal of our system is to address many of the problems, both in quality and scalability, introduced by scaling search engine technology to such extraordinary numbers.”
Now in 2018, there are reportedly 130 trillion documents on the web — an extraordinary number indeed. And sure enough, their search has scaled to meet it.
3. Google took up 55 GB of storage
“The total of all the data used by the search engine requires a comparable amount of storage, about 55 GB.”
Now, Google is 2 billion lines of code. As noted by one of their engineering managers in 2016, the repository contains 86TB of data.
4. “People are still only willing to look at the first few tens of results.”
Please note: “tens.”
They write about the need for more precision in search. Remember the days when people regularly clicked past page 1?
5. Percentage of .com domains: from 1.5 to 60, to now 46.5
They note how “commercialized” the web was already becoming, leaving search engine technology “to be largely a black art and to be advertising oriented.”
“The Web has also become increasingly commercial over time. In 1993, 1.5% of web servers were on .com domains. This number grew to over 60% in 1997.”
According to Statistica, the number of .com domains is down to 46.5% as of May 2018.
“With Google,” they wrote, “we have a strong goal to push more development and understanding into the academic realm.”
6. “There are two types of hits: fancy hits and plain hits”
After going into some technical detail about optimized compact encoding, they reveal that they’ve their complex compact encoding preparations are categorized simply — endearingly — into fancy and plain.
7. Already defending user experience in anticipating search
From the start, it seems Brin and Page fought for users to not need to excessively specify their queries in order to get desired information. They wrote:
“Some argue that on the web, users should specify more accurately what they want and add more words to their query. We disagree vehemently with this position. If a user issues a query like “Bill Clinton” they should get reasonable results since there is a enormous amount of high quality information available on this topic. Given examples like these, we believe that the standard information retrieval work needs to be extended to deal effectively with the web.”
It’s interesting that this was so clearly in their thinking from the beginning. At last week’s Search Summit, Googler Juan Felipe Rincon said, “The future of search is no search, because search implies uncertainty. Instead, it will be about how you populate something before someone knows what they don’t know.”
8. There was a typo
In the second paragraph of section 3.2, they write “Couple this flexibility to publish anything with the enormous influence of search engines to route traffic and companies which deliberately manipulating search engines for profit become a serious problem.”
Did you catch it? The verb should be, “companies which are deliberately manipulating search engines become” or “companies which deliberately manipulate search engines become.” Of the utmost gravity, we know.
Just goes to show that even if an incomplete verb phrase won’t keep you from doing some pretty cool stuff in the world. And of course, that even the best of us need editors.
9. Search Engine Watch shout out
We tweeted this yesterday, but felt the need to share again for extra emphasis. Our very own Search Engine Watch was cited in the paper, stating that top search engines claimed to index 100 million web documents as of November 1997. Been a fun 21 years.
10: They chose these photos
Happy Friday, everyone.
Google Maps has been steadily rolling out new features to make its app more than just a way to find places and navigate to them. In recent months, it’s added things like group trip planning, music controls, commuter tools, ETA sharing, personalized recommendations, and more. Now, it’s introducing a new way for users to follow their favorite businesses, as well – like restaurants, bars, or stores, for example – in order to stay on top of their news and updates.
If that sounds a lot like Google Maps’ own version of Facebook Pages, you’re right.
Explains the company, once you tap the new “follow” to track a business, you’ll then be able to see news from those places like their upcoming events, their offers, and other updates right in the “For You” tab on Google Maps.
Events, deals and photo-filled posts designed to encourage foot traffic? That definitely sounds like a Facebook Page competitor aimed at the brick-and-mortar crowd.
Businesses can also use the Google Maps platform to start reaching potential customers before they open to the public, Google notes.
After building a Business Profile using Google My Business which includes their opening date, the business will then be surfaced in users’ searches on mobile web and in the app, up to three months before their opening.
This profile will display the opening date in orange just below the business name, and users can save the business to one of their lists, if they choose. Users can also view all the other usual business information, like address, phone, website and photos.
The new “follow” feature will be accessible to the over 150 million places already on Google Maps, as well as the millions of users who are seeking them out.
The feature has been spotted in the wild for some time before Google’s official announcement this week, and is rolling out over the next few weeks, initially on Android.
The “For You” tab is currently available in limited markets, with more countries coming soon, says Google.
In June 2017, the EU Commission sentenced Google to pay a fine of 2.4 billion euros. Now, up to 5% of Google’s daily revenue is at stake if they fail to make their Google Shopping channel more competitive. Google is taking aggressive measures to adhere to the EU’s ruling…
Read more at PPCHero.com
I came across this statement on the Web earlier this week, and wondered about it, and decided to investigate more:
If there are multiple instances of the same document on the web, the highest authority URL becomes the canonical version. The rest are considered duplicates.
I read that article from Dejan SEO, and thought it was worth exploring more. As I was looking around at Google patents that included the word “Authority” in them, I found this patent which doesn’t quite say the same thing that Dejan does, but is interesting in that it finds ways to distinguish between duplicate pages on different domains based upon priority rules, which is interesting in determining which duplicate page might be the highest authority URL for a document.
The patent is:
Identifying a primary version of a document
Inventors: Alexandre A. Verstak and Anurag Acharya
Assignee: Google Inc.
US Patent: 9,779,072
Granted: October 3, 2017
Filed: July 31, 2013
A system and method identifies a primary version out of different versions of the same document. The system selects a priority of authority for each document version based on a priority rule and information associated with the document version and selects a primary version based on the priority of authority and information associated with the document version.
Since the claims of a patent are what patent examiners at the USPTO look at when they are prosecuting a patent, and deciding whether or not it should be granted. I thought it would be worth looking at the claims contained within the patent to see if they helped encapsulate what it covered. The first one captures some aspects of it that are worth thinking about while talking about different document versions of particular documents, and how the metadata associated with a document might be looked at to determine which is the primary version of a document:
What is claimed is:
1. A method comprising: identifying, by a computer system, a plurality of different document versions of a particular document; identifying, by the computer system, a first type of metadata that is associated with each document version of the plurality of different document versions, wherein the first type of metadata includes data that describes a source that provides each document version of the plurality of different document versions; identifying, by the computer system, a second type of metadata that is associated with each document version of the plurality of different document versions, wherein the second type of metadata describes a feature of each document version of the plurality of different document versions other than the source of the document version; for each document version of the plurality of different document versions, applying, by the computer system, a priority rule to the first type of metadata and the second type of metadata, to generate a priority value; selecting, by the computer system, a particular document version, of the plurality of different document versions, based on the priority values generated for each document version of the plurality of different document versions; and providing, by the computer system, the particular document version for presentation.
This doesn’t advance the claim that the primary version of a document is considered the canonical version of that document, and all links pointed to that document are redirected to the primary version.
There is another patent that shares an inventor with this one that refers to one of the duplicate content URL being chosen as a representative page, though it doesn’t use the phrase “canonical.” From that patent:
In some embodiments, a method for selecting a representative document from a set of duplicate documents includes: selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score, where each respective document in the plurality of documents has a fingerprint that identifies the content of the respective document, the fingerprint of each respective document in the plurality of documents indicating that each respective document in the plurality of documents has substantially identical content to every other document in the plurality of documents, and a first document in the plurality of documents is associated with the query-independent score. The method further includes indexing, in accordance with the query independent score, the first document thereby producing an indexed first document; and with respect to the plurality of documents, including only the indexed first document in a document index.
This other patent is:
Representative document selection for a set of duplicate documents
Inventors: Daniel Dulitz, Alexandre A. Verstak, Sanjay Ghemawat and Jeffrey A. Dean
Assignee: Google Inc.
US Patent: 8,868,559
Granted: October 21, 2014
Filed: August 30, 2012
Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.
Regardless of whether the primary version of a set of duplicate documents is treated as the representative document as suggested in this second patent (whatever that may mean exactly), I think it’s important to get a better understanding of what a primary version of a document might be.
The primary version patent provides some reasons why one of them might be considered a primary version:
(1) Including of different versions of the same document does not provide additional useful information, and it does not benefit users.
(2) Search results that include different versions of the same document may crowd out diverse contents that should be included.
(3) Where there are multiple different versions of a document present in the search results, the user may not know which version is most authoritative, complete, or best to access, and thus may waste time accessing the different versions in order to compare them.
Those are the three reasons this duplicate document patent says it is ideal to identify a primary version from different versions of a document that appears on the Web. The search engine also wants to furnish “the most appropriate and reliable search result.”
How does it work?
The patent tells us that one method of identifying a primary version is as follows.
The different versions of a document are identified from a number of different sources, such as online databases, websites, and library data systems.
For each document version, a priority of authority is selected based on:
(1) The metadata information associated with the document version, such as
- The source
- Exclusive right to publish
- Licensing right
- Citation information
- Page rank
- The like
(2) As a second step, the document versions are then determined for length qualification using a length measure. The version with a high priority of authority and a qualified length is deemed the primary version of the document.
If none of the document versions has both a high priority and a qualified length, then the primary version is selected based on the totality of information associated with each document version.
The patent tells us that scholarly works tend to work under the process in this patent:
Because works of scholarly literature are subject to rigorous format requirements, documents such as journal articles, conference articles, academic papers and citation records of journal articles, conference articles, and academic papers have metadata information describing the content and source of the document. As a result, works of scholarly literature are good candidates for the identification subsystem.
Meta data that might be looked at during this process could include such things as:
- Author names
- Publication date
- Publication location
- Page rank
- Citation information
- Article identifiers such as Digital Object Identifier, PubMed Identifier, SICI, ISBN, and the like
- Network locution (e.g., URL)
- Reference count
- Citation count
- So forth
The patent goes into more depth about the methodology behind determining the primary version of a document:
The priority rule generates a numeric value (e.g., a score) to reflect the authoritativeness, completeness, or best to access of a document version. In one example, the priority rule determines the priority of authority assigned to a document version by the source of the document version based on a source-priority list. The source-priority list comprises a list of sources, each source having a corresponding priority of authority. The priority of a source can be based on editorial selection, including consideration of extrinsic factors such as reputation of the source, size of source’s publication corpus, recency or frequency of updates, or any other factors. Each document version is thus associated with a priority of authority; this association can be maintained in a table, tree, or other data structures.
The patent includes a table illustrating the source-priority list.
The patent includes some alternative approaches as well. It tells us that “the priority measure for determining whether a document version has a qualified priority can be based on a qualified priority value.”
A qualified priority value is a threshold to determine whether a document version is authoritative, complete, or easy to access, depending on the priority rule. When the assigned priority of a document version is greater than or equal to the qualified priority value, the document is deemed to be authoritative, complete, or easy to access, depending on the priority rule. Alternatively, the qualified priority can be based on a relative measure, such as given the priorities of a set of document versions, only the highest priority is deemed as qualified priority.
I was in a Google Hangout on air within the last couple of years where I and a number of other SEOs (Ammon Johns, Eric Enge, Jennifer Slegg, and I) asked some questions to John Mueller and Andrey Lipattse, and we asked some questions about duplicate content. It seems to be something that still raises questions among SEOs.
The patent goes into more detail regarding determining which duplicate document might be the primary document. We can’t tell whether that primary document might be treated as if it is at the canonical URL for all of the duplicate documents as suggested in the Dejan SEO article that I started with a link to in this post, but it is interesting seeing that Google has a way of deciding which version of a document might be the primary version. I didn’t go into much depth about quantified lengths being used to help identify the primary document, but the patent does spend some time going over that.
Is this a little-known ranking factor? The Google patent on identifying a primary version of duplicate documents does seem to find some importance in identifying what it believes to be the most important version among many duplicate documents. I’m not sure if there is anything here that most site owners can use to help them have their pages rank higher in search results, but it’s good seeing that Google may have explored this topic in more depth.
Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana
The post How Google might Identify Primary Versions of Duplicate Pages appeared first on SEO by the Sea ⚓.