CBPO

Tag: World

Schema, Structured Data, and Scattered Databases such as the World Wide Web

June 16, 2018 No Comments

I spoke at SMX Advanced this week on Schema markup and Structured Data, as part of an introduction to its use at Google.

I had the chance to visit Seattle, and tour some of it. I took some photos, but would like to go back sometimes and take a few more, and see more of the City.

One of the places that I did want to see was Pike Place market. It was a couple of blocks away from the Hotel I stayed at (the Marriott Waterfront.)

It is a combination fish and produce market, and is home to one of the earliest Starbucks.

pike-place-market-entrance

I could see living near the market and shopping there regularly. It has a comfortable feel to it.

Pike Place Farmers Market

This is a view of the Farmers Market from the side. I wish I had the chance to come back later in the day, and see what it was like other than in the morning.

Victor Steinbrueck Park

This was a nice little park next to Pike Place Market, which looked like a place to take your dog for a walk while in the area, and had a great view of Elliot Bay (the central part of Puget Sound.)

A view of Puget Sound

This is a view of the waterfront from closer to the conference center.

Mount Ranier

You can see Mount Ranier from the top of the Conference Center.

My presentation for SMX Advanced 2018:

Schema, Structured Data & Scattered Databases Such as the World Wide Web. My role in this session is to introduce Schema and Structured Data and how Google is using them on the Web.

Google is possibly best known for the PageRank Algorithm invented by founder Lawrence Page, whom it is named after. In what looks like the second patent filed by someone at Google was the DIPRE (Dual interative pattern relation expansion) patent, invented and filed by Sergey Brin. He didn’t name it after himself (Brinrank) like Page did with PageRank.

The provisional patent filed for this invention was the whitepaper, “Extracting Patterns and Relations from Scattered Databases such as the World Wide Web.” The process behind it is set out in the paper, and it involves a list of 5 books, titles, their authors, Publishers, Year published. Unlike PageRank, it doesn’t involve crawling webpages, and indexing links from Page to page and anchor text. Instead, it involves collecting facts from page to page, and when it finds pages that contain properties and attributes from these five books, it is supposed to collect similar facts about other books on the same site. And once it has completed, it is supposed to move on to other sites and look for those same 5 books, and collect more books. The idea is to eventually know where all the books are on the Web, and facts about those books, that could be used to answer questions about them.

This is where we see Google being concerned about structured data on the web, and how helpful knowing about it could be.

When I first started out doing inhouse SEO, it was for a Delaware incorporation business, and geography was an important part of the queries that my pages were found for. I had started looking at patents, and ones such as this one on “Generating Structured Data caught my attention. It focused on collecting data about local entities, or local businesses, and properties related to those. It was built by the team led by Andrew Hogue, who was in charge of the Annotation framework at Google, who were responsible for “The Fact Repository”, an early version of Google’s Knowledge Graph.

If you’ve heard of NAP consistency, and of mentions being important to local search, it is because Local search was focusing on collecting structured data that could be used to answer questions about businesses. Patents about location prominence followed, which told us that a link counted as a mention, and a patent on local authority, which determined which Website was the authoritative one for a business. But, it seemed to start with collecting structured data about businesses at places.

The DIPRE Algorithm focused upon crawling the web to find facts, and Google Maps built that into an approach that could be used to rank places and answer questions about them.

If you haven’t had a chance to use Google’s experimental table search, it is worth trying out. It can answer questions to find answers from data-based tables across the web, such as “what is the longest wooden pier in California”, which is the one in Oceanside, a town next to the one I live in. It is from a Webtables project at Google.

Database fields are sometimes referred to as schema and table headers which tell us what kind of data is in a table column may also be referred to as “schema”. A data-based web table could be considered a small structured database, and Google’s Webtable project found that there was a lot of information that could be found in web tables on the Web.

Try out the first link above (the WebTables Project Slide) when you get the chance, and do some searches on Google’s table search. The second paper is one that described the WebTables project when it first started out, and the one that follows it describes some of the things that Google researchers learned from the Project. We’ve seen Structured Snippets like the one above grabbing facts to include in a snippet (in this case from a data table on the Wikipedia page about the Oceanside Pier.)

When a data table column contains the same data that another table contains, and the first doesn’t have a table header label, it might learn a label from the second table (and this is considered a way to learn semantics or meaning from tables) These are truly scattered databases across the World Wide Web, but through the use of crawlers, that information can be collected and become useful, like the DIPRE Algorithm described.

In 2005, the Official Google Blog published this short story, which told us about Google sometimes answering direct questions in response to queries at the top of Web results. I don’t remember when these first started appearing, but do remember Definition results about a year earlier, which you could type out “Define:” and a word or ask “What is” before a word and Google would show a definition, and there was a patent that described how they were finding definitions from glossary pages, and how to ideally set up those glossaries, so that your definitions might be the ones that end up as responses.

In 2012, Google introduced the Knowledge Graph, which told us that they would be focusing upon learning about specific people, places and things, and answering questions about those instead of just continuing to match keywords in queries to keywords in documents. They told us that this was a move to things instead of strings. Like the books in Brin’s DIPRE or Local Entities in Google Maps.

We could start using the Web as a scattered database, with questions and answers from places such as Wikipedia tables helping to answer queries such as “What is the capital of Poland”

And Knowledge bases such as Wikipedia, Freebase, IMDB and Yahoo Finance could be the sources of facts about properties and attributes about things such as movies and actors and businesses where Google could find answers to queries without having to find results that had the same keywords in the document as the query.

In 2011, The Schema.org site was launched as a joint project from Google, Yahoo, Bing, and Yandex, that provided machine-readable text that could be added to web pages. This text is provided in a manner that is machine readable only, much like XML sitemaps are intended to be machine-readable, to provide an alternative channel of information to search engines about the entities pages are about, and the properties and attributes on those pages.

While Schema.org was introduced in 2011, it was built to be extendable, and to let subject matter experts be able to add new schema, like this extension from GS1 (the inventors of barcodes in brick and mortar stores) If you haven’t tried out this demo from them, it is worth getting your hands on to see what is possible.

In 2014, Google published their Biperpedia paper, which tells us about how they might create ontologies from Query streams (sessions about specific topics) by finding terms to extract data from the Web about. At one point in time, Search engines would do focused crawls of the web starting at sources such as DMOZ, so that the Index of the Web they were constructing contained pages about a wide range of categories. By using query stream information, they are crowdsourcing the building of resources to build ontologies about. This paper tells us that Biperpedia enabled them to build ontologies that were larger than what they had developed through Freebase, which may be partially why Freebase was replaced by wiki data.

The Google+ group I’ve linked to above on the Schema Resources Page has members who work on Schema from Google, such as Dan Brickley, who is the head of schema for Google. Learning about extensions is a good idea, especially if you might consider participating in building new ones, and the community group has a mailing list, which lets you see and participate in discussions about the growth of Schema.


Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Schema, Structured Data, and Scattered Databases such as the World Wide Web appeared first on SEO by the Sea ⚓.


SEO by the Sea ⚓


The Future of Ad Copy in an AI World

April 24, 2018 No Comments

In this webinar, AdFury’s Christopher Hoover and Hanapin’s Lauren Rosner will discuss the hot topic of AI and what the future could look like when automation becomes more and more popular in our industry – the good, the bad, and the challenging.

Read more at PPCHero.com
PPC Hero



Toyota’s Self-Driving Cars Get New Lasers to See the World

September 27, 2017 No Comments

With some help from Luminar and its 22-year-old CEO Austin Russell.
Feed: All Latest


Optimizing Display Advertising in an Omni-Channel World

September 19, 2017 No Comments

We’re teaming up with DialogTech to tell you all about display tactics you can put in place for conversion success, both online and over the phone. DialogTech’s Blair Symes and Hanapin’s Stephanie White will show you the advanced strategies that will have a big impact on your display ROI.

Read more at PPCHero.com
PPC Hero


A world without “(not provided)”: How unlocking organic keyword data leads to a better web

August 9, 2017 No Comments

Beginning in 2011, search marketers began to lose visibility over the organic keywords that consumers were using to find their websites, as Google gradually switched all of its searches over to secure search using HTTPS.

As it did so, the organic keyword data available to marketers in Google Analytics, and other analytics platforms, slowly became replaced by “(not provided)”. By 2014, the (not provided) issue was estimated to impact 80-90% of organic traffic, representing a massive loss in visibility for search marketers and website owners.

Marketers have gradually adjusted to the situation, and most have developed rough workarounds or ways of guessing what searches are bringing customers to their site. Even so, there’s no denying that having complete visibility over organic keyword data once more would have a massive impact on the search industry – as well as benefits for SEO.

One company believes that it has found the key to unlocking “(not provided)” keyword data. We spoke to Daniel Schmeh, MD and CTO at Keyword Hero, a start-up which has set out to solve the issue of “(not provided)”, and ‘Wizard of Moz’ Rand Fishkin, about how “(not provided)” is still impacting the search industry in 2017, and what a world without it might look like.

Content produced in association with Keyword Hero.

“(not provided)” in Google Analytics: How does it impact SEO?

“The “(not provided)” keyword data issue is caused by Google the search engine, so that no analytics program, Google Analytics included, can get the data directly,” explains Rand Fishkin, founder and former CEO of Moz.

“Google used to pass a referrer string when you performed a web search with them that would tell you – ‘This person searched for “red shoes” and then they clicked on your website’. Then you would know that when people searched for “red shoes”, here’s the behavior they showed on your website, and you could buy ads against that, or choose how to serve them better, maybe by highlighting the red shoes on the page better when they land there – all sorts of things.”

“You could also do analytics to understand whether visitors for that search were converting on your website, or whether they were having a good experience – those kinds of things.

“But Google began to take that away around 2011, and their reasoning behind it was to protect user privacy. That was quickly debunked, however, by folks in the industry, because Google provides that data with great accuracy if you choose to buy ads with them. So there’s obviously a huge conflict of interest there.

“I think the assumption at this point is that it’s just Google throwing their weight around and being the behemoth that they can be, and saying, ‘We don’t want to provide this data because it’s too valuable and useful to potential competitors, and people who have the potential to own a lot of the search ranking real estate and have too good of an idea of what patterns are going on.

“I think Google is worried about the quality and quantity of data that could be received through organic search – they’d prefer that marketers spend money on advertising with Google if they want that information.”

Where Google goes, its closest competitors are sure to follow, and Bing and Yandex soon followed suit. By 2013, the search industry was experiencing a near-total eclipse of visibility over organic keyword data, and found itself having to simply deal with the consequences.

“At this point, most SEOs use the data of which page received the visit from Google, and then try to reverse-engineer it: what keywords does that page rank for? Based on those two points, you can sort of triangulate the value you’re getting from visitors from those keywords to this page,” says Fishkin.

However, data analysis and processing have come a long way since 2011, or even 2013. One start-up believes that it has found the key to unlocking “(not provided)” keyword data and giving marketers back visibility over their organic keywords.

How to unlock “(not provided)” keywords in Google Analytics

“I started out as a SEO, first in a publishing company and later in ecommerce companies,” says Daniel Schmeh, MD and CTO of SEO and search marketing tool Keyword Hero, which aims to provide a solution to “(not provided)” in Google Analytics. “I then got into PPC marketing, building self-learning bid management tools, before finally moving into data science.

“So I have a pretty broad understanding of the industry and ecosystem, and was always aware of the “(not provided)” problem.

“When we then started buying billions of data points from browser extensions for another project that I was working on, I thought that this must be solvable – more as an interesting problem to work on than a product that we wanted to sell.”

Essentially, Schmeh explains, solving the problem of “(not provided)” is a matter of getting access to the data and engineering around it. Keyword Hero uses a wide range of data sources to deduce the organic keywords hidden behind the screen of “(not provided)”.

“In the first step, the Hero fetches all our users’ URLs,” says Schmeh. “We then use rank monitoring services – mainly other SEO tools and crawlers – as well as what we call “cognitive services” – among them Google Trends, Bing Cognitive Services, Wikipedia’s API – and Google’s search console, to compute a long list of possible keywords per URL, and a first estimate of their likelihood.

“All these results are then tested against real, hard data that we buy from browser extensions.

“This info will be looped back to the initial deep learning algorithm, using a variety of mathematical concepts.”

Ultimately, the process used by Keyword Hero to obtain organic keyword data is still guesswork, but very advanced guesswork.

“All in all, the results are pretty good: in 50 – 60% of all sessions, we attribute keywords with 100% certainty,” says Schmeh.

“For the remainder, at least 83% certainty is needed, otherwise they’ll stay (not provided). For most of our customers, 94% of all sessions are matched, though in some cases we need a few weeks to get to this matching rate.”

If the issue of “(not provided)” organic keywords has been around since 2011, why has it taken us this long to find a solution that works? Schmeh believes that Keyword Hero has two key advantages: One, they take a scientific approach to search, and two, they have much greater data processing powers compared with six years ago.

“We have a very scientific approach to SEO,” he says.

“We have a small team of world-class experts, mostly from Fraunhofer Institute of Technology, that know how to make sense of large amounts of data. Our background in SEO and the fact that we have access to vast amounts of data points from browser extensions allowed us to think about this as more of a data science problem, which it ultimately is.

“Processing the information – the algorithm and its functionalities – would have worked back in 2011, too, but the limiting factor is our capability to work with these extremely large amounts of data. Just uploading the information back into our customers’ accounts would take 13 hours on AWS [Amazon Web Services] largest instance, the X1 – something we could never afford.

“So we had to find other cloud solutions – ending up with things that didn’t exist even a year ago.”

A world without “(not provided)”: How could unlocking organic keyword data transform SEO?

If marketers and website owners could regain visibility over their organic keywords, this would obviously be a huge help to their efforts in optimizing for search and planning a commercial strategy.

But Rand Fishkin also believes it would have two much more wide-reaching benefits: it would help to prove the worth of organic SEO, and would ultimately lead to a better user experience and a better web.

“Because SEO has such a difficult time proving attribution, it doesn’t get counted and therefore businesses don’t invest in it the way they would if they could show that direct connection to revenue,” says Fishkin. “So it would help prove the value, which means that SEO could get budget.

“I think the thing Google is most afraid of is that some people would see that they rank organically well enough for some keywords they’re bidding on in AdWords, and ultimately decide not to bid anymore.

“This would cause Google to lose revenue – but of course, many of these websites would save a lot of money.”

And in this utopian world of keyword visibility, marketers could channel that revenue into better targeting the consumers whose behavior they would now have much higher-quality insights into.

“I think you would see more personalization and customization on websites – so for example, earlier I mentioned a search for ‘red shoes’ – if I’m an ecommerce website, and I see that someone has searched for ‘red shoes’, I might actually highlight that text on the page, or I might dynamically change the navigation so that I had shades of red inside my product range that I helped people discover.

“If businesses could personalize their content based on the search, it could create an improved user experience and user performance: longer time on site, lower bounce rate, higher engagement, higher conversion rate. It would absolutely be better for users.

“The other thing I think you’d see people doing is optimizing their content efforts around keywords that bring valuable visitors. As more and more websites optimized for their unique search audience, you would generally get a better web – some people are going to do a great job for ‘red shoes’, others for ‘scarlet sandals’, and others for ‘burgundy sneakers’. And as a result, we would have everyone building toward what their unique value proposition is.”

Daniel Schmeh adds that unlocking “(not provided)” keyword data has the ability to make SEO less about guesswork and more substantiated in numbers and hard facts.

“Just seeing simple things, like how users convert that use your brand name in their search phrase versus those who don’t, has huge impact on our customers,” he says. “We’ve had multiple people telling us that they have based important business decisions on the data.

“Seeing thousands of keywords again is very powerful for the more sophisticated, data-driven user, who is able to derive meaningful insights; but we’d really like the Keyword Hero to become a standard tool. So we’re working hard to make this keyword data accessible and actionable for all of our users, and will soon be offering features like keyword clustering – all through their Google Analytics interface.”

To find out more about how to unlock your “(not provided)” keywords in Google Analytics, visit the Keyword Hero website.

Search Engine Watch


Breathtaking Photos of a World Without Light Pollution

April 26, 2017 No Comments
Breathtaking Photos of a World Without Light Pollution

Gavin Heffernan and Harun Mehmedinovic set out to document the last dark skies around North America. The post Breathtaking Photos of a World Without Light Pollution appeared first on WIRED.
WIRED


The New World Of PPC Targeting

January 15, 2017 No Comments

As more steps of the customer journey have moved online and with the great shift to mobile, we shouldn’t be stuck in a last click, myopic view of our marketing.

Read more at PPCHero.com
PPC Hero


60% Better CTR With Google DSK Campaigns And Other Real World Results

December 28, 2016 No Comments

Google will pair your target with the placements that most frequently lead to post-click conversions through a feature called Display Select Keywords (DSK).

Read more at PPCHero.com
PPC Hero


As Software Eats The World, Agile Tool Provider Hansoft Raises $8.4M To Steal Competitors’ Lunch

February 11, 2014 No Comments

Launched all the way back in 2005, Sweden-headquartered Hansoft provides tools to help teams collaborate and manage the development of products and services using “Agile software development” — the umbrella term for a group of software development methods based on iterative and incremental development. The goal with Agile is to enable code to be developed, improved and shipped quicker. Software is eating the world, after all.

Interestingly, however, though not wholly surprising for a company that’s generated revenue from the get-go, Hansoft has, until now, been entirely self-funded. Today that changes with the announcement that it’s raised a first round of funding.

The company has raised $ 8.4 million in investment led by Stockholm-based VC and early backer in Spotify Creandum. A number of private individual investors also participated in the round, including (and noteworthy) Mårten Mickos, the former CEO MySQL and currently CEO of Eucalyptus Systems.

Although its legacy is in supporting video game, largely console game, studios, today Hansoft’s customers are companies making software and related hardware products and cloud services in a range of industries, such as telecom, game development, electronics, aerospace, space and defence.

What these companies have in common, says Hansoft co-founder and CEO, Patric Palm, is that they need to “not only scale Agile software development to very large teams, but also out-innovate their completion to stay competitive”. That’s the specific problem that Hansoft’s tools, which run on Mac, Windows and multiple Linux distributions, have been designed to solve.

“Most of our customers are trying to scale Agile development to large complex environments. Typically 5-10 scrum teams in each program and several programs making up a portfolio,” he adds.

My understanding is that this translates into a typical client buying around 1,000 “seats” — the company charges per-user — who are working on between 5 and 10 or so product programs, with 30-200 programers working on the same product.

When Agile methods were conceived and become popular, however, one could argue they were never originally envisioned to scale up to such large development teams, but this is exactly the space that Hansoft and its competitors are playing in, although Palm says that Hansoft is also used by startups and other smaller teams. It’s free for up 9 users.

I asked him why the company is only taking external funding now? “Our competitors are quite rich,” says Palm, adding that, while it’s been fun playing David and Goliath, “we’re not picking enough fights”. Those wealthy competitors include Jira (Atlassian), and IBM RTC, which Palm says many Hansoft customers migrate from. An even more direct rival is IPOed Rally.

Hansoft currently has 35 employees based in Sweden, with a second office in San Francisco housing 2 employees. With new funding, Palm says the company will expand the latter, including recruiting a Chief Growth Officer. Its distributed executive team will see Sweden focus on development, and San Francisco, sales.


TechCrunch » Enterprise