TechSEO Boost: Machine Learning for SEOs
This year’s TechSEO Boost, an event dedicated to technical SEO and hosted by Catalyst, took place on November 29 in Boston.
Billed as the conference “for developers and advanced SEO specialists,” TechSEO Boost built on the success of the inaugural event in 2017 with a day of enlightening, challenging talks from the sharpest minds in the industry.
Some topics permeated the discourse throughout the day and in particular, machine learning was a recurring theme.
As is the nature of the TechSEO Boost conference, the sessions aimed to go beyond the hype to define what precisely machine learning means for SEO, both today and in future.
The below is a recap of the excellent talk from Britney Muller, Senior SEO Scientist at Moz, entitled (fittingly enough) “Machine Learning for SEOs.”
What is machine learning? A quick recap.
The session opened with a brief primer on the key terms and concepts that fit under the umbrella of “machine learning.”
Muller used the definition in the image below to capture the sense of machine learning as “a subset of AI (Artificial Intelligence) that combines statistics and programming to give computers the ability to “learn” without explicitly being programmed.”
That core idea of “learning” from new stimuli is an important one to grasp as we consider how machine learning can be applied to daily SEO tasks.
Machine learning excels at identifying patterns in huge quantities of data. As such, some of the common examples of machine learning applications today include:
- Recommender systems (Netflix, Spotify)
- Ridesharing apps (Uber, Lyft)
- Digital Assistants (Amazon Alexa, Apple Siri, Google Assistant)
This very ubiquity can make it a challenging concept to grasp, however. In fact, Eric Schmidt at Google has gone so far as to say, “The core thing Google is working on is basically machine learning.”
It is helpful to break this down into the steps that comprise a typical machine learning project, in order to see how we might apply this to everyday SEO tasks.
The machine learning process
The image below represents the machine learning process Muller shared at TechSEO Boost:
It is important to bear in mind that some of the training data should be reserved for testing at a later point in the process.
Where possible, this data should also be labelled clearly to help the machine learning algorithm identify classifications and categories within a noisy data set.
It is for precisely this reason that Google asks us to label images to verify our identity:
This demonstrates our human ability to pick out objects in cluttered contexts, but it has the added benefit of providing Google with higher quality image data.
The pitfalls of an unsupervised approach to machine learning, and a training data set that is open to interpretation, were laid bare just last week.
Google’s ‘Smart Compose’ feature within Gmail has demonstrated gender bias by preferring certain pronouns when predicting what a user might want to say.
As reported in Reuters, “Gmail product manager Paul Lambert said a company research scientist discovered the problem in January when he typed “I am meeting an investor next week,” and Smart Compose suggested a possible follow-up question: “Do you want to meet him?” instead of “her.”
The challenge here is not restricted to projects on such a scale. Marketers who want to get their hands dirty must be aware of the limitations of machine learning, as well as its exciting possibilities.
Muller added that people tend to overfit their data, which reduces the accuracy and flexibility of the model they are using. This (very common) phenomenon occurs when a model corresponds very closely with one specific data set, reducing its applicability to new scenarios.
The ability to scale effectively is what gives machine learning its appeal, so overfitting is something to be avoided with care. There is a good primer to this topic here and it is also explained very well through this image:
So, how exactly can this subset of AI be used to improve SEO performance?
How you can use machine learning for SEO
As is the case with all hype-friendly technologies, businesses are keen to get involved with machine learning. However, the point is not to “use machine learning” through fear of being left behind, but rather to find the best uses of machine learning for each business.
Britney Muller shared some examples from her role at Moz during her session at TechSEO Boost.
The first was an approach to automated meta description generation using the Algorithmia Advanced Content Summarizer, which was then compared to Google’s approach to automated descriptions pulled directly from the landing page.
Meta descriptions remain an important asset when trying to encourage a positive click-through rate, but a lot of time is spent crafting these snippets. An automated alternative that can interpret the meaning of landing pages and create clickable summaries for display in the SERPs would be very useful.
Muller shared some examples, such as the image above, to demonstrate the comparison between the two approaches. The machine learning approach is not perfect and may require some tweaking, but it does an excellent job of conveying the page’s intent when compared to Google’s selection.
The team at Moz has since built this into Google Sheets:
Although this is not a product other businesses can access right now, an alternative way of achieving automated meta descriptions has been shared by Paul Shapiro (the TechSEO Boost host) via Github here.
Automated image optimization
Another fascinating use of machine learning for SEO is the automation of image optimization. Britney Muller showed how, in under 20 minutes, it is possible to train an algorithm to distinguish between cats and ducks, then use this model on a new data set with a high level of accuracy.
For large retailers, the application of this method could be very beneficial. With so many new images added to the inventory every day, and with visual search on the rise, a scalable image labeling system would prove very profitable. As demonstrated at TechSEO Boost, this is now a very realistic possibility for businesses willing to build their own model.
A further use of machine learning described by Britney Muller was the transcription of podcasts. An automated approach to this task can turn audio files into something much more legible for a search engine, thereby helping with indexation and ranking for relevant topics.
Muller detailed an approach using the Amazon Transcribe product through Amazon Web Services to achieve this aim.
The audio is broken down and delivered in a J-SON file in a lot of detail, with the different speakers on the podcast labelled separately.
There was not enough time in the session to work through every potential use of machine learning for SEO, but Muller’s core message was that everyone in the industry should be working towards at least a working knowledge of these concepts.
Some further opportunities for experimentation were listed as follows:
As we can see, machine learning truly excels when working with large data sets to identify patterns.
Tools and resources
The best way to get engaged is to combine theory with practice. This is almost always the case, but it is a particularly valid piece of advice in relation to programming.
Muller’s was not the first or last talk to reference Google Codelabs throughout the day.
There are more resources out there than ever before and the likes of Amazon and Google want machine learning to be approachable. Amazon has launched a machine learning course and Google’s crash course is a fantastic way to learn the components of a successful project.
The Google-owned Kaggle is always a great place to trial new data sets and review the innovative work performed by data scientists around the world, once a basic grasp has been attained.
Furthermore, Google’s Colaboratory makes it easy to get started on a project and work with a remote team.
Key takeaways: machine learning for SEOs
What became particularly clear through Muller’s talk is how approachable machine learning applications can be for SEOs. Moreover, the room for experimentation is unprecedented, for those willing to invest some time in the discipline.