Friday, March 13, 2015

Are On-Topic Links Important? - Whiteboard Friday

Posted by randfish


How much does the context of a link really matter? In today's Whiteboard Friday, Rand looks at on- and off-topic links to uncover what packs the greatest SEO punch and shares what you should be looking for when building a high-quality link.



For reference, here's a still of this week's whiteboard!


On-Topic Links Whiteboard


Video Transcription



Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we're going to chat a little bit about on-topic and off-topic links. One of the questions and one of the topics that you see discussed all the time in the SEO world is: Do on-topic links matter more than off-topic links? By on topic, people generally mean they come from sites and pages that are on the same or very similar subject matter to the site or page that I'm trying to get the link to.


It sort of makes intuitive sense to us that Google would care somewhat about this, that they would say, "Oh, well, here's our friend over here," we'll call him Steve. No we're going to call him Carl, because Carl is a great name.


Carl, of course, has CarlsCloset.net, CarlsCloset.net being a home organization site. Carl is going out, and he's doing some link building, which he should, and so he's got some link targets in mind. He looks at places like RealSimple.com, the magazine site, Sunset Magazine, UnderwaterHoagies.com, Carl being a great fan of all things underwater and sandwich related. So as he's looking at these sites, he's thinking to himself, well, from an SEO perspective, is it necessary the case that Real Simple, which has a lot of content on home organization and on cleaning up clutter and those kinds of things, is that going to help Carl's Closet site rank better than, say, a link from UnderwaterHoagies.com?


The answer is a little tough here. It could be the case that UnderwaterHoagies.com has a feature article all about how submariners can keep their home in order, even as they brunch under the sea. But maybe the link from RealSimple.com is coming from a less on-topic article and page. So this starts to get really messy. Is it the site that matters, or is it the page that matters? Is it the context that matters? Is it the link itself and where that's embedded in the site? What is the real understanding that Google has between relationships of on-topic and off-topic? That's where you get a lot of convoluted information.


I have seen and we have probably all heard a ton of anecdotal evidence on both sides. There are SEOs who will argue passionately from their experience that what they've seen is that on-topic links are hugely more beneficial than off-topic ones. You'll see the complete opposite from some other folks. In fact, most of my personal experiences, when I was doing more directed link building for clients way back in my SEO consulting days and even more recently as I've helped startups and advised folks, has been that off-topic links, UnderwaterHoagies.com linking to Carl's Closet, that still seems to provide quite a bit of benefit, and it's very had to gauge whether it's as much, less than, more than any of these other ones. So I think, on the anecdotal side, we're in a tough spot.


What we can say is that probably there's some additional value from on-topic sites, on-topic pages, or on-topic link connections, that Google has some idea of context. We've seen them make huge strides with algorithms like Hummingbird, certainly with their keyword matching and topic modeling algorithms. It seems very unlikely that there would be nothing in Google's algorithm that looks at the context or relationship of content between linking pages and linking websites.


However, in the real world, things are almost never equal. It's not like they're going to get exactly the same anchor text from the same importance of a page that has the same number of external links, that the content is exactly the same on all three of these websites pointing over to Carl's Closet. In the real world, Carl is going to struggle much harder to get some of these links than others. So I think that the questions we need to ask ourselves, as folks who are doing directed marketing and trying to earn links, is: Will the link actually help people? Is that link going to be clicked?


If you're on a page on Real Simple that you think very few people ever reach, you think very few people will ever click that link because it just doesn't appear to provide much value, versus you're in an article all about home organization on Underwater Hoagies, and it was featured on their home page, and you're pretty sure that a lot of the submariners who are eating their subs under the sea are very interested in this topic and they're going to click on that link, well you know what? That's a link that helps people. That probably means search engines are going to treat it with some reverence as well.


Does the link make sense in context? This is a good one to ask yourself when you are doing any kind of link building that's directed that could potentially be manipulative. If the link makes sense in context, it tends to be the case that it's going to be more useful. So if Carl contributes the article to UnderwaterHoagies.com, and the link makes sense in context, and it will help people, I think it's appropriate to put it there. If that's not the case, it could look a little manipulative. It could certainly be perceived as self-serving.


Then, can you actually acquire the link? It's wonderful when you go out and you make a list of, hey, here's the most important and relevant sites in our sector and niche, and this is how we're going to build topical authority. But if you can't get those links, hey that's tough potatoes, man. It's no better than putting a list of links and just sorting them by, God knows, a horrible metric like PageRank or Alexa rank or something like that.


I would instead ask yourself if it's realistic for you to be able to get those links and pursue those as well as pursuing or looking at the metrics, and the importance, and the topical relevance.


Let's think about this from a broad perspective. Search engines are caring about what? They're caring about matching the content relevance to the searcher's query. They care about raw link popularity. That's sort of like the old-school algorithms of PageRank and number of links and that kind of thing. They do care about topical authority and brand authority. We talked about on Whiteboard Friday previously around some topical authorities and how Google determines the authority and the subject matter of a site's authority. They care about domain authority, the raw importance of a domain on the web, and they care about things like engagement, user and usage data, and given how much they can follow all of us around the web these days, they probably know pretty well whether people are clicking on these articles using these pages or not.


Then anchor text. Not every link that you might build or acquire or earn is going to provide all of these in one single package. Each of them are going to be contributing pieces of those puzzles. When it comes to the on-topic/off-topic link debate, I'm much more about caring about the answers to these kinds of questions -- Can I acquire the link? Is it useful to people? Will they actually use it? Does the link make sense in context? -- than I am about is it on-topic or off-topic? I'm not sure that I would ever urge you to prioritize based on that.


That said, I'm certainly looking forward to your feedback this week and hearing about your experiences with on-topic and off-topic links, and hopefully we'll see you again next week for another edition of Whiteboard Friday. Take care.



Video transcription by Speechpad.com




Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!


Friday, March 6, 2015

What Deep Learning and Machine Learning Mean For the Future of SEO - Whiteboard Friday

Posted by randfish


Imagine a world where even the high-up Google engineers don't know what's in the ranking algorithm. We may be moving in that direction. In today's Whiteboard Friday, Rand explores and explains the concepts of deep learning and machine learning, drawing us a picture of how they could impact our work as SEOs.





For reference, here's a still of this week's whiteboard!


Whiteboard Friday Image of Board


Video transcription



Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are going to take a peek into Google's future and look at what it could mean as Google advances their machine learning and deep learning capabilities. I know these sound like big, fancy, important words. They're not actually that tough of topics to understand. In fact, they're simplistic enough that even a lot of technology firms like Moz do some level of machine learning. We don't do anything with deep learning and a lot of neural networks. We might be going that direction.


But I found an article that was published in January, absolutely fascinating and I think really worth reading, and I wanted to extract some of the contents here for Whiteboard Friday because I do think this is tactically and strategically important to understand for SEOs and really important for us to understand so that we can explain to our bosses, our teams, our clients how SEO works and will work in the future.


The article is called "Google Search Will Be Your Next Brain." It's by Steve Levy. It's over on Medium. I do encourage you to read it. It's a relatively lengthy read, but just a fascinating one if you're interested in search. It starts with a profile of Geoff Hinton, who was a professor in Canada and worked on neural networks for a long time and then came over to Google and is now a distinguished engineer there. As the article says, a quote from the article: "He is versed in the black art of organizing several layers of artificial neurons so that the entire system, the system of neurons, could be trained or even train itself to divine coherence from random inputs."


This sounds complex, but basically what we're saying is we're trying to get machines to come up with outcomes on their own rather than us having to tell them all the inputs to consider and how to process those incomes and the outcome to spit out. So this is essentially machine learning. Google has used this, for example, to figure out when you give it a bunch of photos and it can say, "Oh, this is a landscape photo. Oh, this is an outdoor photo. Oh, this is a photo of a person." Have you ever had that creepy experience where you upload a photo to Facebook or to Google+ and they say, "Is this your friend so and so?" And you're like, "God, that's a terrible shot of my friend. You can barely see most of his face, and he's wearing glasses which he usually never wears. How in the world could Google+ or Facebook figure out that this is this person?"


That's what they use, these neural networks, these deep machine learning processes for. So I'll give you a simple example. Here at MOZ, we do machine learning very simplistically for page authority and domain authority. We take all the inputs -- numbers of links, number of linking root domains, every single metric that you could get from MOZ on the page level, on the sub-domain level, on the root-domain level, all these metrics -- and then we combine them together and we say, "Hey machine, we want you to build us the algorithm that best correlates with how Google ranks pages, and here's a bunch of pages that Google has ranked." I think we use a base set of 10,000, and we do it about quarterly or every 6 months, feed that back into the system and the system pumps out the little algorithm that says, "Here you go. This will give you the best correlating metric with how Google ranks pages." That's how you get page authority domain authority.


Cool, really useful, helpful for us to say like, "Okay, this page is probably considered a little more important than this page by Google, and this one a lot more important." Very cool. But it's not a particularly advanced system. The more advanced system is to have these kinds of neural nets in layers. So you have a set of networks, and these neural networks, by the way, they're designed to replicate nodes in the human brain, which is in my opinion a little creepy, but don't worry. The article does talk about how there's a board of scientists who make sure Terminator 2 doesn't happen, or Terminator 1 for that matter. Apparently, no one's stopping Terminator 4 from happening? That's the new one that's coming out.


So one layer of the neural net will identify features. Another layer of the neural net might classify the types of features that are coming in. Imagine this for search results. Search results are coming in, and Google's looking at the features of all the websites and web pages, your websites and pages, to try and consider like, "What are the elements I could pull out from there?"


Well, there's the link data about it, and there are things that happen on the page. There are user interactions and all sorts of stuff. Then we're going to classify types of pages, types of searches, and then we're going to extract the features or metrics that predict the desired result, that a user gets a search result they really like. We have an algorithm that can consistently produce those, and then neural networks are hopefully designed -- that's what Geoff Hinton has been working on -- to train themselves to get better. So it's not like with PA and DA, our data scientist Matt Peters and his team looking at it and going, "I bet we could make this better by doing this."


This is standing back and the guys at Google just going, "All right machine, you learn." They figure it out. It's kind of creepy, right?


In the original system, you needed those people, these individuals here to feed the inputs, to say like, "This is what you can consider, system, and the features that we want you to extract from it."


Then unsupervised learning, which is kind of this next step, the system figures it out. So this takes us to some interesting places. Imagine the Google algorithm, circa 2005. You had basically a bunch of things in here. Maybe you'd have anchor text, PageRank and you'd have some measure of authority on a domain level. Maybe there are people who are tossing new stuff in there like, "Hey algorithm, let's consider the location of the searcher. Hey algorithm, let's consider some user and usage data." They're tossing new things into the bucket that the algorithm might consider, and then they're measuring it, seeing if it improves.


But you get to the algorithm today, and gosh there are going to be a lot of things in there that are driven by machine learning, if not deep learning yet. So there are derivatives of all of these metrics. There are conglomerations of them. There are extracted pieces like, "Hey, we only ant to look and measure anchor text on these types of results when we also see that the anchor text matches up to the search queries that have previously been performed by people who also search for this." What does that even mean? But that's what the algorithm is designed to do. The machine learning system figures out things that humans would never extract, metrics that we would never even create from the inputs that they can see.


Then, over time, the idea is that in the future even the inputs aren't given by human beings. The machine is getting to figure this stuff out itself. That's weird. That means that if you were to ask a Google engineer in a world where deep learning controls the ranking algorithm, if you were to ask the people who designed the ranking system, "Hey, does it matter if I get more links," they might be like, "Well, maybe." But they don't know, because they don't know what's in this algorithm. Only the machine knows, and the machine can't even really explain it. You could go take a snapshot and look at it, but (a) it's constantly evolving, and (b) a lot of these metrics are going to be weird conglomerations and derivatives of a bunch of metrics mashed together and torn apart and considered only when certain criteria are fulfilled. Yikes.


So what does that mean for SEOs. Like what do we have to care about from all of these systems and this evolution and this move towards deep learning, which by the way that's what Jeff Dean, who is, I think, a senior fellow over at Google, he's the dude that everyone mocks for being the world's smartest computer scientist over there, and Jeff Dean has basically said, "Hey, we want to put this into search. It's not there yet, but we want to take these models, these things that Hinton has built, and we want to put them into search." That for SEOs in the future is going to mean much less distinct universal ranking inputs, ranking factors. We won't really have ranking factors in the way that we know them today. It won't be like, "Well, they have more anchor text and so they rank higher." That might be something we'd still look at and we'd say, "Hey, they have this anchor text. Maybe that's correlated with what the machine is finding, the system is finding to be useful, and that's still something I want to care about to a certain extent."


But we're going to have to consider those things a lot more seriously. We're going to have to take another look at them and decide and determine whether the things that we thought were ranking factors still are when the neural network system takes over. It also is going to mean something that I think many, many SEOs have been predicting for a long time and have been working towards, which is more success for websites that satisfy searchers. If the output is successful searches, and that' s what the system is looking for, and that's what it's trying to correlate all its metrics to, if you produce something that means more successful searches for Google searchers when they get to your site, and you ranking in the top means Google searchers are happier, well you know what? The algorithm will catch up to you. That's kind of a nice thing. It does mean a lot less info from Google about how they rank results.


So today you might hear from someone at Google, "Well, page speed is a very small ranking factor." In the future they might be, "Well, page speed is like all ranking factors, totally unknown to us." Because the machine might say, "Well yeah, page speed as a distinct metric, one that a Google engineer could actually look at, looks very small." But derivatives of things that are connected to page speed may be huge inputs. Maybe page speed is something, that across all of these, is very well connected with happier searchers and successful search results. Weird things that we never thought of before might be connected with them as the machine learning system tries to build all those correlations, and that means potentially many more inputs into the ranking algorithm, things that we would never consider today, things we might consider wholly illogical, like, "What servers do you run on?" Well, that seems ridiculous. Why would Google ever grade you on that?


If human beings are putting factors into the algorithm, they never would. But the neural network doesn't care. It doesn't care. It's a honey badger. It doesn't care what inputs it collects. It only cares about successful searches, and so if it turns out that Ubuntu is poorly correlated with successful search results, too bad.


This world is not here yet today, but certainly there are elements of it. Google has talked about how Panda and Penguin are based off of machine learning systems like this. I think, given what Geoff Hinton and Jeff Dean are working on at Google, it sounds like this will be making its way more seriously into search and therefore it's something that we're really going to have to consider as search marketers.


All right everyone, I hope you'll join me again next week for another edition of Whiteboard Friday. Take care.



Video transcription by Speechpad.com




Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!


Thursday, March 5, 2015

The Most Important Link Penalty Removal Tool: Your Mindset

Posted by Eric Enge


mindset - your best link removal tool


Let's face it. Getting slapped by a manual link penalty, or by the Penguin algorithm, really stinks. Once this has happened to you, your business is in a world of hurt. Worse still is the fact that you can't get clear information from Google on which of your links are the bad ones. In today's post, I am going to focus on the number one reason why people fail to get out from under these types of problems, and how to improve your chances of success.


The mindset


Success begins, continues, and ends with the right mindset. A large percentage of people I see who go through a link cleanup process are not aggressive enough about cleaning up their links. They worry about preserving some of that hard-won link juice they obtained over the years.


You have to start by understanding what a link cleanup process looks like, and just how long it can take. Some of the people I have spoken with have gone through a process like this one:


link removal timeline


In this fictitious timeline example, we see someone who spends four months working on trying to recover, and at the end of it all, they have not been successful. A lot of time and money have been spent, and they have nothing to show for it. Then, the people at Google get frustrated and send them a message that basically tells them they are not getting it. At this point, they have no idea when they will be able to recover. The result is that the complete process might end up taking six months or more.


In contrast, imagine someone who is far more aggressive in removing and disavowing links. They are so aggressive that 20 percent of the links they cut out are actually ones that Google has not currently judged as being bad. They also start on March 9, and by April 30, the penalty has been lifted on their site.


Now they can begin rebuilding their business, five or months sooner than the person who does not take as aggressive an approach. Yes, they cut out some links that Google was not currently penalizing, but this is a small price to pay for getting your penalty cleared five months sooner. In addition, using our mindset-based approach, the 20 percent of links we cut out were probably not links that were helping much anyway, and that Google might also take action on them in the future.


Now that you understand the approach, it's time to make the commitment. You have to make the decision that you are going to do whatever it takes to get this done, and that getting it done means cutting hard and deep, because that's what will get you through it the fastest. Once you've got your head on straight about what it will take and have summoned the courage to go through with it, then and only then, you're ready to do the work. Now let's look at what that work entails.


Obtaining link data


We use four sources of data for links:



  1. Google Webmaster Tools

  2. Open Site Explorer

  3. Majestic SEO

  4. ahrefs


You will want to pull in data from all four of these sources, get them into one list, and then dedupe them to create a master list. Focus only on followed links as well, as nofollowed links are not an issue. The overall process is shown here:


pulling a link set


One other simplification is also possible at this stage. Once you have obtained a list of the followed links, there is another thing you can do to dramatically simplify your life. You don't need to look at every single link.


You do need to look at a small sampling of links from every domain that links to you. Chances are that this is a significantly smaller quantity of links to look at than all links. If a domain has 12 links to you, and you look at three of them, and any of those are bad, you will need to disavow the entire domain anyway.


I take the time to emphasize this because I've seen people with more than 1 million inbound links from 10,000 linking domains. Evaluating 1 million individual links could take a lifetime. Looking at 10,000 domains is not small, but it's 100 times smaller than 1 million. But here is where the mindset comes in. Do examine every domain.


This may be a grinding and brutal process, but there is no shortcut available here. What you don't look at will hurt you. The sooner you start on the entire list, the sooner you will get the job done.


How to evaluate links


Now that you have a list, you can get to work. This is a key part where having the right mindset is critical. The first part of the process is really quite simple. You need to eliminate each and every one of these types of links:



  1. Article directory links

  2. Links in forum comments, or their related profiles

  3. Links in blog comments, or their related profiles

  4. Links from countries where you don't operate/sell your products

  5. Links from link sharing schemes such as Link Wheels

  6. Any links you know were paid for


Here is an example of a foreign language link that looks somewhat out of place:


foreign language link


For the most part, you should also remove any links you have from web directories. Sure, if you have a link from DMOZ, Business.com, or BestofTheWeb.com, and the most important one or two directories dedicated to your market space, you can probably keep those.


For a decade I have offered people a rule for these types of directories, which is "no more than seven links from directories." Even the good ones carry little to no value, and the bad ones can definitely hurt you. So there is absolutely no win to be had running around getting links from a bunch of directories, and there is no win in trying to keep them during a link cleanup process.


Note that I am NOT talking about local business directories such as Yelp, CityPages, YellowPages, SuperPages, etc. Those are a different class of directory that you don't need to worry about. But general purpose web directories are, generally speaking, a poison.


Rich anchor text


Rich anchor text has been the downfall of many a publisher. Here is one of my favorite examples ever of rich anchor text:



The author wanted the link to say "buy cars," but was too lazy to fit the two words into the same sentence! Of course, you may have many guest posts that you have written that are not nearly as obvious as this one. One great way to deal with that is to take your list of links that you built and sort them by URL and look at the overall mix of anchor text. You know it's a problem if it looks anything like this:


overly optimized anchor text


The problem with the distribution in the above image is that the percentage of links that are non "rich" in nature is way too small. In the real world, most people don't conveniently link to you using one of your key money phrases. Some do, but it's normally a small percentage.


Other types of bad links


There is no way for me to cover every type of bad link in this post, but here are other types of links, or link scenarios, to be concerned about:



  1. If a large percentage of your links are coming from over on the right rail of sites, or in the footers of sites

  2. If there are sites that give you a site-wide link, or a very large number of links from one domain

  3. Links that come from sites whose IP address is identical in the A block, B block, and C block (read more about what these are here)

  4. Links from crappy sites


The definition of a crappy site may seem subjective, but if a site has not been updated in a while, or its information is of poor quality, or it just seems to have no one who cares about it, you can probably consider it a crappy site. Remember our discussion on mindset. Your objective is to be harsh in cleaning up your links.


In fact, the most important principle in evaluating links is this: If you can argue that it's a good link, it's NOT. You don't have to argue for good quality links. To put it another way, if they are not obviously good, then out they go!


Quick case study anecdote: I know of someone who really took a major knife to their backlinks. They removed and/or disavowed every link they had that was below a Moz Domain Authority of 70. They did not even try to justify or keep any links with lower DA than that. It worked like a champ. The penalty was lifted. If you are willing to try a hyper-aggressive approach like this one, you can avoid all the work evaluating links I just outlined above. Just get the Domain Authority data for all the links pointing to your site and bring out the hatchet.


No doubt that they ended up cutting out a large number of links that were perfectly fine, but their approach was way faster than doing the complete domain by domain analysis.


Requesting link removals


Why is it that we request link removals? Can't we just build a disavow file and submit that to Google? In my experience, for manual link penalties, the answer to this question is no, you can't. (Note: if you have been hit by Penguin, and not a manual link penalty, you may not need to request link removals.)


Yes, disavowing a link is supposed to tell Google that you don't want to receive any PageRank, or benefit, from it. However, there is a human element at play here. Google likes to see that you put some effort into cleaning up the bad links that you have gotten that led to your penalty. The more bad links you have, the more important this becomes.


This does make the process a lot more expensive to get through, but if you approach this with the "whatever it takes" mindset, you dive into the requesting link removal process and go ahead and get it done.


I usually have people go through three rounds of requests asking people to remove links. This can be a very annoying process for those receiving your request, so you need to be aware of that. Don't start your email with a line like "Your site is causing mine to be penalized ...", as that's just plain offensive.


I'd be honest, and tell them "Hey, we've been hit by a penalty, and as part of our effort to recover we are trying to get many of the links we have gotten to our site removed. We don't know which sites are causing the problem, but we'd appreciate your help ..."


Note that some people will come back to you and ask for money to remove the link. Just ignore them, and put their domains in your disavow file.


Once you are done with the overall removal requests, and had whatever success you have had, take the rest of the domains and disavow them. There is a complete guide to creating a disavow file here. The one incremental tip I would add is that you should nearly always disavow entire domains, not just the individual links you see.


This is important because even with the four tools we used to get information on as many links as we could, we still only have a subset of the total links. For example, the tools may have only seen one link from a domain, but in fact you have five. If you disavow only the one link, you still have four problem links, and that will torpedo your reconsideration request.


Disavowing the domain is a better-safe-than-sorry step you should take almost every time. As I illustrated at the beginning of this post, adding extra cleanup/reconsideration request loops is very expensive for your business.


The overall process


When all is said and done, the process looks something like this:


link removal process


If you run this process efficiently, and you don't try to cut corners, you might be able to get out from your penalty in a single pass through the process. If so, congratulations!


What about tools?


There are some fairly well-known tools that are designed to help you with the link cleanup process. These include Link Detox and Remove'em. In addition, at STC we have developed our own internal tool that we use with our clients.


These tools can be useful in flagging some of your links, but they are not comprehensive—they will help identify some really obvious offenders, but the great majority of links you need to deal with and remove/disavow are not identified. Plan on investing substantial manual time and effort to do the heavy lifting of a comprehensive review of all your links. Remember the "mindset."


Summary


As I write this post, I have this sense of being heartless because I outline an approach that is often grueling to execute. But consider it tough love. Recovering from link penalties is indeed brutal. In my experience, the winners are the ones who come with meat cleaver in hand, don't try to cut corners, and take on the full task from the very start, no matter how extensive an effort it may be.


Does this type of process succeed? You bet. Here is an example of a traffic chart from a successful recovery:


manual penalty recovery graph




Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!