- Apr 28, 2016
- Reaction score
Why Should You Read This?
I felt the need to write this tutorial because I got $70.00 worth of Facebook accounts banned when I started promoting CPA offers in Facebook groups, and later found out it was due to not re-hashing the OG-tag images of my landing pages (see "http://ogp.me" for more information on OG tags). Of all the tutorials I've read here on CPA and social media, none I've seen have stressed the importance of creating unique images to stop Facebook and other social networks from linking together and flagging your accounts.
For a large part of the tutorial, I'll be referring to Facebook specifically. This is because it's the only social network I've researched enough to be able to talk about in-depth. However, my best guess is that almost every other social network uses a similar method of spam detection. I'd invite members who have tested sites like Twitter, Tumblr, and Instagram for hash-based spam detection algorithms to let BHW readers know what their findings were.
Why Are Unique Images Important?
A big part of flying under the spam filters of social media sites like Facebook lies in making sure they cannot link your resources or spam activity together. This means checking your post text, landing pages, images, domains, and everything else to make sure no one variable can be traced back to each of your accounts by Facebook's anti-spam algorithm. If you're spamming on a large scale in any social network, this is critical. That's why it's important to make sure that the image hashes for the photos you upload on your or include in landing page OG tags aren't all the same. If your accounts can all be linked together by an anti-spam filter because of one variable you overlooked, they can and likely will get banned.
For instance, if I have 20 Facebook accounts and they all link to 1 previously unshared domain within 1 hour, the accounts are all going to get banned. This is because Facebook's spam detection algorithm detects that it's highly unlikely that a domain which was previously unshared on Facebook would suddenly get shared real people so quickly and in such high a number. Thus, it bans each and every account that shared the link because they're participating in spam.
Another example of this principle would be in spinning text. If I post the exact same status on 20 Facebook accounts within 1 hour, Facebook's spam filter will likely ban them all because it's highly unlikely that 20 real people would all chose the exact same word choice within their status in such a short period of time. This is circumvented by a method called "spinning" in which you identify multiple linguistically equivalent rephrasings of parts of a sentence, and then use a script or tool to string together the equivalent combinations randomly so that the text is always unique (see http://www.massplanner.com/new-feature-spinning-syntax-text-fields/ for more information on text spinning).
Yet another example comes from posting images on Facebook. Suppose I'm trying to make my accounts seem real by posting images, like an image of nature or a quote, on their walls every once in a while. If I have 20 accounts and they all upload the exact same image to their wall within a 24 hour period, this will raise flags against Facebook's spam detection algorithm because it's highly unlikely that the exact same image would be uploaded so many times in such a short period of time by real people. Image rehashing comes in to circumvent penalties like the aforementioned.
How Do You Make Images Unique?
If you don't already know, making unique images consists of rehashing them in some shape or form. This is the process of taking a file and making a copy of it so that it the copied file still serves it's intended purpose perfectly, yet seems like a different file than the original when a computer analyzes it. This is done by calculating a "hash" value for the file, which is performed by inputting the value of each and every bit contained within a file into a complex mathematical function that returns a small string of letters and numbers. Because each and every bit of the file is inputted sequentially, modifying even the one bit within a file will result in a completely different hash value. This means that every file has a hash that only it can generate. This hashing process is what anti-spam algorithms like Facebook's implement to catch and ban spammers using the same image over and over again.
A simplified diagram of the MD5 hashing process as it applies to passwords and file-verification.
Any image that is either uploaded to Facebook or used in an OG tag has it's hash calculated and tracked. Then, Facebook's anti-spam algorithms leverage this data to detect and block spam activity. It does this by monitoring the rest of the social network looking for an unusually high count of the exact same hash being either uploaded or referenced in the OG data of other shared pages. Much like spamming same link on many different accounts will get you flagged or banned on Facebook, linking your OG images from the same source or using images in alternate domains with the exact same hash will also get you banned. As spam filters get better and better, it's important to know which methods of rehashing are detectable, and which aren't so that Facebook cannot catch group spammers.
What Are Bad Methods of Rehashing Images?
There are many bad ways of re-hashing images that Facebook will detect, some of which I'll outline briefly:
What does it do? End-of-file of re-hashing puts "junk data" on the end of images. This rehashing method makes the file look unique to a hashing program and doesn't affect the image itself in any way because no data is being taken away. Rather, the end of the file is merely being padded with random characters. I created a program that automatically does this up to 50 times in a row and released it in a thread a while ago (see "http://www.blackhatworld.com/seo/me...es-unique-perfect-for-social-networks.852010/").
Why isn't it recommended? Though this method is currently working for OG tags in Facebook landing pages, but there are three good reasons why it's better to use a different method of rehashing.
First, this method does not work for images hosted by Facebook itself, because any image uploaded to Facebook has it's "junk data" stripped from the end of the image upon upload. This means that if I have 100 images that I've rehashed with this method and I upload them all Facebook, Facebook will revert the rehashed images into the exact bytes contained within the original image. This is because as I've just mentioned, uploading a photo to Facebook removes the "junk data" from it, which was the very data that the process has just added to make it distinct from other images. After this happens, the hashes of all the images will be the same as the original, completely defeating the purpose of adding end-of-file data in the first place.
Second, though this method works for making unique images referenced by OG tags within landing pages, it is not recommended for use in this way due to the fact that it isn't future-proof and could result in a mass-ban. Just because Facebook isn't stripping end-of-file data off of images referenced within OG tags right now doesn't mean that they won't start doing it next week, tomorrow, or as you're reading this. If you use this method and Facebook's spam detection algorithms get an update that hashes OG tagged images after they've been stripped of end-of-file data, all your landing pages will be banned and (I'm pretty sure) any account that posted the blocked landing pages will have strict filters applied to them for an indeterminate amount of time.
Third, any image that you reference within an OG tag needs to be on the internet somewhere, and you'll experience a great deal of difficulty getting it there with your end-of-file data (and thus your rehash) intact. Within the seven different image uploading services I've tested, only one actually preserved the end-of-file data after the upload. To test whether or not your image host strips off end-of-file data after upload first upload your image, then download it back to your local computer and check it's hash value. If the file has has been altered or reverted to the hash of the original file, the chances are high that the end-of-file data was stripped and your future images will all end up with the exact same hash after upload.
What does it do? This method of rehashing modifies the Portable Executable Header of a file to something different than what it already was, resulting in a different hash value. The Portable Executable Header section of a file is used as a resource for .exe files within Windows, but is (apparently) present in every Windows-readable file greater than a couple bytes, and is able to be modified without damaging the file in almost every instance (see "http://imristo.com/hash-manager-change-the-hash-of-any-file/" where the creator of Hash Manager, a program that employs the PE-Header rehashing process, talks more about this method).
Why Isn't it recommended? As far as I know, this method has many of the same problems as the end-of-file method. When uploaded to Facebook as well as most image hosts, images rehashed using this method end up either reverting to their original hash or changing into a hash that is the same as all the other's post-upload. In either case the images end up all having the same hash, as the PE-Header data is likely being stripped as the image is put through the upload filter. In the same way that I don't recommend using end-of-file hashing for OG tagged images or on-site uploads, I also don't recommend using this method because it has the same problems.
Rehashing Through Resizing:
What does it do? It's just like it sounds. Resizing an image in any way, shape or form will change it's hash. The nice part about this method is that almost no universal image uploading services will resize your file unless you specify them to do so. However, it has it's flaws.
Why isn't it recommended? People have been and are using this method successfully, but it's not optimal and runs into three annoying problems if used too much:
First, it's very easy to run out of possible resolutions to resize an image to without distorting it when using this method on small images. if you have an image of 100 by 100 pixels you'll only have a couple dozen possible alternate resolutions that don't look significantly different from the original.
Second, even if you're using a large image, it's very hard to keep track of the resolutions that you've used and the resolutions that you haven't used. If you use a mass rehasher based on a script, you'd need to configure that script to not make your rehashed images too big, too wide, or too tall.
Third, you'd either need to remember how far in every orientation you've resized or get the program that does the resizing to log it for you so that you don't generate images with resolutions and hashes that you've previously used. There is currently no program that I know of which has these features. It's honestly just an unnecessary mess to deal with, so I'd steer clear of this method.
What does it do? EXIF-data is metadata stored within image files that can consist of anything from the shutter speed of the high-end camera that took the photo in question to the GPS coordinates of the smartphone at the time when they did the same. Default EXIF-data included within a photo varies dependent on the device the photo was taken on, and it's by either removing or randomizing this EXIF-data that the hash of an image can be changed (see "http://www.makeuseof.com/tag/exif-photo-data-find-understand/" for more information on EXIF data).
Why isn't it recommended? Almost every major upload service strips EXIF-data, so taking a file that has EXIF-data, stripping to make it unique, and then uploading it to the internet will only work once. This is because once the EXIF-data that was previously contained within the file has been stripped, stripping it again will not generate a hash distinct from the first one due to the fact that the second file was EXIF-stripped in the first place.
If you're not stripping EXIF-data to rehash, you might look for a good way to randomize it like I did (see "http://www.blackhatworld.com/seo/exif-randomizer.852270/"). I wasn't able to find a program that did this, so anyone that wants to use this method on a large scale needs to have one developed. Even if there were a program, remember that rehashing through randomizing EXIF-data is still not viable for social media use. This is because the rehashed files would only be usable as OG images due to Facebook's built in EXIF-stripper for on-site uploads. However, just like the end-of-file and PE-Header methods, Facebook can start storing and detecting EXIF-stripped landing page OG image hashes any day. It's is far from viable due to the custom program needed and the fact that like the ones before it, this method can get patched any day.
What Is the Best Way to Create Unique Images?
Having learned about the lack of solutions for those who are looking to create reliable and currently undetectable rehashed images on a large scale, as well as cringing at the thought of using Photoshop to paint a random pixel on an image 1,000 different times, I went ahead and developed another program that employs what I think is the safest and most future-proof image rehashing method.
What does it do? Pixel-based rehashing is a method that rehashes an image by changing the color of a pixel. This is the process used by the very, very time consuming manual Photoshop method that I was outlining in the paragraph above. I went looking for a possible method to automate this process and found a shell-based framework called ImageMagick that can imitate the functions of conventional image editors from the command-line.
Coupled with a random number generator, I leveraged these resources to create a program that is equivalent to opening up Photoshop, painting a single random pixel a random color, and then repeating the process thousands of times while choosing a different pixel and different color each time.
Why is it recommended? This generates unique rehashed images that cannot be reverted back to their source, stays the same resolution, has a rehash limit in the millions, and is future-proof. The reason that this method is future-proof is because many more people are using the other rehashing methods I've outlined, so Facebook will likely start cracking down on and tracking PE Header and end-of-file rehashing before they start using algorithms that can detect and track similar images on a pixel-level.
Where can I download it? Image Rehasher can be found here: "http://www.blackhatworld.com/seo/get-pixel-based-image-rehasher-make-1-000s-of-unique-images.869497/".
Feel free to let me know in the comments if you have any questions, comments or data to contribute. I tried to write something valuable to both beginner and experienced internet marketers, and hope it was worth the read.
EDIT: I personally wouldn't recommend using this for any form of SEO. I don't know much about SEO, but the fact that Google possesses algorithms that can identify similar images in the Google Images search results leads me to believe that complex alterations are necessary to trick Google into failing to recognize a pixel-based rehash while simultaneously making both look nearly identical to the human eye.
I've read a little bit about a method that, in theory, may be able to accomplish this but it involves transforming the source image's pixels into Fourier space, adjusting the trigonometrical variables relative to a proportion of their own values, and then converting the Fourier data back into pixels. No thanks.