How To Build The Ultimate Auto Accept List For SEO Tools!

Welcome to my tutorial on building the ultimate auto-accept list for SEO tools. Unlike my post based around using only GSA Search Engine Ranker, this post will introduce additional tools to speed up the overall process. If you are just starting out or are on a budget then the SER based post is probably a better route for you as it is able to do the majority of the things covered in this post, just slower. If you rather invest money and save on time then you are always able to skip both processes and purchase a premium list and then use my list filtering process explained here to achieve lighting fast link per minute counts.

The Tools Required For This Process

As I previously touched on, we will be introducing some additional tools and services with this method to maximize the link output of the process while increasing speed. These recommendations can be swapped out depending on what you have available but I have laid out the ideal tool setup below as well as a basic explanation of what the tool will be used for.

A Dedicated Server Or Multiple High-End VPS’ – The process required a large amount of system resources meaning that ideally, it will run on either a dedicated server that has been broken down into three VPS’ or three high-end VPS’. Once setup, one of the VPS’ will be used for each of the stages of link acquisition, link identification, and link verification. Additional VPS’ can be added as required to increase the link acquisition phase.

Dropbox – Personally, I use dropbox but any file sharing system can be used in its place to sync the various file lists as required between the different VPS’.

Scrapebox and Scrapebox Automator – Scrapebox will be used for both link acquisition methods, link extraction, and footprint scraping. The use of Scrapebox Automator can be added in to have Scrapebox run a number of footprint scraping tasks all day long without any downtime.

GSA Platform Identifier – PI will be used for the link identification phase of the process to sort the links from the link acquisition phase.

GSA Search Engine Ranker – Although SER will be used on your live VPS’ to build links for your live projects, a dedicated instance of it will be used to verify the links that PI Pushed out from the link identification phase.

Why We Introduce Additional Tools

Due to GSA Search Engine Ranker being a 32-bit tool it is only able to use around 2 GB of RAM straight out of the box. This means it has a finite number of resources available to it limiting the performance of the tool.

By adding additional tools to complete various tasks in the process, SER is able to focus on what it does best, link building/verification. In addition, Scrapebox is a much better link extraction tool and allows users to run multiple instances of the tool on the same VPS completing various tasks all for a single license fee.

Breaking the various tasks down to individual tools also allows us to scale various areas of the process as shown in the example diagram below.

Auto Accept List Building Set Up.

A good dedicated verification server will be able to process many more links than a single identification VPS is able to push to it. A single link identification VPS is able to process many more links than a single link acquisition VPS is able to push to it and so on enabling the user to scale as and where required.

Preparing Your Dropbox

As this process will be spread over a number of different VPS’ or servers we will utilize Dropbox to keep our lists synced between the various phases of the process as well as pushing out our filtered list to our live linkbuilding projects.

Personally, I used to use a main folder for each of the phases so we start with a folder for link acquisition, a folder for link identification, a folder for link verification and one final folder for our filtered links. We then set up subfolders in each of these main folders for each VPS working on a task similar to how the example bullet points are laid out below.

  1. Link Acquisition
    1. Footprint Scraping 1
    2. Footprint Scraping 2
    3. Footprint Scraping 3
    4. Link Extraction 1
    5. Link Extraction 2
  2. Link Identification
    1. Link Identification 1
  3. Link Verification
    1. Link Verification 1
  4. List Filtering
    1. List Filering 1

Setting things out like this allows us to hot swap the various stages as we please. For example, each of the link acquisition folders are gathering links and then having those links saved into their respective folders. We then set GSA Platform Identifier up to monitor the folders we require. In this example, say we have two instanced of PI running. One is monotoring link acquisition sub folders 1-3 with the other dealing with sub folders 4 and 5.

As link extraction usually produces a much higher link yield than footprint scraping it is safe to say that the first instance of PI maybe sat doing nothing for a few hours a day while the second instance is flat out working at maxamum capacity and unable to catch up with the links being placed in the folders it is monotoring.

As we have seperate folders for each of the link acquisition VPS’ we are able to tell the first instance of PI to monitor one of the link extraction sub folders and move one of the footprint scraping folders over to the second instance of PI quickly and easily. When I first started doing this process myself I used to dump all of the scraped and extracted links into the same folder and it presented no end of problems!

Moving on with the example, say we have both of the PI instances identifying their links and dropping anything we have told it to keep into the link identification 1 sub folder. A verification instance of SER can pick the targets up from this folder and push them through its projects in an attempt to verify them and save any verified links to the link verification 1 subfolder.

Then depending on workload, you can have that same instance of SER or a totally new instance on a new VPS have projects set up as explained in my guide to filtering your lists and pull the links from the verified and list filtering folder once every week or so. This instance will focus on reprocessing past verified targets to purge any dead links to keep your live link building instances running at optimum speed. Once the fildering projects have completed you simply delete the current target URLs in the list filtering folder, export the newly filtered verifieds to that folder and have your live link building instances pull their targets from that folder.

As I said, this is just an example layout for me to attempt to explain the process and display how it allows your to hot swap and scale your folders at the various sections with dropbox syncing the required folders to the required VPS’ in the background to reduce your workload.

Turning Link Acquisition Upto Eleven

As I have touched on, we are dropping GSA Search Engine Ranker for the link acquisition phase in favor of Scrapebox. We will be starting with footprint scraping as link extraction requires a target pool of URLs to begin extracting from, ideally, the target pool will be made up of engines such as blog comments, image comments, and guestbooks.

Starting off With Footprint Scraping

Due to Scrapebox and SER supporting different engines we need to extract the footprints for the engines we wish to target from GSA Search Engine Ranker to import into scrapebox. Thankfully, a recent update to SER has made this process very easy!

In this example, I will use blog comments, image comments, and guestbooks as it will provide me with targets that I can use in the link extraction phase but you will select the engines you wish to build a list around.

To export your footprints, open up a new SER project, select the platforms or engines you wish to target and right click the platform window. Scroll to the bottom of the options menu that pops up and select “Export Footprints of selected engines” and select either file or clipboard as shown in the screenshot below.

How to extract footprints in GSA Search Engine Ranker.

Once you have a set of extracted footprints I highly recommend you complete the process I explain in my post here on cleaning up your footprints. An unoptimized footprint list will take longer for your tools to scrape through but will not necessarily increase your link yield so completing the process once saves so much time in the long run.

Once you have your clean footprint list it is time to prepare Scrapebox for search engine scraping. As with SER, you have to options to choose from when scraping with Scrapebox. The first is to use your own premium proxies and the second is to use free public proxies. Premium proxies scrape slower to avoid soft bans but more reliable when their threads and timeouts are configured correctly. Public proxies are free but are often already soft banned meaning a bunch of your scrapes will fail but they are fine if you are just starting out. Either way, you need to optimize your threads and timeouts.

To do this, click the “settings” option and then “Connections, Timeout, and Other Settings as shown in the screenshot below.

How to hope the threads and timeout settings in scrapebox.

Once open the menu shown in the screenshot below will open.

The Scrapebox threads window.
  1. The harvester thread count for the tool. If you have chosen to use your own premium proxies to scrape with then it is a good idea to set this low. Personally, I set it to one as it is easier to work out the required timeout per proxy. If you are using public proxies then I set the thread count as high as my VPS or server can handle.
  2. The advanced settings option, if you are using a dedicated server or high-end VPS while choosing to use public proxies to scrape then it’s a good idea to enable this option as it will allow you to increase your thread count beyond the default.
  3. The timeout settings, once the initial screen has been optimized we will need to optimize the timeout counts.

The two screenshots below show the difference between the standard max thread count for harvesting and the advanced max thread count when enabling advanced settings. If you are only using a low-end VPS or a desktop/laptop then leave your harvester thread count on the standard max thread count.

Standard scrapebox thread count for harvesting.
Advanced thread count for scraping with Scrapebox.

Next up, we move onto the timeout settings as shown in the screenshot below.

Scrapebox timeout settings.

If you choose to use premium proxies to scrape the search engines then you will need to work out your target search engines soft ban threshold. Until recently, a single proxy could query Google around once every seventy seconds without becoming soft banned. My recent scraping suggests this has gone up again meaning you will have to test and adjust to find the timeout that works for you.

If you chose to use public proxies then I set my timeout as low as possible. As they are public proxies there could be a large number of people using them meaning it is best for you to get as much from them as possible before they are soft banned.

Now that the threads and timeouts are set up we move on to adding our footprints. Although they are actually footprints we need Scrapebox to treat them as keywords so it will query the various search engines with them and harvest any results. To do this, simply paste them into the keywords window in the top left of Scrapeboxes main window as shown in the screenshot below.

Scrapebox keyword window.

If you wish to use your own premium proxies then you can add them by clicking the load button in the screenshot below and selecting their file. Once your proxies have been added be sure to tick the “use proxies” button, if you chose to use public proxies then we will set them up in the next stage.

Scrapebox proxy menu.

Next up, click the “Start Harvesting” button on in the URL’s Harvested window in the top right of the Scrapebox window as shown in the screenshot below.

Scrapebox URL Harvested window.

You should now be presented with the custom harvester window as shown in the screenshot below.

Scrapebox Custom Harvester.

Select the search engines you wish to target from the search engine pane on the left of the window. If you wish to scrape with public proxies you are able to scrape your own but Scrapebox pushes their own scraped proxies out to the tool. You can enable these by clicking the proxies button at the bottom center of the window and selecting the “use server proxies” option shown in the screenshot below.

Scrapebox server proxies.

Selecting this option will pull the pre-scraped public proxies from the Scrapebox team for you. If you chose to use your own premium proxies you can skip this step, no matter what proxy method you decided to use you now press the start button on the harvester window and leave Scrapebox to scrape your desired search engines.

Scrapebox harvested results.

Once the scrape is complete the popup shown in the screenshot above will be presented. As you can see from the two right-hand columns that are shades of red, there is a high failure rate due to the public proxies but at least they are free and harvest some results. Closing the popup will return you to the Scrapebox harvester where you are able to view some data on how the scrape was completed as shown in the screenshot below.

Scrapebox harvested results.

Closing the harvester will add the harvested URLs to your URL’s Harvested pane in the top right of your main Scrapebox window. Be sure to remove duplicates from the results by using the “Remove / Filter” button as shown in the screenshot below.

How to remove duplicates in Scrapebox.

If you are scraping non-contextual targets then only remove duplicate URLs as platforms like blog and image comments can hold multiple valid pages all on the same domain. If you are scraping for contextual targets then you can remove duplicate domains as tools such as GSA Search Engine Ranker are clever enough to navigate to the account creation page for supported engines.

Once duplicates have been removed, save your harvested URLs to your desktop ready for the link identification phase.

Automating The Footprint Scraping Process

As I have previously touched on, Scrapebox allows license holders to run as many instances of the tool on a single VPS as it can handle with a single license. You could, in theory, split your footprint scraping load across 10 different instances all running at the same time. Unfortunately, this will cause problems with the number of proxies available for the scrapes and overall reduce your link yield.

A better way to run the process is to use the Scrapebox Automator plugin. This premium plugin allows the user to queue up jobs for Scrapebox to run throughout the day and with a little tweaking, you are able to have it run 24 hours a day non-stop dumping scraped links into one of your link acquisition sub-folders to pass onto your GSA Platform Identifier installs.

At the time of writing, the Scrapebox Automator costs $20 and requires a regular Scrapebox license. To purchase the plugin, open Scrapebox, click the “Premium Plugins” menu and then click “Show Available Plugins” as shown in the screenshot below.

Buying premium plugins in Scrapebox.

The premium plugin menu will then open and allow you to purchase any of the available premium plugins for Scrapebox as shown in the screenshot below. In this case, we are only interested in the Scrapebox Automator.

Purchasing the Scrapebox Automator Plugin.

Once purchased and installed to your Scrapebox you will have two Automator options enabled on your menu bar. The first is to open the config menu for the plugin and is found by selecting the “Premium Plugins” menu bar and then selecting “Automator Plugin 64-Bit” as shown in the screenshot below.

Opening the scrapebox automator plugin.

Once opened you will be presented with the tools configuration menu window as shown in the screenshot below.

Scrapebox Automator main configuration window.

The left-hand pane holds options for almost every command Scrapebox is capable of completing, the center pane holds the sequence of commands to be completed in the order the user adds them and the right-hand pane presents the various options available to the user for any specific task that is highlighted in the center pane.

There are a number of differnt commands you can use for your Automator job that will enable you to customize exactly what the tool is doing. One thing I will add is there is no requirement to add an individual “Import/Export Keyword List” command as the “Harvest URLs” command has this option included in its configuration at the bottom of its settings that will be displayed in the right-handpane once the command is selected. The screenshot below shows the required commands for a basic automated scraping session.

Basic Scrapebox Automator Project.

It will run three consecutive scrapes for you, exporting the harvested URLs to the location you specify in your “Harvested URLs” command as well as allowing you to select a different set of keywords/footprints to scrape with for each run. Keep in mind, this is just an example, you can have these projects going on indefinitely provided you supply the tool with fresh proxies so the commands in the screenshot above maybe duplicated ten or twenty times to reduce the amount of time required by you to interact with the footprint scraping phase of link acquisition.

If you are a customer of the SEOSpartans Catch-Alls service then you are able to use their handy free tool available here that takes the Scrapebox Automator to the next level. It comes with an easy to follow readme file that takes around five minutes to set up and then allows you to scrape indefinitely with a single project while changing your scraping keywords/footprints each time. It dumps all of your harvested URLs into the folder you specify meaning GSA Platform Identifier is able to pick the harvested URLs up for processing instantly and it only requires its initial set up to run.

Doubling Your Public Proxies With GSA Proxy Scraper

As I have mentioned, Scrapebox will allow you to run multiple instances of the tool on the same VPS or server. This means that provided the hardware of the machine it is running on it good enough we can have one instance open that is gathering links via link extraction, one instance open that is using the default Scrapebox server proxies and another instance open that is footprint scraping using proxies gathered from GSA Proxy Scraper or another instance of Scrapebox scraping that is set up to scrape proxies.

I usually use GSA Proxy Scraper for this as it is easy to set up and you can leave it to run indefinitely building up a list of proxies. The best thing about GSA PS is that you can set different export tasks so you can have it spitting out proxies for Google scraping, Bing scraping as well as a number of other tasks.

Now I’m sure GSA PS can do much more than I use it for but I have always kept it as simple as possible. To my knowledge, the tool will automatically begin scraping proxies as soon as it is open, once open you will be presented with the screen in the below screenshot.

GSA Proxy Scraper

As it has already begun to scrape proxies for you, you can go directly to the “Settings” option to start to set up your auto export rules for your proxies as shown in the screenshot below.

GSA Proxy Scraper export rules.

Select the “Automatic Export” tab and select the “Add” button and then the “Send To File” option as highlighted by the red arrows in the above screenshot.

GSA Proxy Scraper Bing Proxies.

You will then be presented with the window in the screenshot above. You can name the export as you like but for this example, I will be setting it up ready to export Bing passed proxies. I set the interval in minutes setting to 1 minute as I want Scrapebox to have access to the latest version of proxies when its Automator job resets and pulls proxies. The export file name can be anything you like and you also have a number of export formats to choose from but as these are to be used in Scrapebox I decided to use its internal export setting.

GSA Proxy Scraper Export Settings For Bing.

On the next screen for your export rule be sure to tick “Export only working proxies” and set your “Include Tags’ to whatever platform you wish to export working proxies for, in this case, I have only selected Bing. As these public proxies will be used to scrape I export proxies from all regions of all proxy types.

Once complete your export rule has been set up and will run automatically. All that is left to do is copy and paste your Scrapebox folder, name it Scrapebox 2 and create a shortcut of its .exe to your desktop for ease of access. When you set this second Scrapebox Automator job up, be sure to set the proxy source file in the Automator job as the file your GSA Proxy Scraper is exporting to. This way it will pick up fresh proxies that have been exported by GSA PS before beginning each scrape. There will be some crossover between the proxies being used by the first instance of Scrapebox that get their proxies from the Scrapebox server and the second instance using proxies from GSA PS but there will be plenty of different proxies available to make it worth the effort.

Growing Exponentially With Link Extraction

As I have previously touched on in other posts, link extraction has the ability to grow your list exponentially depending on the platforms and engines you are targeting and better yet, it has no requirement for proxies! Although SER does have some link extraction capabilities I prefer to use Scrapebox for it as it gives the user more control over the process.

Scrapebox does not come with the ability to extract links natively, you have to download their free link extraction plugin. To do this simply click the “Addons” tab and then select “Show available add-ons” as shown in the screenshot below.

Downloading Scrapebox Plugins.

The plugin manager will then open up and give you the option to download, any or all that you wish. The one we are interested in for this process is the “ScrapeBox Link Extractor” plugin. Select it and then press the “Install Addon” button at the bottom of the plugin menu as shown in the screenshot below.

Installing the scrapebox link extractor plugin.

Before we open the addon we need to add the target URLs we want to extract from. To do this we simply add them to the URLs harvester in the top right of Scrapebox as shown in the screenshot below. In this example, I am adding the URLs I scraped during the footprint scraping example above.

Loading URLs into the Scrapebox URL's Harvester.

To open the Scrapebox link extractor simply click the “Addons” menu and select it as shown in the screenshot below.

Once open we need to load the target URLs we wish to run through the link extractor, to do this click on the “load” button in the bottom left and then select the “Load URLs from Scrapebox Harvester” option as shown in the screenshot below.

How to import URLs to link extract in Scrapebox.

Now as I have said multiple times before, link extraction can be extremely resource intensive. On a low-end VPS it maybe a good idea to monitor your system resource usage when running the link extractor and if the load is maxing out the VPS hardware then it maybe an idea to pause or close any other tools running on the machine until the extraction is complete. That said, a high-end VPS or dedicated server should be able to run tools like SER alongside the link extractor.

Internal Or External Link Extraction

When looking at the bottom of the link extractors window you will see you have three options for what you require the tool to do. They are internal, external and internal & external as shown in the screenshot below.

Internal and external extraction settings on Scrapebox.

The default for the tool is to be set to “Internal & External” as shown but in my opinion, this is a waste and requires, even more, system resources. At this stage of the link acquisition process, I personally prefer to run it in external mode. This means that the tool will scan all of the target URLs we added to Scrapebox and check the pages for outbound links to external domains. This helps us increase the total number of domains within our auto-accept list as well as provide us with new domains we can link extract from at a later stage.

That being said, internal extraction does have its place. The problem is, we could have the tool scan the URLs we added to Scrapebox for links on their pages pointing to other pages on the same domain but at this stage we have not identified or verified the list meaning we could be internally extracting a whole bunch of links that we can never use.

Once this full process has been completed and we have pushed our links through an instance of GSA Search Engine Ranker to make sure our system can both submit the link and that it will be verified, then we can take advantage of the internal link extractor.For example, platforms such as blog comments and image comments usually have a large number of pages on the same domain that we are able to take advantage of.

For example, platforms such as blog comments and image comments usually have a large number of pages on the same domain that we are able to take advantage of. Once we have a verified list of blog and image comments that have been pushed through an instance of SER we can then load those targets into Scrapebox, open its link extractor and set it to internal extraction. The tool will then scan the pages for as many internal links as it is able to find. Once the tool has completed its internal link extraction, export the URLs it found, remove duplicate URLs from the list and then rerun it through the internal link extractor again. Repeat this process a few times and then export the list of URLs you have acquired to a file.

Now that this file contains a large amount of target URLs from domains you have already pushed through SER, it is safe to say that there is a good chance that a large percentage of these will verify so I skip the link identification phase and import them directly into a SER project for verification. While SER is processing those links you can reload your file of all the internal links back into Scrapebox, open its link extractor, set it to external and run it.

Although you have already extracted your original set of pages in the list, your internal extraction should have increase the number of targets and as they are blog and image comment domains there is a high chance other people are using those pages for their link building. This means when you run those internally extracted pages through the external extractor it scans for all of the pages that have outbound links to them essentially letting you acquire other people’s SER lists for free.I hope that it is starting to become apparent just how exponentially fast link extraction can grow your targets.

I hope that it is starting to become apparent just how exponentially fast link extraction can grow your targets. If you are willing to invest your time and some money into the tools required for this process you are quickly able to acquire the targets on a number of premium lists as well as a number of private lists and grow your own auto-accept list from it.

Pulling The Trigger

Now that the targets to be extracted have been loaded into the extractor and it has been set to external it is time to press start! Due to the system resources available and the size of the target pool of URLs the tools has to extract from the process can take a few hours but once it has completed its run the link extractors menu should look a little similar to the screenshot below.

Scrapebox link extractor addon complete.

As you can see the extractor has complete 20961 targets that resulted in 546359 URLs before the removal of duplicates. I have found the easiest way to access and maintain your URLs lists is to click the “Show save folder” button in the bottom right of the tools window as shown in the screenshot below.

Scrapebox show save folder

Once clicked, the folder will open allowing you to drag and drop the file into the Scrapebox Harvested URLs section in the top right window and then delete the file as they take up so much space when you are running this process 24 hours a day.

Once your extracted URLs have been added to the harvested URLs section you have some removal to do. As link extraction will collect anything that put in link tags it can end up with a whole bunch of noise that are not actually URLs so my first step is to remove entries that are not URLs as shown in the screenshot below.

Next up we have the removal of duplicates. Remember, if you are focusing on contextual platforms then you can remove duplicates at the domain level but if you are collecting non-contextual links then only remove duplicates at the URL level.

To better explain why you do this here are two screenshots, one of the extracted URLs with duplicates removed at URL level and one with duplicates removed at the domain level.

Duplicates removed at URL level – 987504 URLs remaining.

Duplicates removed at URL level.

Duplicates removed at domain level – 224663 URLs remaining.

Duplicates removed at domain level.

That means if you are prospecting for non-contextual links you have potentially lost over 60% of your targets by filtering by domain level rather than URL level.

Finally, we have the removal of our personal list of URLs we wish to purge from the list. I don’t plan to do this stage in the example as it will remove too much of the list making the rest of the tutorial look unrealistic for people running through it for the first time. This list is essentially a list of domains that we don’t believe we will be able to post to. These can be parasites such as and YouTube or targets that have failed the verification process.

GSA Search Engine Ranker failed folder ticked.

Thankfully, SER has a tick box that can make this extremely easy as shown in the screenshot above. All you need to do is tick the failed folders tick box and SER will automatically save the URL to the folder. Once failed you merge all of the .txt files in the failed folder togeather and add them to your master purge folder.

Purging these URLs at this stage reduces the workload further down the line as URLs we know we will never be able to post to are removed before wasting resources in GSA Platform Identifier in the link identification phase and GSA Search Engine Ranker in the link verification phase.

To purge the URLs we simply select the “Remove / Filter” option on Scrapebox and select the “Remove URLs containing entries from” option as shown in the screenshot below.

Scrapebox purging lists.

This will then run through your purge list and remove and URLs from your harvested URLs that are from the domains saved in your purge file. Once done, export your harvested URLs ready for the link identification phase.

Moving Onto Link Identification

The main tool we will be using for the link identification phase is GSA Platform Identifier but you are able to use an instance of GSA Search Engine Ranker for this if you wish. As I have mentioned a number of times on the blog, I personally prefer to leave my SER installs to focus on building links rather than wasting their resources on link identification.

That being said, if you are on a budget and have no other option but to use SER to identify your links you can complete the navigation in the screenshot below.

How To import and identify links with GSA Search Engine Ranker.

Bear in mind, this will take resources away from SERs ability to build links and depending on the size of the link list required for processing it can be left operating in this state for a very long time. If you do have to use SER for this process then it maybe better for you to just purchase a premium SER list or use my guide on building a basic auto accept list using only GSA Search Engine Ranker.

Moving onto GSA PI, the screenshot below shows the tools main window without any projects, this is how the tool will look the first time you open it.

GSA Platform Identifiers main screen.

Depending on the types of platforms and engines you have targeted in your link acquisition phase, GSA PI may have to query a single domain multiple times. If you have target blog and image comments this can easily go into the hundreds or even thousands. Due to this, the first thing we need to do to a fresh install is add our premium proxies. This will stop domains being able to ticket out VPS or Server providers with complaints of abuse and will give us fewer problems in the long run. To do this click the proxies option on the main menu shown in the screenshot below.

Adding proxies to GSA Platform Identifier.

Once done, click the import box in the top left of the proxy menu and import your proxies from either a file or your clipboard. As we require reliable proxies for this that will not be randomly dropping due to overuse it is best to use premium proxies for this phase.

Next up, we need to create our projects that will pull the acquired links from their various folders and identify them for the link verification stage of the process. To do this we click the “New” option on the tools main menu.

Creating a new project in GSA Platform Identifier.

I will break the create project window down into different sections as I feel it will make it easier to explain. First up we have the target link path settings as shown in the below screenshot.

GSA Platform Identifier project settings window.
  1. Enter your project name here, I usually name it after the Dropbox folder it is monitoring.
  2. Setting your project up to process a file. This can be handy for one-off scrapes. For example, you may need to do a one off keyword relevant blog comment scrape. As you don’t want its results getting mixed in with your generic URLs you can set a one-time PI project up to identify from its file.
  3. Sets the project up to monitor a file for its targets. This is the option I usually use when running this process on a large scale as it requires little to no upkeep. Scrapebox drops targets into the folder, GSA Platform Identifier picks them up and churns them out.
  4. Sets a project up to do nothing but remove duplicates. This is an excellent feature and I usually have one remove duplicates project set up for each of my folders to help reduce the duplicate URLs being pushed through GSA PI. Although Scrapebox does offer a remove duplicates feature, this is on a scrape by scrape basis meaning the duplicates between two scrapes will remain. Running a PI project to remove duplicates from the folder will remove them and further optimize your process.
  5. These are your navigation options to find your files and folders for the process.
  6. This will show your file or path for the folders being monitored. You can also directly drag and drop files into here to increase speed.
  7. Set the destination folder you want your identified links to be exported to. This should be the folder that your SER verification rig is picking its targets up from.
  8. Exports results to a single file, carrying on from my earlier example. If you did a keyword specific scrape and you don’t want its identified URLs exported into the same place as your other URLs then you can set up a specific file for it.
  9. I have actually had some success with pushing unrecognized URLs through my SER rig to get some verified URLs. You can tick this and push out the URLs GSA Platform Identifier is unable to identify to a file for processing in SER. If you do choose to do this then make sure the SER project processing the file is set to a low priority as its URL yield is extremely low when compared to the identified links.
  10. Be sure to match this setting to the one you use in your SER installs or SER will not be able to read the targets.
  11. Set your retry count as you see fit.
  12. Set the number of threads your VPS or Server can handle start at the default 64 and adjust as required.
  13. Tick this to make sure the project uses your proxies.
  14. You are able to limit the outbound link count on pages that will be passed through to your exported. Usually, I don’t even enable the option due to the way I use the different link types in my pyramids.
  15. Set ow often you want PI to check your folders.
  16. The platforms and engines control pane, you can select the various platforms and engines you wish to filter out of your exported list.
  17. This option allows you to filter by keywords if you are trying to build a niche specific list.
  18. You are able to filter the URLs you let pass through by language.
  19. Sets a Moz metric filter to your exported list.
  20. Similar to above but uses page rank rather than Moz.
  21. This will remove all of the domains from your GSA Platform Identifier blacklist if you choose to use one to help prevent duplicates passing through to the link verification phase.
  22. Automatically adds processed URLs to your blacklist.
  23. Automatically creates a remove duplicates project for this project.

That about covers the project set up options for GSA PI. All that is left to do is start the projects and leave them to process the acquired links and export them to your verification folders.

Time For Link Verification

The link verification phase is relatively simple if you are familiar with GSA Search Engine Ranker as it is basically a number of projects designed to take your identified URLs and push them through SERs system to try and produce verified links. If you have previously run any form of a project with SER in the past then this phase should be easy.

Firstly you will need to map one of SERs folder paths to the folder your GSA PI project is exporting identified URLs to. To do this go to your SER, Click the options button on the main screen and select the advanced tab. I usually use my identified folder so I then click the little drop down arrow to the right of the file patch and map the folder accordingly as the screenshot below.

GSA Search Engine Rankers Folder Options.

Once done, PI will keep this folder updated with freshly identified targets and free from duplicates as best it can provided you set up a remove duplicates project for the folder in PI.

Next up, we need to set up a SER project to actually process the URLs. There are so many settings and tweaks you can enable or disable on SER projects that I am not going to go through it all but if you need help with what the various options do then be sure to check out my ultimate guide to GSA Search Engine Ranker.

That being said, I would consider the minimum of a verification project to have the correctly selected engines and platforms that you wish to target, a fake or made up URL set as the page it is building links for so some random site is not getting all of your verified links and the folder you mapped as your PI export folder set as the project’s target pull as shown in the screenshot below.

Setting a GSA Search Engine Ranker Project up to pull targets from the correct folder.

Finally, set the project to active and let it run. It will push all of the identified URLs through its system to remove all of the unstable targets for you creating a clean list of targets that you can then export from SER to your live Dropbox folder to be used by your SER installs building links for live projects.

If you want to automate the export of the verified links to your live Dropbox folder then map your link links Dropbox folder to your verification instance of SERs verified folder and be sure to tick the box near to allow SER to automatically write the verified URLs to the folder as shown in the screenshot below.

GSA Search Engine Ranker verified folder mapping.

Then on your live link building instances of SER one of their folders to the live links dropbox folder and set their projects up to pull their targets from that folder so they automatically have a nice clean list of targets to pull from.

The final step of the process is filtering your live link dropbox folder to keep the dead URLs out of it and keeping your live link building instances of SER operating as efficiently as possible. I have fully explained the process in my guide on filtering lists so I won’t be going into it at all in this post.

That concluded my post on building the ultimate auto accept list. I hope this has helped some of my readers understand how they are able to automate large amounts of the process to reduce the amount of time they are required to spend on the task while offering an insight into other tools available in the GSA Toolset as well as some of the capabilities offered by Scrapebox.


MBA candidate at @tuckschool of business at dartmouth. lover of huskies, the ocean & boston sports. hoya saxa!