With scripted data-mining, we want to use wget to rip data and tunnel it through TOR anonymous network to avoid IP blockage at the serverfarm? One way to do it is to use wget, TOR, and Privoxy to get what you need.
Explanation: Tor is a SOCKS proxy in which your date is sent over a network in a pretty anonymous fashion. The problem with tor is that it does not offer a http proxy which is what wget can use. So to get around this you can install Privoxy which will allow you to connect to TOR via a simple HTTP proxy.
So, lets get started.
Step 1 - Install the stuff
you can install all you need with the following command
sudo apt-get install -y tor tor-geoipdb privoxy
Step 2 - Configuration
There are a few things that need to be configured.
Find line starting with: #http_proxy =
Replace whole line with: http_proxy = http://localhost:8118
Add the following to the top of the file
listen-address localhost:8118 forward-socks5 / 127.0.0.1:9050 .
Step 3 - Start every thing up
sudo service tor restart; sudo service privoxy start
Now when you use the wget command your data will be tunneled through the TOR network. you'll notice when you run the wget command that you will see a line like the following
Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:8118... connected.The :8118 shows that your connection is going to Privoxy which in turn goes to TOR.
Note: You download speeds will be significantly redued due to the fact that your data will be tunneling through the TOR network. The configuration of TOR is not in the scope of this article.