Thursday, June 28, 2012

Bash Text Manipulation

I've been trying to get into programing so I've decided to start with some scripting to automate some Enumeration. I originally started with Bash Text Manipulation here is an example I did using yahoo, to enumerate hosts.

[root@localhost yahoo]# wget yahoo.com
--15:37:08--  http://yahoo.com/
           => `index.html'
Resolving yahoo.com... 98.139.183.24, 72.30.38.140, 209.191.122.70
Connecting to yahoo.com|98.139.183.24|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.yahoo.com/ [following]
--15:37:09--  http://www.yahoo.com/
           => `index.html'
Resolving www.yahoo.com... 98.139.183.24, 2001:4998:f00b:1fe::3001, 2001:4998:f00b:1fe::3000
Connecting to www.yahoo.com|98.139.183.24|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    [       <=>                                ] 225,798      177.62K/s

15:37:10 (177.13 KB/s) - `index.html' saved [225798]

[root@localhost yahoo]# cat index.html | grep href | cut -d "/" -f3 | grep yahoo.com | cut -d '"' -f1 | sort -u
apps.search.yahoo.com
autos.yahoo.com
everything.yahoo.com
finance.yahoo.com
images.search.yahoo.com
info.yahoo.com
local.search.yahoo.com
login.yahoo.com
movies.yahoo.com
music.yahoo.com
news.yahoo.com
omg.yahoo.com
screen.yahoo.com
search.yahoo.com
shine.yahoo.com
shopping.yahoo.com
sports.yahoo.com
tools.search.yahoo.com
tv.yahoo.com
video.search.yahoo.com
weather.yahoo.com
www.yahoo.com

So just to explain what I did here; I'm going to take a link out of the index.html and break it down.
a href="http://weather.yahoo.com/redirwoei/12760452"
So I'm out putting anything that contains href in index.html as shown above.
Next I'm cutting using a delimiter of / and I only want the information from field 3.
The 1st field would be” a href="http:/"
The 2nd field would be "/"
The 3rd field would be weather.yahoo.com/"
Next I'm telling it only to output yahoo.com domain since there were links to imgur and etc.
Now this is pretty good but I do get some stragglers such as
www.yahoo.com">
www.yahoo.com">
So now I will cut using a delimiter of a " since it is a quote I need to surround it by single quotes and I want field 1.
Finally I will sort it by unique and pipe it to a file >yahoohost.txt

No comments: