KatolaZ
f9fe15e6a2
|
6 years ago | |
---|---|---|
README.md | ||
burrow | 6 years ago | |
url_to_id | 6 years ago |
README.md
Burrow-The-Burrows
A Gopher burrower in a shell script. By using burrow
and a bit of
plumbing you can get all the links in a Gopher MENU, recursively visit
all the available subdirs, and create a directed graph of the visited
selectors.
burrow
takes as input a gopher identifier, as generated by
url_to_id
, which is considered a gophermap, and provides on stdout the
list of menu selectors found in that document. burrow
will also dump
on stderr the list of all the edges (to any kind of selector) found in
that page, in the format:
src_SHA256 dst_SHA256
where src_SHA256
is the SHA256 of the source selector (the current
document), while dst_SHA256
is the destination selector (the pointed
document).
To start a crawl, one can do something like:
$ ./url_to_id gopher://your.gopher.url/ > ids
$ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null &
Notice that burrow
will create a certain number of folders in the
current directory, used to keep track of the selectors that have been
already retrieved.