Burrow The Burrows. A gopher crawler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
KatolaZ f9fe15e6a2 avoid self-loops and cgi traps 4 years ago
README.md new url_to_id and added dry-run in burrow 4 years ago
burrow avoid self-loops and cgi traps 4 years ago
url_to_id minor fix in burrow 4 years ago



A Gopher burrower in a shell script. By using burrow and a bit of plumbing you can get all the links in a Gopher MENU, recursively visit all the available subdirs, and create a directed graph of the visited selectors.

burrow takes as input a gopher identifier, as generated by url_to_id, which is considered a gophermap, and provides on stdout the list of menu selectors found in that document. burrow will also dump on stderr the list of all the edges (to any kind of selector) found in that page, in the format:

src_SHA256 dst_SHA256

where src_SHA256 is the SHA256 of the source selector (the current document), while dst_SHA256 is the destination selector (the pointed document).

To start a crawl, one can do something like:

	$ ./url_to_id  gopher://your.gopher.url/ > ids
	$ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null &

Notice that burrow will create a certain number of folders in the current directory, used to keep track of the selectors that have been already retrieved.