You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
30 lines
1.0 KiB
30 lines
1.0 KiB
7 years ago
|
## Burrow-The-Burrows
|
||
|
|
||
|
A Gopher burrower in a shell script. By using `burrow` and a bit of
|
||
|
plumbing you can get all the links in a Gopher MENU, recursively visit
|
||
|
all the available subdirs, and create a directed graph of the visited
|
||
|
selectors.
|
||
|
|
||
|
`burrow` takes as input a gopher identifier, as generated by
|
||
|
`url_to_id`, which is considered a gophermap, and provides on stdout the
|
||
|
list of menu selectors found in that document. `burrow` will also dump
|
||
|
on stderr the list of all the edges (to any kind of selector) found in
|
||
|
that page, in the format:
|
||
|
|
||
|
src_SHA256 dst_SHA256
|
||
|
|
||
|
where `src_SHA256` is the SHA256 of the source selector (the current
|
||
|
document), while `dst_SHA256` is the destination selector (the pointed
|
||
|
document).
|
||
|
|
||
|
To start a crawl, one can do something like:
|
||
|
|
||
|
```
|
||
|
$ ./url_to_id gopher://your.gopher.url/ > ids
|
||
|
$ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null &
|
||
|
```
|
||
|
|
||
|
Notice that `burrow` will create a certain number of folders in the
|
||
|
current directory, used to keep track of the selectors that have been
|
||
|
already retrieved.
|