parent
040ba18f7f
commit
1185679222
@ -0,0 +1,29 @@ |
||||
## Burrow-The-Burrows |
||||
|
||||
A Gopher burrower in a shell script. By using `burrow` and a bit of |
||||
plumbing you can get all the links in a Gopher MENU, recursively visit |
||||
all the available subdirs, and create a directed graph of the visited |
||||
selectors. |
||||
|
||||
`burrow` takes as input a gopher identifier, as generated by |
||||
`url_to_id`, which is considered a gophermap, and provides on stdout the |
||||
list of menu selectors found in that document. `burrow` will also dump |
||||
on stderr the list of all the edges (to any kind of selector) found in |
||||
that page, in the format: |
||||
|
||||
src_SHA256 dst_SHA256 |
||||
|
||||
where `src_SHA256` is the SHA256 of the source selector (the current |
||||
document), while `dst_SHA256` is the destination selector (the pointed |
||||
document). |
||||
|
||||
To start a crawl, one can do something like: |
||||
|
||||
``` |
||||
$ ./url_to_id gopher://your.gopher.url/ > ids |
||||
$ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null & |
||||
``` |
||||
|
||||
Notice that `burrow` will create a certain number of folders in the |
||||
current directory, used to keep track of the selectors that have been |
||||
already retrieved. |
Loading…
Reference in new issue