Directory Archive¶
Persistent filesystem-backed archive for Tor directory protocol
descriptors. This is intended to be used as part of an asyncio
application. File I/O operations are provided by coroutines and coroutine
methods, with the actual I/O performed in an executor.
-
class
bushel.archive.
CollectorOutBridgeDescsMarker
[source]¶ Enumeration of marker names under the “bridge-descriptors” directory as specified in §5.2 of [collector-protocol].
Name
Description
EXTRA_INFO
Bridge extra-info descriptors (§5.2.1)
SERVER_DESCRIPTOR
Bridge server descriptors (§5.2.1)
STATUS
Bridge statuses (§5.2.2)
-
class
bushel.archive.
CollectorOutRelayDescsMarker
[source]¶ Enumeration of marker names under the “relay-descriptors” directory as specified in §5.3 of [collector-protocol].
Name
Description
CONSENSUS
Network status consensuses (§5.3.2)
EXTRA_INFO
Relay extra-info descriptors (§5.3.2)
SERVER_DESCRIPTOR
Relay server descriptors (§5.3.2)
VOTE
Network status votes (§5.3.2)
-
class
bushel.archive.
CollectorOutSubdirectory
[source]¶ Enumeration of subdirectory names under the “out” directory as specified in §5.0 of [collector-protocol].
Name
Description
BRIDGE_DESCRIPTORS
Bridge descriptors (§5.2)
EXIT_LISTS
Exit lists (§5.1)
RELAY_DESCRIPTORS
Relay descriptors (§5.3)
TORPERF
Torperf and Onionperf (§5.1)
WEBSTATS
Web server access logs (§5.4)
-
class
bushel.archive.
DirectoryArchive
(archive_path, max_file_concurrency=100)[source]¶ Persistent filesystem-backed archive for Tor directory protocol descriptors.
This implements the CollecTor File Structure Protocol as detailed in [collector-protocol].
- Parameters
archive_path (str) – Either an absolute or relative path to the location of the directory to use for the archive. This location must exist, but may be an empty directory.
-
bridge_extra_info_descriptor_path
(published, digest)[source]¶ Generates a path, including the archive path, for a bridge extra-info descriptor with a given published time and digest. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> published = datetime.datetime(2018, 11, 19, 9, 17, 56) >>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389" >>> archive.bridge_extra_info_descriptor_path(published, digest) # doctest: +ELLIPSIS '/srv/archive/bridge-descriptors/extra-info/2018/11/a/9/a94a...389'
These paths are defined in §5.2.1 of [collector-protocol].
-
bridge_server_descriptor_path
(published, digest)[source]¶ Generates a path, including the archive path, for a bridge server descriptor with a given published time and digest. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> published = datetime.datetime(2018, 11, 19, 15, 1, 2) >>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389" >>> archive.bridge_server_descriptor_path(published, digest) # doctest: +ELLIPSIS '/srv/archive/bridge-descriptors/server-descriptor/2018/11/a/9/a94a...389'
These paths are defined in §5.2.1 of [collector-protocol].
-
bridge_status_path
(valid_after, fingerprint)[source]¶ Generates a path, including the archive path, for a bridge status valid-after time and generated by the authority with the given fingerprint. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> fingerprint = "BA44A889E64B93FAA2B114E02C2A279A8555C533" # Serge >>> archive.bridge_status_path(valid_after, fingerprint) # doctest: +ELLIPSIS '/srv/archive/bridge-descriptors/statuses/2018/11/19/20181119-150000-BA...33'
These paths are defined in §5.2.2 of [collector-protocol].
-
path_for
(descriptor, create_dir=False)[source]¶ The filesystem path that a descriptor will be archived at. These paths are defined in [collector-protocol].
It is also possible to set descriptor with a
str
in which case it will be treated as a relative path from the root of the archive. For example:>>> DirectoryArchive("/srv/archive").path_for("path/to/descriptor") '/srv/archive/path/to/descriptor'
-
relay_consensus
(flavor='ns', valid_after=None)[source]¶ Retrieves a consensus from the archive.
- Parameters
valid_after (datetime) – If set, will retrieve a consensus with the given valid_after time, otherwise a vote that became valid at the top of the current hour will be retrieved.
- Returns
A
NetworkStatusDocumentV3
if found, otherwise None.
-
relay_consensus_path
(valid_after)[source]¶ Generates a path, including the archive path, for a network-status consensus with a given valid-after time. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> archive.relay_consensus_path(valid_after) '/srv/archive/relay-descriptors/consensus/2018/11/19/2018-11-19-15-00-00-consensus'
These paths are defined in §5.3.2 of [collector-protocol].
-
relay_extra_info_descriptor
(digest, published_hint)[source]¶ Retrieves a relay’s extra-info descriptor from the archive.
- Parameters
- Returns
A
RelayExtraInfoDescriptor
if found, otherwise None.
-
relay_extra_info_descriptor_path
(published, digest)[source]¶ Generates a path, including the archive path, for a relay extra-info descriptor with a given published time and digest. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> published = datetime.datetime(2018, 11, 19, 9, 17, 56) >>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389" >>> archive.relay_extra_info_descriptor_path(published, digest) # doctest: +ELLIPSIS '/srv/archive/relay-descriptors/extra-info/2018/11/a/9/a94a...389'
These paths are defined in §5.3.2 of [collector-protocol].
-
relay_extra_info_descriptors
(digests, published_hint)[source]¶ Retrieves multiple extra-info descriptors published around the same time (e.g. all referenced by server-descriptors in the same consensus).
- Parameters
- Returns
A
list
ofstem.descriptor.extrainfo_descriptor.RelayExtraInfoDescriptor
.
-
relay_microdescriptor
(digest, valid_after_hint)[source]¶ Retrieves a relay’s microdescriptor from the archive.
- Parameters
- Returns
A
stem.descriptor.microdescriptor.Microdescriptor
if found, otherwise None.
-
relay_microdescriptors
(digests, valid_after_hint)[source]¶ Retrieves multiple microdescriptors around the same valid_after time (e.g. all referenced by the same microdescriptor consensus).
- Parameters
- Returns
-
relay_server_descriptor
(digest, published_hint)[source]¶ Retrieves a relay’s server descriptor from the archive.
- Parameters
- Returns
A
stem.descriptor.server_descriptor.RelayDescriptor
if found, otherwise None.
-
relay_server_descriptor_path
(published, digest)[source]¶ Generates a path, including the archive path, for a relay server descriptor with a given published time and digest. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> published = datetime.datetime(2018, 11, 19, 15, 1, 2) >>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389" >>> archive.relay_server_descriptor_path(published, digest) # doctest: +ELLIPSIS '/srv/archive/relay-descriptors/server-descriptor/2018/11/a/9/a94a...389'
These paths are defined in §5.3.2 of [collector-protocol].
-
relay_server_descriptors
(digests, published_hint)[source]¶ Retrieves multiple server descriptors published around the same time (e.g. all referenced by the same consensus).
- Parameters
- Returns
A
list
ofstem.descriptor.server_descriptor.RelayDescriptor
.
-
relay_vote
(v3ident, digest='*', valid_after=None)[source]¶ Retrieves a vote from the archive.
- Parameters
v3ident (str) – The v3ident of the authority that created the vote.
digest (str) – A hex-encoded digest of the vote. This will automatically be fixed to upper-case.
valid_after (datetime) – If set, will retrieve a consensus with the given valid_after time, otherwise a vote that became valid at the top of the current hour will be retrieved.
- Returns
A
NetworkStatusDocumentV3
if found, otherwise None.
-
relay_vote_path
(valid_after, v3ident, digest)[source]¶ Generates a path, including the archive path, for a network-status vote with a given valid-after time, generated by the authority with the given v3ident, and with the given digest. For example:
>>> archive = DirectoryArchive("/srv/archive") >>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> v3ident = "D586D18309DED4CD6D57C18FDB97EFA96D330566" # moria1 >>> digest = "663B503182575D242B9D8A67334365FF8ECB53BB" >>> archive.relay_vote_path(valid_after, v3ident, digest) # doctest: +ELLIPSIS '/srv/archive/relay-descriptors/vote/2018/11/19/2018-11-19-15-00-00-vote-D...-...B'
These paths are defined in §5.3.2 of [collector-protocol].
-
bushel.archive.
aglob
(pathname, *, recursive=False)[source]¶ asyncio
wrapper forglob.glob()
.
-
bushel.archive.
collector_422_filename
(valid_after, fingerprint)[source]¶ Create a filename for a bridge status according to §4.2.2 of the [collector-protocol]. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> fingerprint = "BA44A889E64B93FAA2B114E02C2A279A8555C533" # Serge >>> collector_422_filename(valid_after, fingerprint) '20181119-150000-BA44A889E64B93FAA2B114E02C2A279A8555C533'
-
bushel.archive.
collector_431_filename
(valid_after)[source]¶ Create a filename for a network status consensus according to §4.3.1 of the [collector-protocol]. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> collector_431_filename(valid_after) '2018-11-19-15-00-00-consensus'
-
bushel.archive.
collector_433_filename
(valid_after, v3ident, digest)[source]¶ Create a filename for a network status vote according to §4.3.3 of the [collector-protocol].
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> v3ident = "D586D18309DED4CD6D57C18FDB97EFA96D330566" # moria1 >>> digest = "663B503182575D242B9D8A67334365FF8ECB53BB" >>> collector_433_filename(valid_after, v3ident, digest) # doctest: +ELLIPSIS '2018-11-19-15-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-663B...3BB'
Paths in the Collector File Structure Protocol using this filename expect upper-case hex-encoded SHA-1 digests.
>>> v3ident = "d586d18309ded4cd6d57c18fdb97efa96d330566" # Lower case gets corrected >>> digest = "663b503182575d242b9d8a67334365ff8ecb53bb" # Lower case gets corrected >>> collector_433_filename(valid_after, v3ident, digest) # doctest: +ELLIPSIS '2018-11-19-15-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-663B...3BB'
-
bushel.archive.
collector_434_filename
(valid_after)[source]¶ Create a filename for a microdesc-flavoured network status consensus according to §4.3.4 of the [collector-protocol]. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> collector_434_filename(valid_after) '2018-11-19-15-00-00-consensus-microdesc'
-
bushel.archive.
collector_521_path
(subdirectory, marker, published, digest)[source]¶ Create a path according to §5.2.1 of the [collector-protocol]. This is used for server-descriptors and extra-info descriptors for both relays and bridges. For example:
>>> subdirectory = CollectorOutSubdirectory.RELAY_DESCRIPTORS >>> marker = CollectorOutRelayDescsMarker.SERVER_DESCRIPTOR >>> published = datetime.datetime(2018, 11, 19, 9, 17, 56) >>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389" >>> collector_521_path(subdirectory, marker, published, digest) # doctest: +ELLIPSIS 'relay-descriptors/server-descriptor/2018/11/a/9/a94a...389'
Paths in the Collector File Structure Protocol using this substructure expect lower-case hex-encoded SHA-1 digests.
>>> digest = "A94A07B201598D847105AE5FCD5BC3AB10124389" # Upper case gets corrected >>> collector_521_path(subdirectory, marker, published, digest) # doctest: +ELLIPSIS 'relay-descriptors/server-descriptor/2018/11/a/9/a94a...389'
- Parameters
subdirectory (str) – The subdirectory under the “out” directory to use. Standard values can be found in
CollectorOutSubdirectory
.marker (str) – The marker under the subdirectory to use. Standard values can be found in
CollectorOutRelayDescsMarker
andCollectorOutBridgeDescsMarker
.published (datetime) – The published time.
digest (str) – The hex-encoded SHA-1 digest for the descriptor. The case will automatically be fixed to lower-case.
- Returns
Path for the descriptor as a
str
.
-
bushel.archive.
collector_521_substructure
(published, digest)[source]¶ Create a path substructure according to §5.2.1 of the [collector-protocol]. This is used for server-descriptors and extra-info descriptors for both relays and bridges. For example:
>>> published = datetime.datetime(2018, 11, 19, 9, 17, 56) >>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389" >>> collector_521_substructure(published, digest) '2018/11/a/9'
Paths in the Collector File Structure Protocol using this substructure expect lower-case hex-encoded SHA-1 digests.
>>> digest = "A94A07B201598D847105AE5FCD5BC3AB10124389" # Upper case gets corrected >>> collector_521_substructure(published, digest) '2018/11/a/9'
-
bushel.archive.
collector_522_path
(subdirectory, marker, valid_after, filename)[source]¶ Create a path according to §5.2.2 of the [collector-protocol]. This is used for bridge statuses, and network-status consensuses (both ns- and microdesc- flavors) and votes. For a bridge status for example:
>>> subdirectory = CollectorOutSubdirectory.BRIDGE_DESCRIPTORS >>> marker = CollectorOutBridgeDescsMarker.STATUSES >>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> fingerprint = "BA44A889E64B93FAA2B114E02C2A279A8555C533" # Serge >>> filename = collector_422_filename(valid_after, fingerprint) >>> collector_522_path(subdirectory, marker, valid_after, filename) # doctest: +ELLIPSIS 'bridge-descriptors/statuses/2018/11/19/20181119-150000-BA44...533'
Or alternatively for a network-status consensus:
>>> subdirectory = CollectorOutSubdirectory.RELAY_DESCRIPTORS >>> marker = CollectorOutRelayDescsMarker.CONSENSUS >>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> filename = collector_431_filename(valid_after) >>> collector_522_path(subdirectory, marker, valid_after, filename) 'relay-descriptors/consensus/2018/11/19/2018-11-19-15-00-00-consensus'
- Parameters
subdirectory (str) – The subdirectory under the “out” directory to use. Standard values can be found in
CollectorOutSubdirectory
.marker (str) – The marker under the subdirectory to use. Standard values can be found in
CollectorOutRelayDescsMarker
andCollectorOutBridgeDescsMarker
.valid_after (datetime) – The valid_after time.
filename (str) – The filename to use as a
str
, typically created withcollector_422_filename()
for bridge statuses,collector_431_filename()
for network-status consensuses, orcollector_433_filename()
for network-status votes.
- Returns
Path for the descriptor as a
str
.
-
bushel.archive.
collector_522_substructure
(valid_after)[source]¶ Create a path substructure according to §5.2.2 of the [collector-protocol]. This is used for bridge statuses, and network-status consensuses and votes. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> collector_522_substructure(valid_after) '2018/11/19'
-
bushel.archive.
collector_533_substructure
(valid_after)[source]¶ Create a substructure according to §5.3.3 of the [collector-protocol]. This is used for microdesc-flavored consensuses and microdescriptors. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> collector_533_substructure(valid_after) '2018/11'
-
bushel.archive.
collector_534_consensus_path
(valid_after)[source]¶ Create a path according to §5.3.4 of the [collector-protocol] for a microdesc-flavored consensus. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> collector_534_consensus_path(valid_after) 'relay-descriptors/microdesc/2018/11/consensus-microdesc/19/2018-11-19-15-00-00-consensus-microdesc'
-
bushel.archive.
collector_534_microdescriptor_path
(valid_after, digest)[source]¶ Create a path according to §5.3.4 of the [collector-protocol] for a microdescriptor. For example:
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> digest = "00d91cf96321fbd536dd07e297a5e1b7e6961ddd10facdd719716e351453168f" >>> collector_534_microdescriptor_path(valid_after, digest) 'relay-descriptors/microdesc/2018/11/micro/0/0/00d91cf96321fbd536dd07e297a5e1b7e6961ddd10facdd719716e351453168f'
This path in the Collector File Structure Protocol using this substructure expect lower-case hex-encoded SHA-256 digests.
>>> valid_after = datetime.datetime(2018, 11, 19, 15) >>> digest = "00D91CF96321FBD536DD07E297A5E1B7E6961DDD10FACDD719716E351453168F" >>> collector_534_microdescriptor_path(valid_after, digest) 'relay-descriptors/microdesc/2018/11/micro/0/0/00d91cf96321fbd536dd07e297a5e1b7e6961ddd10facdd719716e351453168f'
-
bushel.archive.
parse_file
(path, **kwargs)[source]¶ Parses a descriptor from a file.
- Parameters
str/bytes (content) – String to construct the descriptor from
dict (kwargs) – Additional arguments for
stem.descriptor.Descriptor.parse_file()
.
- Returns
stem.descriptor.Descriptor
subclass for the given content, or a list of descriptors if multiple=True is provided.
-
bushel.archive.
prepare_annotated_content
(descriptor)[source]¶ Encodes annotations and prepends them to the descriptor bytes for writing to disk.
- Parameters
descriptor (Descriptor) – The descriptor to prepare.
- Returns
bytes
for the annotated descriptor.
-
bushel.archive.
valid_after_now
()[source]¶ Takes a good guess at the valid-after time of the latest consensus. There is an assumption that there is a new consensus every hour and that it is valid from the top of the hour. Different valid-after times are compliant with [dir-spec] however, and so this may be wrong.
- Returns
A
datetime
for the top of the hour.