Directory Archive

Persistent filesystem-backed archive for Tor directory protocol descriptors. This is intended to be used as part of an asyncio application. File I/O operations are provided by coroutines and coroutine methods, with the actual I/O performed in an executor.

class bushel.archive.CollectorOutBridgeDescsMarker[source]

Enumeration of marker names under the “bridge-descriptors” directory as specified in §5.2 of [collector-protocol].

Name

Description

EXTRA_INFO

Bridge extra-info descriptors (§5.2.1)

SERVER_DESCRIPTOR

Bridge server descriptors (§5.2.1)

STATUS

Bridge statuses (§5.2.2)

class bushel.archive.CollectorOutRelayDescsMarker[source]

Enumeration of marker names under the “relay-descriptors” directory as specified in §5.3 of [collector-protocol].

Name

Description

CONSENSUS

Network status consensuses (§5.3.2)

EXTRA_INFO

Relay extra-info descriptors (§5.3.2)

SERVER_DESCRIPTOR

Relay server descriptors (§5.3.2)

VOTE

Network status votes (§5.3.2)

class bushel.archive.CollectorOutSubdirectory[source]

Enumeration of subdirectory names under the “out” directory as specified in §5.0 of [collector-protocol].

Name

Description

BRIDGE_DESCRIPTORS

Bridge descriptors (§5.2)

EXIT_LISTS

Exit lists (§5.1)

RELAY_DESCRIPTORS

Relay descriptors (§5.3)

TORPERF

Torperf and Onionperf (§5.1)

WEBSTATS

Web server access logs (§5.4)

class bushel.archive.DirectoryArchive(archive_path, max_file_concurrency=100)[source]

Persistent filesystem-backed archive for Tor directory protocol descriptors.

This implements the CollecTor File Structure Protocol as detailed in [collector-protocol].

Parameters

archive_path (str) – Either an absolute or relative path to the location of the directory to use for the archive. This location must exist, but may be an empty directory.

bridge_extra_info_descriptor_path(published, digest)[source]

Generates a path, including the archive path, for a bridge extra-info descriptor with a given published time and digest. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> published = datetime.datetime(2018, 11, 19, 9, 17, 56)
>>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389"
>>> archive.bridge_extra_info_descriptor_path(published, digest)  # doctest: +ELLIPSIS
'/srv/archive/bridge-descriptors/extra-info/2018/11/a/9/a94a...389'

These paths are defined in §5.2.1 of [collector-protocol].

Parameters
  • published (datetime) – The published time of the descriptor.

  • digest (str) – The hex-encoded SHA-1 digest of the descriptor.

Returns

Archive path as a str.

bridge_server_descriptor_path(published, digest)[source]

Generates a path, including the archive path, for a bridge server descriptor with a given published time and digest. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> published = datetime.datetime(2018, 11, 19, 15, 1, 2)
>>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389"
>>> archive.bridge_server_descriptor_path(published, digest)  # doctest: +ELLIPSIS
'/srv/archive/bridge-descriptors/server-descriptor/2018/11/a/9/a94a...389'

These paths are defined in §5.2.1 of [collector-protocol].

Parameters
  • published (datetime) – The published time of the descriptor.

  • digest (str) – The hex-encoded SHA-1 digest of the descriptor.

Returns

Archive path as a str.

bridge_status_path(valid_after, fingerprint)[source]

Generates a path, including the archive path, for a bridge status valid-after time and generated by the authority with the given fingerprint. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> fingerprint = "BA44A889E64B93FAA2B114E02C2A279A8555C533"  # Serge
>>> archive.bridge_status_path(valid_after, fingerprint)  # doctest: +ELLIPSIS
'/srv/archive/bridge-descriptors/statuses/2018/11/19/20181119-150000-BA...33'

These paths are defined in §5.2.2 of [collector-protocol].

Parameters
  • valid_after (datetime) – The valid-after time for the status.

  • fingerprint (str) – The fingerprint of the bridge authority.

Returns

Path as a str.

path_for(descriptor, create_dir=False)[source]

The filesystem path that a descriptor will be archived at. These paths are defined in [collector-protocol].

It is also possible to set descriptor with a str in which case it will be treated as a relative path from the root of the archive. For example:

>>> DirectoryArchive("/srv/archive").path_for("path/to/descriptor")
'/srv/archive/path/to/descriptor'
Parameters

create_dir (bool) – Create the directory ready to archive a descriptor.

Returns

Archive path for the descriptor as a str.

relay_consensus(flavor='ns', valid_after=None)[source]

Retrieves a consensus from the archive.

Parameters

valid_after (datetime) – If set, will retrieve a consensus with the given valid_after time, otherwise a vote that became valid at the top of the current hour will be retrieved.

Returns

A NetworkStatusDocumentV3 if found, otherwise None.

relay_consensus_path(valid_after)[source]

Generates a path, including the archive path, for a network-status consensus with a given valid-after time. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> archive.relay_consensus_path(valid_after)
'/srv/archive/relay-descriptors/consensus/2018/11/19/2018-11-19-15-00-00-consensus'

These paths are defined in §5.3.2 of [collector-protocol].

Parameters
  • valid_after (datetime) – The valid-after time for the status.

  • fingerprint (str) – The fingerprint of the bridge authority.

Returns

Path as a str.

relay_extra_info_descriptor(digest, published_hint)[source]

Retrieves a relay’s extra-info descriptor from the archive.

Parameters
  • digest (str) – A hex-encoded digest of the descriptor.

  • published_hint (datetime) – Provides a hint on the published time to allow the descriptor to be found in the archive. If the descriptor was not published in the same month as this, it will not be found.

Returns

A RelayExtraInfoDescriptor if found, otherwise None.

relay_extra_info_descriptor_path(published, digest)[source]

Generates a path, including the archive path, for a relay extra-info descriptor with a given published time and digest. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> published = datetime.datetime(2018, 11, 19, 9, 17, 56)
>>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389"
>>> archive.relay_extra_info_descriptor_path(published, digest)  # doctest: +ELLIPSIS
'/srv/archive/relay-descriptors/extra-info/2018/11/a/9/a94a...389'

These paths are defined in §5.3.2 of [collector-protocol].

Parameters
  • published (datetime) – The published time of the descriptor.

  • digest (str) – The hex-encoded SHA-1 digest of the descriptor.

Returns

Path as a str.

relay_extra_info_descriptors(digests, published_hint)[source]

Retrieves multiple extra-info descriptors published around the same time (e.g. all referenced by server-descriptors in the same consensus).

Parameters
  • digest (list(str)) – Hex-encoded digests for the descriptors.

  • published_hint (datetime) – Provides a hint on the published time to allow the descriptor to be found in the archive. If the descriptor was not published in the same month as this, it will not be found.

Returns

A list of stem.descriptor.extrainfo_descriptor.RelayExtraInfoDescriptor.

relay_microdescriptor(digest, valid_after_hint)[source]

Retrieves a relay’s microdescriptor from the archive.

Parameters
  • digest (str) – A hex-encoded digest of the descriptor.

  • valid_after_hint (datetime) – Provides a hint on the valid_after time to allow the descriptor to be found in the archive. If the descriptor did not become valid in the same month as this, it will not be found.

Returns

A stem.descriptor.microdescriptor.Microdescriptor if found, otherwise None.

relay_microdescriptors(digests, valid_after_hint)[source]

Retrieves multiple microdescriptors around the same valid_after time (e.g. all referenced by the same microdescriptor consensus).

Parameters
  • digest (list(str)) – Hex-encoded digests for the descriptors.

  • valid_after_hint (datetime) – Provides a hint on the valid_after time to allow the descriptor to be found in the archive. If the descriptor did not become valid in the same month as this, it will not be found.

Returns

A list of stem.descriptor.microdescriptor.Microdescriptor.

relay_server_descriptor(digest, published_hint)[source]

Retrieves a relay’s server descriptor from the archive.

Parameters
  • digest (str) – A hex-encoded digest of the descriptor.

  • published_hint (datetime) – Provides a hint on the published time to allow the descriptor to be found in the archive. If the descriptor was not published in the same month as this, it will not be found.

Returns

A stem.descriptor.server_descriptor.RelayDescriptor if found, otherwise None.

relay_server_descriptor_path(published, digest)[source]

Generates a path, including the archive path, for a relay server descriptor with a given published time and digest. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> published = datetime.datetime(2018, 11, 19, 15, 1, 2)
>>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389"
>>> archive.relay_server_descriptor_path(published, digest)  # doctest: +ELLIPSIS
'/srv/archive/relay-descriptors/server-descriptor/2018/11/a/9/a94a...389'

These paths are defined in §5.3.2 of [collector-protocol].

Parameters
  • published (datetime) – The published time of the descriptor.

  • digest (str) – The hex-encoded SHA-1 digest of the descriptor.

Returns

Path as a str.

relay_server_descriptors(digests, published_hint)[source]

Retrieves multiple server descriptors published around the same time (e.g. all referenced by the same consensus).

Parameters
  • digest (list(str)) – Hex-encoded digests for the descriptors.

  • published_hint (datetime) – Provides a hint on the published time to allow the descriptor to be found in the archive. If the descriptor was not published in the same month as this, it will not be found.

Returns

A list of stem.descriptor.server_descriptor.RelayDescriptor.

relay_vote(v3ident, digest='*', valid_after=None)[source]

Retrieves a vote from the archive.

Parameters
  • v3ident (str) – The v3ident of the authority that created the vote.

  • digest (str) – A hex-encoded digest of the vote. This will automatically be fixed to upper-case.

  • valid_after (datetime) – If set, will retrieve a consensus with the given valid_after time, otherwise a vote that became valid at the top of the current hour will be retrieved.

Returns

A NetworkStatusDocumentV3 if found, otherwise None.

relay_vote_path(valid_after, v3ident, digest)[source]

Generates a path, including the archive path, for a network-status vote with a given valid-after time, generated by the authority with the given v3ident, and with the given digest. For example:

>>> archive = DirectoryArchive("/srv/archive")
>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> v3ident = "D586D18309DED4CD6D57C18FDB97EFA96D330566"  # moria1
>>> digest = "663B503182575D242B9D8A67334365FF8ECB53BB"
>>> archive.relay_vote_path(valid_after, v3ident, digest)  # doctest: +ELLIPSIS
'/srv/archive/relay-descriptors/vote/2018/11/19/2018-11-19-15-00-00-vote-D...-...B'

These paths are defined in §5.3.2 of [collector-protocol].

Parameters
  • valid_after (datetime) – The valid-after time.

  • v3ident (str) – The v3ident of the directory authority.

  • digest (str) – The digest of the vote.

Returns

Path as a str.

bushel.archive.aglob(pathname, *, recursive=False)[source]

asyncio wrapper for glob.glob().

bushel.archive.collector_422_filename(valid_after, fingerprint)[source]

Create a filename for a bridge status according to §4.2.2 of the [collector-protocol]. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> fingerprint = "BA44A889E64B93FAA2B114E02C2A279A8555C533" # Serge
>>> collector_422_filename(valid_after, fingerprint)
'20181119-150000-BA44A889E64B93FAA2B114E02C2A279A8555C533'
Parameters
  • valid_after (datetime) – The valid-after time.

  • fingerprint (str) – The fingerprint of the bridge authority.

Returns

Filename as a str.

bushel.archive.collector_431_filename(valid_after)[source]

Create a filename for a network status consensus according to §4.3.1 of the [collector-protocol]. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> collector_431_filename(valid_after)
'2018-11-19-15-00-00-consensus'
Parameters

valid_after (datetime) – The valid-after time.

Returns

Filename as a str.

bushel.archive.collector_433_filename(valid_after, v3ident, digest)[source]

Create a filename for a network status vote according to §4.3.3 of the [collector-protocol].

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> v3ident = "D586D18309DED4CD6D57C18FDB97EFA96D330566"  # moria1
>>> digest = "663B503182575D242B9D8A67334365FF8ECB53BB"
>>> collector_433_filename(valid_after, v3ident, digest)  # doctest: +ELLIPSIS
'2018-11-19-15-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-663B...3BB'

Paths in the Collector File Structure Protocol using this filename expect upper-case hex-encoded SHA-1 digests.

>>> v3ident = "d586d18309ded4cd6d57c18fdb97efa96d330566"  # Lower case gets corrected
>>> digest = "663b503182575d242b9d8a67334365ff8ecb53bb"  # Lower case gets corrected
>>> collector_433_filename(valid_after, v3ident, digest)  # doctest: +ELLIPSIS
'2018-11-19-15-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-663B...3BB'
Parameters
  • valid_after (datetime) – The valid-after time.

  • v3ident (str) – The v3ident of the directory authority.

  • digest (str) – The digest of the vote.

Returns

Filename as a str.

bushel.archive.collector_434_filename(valid_after)[source]

Create a filename for a microdesc-flavoured network status consensus according to §4.3.4 of the [collector-protocol]. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> collector_434_filename(valid_after)
'2018-11-19-15-00-00-consensus-microdesc'
Parameters

valid_after (datetime) – The valid-after time.

Returns

Filename as a str.

bushel.archive.collector_521_path(subdirectory, marker, published, digest)[source]

Create a path according to §5.2.1 of the [collector-protocol]. This is used for server-descriptors and extra-info descriptors for both relays and bridges. For example:

>>> subdirectory = CollectorOutSubdirectory.RELAY_DESCRIPTORS
>>> marker = CollectorOutRelayDescsMarker.SERVER_DESCRIPTOR
>>> published = datetime.datetime(2018, 11, 19, 9, 17, 56)
>>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389"
>>> collector_521_path(subdirectory, marker, published, digest)  # doctest: +ELLIPSIS
'relay-descriptors/server-descriptor/2018/11/a/9/a94a...389'

Paths in the Collector File Structure Protocol using this substructure expect lower-case hex-encoded SHA-1 digests.

>>> digest = "A94A07B201598D847105AE5FCD5BC3AB10124389" # Upper case gets corrected
>>> collector_521_path(subdirectory, marker, published, digest)  # doctest: +ELLIPSIS
'relay-descriptors/server-descriptor/2018/11/a/9/a94a...389'
Parameters
Returns

Path for the descriptor as a str.

bushel.archive.collector_521_substructure(published, digest)[source]

Create a path substructure according to §5.2.1 of the [collector-protocol]. This is used for server-descriptors and extra-info descriptors for both relays and bridges. For example:

>>> published = datetime.datetime(2018, 11, 19, 9, 17, 56)
>>> digest = "a94a07b201598d847105ae5fcd5bc3ab10124389"
>>> collector_521_substructure(published, digest)
'2018/11/a/9'

Paths in the Collector File Structure Protocol using this substructure expect lower-case hex-encoded SHA-1 digests.

>>> digest = "A94A07B201598D847105AE5FCD5BC3AB10124389" # Upper case gets corrected
>>> collector_521_substructure(published, digest)
'2018/11/a/9'
Parameters
  • published (datetime) – The published time.

  • digest (str) – The hex-encoded SHA-1 digest for the descriptor. The case will automatically be fixed to lower-case.

Returns

Path substructure as a str.

bushel.archive.collector_522_path(subdirectory, marker, valid_after, filename)[source]

Create a path according to §5.2.2 of the [collector-protocol]. This is used for bridge statuses, and network-status consensuses (both ns- and microdesc- flavors) and votes. For a bridge status for example:

>>> subdirectory = CollectorOutSubdirectory.BRIDGE_DESCRIPTORS
>>> marker = CollectorOutBridgeDescsMarker.STATUSES
>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> fingerprint = "BA44A889E64B93FAA2B114E02C2A279A8555C533" # Serge
>>> filename = collector_422_filename(valid_after, fingerprint)
>>> collector_522_path(subdirectory, marker, valid_after, filename)  # doctest: +ELLIPSIS
'bridge-descriptors/statuses/2018/11/19/20181119-150000-BA44...533'

Or alternatively for a network-status consensus:

>>> subdirectory = CollectorOutSubdirectory.RELAY_DESCRIPTORS
>>> marker = CollectorOutRelayDescsMarker.CONSENSUS
>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> filename = collector_431_filename(valid_after)
>>> collector_522_path(subdirectory, marker, valid_after, filename)
'relay-descriptors/consensus/2018/11/19/2018-11-19-15-00-00-consensus'
Parameters
Returns

Path for the descriptor as a str.

bushel.archive.collector_522_substructure(valid_after)[source]

Create a path substructure according to §5.2.2 of the [collector-protocol]. This is used for bridge statuses, and network-status consensuses and votes. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> collector_522_substructure(valid_after)
'2018/11/19'
Parameters

valid_after (datetime) – The valid-after time.

Returns

Path substructure as a str.

bushel.archive.collector_533_substructure(valid_after)[source]

Create a substructure according to §5.3.3 of the [collector-protocol]. This is used for microdesc-flavored consensuses and microdescriptors. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> collector_533_substructure(valid_after)
'2018/11'
bushel.archive.collector_534_consensus_path(valid_after)[source]

Create a path according to §5.3.4 of the [collector-protocol] for a microdesc-flavored consensus. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> collector_534_consensus_path(valid_after)
'relay-descriptors/microdesc/2018/11/consensus-microdesc/19/2018-11-19-15-00-00-consensus-microdesc'
bushel.archive.collector_534_microdescriptor_path(valid_after, digest)[source]

Create a path according to §5.3.4 of the [collector-protocol] for a microdescriptor. For example:

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> digest = "00d91cf96321fbd536dd07e297a5e1b7e6961ddd10facdd719716e351453168f"
>>> collector_534_microdescriptor_path(valid_after, digest)
'relay-descriptors/microdesc/2018/11/micro/0/0/00d91cf96321fbd536dd07e297a5e1b7e6961ddd10facdd719716e351453168f'

This path in the Collector File Structure Protocol using this substructure expect lower-case hex-encoded SHA-256 digests.

>>> valid_after = datetime.datetime(2018, 11, 19, 15)
>>> digest = "00D91CF96321FBD536DD07E297A5E1B7E6961DDD10FACDD719716E351453168F"
>>> collector_534_microdescriptor_path(valid_after, digest)
'relay-descriptors/microdesc/2018/11/micro/0/0/00d91cf96321fbd536dd07e297a5e1b7e6961ddd10facdd719716e351453168f'
bushel.archive.parse_file(path, **kwargs)[source]

Parses a descriptor from a file.

Parameters
  • str/bytes (content) – String to construct the descriptor from

  • dict (kwargs) – Additional arguments for stem.descriptor.Descriptor.parse_file().

Returns

stem.descriptor.Descriptor subclass for the given content, or a list of descriptors if multiple=True is provided.

bushel.archive.prepare_annotated_content(descriptor)[source]

Encodes annotations and prepends them to the descriptor bytes for writing to disk.

Parameters

descriptor (Descriptor) – The descriptor to prepare.

Returns

bytes for the annotated descriptor.

bushel.archive.valid_after_now()[source]

Takes a good guess at the valid-after time of the latest consensus. There is an assumption that there is a new consensus every hour and that it is valid from the top of the hour. Different valid-after times are compliant with [dir-spec] however, and so this may be wrong.

Returns

A datetime for the top of the hour.