Directory Documents

The bushel.directory.document module provides base classes and utility methods for handling documents that implement the Tor directory protocol version 3 meta format (§1.2 [dir-spec]).

For specific document types, see:

class bushel.directory.document.DirectoryCertificate(raw_content)[source]

A Tor Ed25519 certificate as specified by [cert-spec]. It is not the only certificate format that Tor uses. Typically these are found as the data contained within DirectoryDocumentObject s.

digraph g {
    rankdir=LR;

    certificate [label="Certificate",shape="box",style="filled",fillcolor="yellow"];
    extension [label="Extension",shape="box"];

    certificate->extension [label="has zero or more"];
}
Parameters

raw_content (bytes) – raw certificate contents

Variables
  • data (bytes) – raw certificate contents

  • version (int) – version of the certificate format (currently always 1)

  • cert_type (int) – type of certificate

  • expiration_date (datetime) – expiration date of certificate

  • cert_key_type (int) – type of certified key

  • certified_key (bytes) – an Ed25519 public key if cert_key_type is 1, or a SHA256 hash of some other key type depending on the value of cert_key_type

  • n_extensions (int) – declared number of extensions

  • extensions (list(DirectoryCertificateExtension)) – parsed extensions

  • signature (bytes) – certificate signature

is_valid()[source]

Checks that the certificate is valid. This is the counterpart to verify() that checks that the certificate data conforms to the specification. The two checks performed are:

  • expiration date is not passed

  • there are no extensions that affect validation that we do not understand

Note

In the Tor Metrics use case, we need to check that certificates were valid at the time they were expected to be valid, but the current API does not support this.

parse()[source]

Parses the certificate to make the fields available via instance attributes. This does not validate or verify the certificate, but must be called before making calls to is_valid() or verify().

verify(verify_key_data=None)[source]

Verify the certificate using the verification key. Optionally provide key material, otherwise the key found in the “signed-with-ed25519-key” (type 4) extension will be used.

This only verifies the signature. To validate the certificate data the seperate DirectoryCertificate.is_valid() method must be used.

Warning

This verifies the raw data that the object was initialized with, the fields may have been played with since parsing and the parser may also have unknown bugs.

Parameters

verify_key_data (bytes) – an Ed25519 verification key

class bushel.directory.document.DirectoryCertificateExtension[source]

A Tor Ed25519 certificate extension as specified by [cert-spec].

digraph g {
    rankdir=LR;

    certificate [label="Certificate",shape="box"];
    extension [label="Extension",shape="box",style="filled",fillcolor="yellow"];

    certificate->extension [label="has zero or more"];
}
Variables
  • type (int) – extension type

  • flags (int) – extension flags

  • data (bytes) – extension data

See also

These will be found in DirectoryCertificate s.

class bushel.directory.document.DirectoryDocument(raw_content)[source]

A directory document as described in the Tor directory protocol meta format (§1.2 [dir-spec]).

digraph g {
    rankdir=LR;

    document [label="Document",shape="box",style="filled",fillcolor="yellow"];
    item [label="Item",shape="box"];
    object [label="Object",shape="box"];

    document->item [label="has one or more"];
    item->object [label="has zero or more"];
}
Parameters

raw_content (bytes) – raw document contents

tokenize()[source]

Tokenizes the document using the following tokens:

Kind

Matches on

Value

END

"-----END " Keyword "-----"

Keyword

BEGIN

"-----BEGIN " Keyword "-----"

Keyword

NL

The ascii LF character (hex value 0x0a)

Raw data

PRINTABLE

Printing, non-whitespace, UTF-8

Raw data

WS

Space or tab

Raw data

MISMATCH

Anything else (likely binary nonsense)

Raw data

Note that these tokens do not match the non-terminals exactly as they are specified in the Tor directory protocol meta format. In particular, the PRINTABLE token is used for both keywords and arguments (and object data). It is up to whatever is processing these tokens to decide if something is valid keyword or argument.

>>> document_bytes = b'''super-keyword 3
... onion-magic
... -----BEGIN ONION MAGIC-----
... AQQABp6MAT7yJjlcuWLDbr8A5J8YgyDh5SPYkLpj7fmcBaFbKekjAQAgBADKnR/C
... -----END ONION MAGIC-----
... '''
>>> for token in DirectoryDocument(document_bytes).tokenize():
...     print(token) # doctest: +ELLIPSIS
DirectoryDocumentToken(kind='PRINTABLE', value='super-keyword', line=1, column=0)
DirectoryDocumentToken(kind='WS', value=' ', line=1, column=13)
DirectoryDocumentToken(kind='PRINTABLE', value='3', line=1, column=14)
DirectoryDocumentToken(kind='NL', value='\n', line=1, column=15)
DirectoryDocumentToken(kind='PRINTABLE', value='onion-magic', line=2, column=0)
DirectoryDocumentToken(kind='NL', value='\n', line=2, column=11)
DirectoryDocumentToken(kind='BEGIN', value='ONION MAGIC', line=3, column=0)
DirectoryDocumentToken(kind='PRINTABLE', value='AQQ...DKnR/C', line=4, column=0)
DirectoryDocumentToken(kind='NL', value='\n', line=4, column=64)
DirectoryDocumentToken(kind='END', value='ONION MAGIC', line=5, column=0)
DirectoryDocumentToken(kind='EOF', value=None, line=6, column=0)
Returns

iterator for DirectoryDocumentToken

class bushel.directory.document.DirectoryDocumentItem(keyword, arguments, objects, errors)[source]

A directory document item as described in the Tor directory protocol meta format (§1.2 [dir-spec]).

digraph g {
    rankdir=LR;

    document [label="Document",shape="box"];
    item [label="Item",style="filled",fillcolor="yellow",shape="box"];
    object [label="Object",shape="box"];

    document->item [label="has one or more"];
    item->object [label="has zero or more"];
}
Parameters
Variables
class bushel.directory.document.DirectoryDocumentItemError[source]

Enumeration of forgivable errors that may be encountered during itemization of a directory document.

Name

Description

TRAILING_WHITESPACE

Trailing whitespace on KeywordLines https://bugs.torproject.org/30105

class bushel.directory.document.DirectoryDocumentItemizer(allowed_errors=None)[source]

Parses DirectoryDocumentToken s into DirectoryDocumentItem s. By default this is a strict implementation of the Tor directory protocol meta format (§1.2 [dir-spec]), but this can be relaxed to account for implementation bugs in known Tor implementations.

Items are produced by processing tokens according to a state machine:

digraph g {
    start [label="START"];
    keyword_line [label="KEYWORD-LINE"];
    keyword_line_ws [label="KEYWORD-LINE-WS"];
    keyword_line_end [label="KEYWORD-LINE-END"];
    object_data [label="OBJECT-DATA"];
    object_data_eol [label="OBJECT-DATA-EOL"];

    start -> keyword_line [label="PRINATABLE"];
    keyword_line -> keyword_line_end [label="NL"];
    keyword_line -> keyword_line_ws [label="WS"];
    keyword_line_ws -> keyword_line [label="PRINTABLE"];
    keyword_line_ws -> keyword_line_end [label="NL", color="red"];
    keyword_line_end -> object_data [label="BEGIN"];
    keyword_line_end -> start [label="EOF"];
    keyword_line_end -> keyword_line [label="PRINTABLE"];
    object_data -> object_data_eol [label="PRINTABLE"];
    object_data_eol -> object_data [label="NL"];
    object_data -> keyword_line_end [label="END"];
}

State transitions shown in red would ideally not be needed as they are protocol violations, but implementations of the protocol exist that produce documents requiring these transitions and we need to be bug compatible.

Warning

All printable strings are treated equally right now, so we’re not testing for keywords being the restricted set, nor are we decoding object data yet.

Parameters

allowed_errors (list(DirectoryDocumentItemError)) – A list of errors that will be considered non-fatal during itemization.

class bushel.directory.document.DirectoryDocumentObject[source]

A directory document item as described in the Tor directory protocol meta format (§1.2 [dir-spec]).

digraph g {
    rankdir=LR;

    document [label="Document",shape="box"];
    item [label="Item",shape="box"];
    object [label="Object",shape="box",style="filled",fillcolor="yellow"];

    document->item [label="has one or more"];
    item->object [label="has zero or more"];
}
Variables
class bushel.directory.document.DirectoryDocumentToken[source]
Variables
  • kind (str) – the kind of token

  • value (bytes) – kind-dependent value

  • line (int) – line number

  • column (int) – column number

bushel.directory.document.decode_object_data(lines)[source]

Decodes the base64 encoded data found within directory document objects.

Parameters

lines (list(str)) – the lines as found in a directory document object, not including newlines or the begin/end lines

Returns

the decoded data

Return type

bytes

bushel.directory.document.encode_object_data(data)[source]

Encodes bytes using base64 and wraps the lines at 64 charachters.

Parameters

data (bytes) – the data to be encoded

Returns

the line-wrapped base64 encoded data as a list of strings, one string per line

Return type

list(str)

bushel.directory.document.parse_timestamp(item, argindex=0)[source]

Parses a timestamp from a directory document’s item using the common format from [dir-spec]. This format is not defined explicitly but is used with many keywords including valid-after, fresh-until, and valid-until.

Note

Due to the way the tokenizer works, timestamps are parsed as two arguments split by whitespace. This function takes this into account when parsing the timestamp.

Most items will have the timestamp as the first argument on the keyword line. At the time of writing, there are no keywords defined that expect timestamps at other indexes. Should this be required though, argindex may be used to parse a timestamp from a later argument.

Parameters
  • item (DirectoryDocumentItem) – the directory document item

  • argindex (int) – zero-indexed index of date portion of timestamp, the time portion is expected in argindex+1

Returns

the parsed timestamp

Return type

datetime