aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorDavid Bremner <david@tethera.net>2022-06-23 09:30:44 -0300
committerDavid Bremner <david@tethera.net>2022-07-07 06:56:05 -0300
commit6219e7380ae34cc0c8142f4174bee3cde9bf9662 (patch)
tree633a3b5ac38b66c8d33f171320c9b2c52d28fadc /doc
parentb07e121923a4ca00d0ec68ba9eebe8dafb70e13a (diff)
CL/git: add format version 1
The original nmbug format (now called version 0) creates 1 subdirectory of 'tags/' per message. This causes problems for more than (roughly) 100k messages. Version 1 introduces 2 layers of hashed directories. This scheme was chose to balance the number of subdirectories with the number of extra directories (and git objects) created via hashing. This should be upward compatible in the sense that old repositories will continue to work with the updated notmuch-git.
Diffstat (limited to 'doc')
-rw-r--r--doc/man1/notmuch-git.rst40
1 files changed, 36 insertions, 4 deletions
diff --git a/doc/man1/notmuch-git.rst b/doc/man1/notmuch-git.rst
index fa7a748e..59d02fb4 100644
--- a/doc/man1/notmuch-git.rst
+++ b/doc/man1/notmuch-git.rst
@@ -235,14 +235,46 @@ REPOSITORY CONTENTS
===================
The tags are stored in the git repo (and exported) as a set of empty
-files. For a message with Message-Id *id*, for each tag *tag*, there
+files. These empty files are contained within a directory named after
+the message-id.
+
+In what follows `encode()` represents a POSIX filesystem safe
+encoding. The encoding preserves alphanumerics, and the characters
+`+-_@=.,:`. All other octets are replaced with `%` followed by a two
+digit hex number.
+
+Currently :any:`notmuch-git` can read any format version, but can only
+create (via :any:`init`) :ref:`version 1 <format_version_1>` repositories.
+
+.. _format_version_0:
+
+Version 0
+---------
+
+This is the legacy format created by the `nmbug` tool prior to release
+0.37. For a message with Message-Id *id*, for each tag *tag*, there
is an empty file with path
tags/ `encode` (*id*) / `encode` (*tag*)
-The encoding preserves alphanumerics, and the characters `+-_@=.,:`.
-All other octets are replaced with `%` followed by a two digit hex
-number.
+.. _format_version_1:
+
+Version 1
+---------
+
+In format version 1 and later, the format version is contained in a
+top level file called FORMAT.
+
+For a message with Message-Id *id*, for each tag *tag*, there
+is an empty file with path
+
+ tags/ `hash1` (*id*) / `hash2` (*id*) `encode` (*id*) / `encode` (*tag*)
+
+The hash functions each represent one byte of the `blake2b` hex
+digest.
+
+Compared to :ref:`version 0 <format_version_0>`, this reduces the
+number of subdirectories within each directory.
.. _repo_location: