<feed xmlns='http://www.w3.org/2005/Atom'>
<title>notmuch/lib, branch debian/bookworm-backports</title>
<subtitle>thread-based email index, search, and tagging</subtitle>
<id>https://git.notmuchmail.org/git/notmuch/atom?h=debian%2Fbookworm-backports</id>
<link rel='self' href='https://git.notmuchmail.org/git/notmuch/atom?h=debian%2Fbookworm-backports'/>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/'/>
<updated>2023-09-23T11:34:48Z</updated>
<entry>
<title>Pass error message from GLib ini parser to CLI</title>
<updated>2023-09-23T11:34:48Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-09-15T12:50:04Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=1c10d91d8e4a3e5bc76ca4c6b9939f3759e6ef5e'/>
<id>urn:sha1:1c10d91d8e4a3e5bc76ca4c6b9939f3759e6ef5e</id>
<content type='text'>
The function _notmuch_config_load_from_file is only called in two
places in open.cc. Update internal API to match the idiom in open.cc.
Adding a newline is needed for consistency with other status strings.

Based in part on a patch [1] from Eric Blake.

[1]: id:20230906153402.101471-1-eblake@redhat.com
</content>
</entry>
<entry>
<title>lib/n_d_remove_message: do not remove unique filename</title>
<updated>2023-07-22T10:15:59Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-07-20T12:08:01Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=b6f144abe1f5aa3519240cf52f4cb9907fefcd0e'/>
<id>urn:sha1:b6f144abe1f5aa3519240cf52f4cb9907fefcd0e</id>
<content type='text'>
It is wasteful to remove a filename term when the whole message
document is about to be removed from the database. Profiling with perf
shows this takes a significant portion of the time when cleaning up
removed files in the database.

The logic of n_d_remove_message becomes a bit more convoluted here in
order to make the change minimal.

It is possible that this function can be further optimized, since the
expansion of filename terms into filenames is probably not needed
here.
</content>
</entry>
<entry>
<title>lib/message: check message type before deleting document</title>
<updated>2023-07-22T10:11:46Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-07-20T12:08:00Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=d93d49b6aed5b3f71651ffe79225da08c7d8f1aa'/>
<id>urn:sha1:d93d49b6aed5b3f71651ffe79225da08c7d8f1aa</id>
<content type='text'>
It isn't really clear how this worked before. Traversing the terms of
a document after deleting it from the database seems likely to be
undefined behaviour at best
</content>
</entry>
<entry>
<title>doc/lib: clarify ownership for notmuch_database_get_revision</title>
<updated>2023-07-09T15:08:28Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-05-29T11:01:40Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=a62b8a95c0e890f11cb39cc2aaea3a4893c8268d'/>
<id>urn:sha1:a62b8a95c0e890f11cb39cc2aaea3a4893c8268d</id>
<content type='text'>
The ownership is implicit in the const declaration (I think!), but
that does not show up in the doxygen generated API docs.
</content>
</entry>
<entry>
<title>lib: index attachments with mime types matching index.as_text</title>
<updated>2023-04-02T22:24:43Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-01-06T00:02:06Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=a554690d6af0ac8cb55166a20efd0f449abde389'/>
<id>urn:sha1:a554690d6af0ac8cb55166a20efd0f449abde389</id>
<content type='text'>
Instead of skipping indexing all attachments, we check of a (user
configured) mime type that is indexable as text.
</content>
</entry>
<entry>
<title>lib: parse index.as_text</title>
<updated>2023-04-02T22:22:36Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-01-06T00:02:05Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=3f5809bf28becbddfed9ff33d6f1242346904c23'/>
<id>urn:sha1:3f5809bf28becbddfed9ff33d6f1242346904c23</id>
<content type='text'>
We pre-parse into a list of compiled regular expressions to avoid
calling regexc on the hot (indexing) path.  As explained in the code
comment, this cannot be done lazily with reasonable error reporting,
at least not without touching a lot of the code in index.cc.
</content>
</entry>
<entry>
<title>lib: add config key INDEX_AS_TEXT</title>
<updated>2023-04-02T22:21:37Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2023-01-06T00:02:04Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=c6733a45c8ff698505ff330d2edce92c90cbc946'/>
<id>urn:sha1:c6733a45c8ff698505ff330d2edce92c90cbc946</id>
<content type='text'>
Higher level processing as a list of regular expressions and
documentation will follow.
</content>
</entry>
<entry>
<title>lib: replace some uses of Query::MatchAll with a thread-safe alternative</title>
<updated>2023-03-31T11:11:39Z</updated>
<author>
<name>Kevin Boulain</name>
<email>kevin@boula.in</email>
</author>
<published>2023-03-02T17:59:15Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=6273966d0b50541a37a652ccf6113f184eff5300'/>
<id>urn:sha1:6273966d0b50541a37a652ccf6113f184eff5300</id>
<content type='text'>
This replaces two instances of Xapian::Query::MatchAll with the
equivalent but thread-safe alternative Xapian::Query(std::string()).
Xapian::Query::MatchAll maintains an internal pointer to a refcounted
Xapian::Internal::QueryTerm.

None of this is thread-safe but that wouldn't be an issue if
Xapian::Query::MatchAll wasn't static. Because it's static, the
refcounting goes awry when Notmuch is called from multiple threads.
This is actually documented by Xapian:
https://github.com/xapian/xapian/blob/4715de3a9fcee741587439dc3cc1d2ff01ffeaf2/xapian-core/include/xapian/query.h#L65

While static, Xapian::Query::MatchNothing is safe because it doesn't
maintain an internal object and as such, doesn't use references.

Two best-effort tests making use of TSan were added to showcase the
issue (I couldn't figure out a way to deterministically reproduce it
without making an unmaintainable mess).

First, when two databases are created in parallel, a query that uses
Xapian::Query::MatchAll is made (lib/query.cc), resulting in the
following backtrace on a segfault:
  #0  0x00007ffff76822af in Xapian::Query::get_terms_begin (this=0x7fffe80137f0) at api/query.cc:141
  #1  0x00007ffff7f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) at lib/query.cc:176
  #2  0x00007ffff7f93784 in _notmuch_query_ensure_parsed_xapian (query=0x7fffe80137c0) at lib/query.cc:225
  #3  0x00007ffff7f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) at lib/query.cc:260
  #4  0x00007ffff7f93bfe in _notmuch_query_search_documents (query=0x7fffe80137c0, type=0x7ffff7fa9b1e "mail", out=0x7ffff666da18) at lib/query.cc:361
  #5  0x00007ffff7f93ba4 in notmuch_query_search_messages (query=0x7fffe80137c0, out=0x7ffff666da18) at lib/query.cc:349
  #6  0x00007ffff7f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, progress_notify=0x0, closure=0x0) at lib/database.cc:934
  #7  0x00007ffff7fa110f in notmuch_database_create_with_config (database_path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x7ffff7faab3c "", profile=0x0, database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:754
  #8  0x00007ffff7fa0d6f in notmuch_database_create_verbose (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:653
  #9  0x00007ffff7fa0ceb in notmuch_database_create (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637
  ...

Second, some queries would make use of Xapian::Query::MatchAll
(lib/regexp-fields.cc), resulting in the following backtrace on a
segfault:
  #0  0x00007f629828b690 in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #1  0x00007f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms (this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434
  #2  0x00007f629828b69f in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #3  0x00007f6298282571 in Xapian::Query::get_unique_terms_begin (this=0x7f628800dcd8) at api/query.cc:166
  #4  0x00007f629841a59b in Xapian::Weight::Internal::accumulate_stats (this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86
  #5  0x00007f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, nowait=true, total_stats=...) at matcher/localsubmatch.cc:172
  #6  0x00007f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237
  #7  0x00007f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, order_=Xapian::Enquire::ASCENDING, sort_key_=0, sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of length 0, capacity 0, have_sorter=false, have_mdecider=false) at matcher/multimatch.cc:353
  #8  0x00007f629826fcba in Xapian::Enquire::Internal::get_mset (this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:569
  #9  0x00007f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:937
  #10 0x00007f6298be529a in _notmuch_query_search_documents (query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at lib/query.cc:447
  #11 0x00007f6298be4ae8 in notmuch_query_search_messages (query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349
  ...

Printing Xapian::Query::MatchAll-&gt;internal.px-&gt;_refs in these
circumstances can help quickly identifying this scenario.

This is motivated by some test frameworks (like Rust's Cargo) that
runs unit tests in parallel and would easily encounter this issue,
unless client code gates every call to Notmuch behind a lock.

This is what can be expected from the tests when they fail:
   == stderr ==
  +==================
  +WARNING: ThreadSanitizer: data race (pid=207931)
  +  Read of size 1 at 0x7b10000001a0 by thread T2:
  +    #0 memcpy &lt;null&gt; (libtsan.so.2+0x62506)
  +    #1 void std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_construct&lt;char*&gt;(char*, char*, std::forward_iterator_tag) [clone .isra.0] &lt;null&gt; (libxapian.so.30+0x872b3)
  +
  +  Previous write of size 8 at 0x7b10000001a0 by thread T1:
  +    #0 operator new(unsigned long) &lt;null&gt; (libtsan.so.2+0x8ba83)
  +    #1 Xapian::Query::Query(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, unsigned int, unsigned int) &lt;null&gt; (libxapian.so.30+0x855cd)
  ...
</content>
</entry>
<entry>
<title>lib/message-property: sync removed properties to the database</title>
<updated>2023-03-30T11:01:09Z</updated>
<author>
<name>Kevin Boulain</name>
<email>kevin@boula.in</email>
</author>
<published>2023-03-29T16:13:32Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=fb55ff28a2fdaa9c218af5ca10b1cae674869edd'/>
<id>urn:sha1:fb55ff28a2fdaa9c218af5ca10b1cae674869edd</id>
<content type='text'>
_notmuch_message_remove_all_properties wasn't syncing the message back
to the database but was still invalidating the metadata, giving the
impression the properties had actually been removed.

Also move the metadata invalidation to _notmuch_message_remove_terms
to be closer to what's done in _notmuch_message_modify_property and
_notmuch_message_remove_term.
</content>
</entry>
<entry>
<title>lib/message-property: catch xapian exceptions</title>
<updated>2023-03-30T10:08:47Z</updated>
<author>
<name>Kevin Boulain</name>
<email>kevin@boula.in</email>
</author>
<published>2023-03-29T16:19:58Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=568f6bc3c2fd2396c05d254e2649750fb82b00b6'/>
<id>urn:sha1:568f6bc3c2fd2396c05d254e2649750fb82b00b6</id>
<content type='text'>
Since libnotmuch exposes a C interface there's no way for clients to
catch this.
Inspired by what's done for tags (see notmuch_message_remove_tag).
</content>
</entry>
</feed>
