aboutsummaryrefslogtreecommitdiff
path: root/lib/regexp-fields.cc
AgeCommit message (Collapse)Author
2023-03-31lib: replace some uses of Query::MatchAll with a thread-safe alternativeKevin Boulain
This replaces two instances of Xapian::Query::MatchAll with the equivalent but thread-safe alternative Xapian::Query(std::string()). Xapian::Query::MatchAll maintains an internal pointer to a refcounted Xapian::Internal::QueryTerm. None of this is thread-safe but that wouldn't be an issue if Xapian::Query::MatchAll wasn't static. Because it's static, the refcounting goes awry when Notmuch is called from multiple threads. This is actually documented by Xapian: https://github.com/xapian/xapian/blob/4715de3a9fcee741587439dc3cc1d2ff01ffeaf2/xapian-core/include/xapian/query.h#L65 While static, Xapian::Query::MatchNothing is safe because it doesn't maintain an internal object and as such, doesn't use references. Two best-effort tests making use of TSan were added to showcase the issue (I couldn't figure out a way to deterministically reproduce it without making an unmaintainable mess). First, when two databases are created in parallel, a query that uses Xapian::Query::MatchAll is made (lib/query.cc), resulting in the following backtrace on a segfault: #0 0x00007ffff76822af in Xapian::Query::get_terms_begin (this=0x7fffe80137f0) at api/query.cc:141 #1 0x00007ffff7f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) at lib/query.cc:176 #2 0x00007ffff7f93784 in _notmuch_query_ensure_parsed_xapian (query=0x7fffe80137c0) at lib/query.cc:225 #3 0x00007ffff7f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) at lib/query.cc:260 #4 0x00007ffff7f93bfe in _notmuch_query_search_documents (query=0x7fffe80137c0, type=0x7ffff7fa9b1e "mail", out=0x7ffff666da18) at lib/query.cc:361 #5 0x00007ffff7f93ba4 in notmuch_query_search_messages (query=0x7fffe80137c0, out=0x7ffff666da18) at lib/query.cc:349 #6 0x00007ffff7f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, progress_notify=0x0, closure=0x0) at lib/database.cc:934 #7 0x00007ffff7fa110f in notmuch_database_create_with_config (database_path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x7ffff7faab3c "", profile=0x0, database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:754 #8 0x00007ffff7fa0d6f in notmuch_database_create_verbose (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:653 #9 0x00007ffff7fa0ceb in notmuch_database_create (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637 ... Second, some queries would make use of Xapian::Query::MatchAll (lib/regexp-fields.cc), resulting in the following backtrace on a segfault: #0 0x00007f629828b690 in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245 #1 0x00007f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms (this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434 #2 0x00007f629828b69f in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245 #3 0x00007f6298282571 in Xapian::Query::get_unique_terms_begin (this=0x7f628800dcd8) at api/query.cc:166 #4 0x00007f629841a59b in Xapian::Weight::Internal::accumulate_stats (this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86 #5 0x00007f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, nowait=true, total_stats=...) at matcher/localsubmatch.cc:172 #6 0x00007f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237 #7 0x00007f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, order_=Xapian::Enquire::ASCENDING, sort_key_=0, sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of length 0, capacity 0, have_sorter=false, have_mdecider=false) at matcher/multimatch.cc:353 #8 0x00007f629826fcba in Xapian::Enquire::Internal::get_mset (this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:569 #9 0x00007f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:937 #10 0x00007f6298be529a in _notmuch_query_search_documents (query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at lib/query.cc:447 #11 0x00007f6298be4ae8 in notmuch_query_search_messages (query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349 ... Printing Xapian::Query::MatchAll->internal.px->_refs in these circumstances can help quickly identifying this scenario. This is motivated by some test frameworks (like Rust's Cargo) that runs unit tests in parallel and would easily encounter this issue, unless client code gates every call to Notmuch behind a lock. This is what can be expected from the tests when they fail: == stderr == +================== +WARNING: ThreadSanitizer: data race (pid=207931) + Read of size 1 at 0x7b10000001a0 by thread T2: + #0 memcpy <null> (libtsan.so.2+0x62506) + #1 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] <null> (libxapian.so.30+0x872b3) + + Previous write of size 8 at 0x7b10000001a0 by thread T1: + #0 operator new(unsigned long) <null> (libtsan.so.2+0x8ba83) + #1 Xapian::Query::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) <null> (libxapian.so.30+0x855cd) ...
2022-03-19lib: do not phrase parse prefixed bracketed subexpressionsDavid Bremner
Since Xapian does not preserve quotes when passing the subquery to a field processor, we have to make a guess as to what the user intended. Here the added assumption is that a string surrounded by parens is not intended to be a phrase.
2022-01-27lib: drop trailing slash for path and folder searches (infix)David Bremner
This resolves an old bug reported by David Edmondson in 2014. The fix is only needed for the "boolean" case, as probabilistic / phrase searching already ignores punctuation. This fix is only for the infix (xapian provided) query parser. [1]: id:cunoasuolcv.fsf@gargravarr.hh.sledj.net
2021-09-04lib: factor out query construction from regexpDavid Bremner
This will allow re-use of this code outside of the Xapian query parser.
2020-04-23build: drop support for xapian versions less than 1.4Tomi Ollila
Xapian 1.4 is over 3 years old now (1.4.0 released 2016-06-24), and 1.2 has been deprecated in Notmuch version 0.27 (2018-06-13). Xapian 1.4 supports compaction, field processors and retry locking; conditionals checking compaction and field processors were removed but user may want to disable retry locking at configure time so it is kept.
2019-06-14lib: run uncrustifyuncrustify
This is the result of running $ uncrustify --replace --config ../devel/uncrustify.cfg *.c *.h *.cc in the lib directory
2019-03-28lib: use phrase search for anything not ending in '*'David Bremner
Anything that does not look like a wildcard should be safe to quote. This should fix the problem searching for xapian keywords.
2019-03-11Prepend regerror() messages with "regexp error: "Luis Ressel
The exact error messages returned by regerror() aren't standardized; relying on them isn't portable. Thus, add a a prefix to make clear that the subsequent message is a regexp parsing error, and only look for this prefix in the test suite, ignoring the rest of the message.
2017-05-09lib: Add regexp expansion for for tags and pathsDavid Bremner
From a UI perspective this looks similar to what was already provided for from, subject, and mid, but the implementation is quite different. It uses the database's list of terms to construct a term based query equivalent to the passed regular expression.
2017-05-09lib: Add regexp searching for mid: prefixDavid Bremner
The bulk of the change is passing in the field options to the regexp field processor, so that we can properly handle the fallback (non-regexp case).
2017-03-29lib: handle empty string in regexp field processorsDavid Bremner
The non-field processor behaviour is is convert the corresponding queries into a search for the unprefixed terms. This yields pretty surprising results so I decided to generate a query that would match the terms (i.e. none with that prefix) generated for an empty header.
2017-03-24lib: only trigger phrase processing for regexp fields when neededDavid Bremner
The argument is that if the string passed to the field processor has no spaces, then the added quotes won't have any benefit except for disabling wildcards. But disabling wildcards doesn't seem very useful in the normal Xapian query parser, since they're stripped before generating terms anyway. It does mean that the query 'from:"foo*"' will not be precisely equivalent to 'from:foo' as it is for the non field-processor version.
2017-03-07lib: Fix RegexpPostingSourceOlly Betts
Remove incorrect skipping to first match from init(), and add explicit skip_to() and check() methods to work around xapian-core bug (the check() method will also improve speed when filtering by one of these).
2017-03-04lib: use delete[] to free buffer allocated using new[]Jani Nikula
Fix warning caught by clang: lib/regexp-fields.cc:41:2: warning: 'delete' applied to a pointer that was allocated with 'new[]'; did you mean 'delete[]'? [-Wmismatched-new-delete] delete buffer; ^ [] lib/regexp-fields.cc:37:17: note: allocated with 'new[]' here char *buffer = new char[len]; ^
2017-03-03lib: regexp matching in 'subject' and 'from'David Bremner
the idea is that you can run % notmuch search subject:/<your-favourite-regexp>/ % notmuch search from:/<your-favourite-regexp>/ or % notmuch search subject:"your usual phrase search" % notmuch search from:"usual phrase search" This feature is only available with recent Xapian, specifically support for field processors is needed. It should work with bindings, since it extends the query parser. This is easy to extend for other value slots, but currently the only value slots are date, message_id, from, subject, and last_mod. Date is already searchable; message_id is left for a followup commit. This was originally written by Austin Clements, and ported to Xapian field processors (from Austin's custom query parser) by yours truly.