summaryrefslogtreecommitdiff
path: root/lib/message.cc
AgeCommit message (Collapse)Author
2020-05-04lib: replace STRNCMP_LITERAL in __message_remove_indexed_termsDavid Bremner
strncmp looks for a prefix that matches, which is very much not what we want here. This fixes the bug reported by Franz Fellner in id:1588595993-ner-8.651@TPL520
2019-06-14lib: run uncrustifyuncrustify
This is the result of running $ uncrustify --replace --config ../devel/uncrustify.cfg *.c *.h *.cc in the lib directory
2019-05-29indexing: record protected subject when indexing cleartextDaniel Kahn Gillmor
When indexing the cleartext of an encrypted message, record any protected subject in the database, which should make it findable and visible in search. Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
2019-05-25lib: support user prefix names in term generationDavid Bremner
This should not change the indexing process yet as nothing calls _notmuch_message_gen_terms with a user prefix name. On the other hand, it should not break anything either. _notmuch_database_prefix does a linear walk of the list of (built-in) prefixes, followed by a logarithmic time search of the list of user prefixes. The latter is probably not really noticable.
2019-05-23n_m_remove_indexed_terms: reduce number of Xapian API calls.David Bremner
Previously this functioned scanned every term attached to a given Xapian document. It turns out we know how to read only the terms we need to preserve (and we might have already done so). This commit replaces many calls to Xapian::Document::remove_term with one call to ::clear_terms, and a (typically much smaller) number of calls to ::add_term. Roughly speaking this is based on the assumption that most messages have more text than they have tags. According to the performance test suite, this yields a roughly 40% speedup on "notmuch reindex '*'"
2019-04-17lib: add 'body:' field, stop indexing headers twice.David Bremner
The new `body:` field (in Xapian terms) or prefix (in slightly sloppier notmuch) terms allows matching terms that occur only in the body. Unprefixed query terms should continue to match anywhere (header or body) in the message. This follows a suggestion of Olly Betts to use the facility (since Xapian 1.0.4) to add the same field with multiple prefixes. The double indexing of previous versions is thus replaced with a query time expension of unprefixed query terms to the various prefixed equivalent. Reindexing will be needed for 'body:' searches to work correctly; otherwise they will also match messages where the term occur in headers (demonstrated by the new tests in T530-upgrade.sh)
2018-09-06lib: calculate message depth in threadDavid Bremner
This will be used in reparenting messages without useful in-reply-to, but with useful references
2018-09-06lib: read reference terms into message struct.David Bremner
The plan is to use these in resolving threads.
2018-09-06lib/thread: sort sibling messages by dateDavid Bremner
For non-root messages, this should not should anything currently, as the messages are already added in date order. In the future we will add some non-root messages in a second pass out of order and the sorting will be useful. It does fix the order of multiple root-messages (although it is overkill for that).
2018-05-26lib: make notmuch_message_get_database() take a const notmuch_message_t*Daniel Kahn Gillmor
This is technically an API change, but it is not an ABI change, and it's merely a statement that limits what the library can do. This is in parallel to notmuch_query_get_database(), which also takes a const pointer.
2018-05-26lib: expose notmuch_message_get_database()Daniel Kahn Gillmor
We've had _notmuch_message_database() internally for a while, and it's useful. It turns out to be useful on the other side of the library interface as well (i'll use it later in this series for "notmuch show"), so we expose it publicly now.
2018-05-07lib: define specialized get_thread_id for use in thread subqueryDavid Bremner
The observation is that we are only using the messages to get there thread_id, which is kindof a pessimal access pattern for the current notmuch_message_get_thread_id
2017-12-08cli/reindex: destroy stashed session keys when --decrypt=falseDaniel Kahn Gillmor
There are some situations where the user wants to get rid of the cleartext index of a message. For example, if they're indexing encrypted messages normally, but suddenly they run across a message that they really don't want any trace of in their index. In that case, the natural thing to do is: notmuch reindex --decrypt=false id:whatever@example.biz But of course, clearing the cleartext index without clearing the stashed session key is just silly. So we do the expected thing and also destroy any stashed session keys while we're destroying the index of the cleartext. Note that stashed session keys are stored in the xapian database, but xapian does not currently allow safe deletion (see https://trac.xapian.org/ticket/742). As a workaround, after removing session keys and cleartext material from the database, the user probably should do something like "notmuch compact" to try to purge whatever recoverable data is left in the xapian freelist. This problem really needs to be addressed within xapian, though, if we want it fixed right.
2017-10-21crypto: index encrypted parts when indexopts try_decrypt is set.Daniel Kahn Gillmor
If we see index options that ask us to decrypt when indexing a message, and we encounter an encrypted part, we'll try to descend into it. If we can decrypt, we add the property index.decryption=success. If we can't decrypt (or recognize the encrypted type of mail), we add the property index.decryption=failure. Note that a single message may have both values of the "index.decryption" property: "success" and "failure". For example, consider a message that includes multiple layers of encryption. If we manage to decrypt the outer layer ("index.decryption=success"), but fail on the inner layer ("index.decryption=failure"). Because of the property name, this will be automatically cleared (and possibly re-set) during re-indexing. This means it will subsequently correspond to the actual semantics of the stored index.
2017-10-21reindex: drop all properties named with prefix "index."Daniel Kahn Gillmor
This allows us to create new properties that will be automatically set during indexing, and cleared during re-indexing, just by choice of property name.
2017-10-09lib: convert notmuch_bool_t to stdbool internallyJani Nikula
C99 stdbool turned 18 this year. There really is no reason to use our own, except in the library interface for backward compatibility. Convert the lib internally to stdbool.
2017-09-05lib: enforce that n_message_reindex takes headers from first fileDavid Bremner
This is still a bit stopgap to be only choosing one set of headers, but this seems like a more defensible set of headers to choose.
2017-08-29lib: add notmuch_message_has_maildir_flagDavid Bremner
I considered a higher level interface where the caller passes a tag name rather than a flag character, but the role of the "unread" tag is particularly confusing with such an interface.
2017-08-29lib/message: split n_m_maildir_flags_tags, store maildir flagsDavid Bremner
In a future commit this will allow querying maildir flags seperately from tags to allow resolving certain conflicts.
2017-08-23reindex: drop notmuch_param_t, use notmuch_indexopts_t insteadDaniel Kahn Gillmor
There are at least three places in notmuch that can trigger an indexing action: * notmuch new * notmuch insert * notmuch reindex I have plans to add some indexing options (e.g. indexing the cleartext of encrypted parts, external filters, automated property injection) that should properly be available in all places where indexing happens. I also want those indexing options to be exposed by (and constrained by) the libnotmuch C API. This isn't yet an API break because we've never made a release with notmuch_param_t. These indexing options are relevant in the listed places (and in the libnotmuch analogues), but they aren't relevant in the other kinds of functionality that notmuch offers (e.g. dump/restore, tagging, search, show, reply). So i think a generic "param" object isn't well-suited for this case. In particular: * a param object sounds like it could contain parameters for some other (non-indexing) operation. This sounds confusing -- why would i pass non-indexing parameters to a function that only does indexing? * bremner suggests online a generic param object would actually be passed as a list of param objects, argv-style. In this case (at least in the obvious argv implementation), the params might be some sort of generic string. This introduces a problem where the API of the library doesn't grow as new options are added, which means that when code outside the library tries to use a feature, it first has to test for it, and have code to handle it not being available. The indexopts approach proposed here instead makes it clear at compile time and at dynamic link time that there is an explicit dependency on that feature, which allows automated tools to keep track of what's needed and keeps the actual code simple. My proposal adds the notmuch_indexopts_t as an opaque struct, so that we can extend the list of options without causing ABI breakage. The cost of this proposal appears to be that the "boilerplate" API increases a little bit, with a generic constructor and destructor function for the indexopts struct. More patches will follow that make use of this indexopts approach.
2017-08-01lib: add notmuch_message_reindexDaniel Kahn Gillmor
This new function asks the database to reindex a given message. The parameter `indexopts` is currently ignored, but is intended to provide an extensible API to support e.g. changing the encryption or filtering status (e.g. whether and how certain non-plaintext parts are indexed).
2017-08-01lib: add _notmuch_message_remove_indexed_termsDavid Bremner
Testing will be provided via use in notmuch_message_reindex
2017-08-01lib: add notmuch_message_count_filesDavid Bremner
This operation is relatively inexpensive, as the needed metadata is already computed by our lazy metadata fetching. The goal is to support better UI for messages with multipile files.
2017-07-14lib: wrap use of g_mime_utils_header_decode_dateDavid Bremner
This changes return type in gmime 3.0
2017-05-13build: visibility=default for library structs is no longer neededJani Nikula
Commit d5523ead90b6 ("Mark some structures in the library interface with visibility=default attribute.") fixed some mixed visibility issues with structs. With the symbol default visibility reversed, this is no longer a problem.
2017-04-20Replace index(3) with strchr(3)Fredrik Fornwall
The index(3) function has been deprecated in POSIX since 2001 and removed in 2008, and most code in notmuch already calls strchr(3). This fixes a compilation error on Android whose libc does not have index(3).
2017-03-22lib: replace deprecated n_q_count_messages with status returning versionDavid Bremner
This function was deprecated in notmuch 0.21. We re-use the name for a status returning version, and deprecate the _st name. One or two remaining uses of the (removed) non-status returning version fixed at the same time
2017-03-18Merge branch 'release'David Bremner
Merge in memory fixes
2017-03-18lib/message.cc: fix Coverity finding (use after free)Tomi Ollila
The object where pointer to `data` was received was deleted before it was used in _notmuch_string_list_append(). Relevant Coverity messages follow: 3: extract Assigning: data = std::__cxx11::string(message->doc.()).c_str(), which extracts wrapped state from temporary of type std::__cxx11::string. 4: dtor_free The internal representation of temporary of type std::__cxx11::string is freed by its destructor. 5: use after free: Wrapper object use after free (WRAPPER_ESCAPE) Using internal representation of destroyed object local data.
2017-03-15lib: clamp return value of g_mime_utils_header_decode_date to >=0David Bremner
For reasons not completely understood at this time, gmime (as of 2.6.22) is returning a date before 1900 on bad date input. Since this confuses some other software, we clamp such dates to 0, i.e. 1970-01-01.
2017-02-25lib/message.cc: use view number to invalidate cached metadataDavid Bremner
Currently the view number is incremented by notmuch_database_reopen
2017-02-25lib: handle DatabaseModifiedError in _n_message_ensure_metadataDavid Bremner
The retries are hardcoded to a small number, and error handling aborts than propagating errors from notmuch_database_reopen. These are both somewhat justified by the assumption that most things that can go wrong in Xapian::Database::reopen are rare and fatal. Here's the brief discussion with Xapian upstream: 24-02-2017 08:12:57 < bremner> any intuition about how likely Xapian::Database::reopen is to fail? I'm catching a DatabaseModifiedError somewhere where handling any further errors is tricky, and wondering about treating a failed reopen as as "the impossible happened, stopping" 24-02-2017 16:22:34 < olly> bremner: there should not be much scope for failure - stuff like out of memory or disk errors, which are probably a good enough excuse to stop
2017-02-23lib: make _notmuch_message_ensure_property_map staticDavid Bremner
It's not called outside message.cc
2017-02-23lib: make _notmuch_message_ensure_metadata staticDavid Bremner
It's not called anywhere outside message.cc.
2016-09-21lib: basic message-property APIDavid Bremner
Initially, support get, set and removal of single key/value pair, as well as removing all properties.
2016-09-21lib: read "property" terms from messages.David Bremner
This is a first step towards providing an API to attach arbitrary (key,value) pairs to messages and retrieve all of the values for a given key.
2016-06-05Use https instead of http where possibleDaniel Kahn Gillmor
Many of the external links found in the notmuch source can be resolved using https instead of http. This changeset addresses as many as i could find, without touching the e-mail corpus or expected outputs found in tests.
2016-06-05lib: whitespace cleanupTomi Ollila
Cleaned the following whitespace in lib/* files: lib/index.cc: 1 line: trailing whitespace lib/database.cc 5 lines: 8 spaces at the beginning of line lib/notmuch-private.h: 4 lines: 8 spaces at the beginning of line lib/message.cc: 1 line: trailing whitespace lib/sha1.c: 1 line: empty lines at the end of file lib/query.cc: 2 lines: 8 spaces at the beginning of line lib/gen-version-script.sh: 1 line: trailing whitespace
2016-04-15complete ghost-on-removal-when-shared-thread-existsDaniel Kahn Gillmor
To fully complete the ghost-on-removal-when-shared-thread-exists proposal, we need to clear all ghost messages when the last active message is removed from a thread. Amended by db: Remove the last test of T530, as it no longer makes sense if we are garbage collecting ghost messages.
2016-04-15On deletion, replace with ghost when other active messages in threadDaniel Kahn Gillmor
There is no need to add a ghost message upon deletion if there are no other active messages in the thread. Also, if the message being deleted was a ghost already, we can just go ahead and delete it.
2016-04-15Introduce _notmuch_message_has_term()Daniel Kahn Gillmor
It can be useful to easily tell if a given message has a given term associated with it.
2016-04-15fix thread breakage via ghost-on-removalDaniel Kahn Gillmor
implement ghost-on-removal, the solution to T590-thread-breakage.sh that just adds a ghost message after removing each message. It leaks information about whether we've ever seen a given message id, but it's a fairly simple implementation. Note that _resolve_message_id_to_thread_id already introduces new message_ids to the database, so i think just searching for a given message ID may introduce the same metadata leakage.
2016-01-16clean up stray apostrophe in commentDaniel Kahn Gillmor
This is a nit-picky orthographical fix for an nit-picky ontological comment.
2016-01-16correct comment referring to notmuch_database_remove_messageDaniel Kahn Gillmor
notmuch_database_remove_message has no leading underscore in its name.
2015-08-13lib: Add per-message last modification trackingAustin Clements
This adds a new document value that stores the revision of the last modification to message metadata, where the revision number increases monotonically with each database commit. An alternative would be to store the wall-clock time of the last modification of each message. In principle this is simpler and has the advantage that any process can determine the current timestamp without support from libnotmuch. However, even assuming a computer's clock never goes backward and ignoring clock skew in networked environments, this has a fatal flaw. Xapian uses (optimistic) snapshot isolation, which means reads can be concurrent with writes. Given this, consider the following time line with a write and two read transactions: write |-X-A--------------| read 1 |---B---| read 2 |---| The write transaction modifies message X and records the wall-clock time of the modification at A. The writer hangs around for a while and later commits its change. Read 1 is concurrent with the write, so it doesn't see the change to X. It does some query and records the wall-clock time of its results at B. Transaction read 2 later starts after the write commits and queries for changes since wall-clock time B (say the reads are performing an incremental backup). Even though read 1 could not see the change to X, read 2 is told (correctly) that X has not changed since B, the time of the last read. In fact, X changed before wall-clock time A, but the change was not visible until *after* wall-clock time B, so read 2 misses the change to X. This is tricky to solve in full-blown snapshot isolation, but because Xapian serializes writes, we can use a simple, monotonically increasing database revision number. Furthermore, maintaining this revision number requires no more IO than a wall-clock time solution because Xapian already maintains statistics on the upper (and lower) bound of each value stream.
2015-08-04lib: Only sync modified message documentsAustin Clements
Previously, we updated the database copy of a message on every call to _notmuch_message_sync, even if nothing had changed. In particular, this always happens on a thaw, so a freeze/thaw pair with no modifications between still caused a database update. We only modify message documents in a handful of places, so keep track of whether the document has been modified and only sync it when necessary. This will be particularly important when we add message revision tracking.
2015-03-29lib: eliminate fprintf from _notmuch_message_file_openDavid Bremner
You may wonder why _notmuch_message_file_open_ctx has two parameters. This is because we need sometime to use a ctx which is a notmuch_message_t. While we could get the database from this, there is no easy way in C to tell type we are getting.
2015-03-29lib: replace almost all fprintfs in library with _n_d_logDavid Bremner
This is not supposed to change any functionality from an end user point of view. Note that it will eliminate some output to stderr. The query debugging output is left as is; it doesn't really fit with the current primitive logging model. The remaining "bad" fprintf will need an internal API change.
2015-03-29lib: add private function to extract the database for a message.David Bremner
This is needed by logging in functions outside message.cc that take only a notmuch_message_t object.
2015-01-02lib: convert two "iterator copy strings" into references.David Bremner
Apparently this is a supported and even idiomatic way of keeping a temporary object (e.g. like that returned from an operator dereference) alive.