notmuch/lib/index.cc, branch 0.15.1

_notmuch_message_index_file: unref (free) address lists from gmime.

2012-12-24T23:02:22Z

Apparently as of GMime 2.4, you don't need to call internet_address_list_destroy anymore, but you still need to call g_object_unref (from the GMime Changelog). On the medium performance corpus, valgrind shows "possibly lost" leakage in "notmuch new" dropping from 7M to 300k.

lib: Reject multi-message mboxes and deprecate single-message mbox

2012-11-27T01:12:10Z

Previously, we would treat multi-message mboxes as one giant email, which, besides the obvious incorrect indexing, often led to out-of-memory errors for archival mboxes. Now we explicitly reject multi-message mboxes. For historical reasons, we retain support for single-message mboxes, but official deprecate this behavior.

Convert non-UTF-8 parts to UTF-8 before indexing them

2012-02-29T11:41:39Z

This fixes a bug that didn't allow to search for non-ASCII words such parts. The code here was copied from show_text_part_content(), because the show command already does the needed conversion when showing the message.

Ignore encrypted parts when indexing.

2011-12-29T21:44:43Z

It appears to be an oversight that encrypted parts were indexed previously. The terms generated from encrypted parts are meaningless and do nothing but add bloat to the database. It is not worth indexing the encrypted content, just as it's not worth indexing the signatures in signed parts.

tag signed/encrypted during notmuch new

2011-05-27T23:22:00Z

This patch adds the tag "signed" to messages with any multipart/signed parts, and the tag "encrypted" to messages with any multipart/encrypted parts. This only occurs when messages are indexed during notmuch new, so a database rebuild is required to have old messages tagged.

Fix to index the "Re" term present in any subject.

2010-11-24T02:11:04Z

This was a misfeature where notmuch had extra code that just threw away legitimate information. It was never indexing an initial "Re" term in a subject. But some users have legitimately wanted to search for this term. The original code was written this way merely for strict compatiblity with the indexing performed by sup, but we're not taking advantage of that now anyway.

lib: Add some missing static qualifiers.

2010-11-02T04:58:43Z

These various functions and data are all used only locally, so should be marked static. Ensuring we get these right will avoid us accidentally leaking unintended symbols through the library interface.

Do not segfault on empty mime parts

2010-04-13T15:49:06Z

notmuch previously unconditionally checked mime parts for various properties, but not for NULL, which is the case if libgmime encounters an empty mime part. Upon encounter of an empty mime part, the following is printed to stderr (the second line due to my patch): (process:17197): gmime-CRITICAL **: g_mime_message_get_mime_part: assertion `GMIME_IS_MESSAGE (message)' failed Warning: Not indexing empty mime part. This is probably a bug that should get addressed in libgmime, but for not, my patch is an acceptable workaround. Signed-off-by: martin f. krafft

Eliminate some useless gobject boilerplate.

2010-02-05T01:26:00Z

If we had external users of this filter then they might expect some of these macros to exist. But since this is just internal, that's just unneeded noise.

notmuch new: Don't index uuencoded data.

2010-02-05T01:08:11Z

With modern MIME attachments, we're already avoiding indexing the attachments. But for old-school uuencoded data in the mail, we have been directly indexing the encoded data as terms, (which is not useful at all---nobody will ever ytry to search based on the seemingly random uuencoded data). Additionally, indexing a modestly large uuencoded file seems to make Xapian go insane, (consuming *lots* of memory). We fix both problems by detecting uuencoded content and not performing any indexing of it.