notmuch - thread-based email index, search, and tagging

Age	Commit message (Collapse)	Author
2016-06-05	Use https instead of http where possible	Daniel Kahn Gillmor
	Many of the external links found in the notmuch source can be resolved using https instead of http. This changeset addresses as many as i could find, without touching the e-mail corpus or expected outputs found in tests.
2016-06-05	lib: whitespace cleanup	Tomi Ollila
	Cleaned the following whitespace in lib/* files: lib/index.cc: 1 line: trailing whitespace lib/database.cc 5 lines: 8 spaces at the beginning of line lib/notmuch-private.h: 4 lines: 8 spaces at the beginning of line lib/message.cc: 1 line: trailing whitespace lib/sha1.c: 1 line: empty lines at the end of file lib/query.cc: 2 lines: 8 spaces at the beginning of line lib/gen-version-script.sh: 1 line: trailing whitespace
2015-11-19	lib: content disposition values are not case-sensitive	Jani Nikula
	Per RFC 2183, the values for Content-Disposition values are not case-sensitive. While at it, use the gmime function for getting at the disposition string instead of referencing the field directly. This fixes "attachment" tagging and filename term generation for attachments while indexing.
2015-03-29	lib: replace almost all fprintfs in library with _n_d_log	David Bremner
	This is not supposed to change any functionality from an end user point of view. Note that it will eliminate some output to stderr. The query debugging output is left as is; it doesn't really fit with the current primitive logging model. The remaining "bad" fprintf will need an internal API change.
2015-01-24	Add indexing for the mimetype term	Todd
	This adds the indexing support for the "mimetype:" term and removes the broken test flag. The indexing is probablistic in Xapian terms, which gives a better experience to end users. Standard content-types of the form "foo/bar" are automatically interpreted as phrases in Xapian due to the embedded slash. Assume, separate messages with application/pdf and application/x-pdf are indexed, then: - mimetype:application/x-pdf will find only the application/x-pdf - mimetype:application/pdf will find only the application/pdf - mimetype:pdf will find both of the messages
2014-06-18	lib: Index name and address of from/to headers as a phrase	Austin Clements
	Previously, we indexed the name and address parts of from/to headers with two calls to _notmuch_message_gen_terms. In general, this indicates that these parts are separate phrases. However, because of an implementation quirk, the two calls to _notmuch_message_gen_terms generated adjacent term positions for the prefixed terms, which happens to be the right thing to do in this case, but the wrong thing to do for all other calls. Furthermore, _notmuch_message_gen_terms produced potentially overlapping term positions for the un-prefixed copies of the terms, which is simply wrong. This change indexes both the name and address in a single call to _notmuch_message_gen_terms, indicating that they should be part of a single phrase. This masks the problem with the un-prefixed terms (fixing the two known-broken tests) and puts us in a position to fix the unintentionally phrases generated by other calls to _notmuch_message_gen_terms.
2014-04-05	lib: replace the header parser with gmime	Jani Nikula
	The notmuch library includes a full blown message header parser. Yet the same message headers are parsed by gmime during indexing. Switch to gmime parsing completely. These are the main changes: * Gmime stops header parsing at the first invalid header, and presumes the message body starts from there. The current parser is quite liberal in accepting broken headers. The change means we will be much pickier about accepting invalid messages. * The current parser converts tabs used in header folding to spaces. Gmime preserve the tabs. Due to a broken python library used in mailman, there are plenty of mailing lists that produce headers with tabs in header folding, and we'll see plenty of tabs. (This change has been mitigated in preparatory patches.) * For pure header parsing, the current parser is likely faster than gmime, which parses the whole message rather than just the headers. Since we parse the message and its headers using gmime for indexing anyway, this avoids and extra header parsing round when adding new messages. In case of duplicate messages, we'll end up parsing the full message although just headers would be sufficient. All in all this should still speed up 'notmuch new'. * Calls to notmuch_message_get_header() may be slightly slower than previously for headers that are not indexed in the database, due to parsing of the whole message. Within the notmuch code base, notmuch reply is the only such user.
2014-04-05	lib: drop support for single-message mbox files	Jani Nikula
	We've supported mbox files containing a single message for historical reasons, but the support has been deprecated, with a warning message while indexing, since Notmuch 0.15. Finally drop the support, and consider all mbox files non-email.
2013-09-14	lib/cli: pass GMIME_ENABLE_RFC2047_WORKAROUNDS to g_mime_init()	Jani Nikula
	As explained by Jeffrey Stedfast, the author of GMime, quoted in [1]: > Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init() > should solve the decoding problem mentioned in the thread. This > flag should be safe to pass into g_mime_init() without any bad side > effects and my unit tests do test that code-path. The thread being referred to is [2]. [1] id:87bo56viyo.fsf@nikula.org [2] id:08cb1dcd-c5db-4e33-8b09-7730cb3d59a2@gmail.com
2012-12-24	_notmuch_message_index_file: unref (free) address lists from gmime.	David Bremner
	Apparently as of GMime 2.4, you don't need to call internet_address_list_destroy anymore, but you still need to call g_object_unref (from the GMime Changelog). On the medium performance corpus, valgrind shows "possibly lost" leakage in "notmuch new" dropping from 7M to 300k.
2012-11-26	lib: Reject multi-message mboxes and deprecate single-message mbox	Austin Clements
	Previously, we would treat multi-message mboxes as one giant email, which, besides the obvious incorrect indexing, often led to out-of-memory errors for archival mboxes. Now we explicitly reject multi-message mboxes. For historical reasons, we retain support for single-message mboxes, but official deprecate this behavior.
2012-02-29	Convert non-UTF-8 parts to UTF-8 before indexing them	Michal Sojka
	This fixes a bug that didn't allow to search for non-ASCII words such parts. The code here was copied from show_text_part_content(), because the show command already does the needed conversion when showing the message.
2011-12-29	Ignore encrypted parts when indexing.	Jameson Graef Rollins
	It appears to be an oversight that encrypted parts were indexed previously. The terms generated from encrypted parts are meaningless and do nothing but add bloat to the database. It is not worth indexing the encrypted content, just as it's not worth indexing the signatures in signed parts.
2011-05-27	tag signed/encrypted during notmuch new	Jameson Graef Rollins
	This patch adds the tag "signed" to messages with any multipart/signed parts, and the tag "encrypted" to messages with any multipart/encrypted parts. This only occurs when messages are indexed during notmuch new, so a database rebuild is required to have old messages tagged.
2010-11-23	Fix to index the "Re" term present in any subject.	Carl Worth
	This was a misfeature where notmuch had extra code that just threw away legitimate information. It was never indexing an initial "Re" term in a subject. But some users have legitimately wanted to search for this term. The original code was written this way merely for strict compatiblity with the indexing performed by sup, but we're not taking advantage of that now anyway.
2010-11-01	lib: Add some missing static qualifiers.	Carl Worth
	These various functions and data are all used only locally, so should be marked static. Ensuring we get these right will avoid us accidentally leaking unintended symbols through the library interface.
2010-04-13	Do not segfault on empty mime parts	martin f. krafft
	notmuch previously unconditionally checked mime parts for various properties, but not for NULL, which is the case if libgmime encounters an empty mime part. Upon encounter of an empty mime part, the following is printed to stderr (the second line due to my patch): (process:17197): gmime-CRITICAL **: g_mime_message_get_mime_part: assertion `GMIME_IS_MESSAGE (message)' failed Warning: Not indexing empty mime part. This is probably a bug that should get addressed in libgmime, but for not, my patch is an acceptable workaround. Signed-off-by: martin f. krafft <madduck@madduck.net>
2010-02-04	Eliminate some useless gobject boilerplate.	Carl Worth
	If we had external users of this filter then they might expect some of these macros to exist. But since this is just internal, that's just unneeded noise.
2010-02-04	notmuch new: Don't index uuencoded data.	Carl Worth
	With modern MIME attachments, we're already avoiding indexing the attachments. But for old-school uuencoded data in the mail, we have been directly indexing the encoded data as terms, (which is not useful at all---nobody will ever ytry to search based on the seemingly random uuencoded data). Additionally, indexing a modestly large uuencoded file seems to make Xapian go insane, (consuming lots of memory). We fix both problems by detecting uuencoded content and not performing any indexing of it.
2010-01-06	Index content from citations and signatures.	Carl Worth
	In the presentation we often omit citations and signatures, but this is not content that should be omitted from the index, (especially when the citation detection is wrong---see cases where a line beginning with "From" is corrupted to ">From" by mail processing tools).
2010-01-06	database: Store mail filename as a new 'direntry' term, not as 'data'.	Carl Worth
	Instead of storing the complete message filename in the data portion of a mail document we now store a 'direntry' term that contains the document ID of a directory document and also the basename of the message filename within that directory. This will allow us to easily store multple filenames for a single message, and will also allow us to find mail documents for files that previously existed in a directory but that have since been deleted.
2009-12-01	lib/index: Fix memory leak for email addresses without names.	Carl Worth
	We carefully noted the fact that we had locally allocated the string here, but then we neglected to free it. Switch to talloc instead which makes it easier to get the behavior we want. It's simpler since we can just call talloc_free unconditionally, without having to track the state of whether we allocated the storage for name or not.
2009-11-18	Typsos	Ingmar Vanhassel

2009-11-12	Don't create "contact" terms in the database.	Carl Worth
	We never did export any interface to get at these, and when I went to use these, I found them inadequate, (because I wanted to distinguish address found in from: from those found in To:). Meanwhile, it was easy enough to extract addresses with a search like: notmuch show tag:sent \| grep ^To: so the storage of contact terms was just wasting space. Stop that.
2009-11-09	libify: Move library sources down into lib directory.	Carl Worth
	A "make" invocation still works from the top-level, but not from down inside the lib directory yet.