notmuch - thread-based email index, search, and tagging

Age	Commit message (Collapse)	Author
2009-11-09	libify: Move library sources down into lib directory.	Carl Worth
	A "make" invocation still works from the top-level, but not from down inside the lib directory yet.
2009-11-09	add_message: Fix crash for file recognized as not email.	Carl Worth
	This crash was introduced sometime recently, as previously things worked fine when notmuch detected that a file is not an email. We're definitely overdue for that test suite.
2009-11-06	add_message: Start storing In-Reply-To information in the database.	Carl Worth
	We'll use this eventually for properly nesting messages in the output of "notmuch show", etc.
2009-10-29	notmuch show: Add a one-line summary of the message before the header.	Carl Worth
	The idea here is that a client could usefully display just this one line while optionally hiding the other header fields.
2009-10-28	Fix add_message and get_filename to strip/re-add the database path.	Carl Worth
	We now store only a relative path inside the database so the database is not nicely relocatable.
2009-10-28	notmuch_database_add_message: Sanity check the file as the first thing	Carl Worth
	This avoids us wasting a bunch of time doing an expensive SHA-1 over a large file only to discover later that it doesn't even look like an email message.
2009-10-28	Tweak formatting of internal error messages.	Carl Worth
	Was neglecting to print the phrase "Internal error: " before, and for the duplicate message-ID error it's nice to actually see the duplicate IDs.
2009-10-28	index: Store "Full Name <user@example.com>" addressses in the database	Carl Worth
	We put these is as a separate term so that they can be extracted. We don't actually need this for searching, since typing an email address in as a search term will already trigger a phrase search that does exactly what's wanted.
2009-10-28	Add full-text indexing using the GMime library for parsing.	Carl Worth
	This is based on the old notmuch-index-message.cc from early in the history of notmuch, but considerably cleaned up now that we have some experience with Xapian and know just what we want to index, (rather than just blindly trying to index exactly what sup does). This does slow down notmuch_database_add_message a lot, but I've got some ideas for getting some time back.
2009-10-27	Fix segfault in case of the database lock not being available.	Carl Worth
	We were nicely reporting the lock-aquisition failure, but then marching along trying to use the database object and just crashing badly. So don't do that.
2009-10-27	Update prefix so that "thread:" can be used in search strings.	Carl Worth
	It's convenient to be able to do things like: notmuch tag -inbox thread:<thread-id> (even though this can run into a race condition as noted in TODO--the fix for the race is simply to not run "notmuch new" between reading a thread with the (not yet existent) "notmuch show" and removing its inbox tag with a command like the above). So we now allow such a thing.
2009-10-27	notmuch_database_add_message: Do not return a message on failure.	Carl Worth
	The recent, disastrous failure of "notmuch new" would have been avoided with this change. The new_command function was basically assuming that it would only get a message object on success so wasn't destroying the message in the other cases.
2009-10-27	notmuch_database_close: Explicitly flush the Xapian database.	Carl Worth
	This would have helped with the recent bug causing "notmuch new" to not record any results in the database. I'm not sure why the explicit flush would be required, (shouldn't the destructor always ensure that things flush?), but perhaps some outstanding references from the leak prevented that. In any case, an explicit flush on close() seems to make sense.
2009-10-26	notmuch restore: Fix to remove all tags before adding tags.	Carl Worth
	This means that the restore operation will now properly pick up the removal of tags indicated by the tag just not being present in the dump file. We added a few new public functions in order to support this: notmuch_message_freeze notmuch_message_remove_all_tags notmuch_message_thaw
2009-10-26	add_message: Add an optional parameter for getting the just-added message.	Carl Worth
	We use this to implement the addition of "inbox" and "unread" tags for all messages added by "notmuch new".
2009-10-26	Remove all calls to g_strdup_printf	Carl Worth
	Replacing them with calls to talloc_asprintf if possible, otherwise to asprintf (with it's painful error-handling leaving the pointer undefined).
2009-10-25	Drop dead function add_term.	Carl Worth
	Even with the recent warnings work, gcc didn't tell me about a static function that I'm not calling? Apparently I get "defined but not used" in C files, but not C++ files. That's bogus, and yet one more reason for me to push the C++ to a minimal lower layer.
2009-10-25	Add -Wswitch-enum and fix warnings.	Carl Worth
	Having to enumerate all the enum values at every switch is annoying, but this warning actually found a bug, (missing support for NOTMUCH_STATUS_OUT_OF_MEMORY in notmuch_status_to_string).
2009-10-25	Add -Wmising-declarations and fix warnings.	Carl Worth
	Wow, lots of missing 'static' on internal functions.
2009-10-25	_notmuch_database_linke_message: Fix error-status propagation.	Carl Worth
	The _notmuch_database_link_message_to_parents function was void in an earlier draft. Now, ensure that we don't miss any error return value from it.
2009-10-25	Change database to store only a single thread ID per message.	Carl Worth
	Instead of supporting multiple thread IDs, we now merge together thread IDs if one message is ever found to belong to more than one thread. This allows for constructing complete threads when, for example, a child message doesn't include a complete list of References headers back to the beginning of the thread. It also simplifies dealing with mapping a message ID to a thread ID which is now a simple get_thread_id just like get_message_id, (and no longer an iterator-based thing like get_tags).
2009-10-25	link_message: Remove dead code.	Carl Worth
	We dropped the THREAD_ID value from the database a while back, but here is code that's carefully computing that value and then never doing anything with it. Delete, delete, delete.
2009-10-25	add_message: Pull the thread-stitching portion out into new _notmuch_database_link_message	Carl Worth
	The function was getting too long-winded before. Add since I'm about to change how we handle the thread linking, it's convenient to have it in an isolated function.
2009-10-25	Add an INTERNAL_ERROR macro and use it for all internal errors.	Carl Worth
	We were previously just doing fprintf;exit at each point, but I wanted to add file and line-number details to all messages, so it makes sense to use a single macro for that.
2009-10-25	add_message: Propagate error status from notmuch_message_create_for_message_id	Carl Worth
	What a great feeling to remove an XXX comment.
2009-10-25	Add comment documenting our current database schema.	Carl Worth
	I've got schemes to change this schema somewhat dramatically, so I want a place to be able to record and review those changes.
2009-10-25	Drop the storage of thread ID(s) in a value.	Carl Worth
	Now that we are iterating over the thread terms instead, we can drop this redundant storage (which should shrink our database a tiny bit).
2009-10-24	Shuffle the value numbers around in the database.	Carl Worth
	First, it's nice that for now we don't have any users yet, so we can make incompatible changes to the database layout like this without causing trouble. ;-) There are a few reasons for this change. First, we now use value 0 uniformly as a timestamp for both mail and timestamp documents, (which lets us cleanup an ugly and fragile bare 0 in the add_value and get_value calls in the timestamp code). Second, I want to drop the thread value entirely, so putting it at the end of the list means we can drop it as compatible change in the future. (I almost want to drop the message-ID value too, but it's nice to be able to sort on it to get diff-able output from "notmuch dump".) But the thread value we never use as a value, (we would never sort on it, for example). And it's totally redundant with the thread terms we store already. So expect it to disappear soon.
2009-10-24	Invent our own prefix values.	Carl Worth
	We're now dropping all pretense of keeping the database directly compatible with sup's current xapian backend. (But perhaps someone might write a new nothmuch backend for sup in the future.) In coming up with the prefix values here, I tried to follow the conventions of http://xapian.org/docs/omega/termprefixes.html as closely as makes sense, (with some domain translation from "web" to "email archive").
2009-10-24	Split BOOLEAN_PREFIX into INTERNAL and EXTERNAL subsets.	Carl Worth
	The idea here is that only some of the prefix names (such as "id" and "tag") actually make sense in external user-supplied query strings. Other things like "type" are internal implementation details of how we store things in the database. So internal machinery will add those terms to the database and we don't need to support them in the string itself. With this, we can now simply loop over the external prefix values to let the quiery parser know about them. So as we add prefixes in the future, we'll only need to add them to this list.
2009-10-24	Change all occurrences of "msgid" to "id".	Carl Worth
	What's good for the user is good for the internals.
2009-10-24	Add the magic to allow searches such as "tag:inbox".	Carl Worth
	The key for this is call add_boolean_prefix on the QueryParser object. That tells the query parser to take something like "tag:inbox" and transform it into the "Linbox" term and do what it needs to do to make this term a requirement of the search. We're starting to have a real system here. Also, I didn't want to expose the ugly name of "msgid" to the user, so we add a prefix name of simply "id" instead.
2009-10-24	Fix timestamp generation to avoid overflowing the term limit	Carl Worth
	The previous code was only correct as long as the timestamp prefix was only a single character. But with the recent change to a multi-character prefix, this broke. So fix it now.
2009-10-24	Trim down prefix list to things we are actually using.	Carl Worth
	I've decided not to try for sup compatibility at the leve of the xapian datbase. There's just too much about sup's usage of the database that I don't like, (beyond the embedded ruby data structures there is redundant storage of message IDs, thread IDs, and dates (in both terms and values)). I'm going to fix that up in the database of notmuch, with some other changes as well. (I plan to drop "reference" terms once linkage to a thread ID through the reference is established. I also plan to add actual documents to represent threads.) So with all that incompatibility, I might as well make my own prefix values. And while doing that, I should try to be as compatible as possible with the conventions described here: http://xapian.org/docs/omega/termprefixes.html
2009-10-24	Move the prefix-string arrays back into database.cc from message.cc	Carl Worth
	Yes, I'm being wishy-washy here, moving code back and forth. But this is where these really do belong.
2009-10-23	Add NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID	Carl Worth
	And document that notmuch_database_add_message can return this value. This pushes the hard decision of what to do with duplicate messages out to the user, but that's OK. (We weren't really doing anything with these ourselves, and this way the user is at least informed of the issue, rather than it just getting papered over internally.)
2009-10-23	Clarify documentation and error string for NOTMUCH_STATUS_TAG_TOO_LONG	Carl Worth
	It's helpful to point out NOTMUCH_STATUS_TAG_MAX for users.
2009-10-23	Add notmuch_database_set_timestamp and notmuch_database_get_timestamp	Carl Worth
	These will be very helpful to implement an efficient "notmuch new" command which imports new mail messages that have appeared.
2009-10-23	database: Add private find_unique_doc_id and find_unique_document functions	Carl Worth
	These are a generalization of the unique-ness testing of notmuch_database_find_message. More preparation for firectory timestamps.
2009-10-23	database: Similarly rename find_message_by_docid to find_document_for_doc_id	Carl Worth
	Again preferring notmuch_database_t* over Xapian::Database*. Also, we're standardizing on "doc_id" rather than "docid" locally, (as an analoge to "message_id"), in spite of the "Xapian::docid" name, (which, fortunately, we can ignore and just us "unsigned int" instead).
2009-10-23	database: Rename internal find_messages_by_term to find_doc_ids	Carl Worth
	This name is a more accurate description of what it does, and the more general naming will make sense as we start storing non-message documents in the database (such as directory timestamps). Also, don't pass around a Xapian::Database where it's more our style to pass a notmuch_database_t*.
2009-10-23	add_message: Fix to not add multiple documents with the same message ID	Carl Worth
	Here's the second big fix to message-ID handling, (the first was to generate message IDs when an email contained none). Now, with no document missing a message ID, and no two documents having the same message ID, we have a nice consistent database where the message ID can be used as a unique key.
2009-10-23	add_message: Re-order the code a bit (find message-id first).	Carl Worth
	We're preparing for being able to deal with files with duplicate message IDs here. The plan is to create a notmuch_message_t object in add_message that may or may not reference a document that exists in the database. So to do this, we have to find the message ID before we do any manipulation of the doc.
2009-10-23	Move thread_id generation code from database.cc to message.cc	Carl Worth
	It's really up to the message to decide how to generate these.
2009-10-23	add_message: Rename message to message_file	Carl Worth
	I still don't like the name message_file at all, but we're about to start using a notmuch_message_t in this function so we need to do something to keep the identifiers separate for now. Eventually, it probably makes sense to push the message-parsing code from database.cc to message.cc.
2009-10-22	Don't forget the "to" header when restrict parsing to certain headers	Carl Worth
	We recently started discarding files as "not email" if they have none of Subject, From, nor To. Apaprently, my mail collection contains a number of messages that I sent, that are saved without Subject and From, (perhaps these were drafts?). Anyway, it's fortunate I had those since they alerted me to this bug, where we were not parsing the "To" header in some cases.
2009-10-22	Fix missing error check.	Carl Worth
	The notmuch_message_file_open function is perfectly capable of returning NULL. So check for it.
2009-10-22	Generate message ID (using SHA1) when a mail message contains none.	Carl Worth
	This is important as we're using the message ID as the unique key in our database. So previously, all messages with no message ID would be treated as the same message---not good at all.
2009-10-21	Merge branch from fixing up bugs after bisecting.	Carl Worth
	I'm glad that when I implemented "notmuch restore" I went through the extra effort to take the code I had written in one sitting into over a dozen commits. Sure enough, I hadn't tested well enough and had totally broken "notmuch setup", (segfaults and bogus thread_id values). With the little commits I had made, git bisect saved the day, and I went back to make the fixes right on top of the commits that introduced the bugs. So now we octopus merge those in.
2009-10-21	Bring back the insert_thread_id function.	Carl Worth
	We deleted this in favor of our fancy new thread_ids iterator from the message object. But one of the previous callers of insert_thread_id isn't using notmuch_message_t yet. I made the mistake of thinking I could just call g_hash_table_insert directly, but the problem was that nobody was splitting up the thread_id string at its commas. So with this, we were inserting bogus comma-separated IDs into the hash table, so thread_id values were ballooning out of control. Should be much better now.