notmuch - thread-based email index, search, and tagging

Age	Commit message (Collapse)	Author
2009-10-24	Add a preliminary "notmuch search" command.	Carl Worth
	This isn't behaving at all like it's documented yet, (for example, it's returning message IDs not thread IDs[]). In fact, the output code is just a copy of the body of "notmuch dump", so all you get for now is message ID and tags. But this should at least be enough to start exercising the query functionality, (which is currently very buggy). [] I'll want to convert the databse to store thread documents before fixing that.
2009-10-24	notmuch_database_create: Document idea to (optionally) return a status	Carl Worth
	The current problem is that when this function fails the caller doesn't get any information about what the particular failure was, (something in the filesystem? or in Xapian?). We should fix that.
2009-10-24	notmuch setup/new: Propagate failure from notmuch_database_set_timestamp	Carl Worth
	With some recent testing, the timestamp was failing, (overflowing the term limit), and reporting an error, but the top-level notmuch command was still returning a success return value. I think it's high time to add a test suite, (and the code base is small enough that if we add it now it shouldn't be too hard to shoot for a very high coverage percentage).
2009-10-24	Fix timestamp generation to avoid overflowing the term limit	Carl Worth
	The previous code was only correct as long as the timestamp prefix was only a single character. But with the recent change to a multi-character prefix, this broke. So fix it now.
2009-10-24	Trim down prefix list to things we are actually using.	Carl Worth
	I've decided not to try for sup compatibility at the leve of the xapian datbase. There's just too much about sup's usage of the database that I don't like, (beyond the embedded ruby data structures there is redundant storage of message IDs, thread IDs, and dates (in both terms and values)). I'm going to fix that up in the database of notmuch, with some other changes as well. (I plan to drop "reference" terms once linkage to a thread ID through the reference is established. I also plan to add actual documents to represent threads.) So with all that incompatibility, I might as well make my own prefix values. And while doing that, I should try to be as compatible as possible with the conventions described here: http://xapian.org/docs/omega/termprefixes.html
2009-10-24	Move the prefix-string arrays back into database.cc from message.cc	Carl Worth
	Yes, I'm being wishy-washy here, moving code back and forth. But this is where these really do belong.
2009-10-24	Revert "Remove some unneeded initializers."	Carl Worth
	This reverts commit fb1bae07002d45138832eacb280419dbd7a19774. These initializers were totally necessary. I clearly wasn't thinking straight when I removed them.
2009-10-23	Cut the enthusiasm a bit.	Carl Worth
	It gets annoying pretty quick.
2009-10-23	Make "notmuch new" ignore directories that are read-only.	Carl Worth
	With this, "notmuch new" is now plenty fast even with large archives spanning many sub-directories. Document this both in "notmuch help" and also in the output of notmuch setup.
2009-10-23	add_files: Pull one stat out of the recrusive function.	Carl Worth
	There's no need to stat each directory both before and after each recursive call.
2009-10-23	More fixing of plurals.	Carl Worth
	It definitely doesn't help that we have the same messages in both "setup" and "new". Should combine those really.
2009-10-23	More care in final status reporting.	Carl Worth
	Printing "Added 1 new messages" just looks like lack of attention to detail, (but yes plurals can be annoying this way).
2009-10-23	Print a better message than "0s" for zero seconds.	Carl Worth
	It's nice to have a tool that at least construct actual sentences.
2009-10-23	Add new "notmuch new" command.	Carl Worth
	Finally, I can get new messages into my notmuch database without having to run a complete "notmuch setup" again. This takes advantage of the recent timestamp capabilities in the database to avoid looking into directories that haven't changed since the last time "notmuch new" was run.
2009-10-23	add_files: Change to return a status value instead of void	Carl Worth
	Also change to use goto rather than early returns. And once again, there were lots of bugs in the error cases previously.
2009-10-23	notmuch setup: Clean up the progress printing a bit.	Carl Worth
	Get rid of a useless leading 0 on the seconds value, and make a distinction between "files" and "messages", (we process many files, but not all of them are recongized as messages). Finally, add a summary line at the end saying how many unique messages were added to the database. Since this comes right after the total number of files, it gives the user at least a hint as to how many messages were encountered with duplicate message IDs.
2009-10-23	Re-order documentation a bit.	Carl Worth
	The notmuch_database_get_default_path function is unique in not accepting a notmuch_database_t* (nor creating one). So list it outside the other notmuch_database functions.
2009-10-23	notmuch_message_get_filename: Improve documentation.	Carl Worth
	Fix a typo, and add clarifications about the lifetime and readonly nature of the return value.
2009-10-23	Remove some unneeded initializers.	Carl Worth
	Some people might argue for more initializers to be "safer", but I actually prefer to leave things this way. It saves typing, but the real benefit is that the things that do require initialization stand out so we know to watch them carefully. And with valgrind, we actually get to catch errors earlier if we don't initialize them. So that can be "safer" ironically enough.
2009-10-23	notmuch setup: Fix a couple of error paths.	Carl Worth
	We had early returns instead of goto statments, and sure enough, they were leaking. Much cleaner this way.
2009-10-23	_find_prefix: Exit when given an invalid prefix name.	Carl Worth
	This will be a nice safety check for internal sanity.
2009-10-23	Add NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID	Carl Worth
	And document that notmuch_database_add_message can return this value. This pushes the hard decision of what to do with duplicate messages out to the user, but that's OK. (We weren't really doing anything with these ourselves, and this way the user is at least informed of the issue, rather than it just getting papered over internally.)
2009-10-23	Clean up comments to not include spaces before tabs.	Carl Worth
	This were just unclean, (an invisble sort of uncleanliness, but still there are liable to make for ugly diffs). Oh, wait, like this one! But at least it's not sprinkled among code changes.
2009-10-23	Clarify documentation and error string for NOTMUCH_STATUS_TAG_TOO_LONG	Carl Worth
	It's helpful to point out NOTMUCH_STATUS_TAG_MAX for users.
2009-10-23	Add notmuch_database_set_timestamp and notmuch_database_get_timestamp	Carl Worth
	These will be very helpful to implement an efficient "notmuch new" command which imports new mail messages that have appeared.
2009-10-23	database: Add private find_unique_doc_id and find_unique_document functions	Carl Worth
	These are a generalization of the unique-ness testing of notmuch_database_find_message. More preparation for firectory timestamps.
2009-10-23	database: Similarly rename find_message_by_docid to find_document_for_doc_id	Carl Worth
	Again preferring notmuch_database_t* over Xapian::Database*. Also, we're standardizing on "doc_id" rather than "docid" locally, (as an analoge to "message_id"), in spite of the "Xapian::docid" name, (which, fortunately, we can ignore and just us "unsigned int" instead).
2009-10-23	database: Rename internal find_messages_by_term to find_doc_ids	Carl Worth
	This name is a more accurate description of what it does, and the more general naming will make sense as we start storing non-message documents in the database (such as directory timestamps). Also, don't pass around a Xapian::Database where it's more our style to pass a notmuch_database_t*.
2009-10-23	sha1: Add new notmuch_sha1_of_string function	Carl Worth
	We'll be using this for storing really long terms in the database and when we just need to look them up, (and never read back the original data directly from the database). For example, storing arbitrarily long directory paths in the database along with mtime timestamps. Note that if we did want to store arbitrarily long terms and also be able to read them back, the Xapian folks recommending splitting the term off with multiple prefixes. See the note near the end of this page: http://trac.xapian.org/wiki/FAQ/UniqueIds
2009-10-23	notmuch restore: Print names of tags that cannot be applied	Carl Worth
	This helps the user gauge the severity of the error. For example, when restoring my sup tags I see a bunch of tags missing for message IDs of the form "sup-faked-...". That's not surprising since I know that sup generates these with the md5sum of the message header while notmuch uses the sha-1 of the entire message. But how much will this hurt? Well, now that I can see that most of the missing tags are just "attachment", then I'm not concerned, (I'll be automatically creating that tag in the future based on the message contents). But if a missing tag is "inbox" then that's more concerning because that's data that I can't easily regenerate outside of sup.
2009-10-23	notmuch_tags_has_more: Fix to use string.empty rather than string.size	Carl Worth
	I'm really interested in the length of the data here, not the size of the storage.
2009-10-23	Fix notmuch_message_get_message_id to never return NULL.	Carl Worth
	With the recent improvements to the handling of message IDs we "know" that a NULL message ID is impossible, (so we simply abort if the impossible happens).
2009-10-23	add_message: Fix to not add multiple documents with the same message ID	Carl Worth
	Here's the second big fix to message-ID handling, (the first was to generate message IDs when an email contained none). Now, with no document missing a message ID, and no two documents having the same message ID, we have a nice consistent database where the message ID can be used as a unique key.
2009-10-23	Add _notmuch_message_create_for_message_id	Carl Worth
	This is the last piece needed for add_message to be able to properly support a message with a duplicate message ID. This function creates a new notmuch_message_t object but one that may reference an existing document in the database.
2009-10-23	Fix _notmuch_message_create to catch Xapian DocNotFoundError.	Carl Worth
	This function is only supposed to be called with a doc_id that was queried from the database already. So there's an internal error if no document with that doc_id can be found in the database. In that case, return NULL.
2009-10-23	Add internal functions for manipulating a new notmuch_message_t	Carl Worth
	This will support the add_message function in incrementally creating state in a new notmuch_message_t. The new functions are _notmuch_message_set_filename _notmuch_message_add_thread_id _notmuch_message_ensure_thread_id _notmuch_message_set_date _notmuch_message_sync
2009-10-23	Add notmuch_message_get_filename	Carl Worth
	This is a new public function to find the filename of the original email message for a message-object that was found in the database. We may change this function in the future to support returning a list of filenames, (for messages with duplicate message IDs).
2009-10-23	add_message: Re-order the code a bit (find message-id first).	Carl Worth
	We're preparing for being able to deal with files with duplicate message IDs here. The plan is to create a notmuch_message_t object in add_message that may or may not reference a document that exists in the database. So to do this, we have to find the message ID before we do any manipulation of the doc.
2009-10-23	Move thread_id generation code from database.cc to message.cc	Carl Worth
	It's really up to the message to decide how to generate these.
2009-10-23	Move the _notmuch_message_sync from private to public interfaces	Carl Worth
	The idea here is to allow internal users to see a non-synced message object, (for example, while parsing a message file and incrementally adding terms, etc.). We're willing to take the care to get the improved performance. But for the public interface, keeping everything synced will be much less confusing, (reference lots of sup bugs that happen due to message state being altered by the user but not synced to the database).
2009-10-23	add_message: Rename message to message_file	Carl Worth
	I still don't like the name message_file at all, but we're about to start using a notmuch_message_t in this function so we need to do something to keep the identifiers separate for now. Eventually, it probably makes sense to push the message-parsing code from database.cc to message.cc.
2009-10-22	Prevent that last bug from reoccurring.	Carl Worth
	It's even enough to check if a "missing" header was accidentally left off the list in the call to restrict_headers. (And it's cheap since we only check in case no such header was found in the message.)
2009-10-22	Don't forget the "to" header when restrict parsing to certain headers	Carl Worth
	We recently started discarding files as "not email" if they have none of Subject, From, nor To. Apaprently, my mail collection contains a number of messages that I sent, that are saved without Subject and From, (perhaps these were drafts?). Anyway, it's fortunate I had those since they alerted me to this bug, where we were not parsing the "To" header in some cases.
2009-10-22	Fix missing error check.	Carl Worth
	The notmuch_message_file_open function is perfectly capable of returning NULL. So check for it.
2009-10-22	Generate message ID (using SHA1) when a mail message contains none.	Carl Worth
	This is important as we're using the message ID as the unique key in our database. So previously, all messages with no message ID would be treated as the same message---not good at all.
2009-10-21	Rename sha1.c to libsha1.c	Carl Worth
	This way both the .c and .h files have the same name, and all of the code imported from the "libsha1" implementation is in filenames matching libsha1.*. This also gives me room to make my own notmuch_sha1 wrapper functions in sha1.c.
2009-10-21	Merge branch from fixing up bugs after bisecting.	Carl Worth
	I'm glad that when I implemented "notmuch restore" I went through the extra effort to take the code I had written in one sitting into over a dozen commits. Sure enough, I hadn't tested well enough and had totally broken "notmuch setup", (segfaults and bogus thread_id values). With the little commits I had made, git bisect saved the day, and I went back to make the fixes right on top of the commits that introduced the bugs. So now we octopus merge those in.
2009-10-21	Bring back the insert_thread_id function.	Carl Worth
	We deleted this in favor of our fancy new thread_ids iterator from the message object. But one of the previous callers of insert_thread_id isn't using notmuch_message_t yet. I made the mistake of thinking I could just call g_hash_table_insert directly, but the problem was that nobody was splitting up the thread_id string at its commas. So with this, we were inserting bogus comma-separated IDs into the hash table, so thread_id values were ballooning out of control. Should be much better now.
2009-10-21	Fix lifetime-maintenance bug with std::string and c_str()	Carl Worth
	Here's more evidence that C++ is a nightmare to program---or that I'm smart enough to realize that C++ is more clever than I will ever be. Most of my issues with C++ have to do with it hiding things from me that I'd really like to and expect to be aware of as a C programmer. For example, the specific problem here is that there's a short-lived std::string, from which I just want to copy the C string. I try to do that on the next line, but before I can, C++ has already called the destructor on the std::string. Now, C++ isn't alone in doing garbage collecting like this. But in a real garbage-collecting system, everything would work that way. For example, here, I'm still holding a pointer to the C string contents, so if the garbage collector were aware of that reference, then it might clean up the std::string container and leave the data I'm still using. But that's not what we get with C++. Instead, some things are reference counted and collected, (like the std::string), and some things just aren't (like the C string it contains). The end result is that it's very fragile. It forces me to be aware of the timing of hidden functions. In a "real" system I wouldn't have to be aware of that timing, and in C the function just wouldn't be hidden.
2009-10-21	List a few more co-conspirators.	Carl Worth
	Keith's name already shows up in the git log, so it would be wrong to not mention him. And Martin and Jamey have been helpful in discussions about what an ideal mail system would look like.