<feed xmlns='http://www.w3.org/2005/Atom'>
<title>notmuch/lib/index.cc, branch 0.19</title>
<subtitle>thread-based email index, search, and tagging</subtitle>
<id>https://git.notmuchmail.org/git/notmuch/atom?h=0.19</id>
<link rel='self' href='https://git.notmuchmail.org/git/notmuch/atom?h=0.19'/>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/'/>
<updated>2014-06-18T20:55:14Z</updated>
<entry>
<title>lib: Index name and address of from/to headers as a phrase</title>
<updated>2014-06-18T20:55:14Z</updated>
<author>
<name>Austin Clements</name>
<email>amdragon@MIT.EDU</email>
</author>
<published>2014-06-16T02:40:32Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=44327ca86d8e3563490801f57a2d1ca455d9588e'/>
<id>urn:sha1:44327ca86d8e3563490801f57a2d1ca455d9588e</id>
<content type='text'>
Previously, we indexed the name and address parts of from/to headers
with two calls to _notmuch_message_gen_terms.  In general, this
indicates that these parts are separate phrases.  However, because of
an implementation quirk, the two calls to _notmuch_message_gen_terms
generated adjacent term positions for the prefixed terms, which
happens to be the right thing to do in this case, but the wrong thing
to do for all other calls.  Furthermore, _notmuch_message_gen_terms
produced potentially overlapping term positions for the un-prefixed
copies of the terms, which is simply wrong.

This change indexes both the name and address in a single call to
_notmuch_message_gen_terms, indicating that they should be part of a
single phrase.  This masks the problem with the un-prefixed terms
(fixing the two known-broken tests) and puts us in a position to fix
the unintentionally phrases generated by other calls to
_notmuch_message_gen_terms.
</content>
</entry>
<entry>
<title>lib: replace the header parser with gmime</title>
<updated>2014-04-05T15:53:04Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2014-03-30T21:21:49Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=473930bb6fb167078a9428ad85f53accf7d4559f'/>
<id>urn:sha1:473930bb6fb167078a9428ad85f53accf7d4559f</id>
<content type='text'>
The notmuch library includes a full blown message header parser. Yet
the same message headers are parsed by gmime during indexing. Switch
to gmime parsing completely.

These are the main changes:

* Gmime stops header parsing at the first invalid header, and presumes
  the message body starts from there. The current parser is quite
  liberal in accepting broken headers. The change means we will be
  much pickier about accepting invalid messages.

* The current parser converts tabs used in header folding to
  spaces. Gmime preserve the tabs. Due to a broken python library used
  in mailman, there are plenty of mailing lists that produce headers
  with tabs in header folding, and we'll see plenty of tabs. (This
  change has been mitigated in preparatory patches.)

* For pure header parsing, the current parser is likely faster than
  gmime, which parses the whole message rather than just the
  headers. Since we parse the message and its headers using gmime for
  indexing anyway, this avoids and extra header parsing round when
  adding new messages. In case of duplicate messages, we'll end up
  parsing the full message although just headers would be
  sufficient. All in all this should still speed up 'notmuch new'.

* Calls to notmuch_message_get_header() may be slightly slower than
  previously for headers that are not indexed in the database, due to
  parsing of the whole message. Within the notmuch code base, notmuch
  reply is the only such user.
</content>
</entry>
<entry>
<title>lib: drop support for single-message mbox files</title>
<updated>2014-04-05T15:52:42Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2014-03-30T21:21:48Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=6812136bf576d894591606d9e10096719054d1f9'/>
<id>urn:sha1:6812136bf576d894591606d9e10096719054d1f9</id>
<content type='text'>
We've supported mbox files containing a single message for historical
reasons, but the support has been deprecated, with a warning message
while indexing, since Notmuch 0.15. Finally drop the support, and
consider all mbox files non-email.
</content>
</entry>
<entry>
<title>lib/cli: pass GMIME_ENABLE_RFC2047_WORKAROUNDS to g_mime_init()</title>
<updated>2013-09-14T17:13:43Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2013-09-11T17:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=71521f06b00a01c5b0eaea5f5f624fe57ed7f426'/>
<id>urn:sha1:71521f06b00a01c5b0eaea5f5f624fe57ed7f426</id>
<content type='text'>
As explained by Jeffrey Stedfast, the author of GMime, quoted in [1]:

&gt; Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init()
&gt; *should* solve the decoding problem mentioned in the thread. This
&gt; flag should be safe to pass into g_mime_init() without any bad side
&gt; effects and my unit tests do test that code-path.

The thread being referred to is [2].

[1] id:87bo56viyo.fsf@nikula.org
[2] id:08cb1dcd-c5db-4e33-8b09-7730cb3d59a2@gmail.com
</content>
</entry>
<entry>
<title>_notmuch_message_index_file: unref (free) address lists from gmime.</title>
<updated>2012-12-24T23:02:22Z</updated>
<author>
<name>David Bremner</name>
<email>bremner@debian.org</email>
</author>
<published>2012-12-11T03:33:40Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=47693539a64b884cbd9bffc9c832162848ad98f2'/>
<id>urn:sha1:47693539a64b884cbd9bffc9c832162848ad98f2</id>
<content type='text'>
Apparently as of GMime 2.4, you don't need to call
internet_address_list_destroy anymore, but you still need to call
g_object_unref (from the GMime Changelog).

On the medium performance corpus, valgrind shows "possibly lost"
leakage in "notmuch new" dropping from 7M to 300k.
</content>
</entry>
<entry>
<title>lib: Reject multi-message mboxes and deprecate single-message mbox</title>
<updated>2012-11-27T01:12:10Z</updated>
<author>
<name>Austin Clements</name>
<email>amdragon@MIT.EDU</email>
</author>
<published>2012-11-25T06:16:01Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=610f0e09929a5f351f7c1c3850ac7e0d83ffe388'/>
<id>urn:sha1:610f0e09929a5f351f7c1c3850ac7e0d83ffe388</id>
<content type='text'>
Previously, we would treat multi-message mboxes as one giant email,
which, besides the obvious incorrect indexing, often led to
out-of-memory errors for archival mboxes.  Now we explicitly reject
multi-message mboxes.  For historical reasons, we retain support for
single-message mboxes, but official deprecate this behavior.
</content>
</entry>
<entry>
<title>Convert non-UTF-8 parts to UTF-8 before indexing them</title>
<updated>2012-02-29T11:41:39Z</updated>
<author>
<name>Michal Sojka</name>
<email>sojkam1@fel.cvut.cz</email>
</author>
<published>2012-02-24T07:36:22Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=40edc971a82e236704216058591d4c7684f8058f'/>
<id>urn:sha1:40edc971a82e236704216058591d4c7684f8058f</id>
<content type='text'>
This fixes a bug that didn't allow to search for non-ASCII words such
parts. The code here was copied from show_text_part_content(), because
the show command already does the needed conversion when showing the
message.
</content>
</entry>
<entry>
<title>Ignore encrypted parts when indexing.</title>
<updated>2011-12-29T21:44:43Z</updated>
<author>
<name>Jameson Graef Rollins</name>
<email>jrollins@finestructure.net</email>
</author>
<published>2011-12-28T20:14:29Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=ac7f84306474dbecea8f6fee2ef2e8d71cc950f7'/>
<id>urn:sha1:ac7f84306474dbecea8f6fee2ef2e8d71cc950f7</id>
<content type='text'>
It appears to be an oversight that encrypted parts were indexed
previously.  The terms generated from encrypted parts are meaningless
and do nothing but add bloat to the database.  It is not worth
indexing the encrypted content, just as it's not worth indexing the
signatures in signed parts.
</content>
</entry>
<entry>
<title>tag signed/encrypted during notmuch new</title>
<updated>2011-05-27T23:22:00Z</updated>
<author>
<name>Jameson Graef Rollins</name>
<email>jrollins@finestructure.net</email>
</author>
<published>2011-05-26T01:01:20Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=1d6b49561f50d6cde1b473f9887e37748e49c02c'/>
<id>urn:sha1:1d6b49561f50d6cde1b473f9887e37748e49c02c</id>
<content type='text'>
This patch adds the tag "signed" to messages with any multipart/signed
parts, and the tag "encrypted" to messages with any
multipart/encrypted parts.  This only occurs when messages are indexed
during notmuch new, so a database rebuild is required to have old
messages tagged.
</content>
</entry>
<entry>
<title>Fix to index the "Re" term present in any subject.</title>
<updated>2010-11-24T02:11:04Z</updated>
<author>
<name>Carl Worth</name>
<email>cworth@cworth.org</email>
</author>
<published>2010-11-24T02:11:04Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=c7b4d15d0ad78b6f28b50310358ae255e6a08008'/>
<id>urn:sha1:c7b4d15d0ad78b6f28b50310358ae255e6a08008</id>
<content type='text'>
This was a misfeature where notmuch had extra code that just threw
away legitimate information. It was never indexing an initial "Re"
term in a subject. But some users have legitimately wanted to search
for this term.

The original code was written this way merely for strict compatiblity
with the indexing performed by sup, but we're not taking advantage of
that now anyway.
</content>
</entry>
</feed>
