<feed xmlns='http://www.w3.org/2005/Atom'>
<title>notmuch/lib/index.cc, branch 0.23.5</title>
<subtitle>thread-based email index, search, and tagging</subtitle>
<id>https://git.notmuchmail.org/git/notmuch/atom?h=0.23.5</id>
<link rel='self' href='https://git.notmuchmail.org/git/notmuch/atom?h=0.23.5'/>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/'/>
<updated>2016-06-05T11:32:17Z</updated>
<entry>
<title>Use https instead of http where possible</title>
<updated>2016-06-05T11:32:17Z</updated>
<author>
<name>Daniel Kahn Gillmor</name>
<email>dkg@fifthhorseman.net</email>
</author>
<published>2016-06-02T16:26:14Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=6a833a6e83865f6999707cc30768d07e1351c2cb'/>
<id>urn:sha1:6a833a6e83865f6999707cc30768d07e1351c2cb</id>
<content type='text'>
Many of the external links found in the notmuch source can be resolved
using https instead of http.  This changeset addresses as many as i
could find, without touching the e-mail corpus or expected outputs
found in tests.
</content>
</entry>
<entry>
<title>lib: whitespace cleanup</title>
<updated>2016-06-05T11:23:28Z</updated>
<author>
<name>Tomi Ollila</name>
<email>tomi.ollila@iki.fi</email>
</author>
<published>2016-05-28T17:45:31Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=cf09631a45d276826255d197c1d5c913a29c79f4'/>
<id>urn:sha1:cf09631a45d276826255d197c1d5c913a29c79f4</id>
<content type='text'>
Cleaned the following whitespace in lib/* files:

lib/index.cc:              1 line:  trailing whitespace
lib/database.cc            5 lines: 8 spaces at the beginning of line
lib/notmuch-private.h:     4 lines: 8 spaces at the beginning of line
lib/message.cc:            1 line:  trailing whitespace
lib/sha1.c:                1 line:  empty lines at the end of file
lib/query.cc:              2 lines: 8 spaces at the beginning of line
lib/gen-version-script.sh: 1 line:  trailing whitespace
</content>
</entry>
<entry>
<title>lib: content disposition values are not case-sensitive</title>
<updated>2015-11-19T11:47:29Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2015-09-26T09:35:21Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=506b81679a883d2a96bcd17e7c826a3166bdf82e'/>
<id>urn:sha1:506b81679a883d2a96bcd17e7c826a3166bdf82e</id>
<content type='text'>
Per RFC 2183, the values for Content-Disposition values are not
case-sensitive. While at it, use the gmime function for getting at the
disposition string instead of referencing the field directly.

This fixes "attachment" tagging and filename term generation for
attachments while indexing.
</content>
</entry>
<entry>
<title>lib: replace almost all fprintfs in library with _n_d_log</title>
<updated>2015-03-28T23:34:15Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2014-12-26T16:25:35Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=736ac26407914425a9c94e86616225292cf716dd'/>
<id>urn:sha1:736ac26407914425a9c94e86616225292cf716dd</id>
<content type='text'>
This is not supposed to change any functionality from an end user
point of view. Note that it will eliminate some output to stderr. The
query debugging output is left as is; it doesn't really fit with the
current primitive logging model. The remaining "bad" fprintf will need
an internal API change.
</content>
</entry>
<entry>
<title>Add indexing for the mimetype term</title>
<updated>2015-01-24T15:47:59Z</updated>
<author>
<name>Todd</name>
<email>todd@electricoding.com</email>
</author>
<published>2015-01-22T23:43:38Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=b04bc967f9837e9d451ef88c276c744aa55accaa'/>
<id>urn:sha1:b04bc967f9837e9d451ef88c276c744aa55accaa</id>
<content type='text'>
This adds the indexing support for the "mimetype:" term and removes
the broken test flag.  The indexing is probablistic in Xapian terms,
which gives a better experience to end users.  Standard content-types
of the form "foo/bar" are automatically interpreted as phrases in
Xapian due to the embedded slash.

Assume, separate messages with application/pdf and application/x-pdf
are indexed, then:

- mimetype:application/x-pdf will find only the application/x-pdf
- mimetype:application/pdf will find only the application/pdf
- mimetype:pdf will find both of the messages
</content>
</entry>
<entry>
<title>lib: Index name and address of from/to headers as a phrase</title>
<updated>2014-06-18T20:55:14Z</updated>
<author>
<name>Austin Clements</name>
<email>amdragon@MIT.EDU</email>
</author>
<published>2014-06-16T02:40:32Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=44327ca86d8e3563490801f57a2d1ca455d9588e'/>
<id>urn:sha1:44327ca86d8e3563490801f57a2d1ca455d9588e</id>
<content type='text'>
Previously, we indexed the name and address parts of from/to headers
with two calls to _notmuch_message_gen_terms.  In general, this
indicates that these parts are separate phrases.  However, because of
an implementation quirk, the two calls to _notmuch_message_gen_terms
generated adjacent term positions for the prefixed terms, which
happens to be the right thing to do in this case, but the wrong thing
to do for all other calls.  Furthermore, _notmuch_message_gen_terms
produced potentially overlapping term positions for the un-prefixed
copies of the terms, which is simply wrong.

This change indexes both the name and address in a single call to
_notmuch_message_gen_terms, indicating that they should be part of a
single phrase.  This masks the problem with the un-prefixed terms
(fixing the two known-broken tests) and puts us in a position to fix
the unintentionally phrases generated by other calls to
_notmuch_message_gen_terms.
</content>
</entry>
<entry>
<title>lib: replace the header parser with gmime</title>
<updated>2014-04-05T15:53:04Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2014-03-30T21:21:49Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=473930bb6fb167078a9428ad85f53accf7d4559f'/>
<id>urn:sha1:473930bb6fb167078a9428ad85f53accf7d4559f</id>
<content type='text'>
The notmuch library includes a full blown message header parser. Yet
the same message headers are parsed by gmime during indexing. Switch
to gmime parsing completely.

These are the main changes:

* Gmime stops header parsing at the first invalid header, and presumes
  the message body starts from there. The current parser is quite
  liberal in accepting broken headers. The change means we will be
  much pickier about accepting invalid messages.

* The current parser converts tabs used in header folding to
  spaces. Gmime preserve the tabs. Due to a broken python library used
  in mailman, there are plenty of mailing lists that produce headers
  with tabs in header folding, and we'll see plenty of tabs. (This
  change has been mitigated in preparatory patches.)

* For pure header parsing, the current parser is likely faster than
  gmime, which parses the whole message rather than just the
  headers. Since we parse the message and its headers using gmime for
  indexing anyway, this avoids and extra header parsing round when
  adding new messages. In case of duplicate messages, we'll end up
  parsing the full message although just headers would be
  sufficient. All in all this should still speed up 'notmuch new'.

* Calls to notmuch_message_get_header() may be slightly slower than
  previously for headers that are not indexed in the database, due to
  parsing of the whole message. Within the notmuch code base, notmuch
  reply is the only such user.
</content>
</entry>
<entry>
<title>lib: drop support for single-message mbox files</title>
<updated>2014-04-05T15:52:42Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2014-03-30T21:21:48Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=6812136bf576d894591606d9e10096719054d1f9'/>
<id>urn:sha1:6812136bf576d894591606d9e10096719054d1f9</id>
<content type='text'>
We've supported mbox files containing a single message for historical
reasons, but the support has been deprecated, with a warning message
while indexing, since Notmuch 0.15. Finally drop the support, and
consider all mbox files non-email.
</content>
</entry>
<entry>
<title>lib/cli: pass GMIME_ENABLE_RFC2047_WORKAROUNDS to g_mime_init()</title>
<updated>2013-09-14T17:13:43Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2013-09-11T17:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=71521f06b00a01c5b0eaea5f5f624fe57ed7f426'/>
<id>urn:sha1:71521f06b00a01c5b0eaea5f5f624fe57ed7f426</id>
<content type='text'>
As explained by Jeffrey Stedfast, the author of GMime, quoted in [1]:

&gt; Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init()
&gt; *should* solve the decoding problem mentioned in the thread. This
&gt; flag should be safe to pass into g_mime_init() without any bad side
&gt; effects and my unit tests do test that code-path.

The thread being referred to is [2].

[1] id:87bo56viyo.fsf@nikula.org
[2] id:08cb1dcd-c5db-4e33-8b09-7730cb3d59a2@gmail.com
</content>
</entry>
<entry>
<title>_notmuch_message_index_file: unref (free) address lists from gmime.</title>
<updated>2012-12-24T23:02:22Z</updated>
<author>
<name>David Bremner</name>
<email>bremner@debian.org</email>
</author>
<published>2012-12-11T03:33:40Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=47693539a64b884cbd9bffc9c832162848ad98f2'/>
<id>urn:sha1:47693539a64b884cbd9bffc9c832162848ad98f2</id>
<content type='text'>
Apparently as of GMime 2.4, you don't need to call
internet_address_list_destroy anymore, but you still need to call
g_object_unref (from the GMime Changelog).

On the medium performance corpus, valgrind shows "possibly lost"
leakage in "notmuch new" dropping from 7M to 300k.
</content>
</entry>
</feed>
