<feed xmlns='http://www.w3.org/2005/Atom'>
<title>notmuch/test/T680-html-indexing.sh, branch master</title>
<subtitle>thread-based email index, search, and tagging</subtitle>
<id>https://git.notmuchmail.org/git/notmuch/atom?h=master</id>
<link rel='self' href='https://git.notmuchmail.org/git/notmuch/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/'/>
<updated>2017-10-20T22:52:49Z</updated>
<entry>
<title>test: use $(dirname "$0") for sourcing test-lib.sh</title>
<updated>2017-10-20T22:52:49Z</updated>
<author>
<name>Jani Nikula</name>
<email>jani@nikula.org</email>
</author>
<published>2017-09-25T20:38:19Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=a863de1e43ee34f6f5794a2759fdceb287e851aa'/>
<id>urn:sha1:a863de1e43ee34f6f5794a2759fdceb287e851aa</id>
<content type='text'>
Don't assume the tests are always run from within the source tree.
</content>
</entry>
<entry>
<title>lib/index: add simple html filter</title>
<updated>2017-07-01T15:32:27Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2017-06-08T02:11:49Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=6dd00d64863dfc0563877ca7899231b8c3058c49'/>
<id>urn:sha1:6dd00d64863dfc0563877ca7899231b8c3058c49</id>
<content type='text'>
The filter just drops all (HTML) tags. As an enabling change, pass the
content type to the filter constructor so we can decide which scanner
to user.
</content>
</entry>
<entry>
<title>test: add known broken test for indexing html</title>
<updated>2017-04-20T09:59:40Z</updated>
<author>
<name>David Bremner</name>
<email>david@tethera.net</email>
</author>
<published>2017-03-22T11:23:00Z</published>
<link rel='alternate' type='text/html' href='https://git.notmuchmail.org/git/notmuch/commit/?id=77c9ec1fddcbe145facfc3d65eee55b11ad61fb9'/>
<id>urn:sha1:77c9ec1fddcbe145facfc3d65eee55b11ad61fb9</id>
<content type='text'>
'quite' on IRC reported that notmuch new was grinding to a halt during
initial indexing, and we eventually narrowed the problem down to some
html parts with large embedded images. These cause the number of terms
added to the Xapian database to explode (the first 400 messages
generated 4.6M unique terms), and of course the resulting terms are
not much use for searching.

The second test is sanity check for any "improved" indexing of HTML.
</content>
</entry>
</feed>
