aboutsummaryrefslogtreecommitdiff
path: root/test/corpora/html/attribute-text
diff options
context:
space:
mode:
authorDavid Bremner <david@tethera.net>2017-03-22 08:23:00 -0300
committerDavid Bremner <david@tethera.net>2017-04-20 06:59:40 -0300
commit77c9ec1fddcbe145facfc3d65eee55b11ad61fb9 (patch)
treebd8adc589322454463db36b966a84501858fa4d2 /test/corpora/html/attribute-text
parente56511817284afc14352f47a13fcf85b2fabd628 (diff)
test: add known broken test for indexing html
'quite' on IRC reported that notmuch new was grinding to a halt during initial indexing, and we eventually narrowed the problem down to some html parts with large embedded images. These cause the number of terms added to the Xapian database to explode (the first 400 messages generated 4.6M unique terms), and of course the resulting terms are not much use for searching. The second test is sanity check for any "improved" indexing of HTML.
Diffstat (limited to 'test/corpora/html/attribute-text')
-rw-r--r--test/corpora/html/attribute-text15
1 files changed, 15 insertions, 0 deletions
diff --git a/test/corpora/html/attribute-text b/test/corpora/html/attribute-text
new file mode 100644
index 00000000..6dae8194
--- /dev/null
+++ b/test/corpora/html/attribute-text
@@ -0,0 +1,15 @@
+From: David Bremner <david@example.net>
+To: David Bremner <david@example.net>
+Subject: test html attachment
+Date: Tue, 17 Nov 2009 21:28:38 +0600
+Message-ID: <87d1dajhgf.fsf@example.net>
+MIME-Version: 1.0
+Content-Type: text/html
+Content-Disposition: inline; filename=test.html
+
+<html>
+ <body>
+ <input value="a>swordfish">
+ </body>
+ hunter2
+</html>