]> git.notmuchmail.org Git - cworth.org/blobdiff - src/exa/i965/eliminating_glyph_fallbacks.mdwn
Add entry on eliminating glyph fallbacks
[cworth.org] / src / exa / i965 / eliminating_glyph_fallbacks.mdwn
diff --git a/src/exa/i965/eliminating_glyph_fallbacks.mdwn b/src/exa/i965/eliminating_glyph_fallbacks.mdwn
new file mode 100644 (file)
index 0000000..25dc8b6
--- /dev/null
@@ -0,0 +1,101 @@
+[[meta title="Eliminating glyph fallbacks"]]
+
+[[tag exa performance xorg i965]]
+
+Sometimes things get worse before they get better.
+
+A few days ago, I presented a patch for [storing glyphs as
+pixmaps](http://cworth.org/exa/storing_glyphs_as_pixmaps/) which
+improved performance, but not as dramatically as one would have hoped.
+
+I profiled the result and found that there were still a lot of
+software fallbacks going on. Tracking things down, (hints: enable
+DEBUG\_TRACE\_FALL in xserver/exa/exa_priv.h and I830DEBUG in
+xf86-video-intel/src/i830.h), I found a simple case statement that was
+falling back to software for any compositing operation targeting an A8
+buffer. Fortunately, it looks like this fallback was due to a
+limitation in older graphics card that doesn't exist on the i965. So a
+very simple [patch](Allow-i965-compositing-to-target-A8-buffers.patch)
+eliminates the software fallback.
+
+So lets take a look at before-and-after profiles:
+
+<dl class="chart barchart">
+    <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//system.oprofile">
+aa10text-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//
+timing">144000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fall
+backs//system.symbols">symbols profile</a></dt>
+    <dd style="width:65.9722%;">
+        <ul>
+            <li class="libexa" style="width:27.1577%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libexa.oprofile">libexa</a><span>27%</span></li>
+            <li class="libpixman" style="width:26.8338%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libpixman.oprofile">libpixman</a><span>27%</span></li>
+            <li class="vmlinux" style="width:24.6667%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//vmlinux.oprofile">vmlinux</a><span>25%</span></li>
+            <li class="Xorg" style="width:4.9222%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//Xorg.oprofile">Xorg</a><span>5%</span></li>
+            <li class="libc-2_6" style="width:4.4754%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>4%</span></li>
+            <li class="oprofiled" style="width:3.4928%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
+            <li class="intel_drv" style="width:2.5876%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//intel_drv.oprofile">intel_drv</a><span>3%</span></li>
+            <li class="other" style="width:5.8638%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//other.oprofile">other</a><span>6%</span></li>
+        </ul>
+    </dd>
+    <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.oprofile">aa10text-no-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//timing">95000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.symbols">symbols profile</a></dt>
+    <dd style="width:100%;">
+        <ul>
+            <li class="vmlinux" style="width:42.1575%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//vmlinux.oprofile">vmlinux</a><span>42%</span></li>
+            <li class="intel_drv" style="width:26.7106%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//intel_drv.oprofile">intel_drv</a><span>27%</span></li>
+            <li class="libexa" style="width:7.9861%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libexa.oprofile">libexa</a><span>8%</span></li>
+            <li class="librt-2_6" style="width:7.7359%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//librt-2.6.oprofile">librt-2.6</a><span>8%</span></li>
+            <li class="libc-2_6" style="width:5.3533%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>5%</span></li>
+            <li class="Xorg" style="width:3.9202%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//Xorg.oprofile">Xorg</a><span>4%</span></li>
+            <li class="oprofiled" style="width:3.2670%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
+            <li class="other" style="width:2.8694%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//other.oprofile">other</a><span>3%</span></li>
+        </ul>
+    </dd>
+</dl>
+
+Yikes! The patch takes us from 144k chars/sec. to only 95k
+chars/sec. I'm regressing performance! But look again, and see that
+the libexa time has been cut dramatically, and the libpixman time has
+been eliminated altogether. That's exactly what we would hope to see
+for eliminating software fallbacks. So I've finally gotten this
+text-rendering benchmark to involve no software fallbacks. Hurrah!
+
+Meanwhile, the intel_drv and vmlinux time have increased
+dramatically. Take a look at how hot those hotspots are in their
+profiles:
+
+intel_drv:
+
+       samples  %        symbol name
+       29614    41.2170  i965_prepare_composite
+       26641    37.0792  I830WaitLpRing
+       9143     12.7253  i965_composite
+       1618      2.2519  I830Sync
+
+vmlinux:
+
+       samples  %        symbol name
+       28775    25.3748  delay_tsc
+       21956    19.3616  system_call
+       7535      6.6446  getnstimeofday
+       5109      4.5053  schedule
+
+So this is just the same, old [synchronous
+compositing](http://cworth.org/exa/i965/synchronous_composite/) bug I
+identified earlier. Performance has gotten worse since I'm stressing
+out the driver and this bug more.
+
+Dave Airlie has been doing some recent work that should let us fix
+that bug once and for all. Hopefully it won't be too long before I can
+actually post some positive progress here.
+
+PS. I've also gotten one report that my patch for storing glyphs as
+Pixmaps speeds glyph rendering up initially, but after the X server
+has been running for about an hour or so, things get *really*
+slow. Shame on me for not doing any testing more extensive than
+starting the X server and then running a single client for a few
+minutes, (either firefox or x11perf). The report is that most of the
+time is disappearing into ExaOffscreenMarkUsed. Well the good news is
+that Dave's work eliminates that function entirely, (along with lots
+of migration code in EXA), so hopefully there's not any big problem to
+fix there. I'll have to test more thoroughly after synching up with
+Dave.