--- /dev/null
+[[meta title="Eliminating glyph fallbacks"]]
+
+[[tag exa performance xorg i965]]
+
+Sometimes things get worse before they get better.
+
+A few days ago, I presented a patch for [storing glyphs as
+pixmaps](http://cworth.org/exa/storing_glyphs_as_pixmaps/) which
+improved performance, but not as dramatically as one would have hoped.
+
+I profiled the result and found that there were still a lot of
+software fallbacks going on. Tracking things down, (hints: enable
+DEBUG\_TRACE\_FALL in xserver/exa/exa_priv.h and I830DEBUG in
+xf86-video-intel/src/i830.h), I found a simple case statement that was
+falling back to software for any compositing operation targeting an A8
+buffer. Fortunately, it looks like this fallback was due to a
+limitation in older graphics card that doesn't exist on the i965. So a
+very simple [patch](Allow-i965-compositing-to-target-A8-buffers.patch)
+eliminates the software fallback.
+
+So lets take a look at before-and-after profiles:
+
+<dl class="chart barchart">
+ <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//system.oprofile">
+aa10text-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//
+timing">144000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fall
+backs//system.symbols">symbols profile</a></dt>
+ <dd style="width:65.9722%;">
+ <ul>
+ <li class="libexa" style="width:27.1577%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libexa.oprofile">libexa</a><span>27%</span></li>
+ <li class="libpixman" style="width:26.8338%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libpixman.oprofile">libpixman</a><span>27%</span></li>
+ <li class="vmlinux" style="width:24.6667%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//vmlinux.oprofile">vmlinux</a><span>25%</span></li>
+ <li class="Xorg" style="width:4.9222%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//Xorg.oprofile">Xorg</a><span>5%</span></li>
+ <li class="libc-2_6" style="width:4.4754%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>4%</span></li>
+ <li class="oprofiled" style="width:3.4928%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
+ <li class="intel_drv" style="width:2.5876%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//intel_drv.oprofile">intel_drv</a><span>3%</span></li>
+ <li class="other" style="width:5.8638%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-fallbacks//other.oprofile">other</a><span>6%</span></li>
+ </ul>
+ </dd>
+ <dt><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.oprofile">aa10text-no-fallbacks/</a> (<a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//timing">95000 chars./sec.</a>) <a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//system.symbols">symbols profile</a></dt>
+ <dd style="width:100%;">
+ <ul>
+ <li class="vmlinux" style="width:42.1575%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//vmlinux.oprofile">vmlinux</a><span>42%</span></li>
+ <li class="intel_drv" style="width:26.7106%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//intel_drv.oprofile">intel_drv</a><span>27%</span></li>
+ <li class="libexa" style="width:7.9861%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libexa.oprofile">libexa</a><span>8%</span></li>
+ <li class="librt-2_6" style="width:7.7359%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//librt-2.6.oprofile">librt-2.6</a><span>8%</span></li>
+ <li class="libc-2_6" style="width:5.3533%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//libc-2.6.oprofile">libc-2.6</a><span>5%</span></li>
+ <li class="Xorg" style="width:3.9202%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//Xorg.oprofile">Xorg</a><span>4%</span></li>
+ <li class="oprofiled" style="width:3.2670%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//oprofiled.oprofile">oprofiled</a><span>3%</span></li>
+ <li class="other" style="width:2.8694%;"><a href="/exa/i965/eliminating_glyph_fallbacks/aa10text-no-fallbacks//other.oprofile">other</a><span>3%</span></li>
+ </ul>
+ </dd>
+</dl>
+
+Yikes! The patch takes us from 144k chars/sec. to only 95k
+chars/sec. I'm regressing performance! But look again, and see that
+the libexa time has been cut dramatically, and the libpixman time has
+been eliminated altogether. That's exactly what we would hope to see
+for eliminating software fallbacks. So I've finally gotten this
+text-rendering benchmark to involve no software fallbacks. Hurrah!
+
+Meanwhile, the intel_drv and vmlinux time have increased
+dramatically. Take a look at how hot those hotspots are in their
+profiles:
+
+intel_drv:
+
+ samples % symbol name
+ 29614 41.2170 i965_prepare_composite
+ 26641 37.0792 I830WaitLpRing
+ 9143 12.7253 i965_composite
+ 1618 2.2519 I830Sync
+
+vmlinux:
+
+ samples % symbol name
+ 28775 25.3748 delay_tsc
+ 21956 19.3616 system_call
+ 7535 6.6446 getnstimeofday
+ 5109 4.5053 schedule
+
+So this is just the same, old [synchronous
+compositing](http://cworth.org/exa/i965/synchronous_composite/) bug I
+identified earlier. Performance has gotten worse since I'm stressing
+out the driver and this bug more.
+
+Dave Airlie has been doing some recent work that should let us fix
+that bug once and for all. Hopefully it won't be too long before I can
+actually post some positive progress here.
+
+PS. I've also gotten one report that my patch for storing glyphs as
+Pixmaps speeds glyph rendering up initially, but after the X server
+has been running for about an hour or so, things get *really*
+slow. Shame on me for not doing any testing more extensive than
+starting the X server and then running a single client for a few
+minutes, (either firefox or x11perf). The report is that most of the
+time is disappearing into ExaOffscreenMarkUsed. Well the good news is
+that Dave's work eliminates that function entirely, (along with lots
+of migration code in EXA), so hopefully there's not any big problem to
+fix there. I'll have to test more thoroughly after synching up with
+Dave.