Wednesday, May 31, 2006

Beagle memory optimization - Filters

On 26th November 2005, myself and Debajyothi Bera started discussing about memory consumption of Beagle, especially its filters. DBera gave a link to the heap-buddy output of Source filter, which showed around 2.9M of allocations.

I immediately pulled heap-buddy from the trunk and installed it and ran beagle-extract-content on a 'C#' source-code file of size approx. 130K. The report showed around 2.6M of allocations for extracting contents from that 'C#' file and the size of extracted content was just around 68K. Here is the heap-buddy output before fix:

bhargavi@vvaradhan-lap:~/cvs/beagle/beagle-before-fix/beagled> heap-buddy outfile
SUMMARY

Filename: outfile
Allocated Bytes: 2.6M
Allocated Objects: 68736
GCs: 10
Resizes: 8
Final heap size: 1.3M

Distinct Types: 170
Backtraces: 2572

and individual "type-based-allocations" as follows...

bhargavi@vvaradhan-lap:~/cvs/beagle/beagle-before-fix/beagled> heap-buddy outfile types

Type # Total AvSz AvAge BT#
string 44744 2.0M 47.4 0.2 636
string[] 4460 106k 24.4 0.1 59
char[] 3922 98k 25.7 0.3 59
char 7141 69k 10.0 0.0 8
System.Text.StringBuilder 1940 45k 24.0 0.0 71
System.Collections...t/SimpleEnumerator 1339 31k 24.0 0.0 17
byte[] 21 29k 1422.7 5.6 21
System.Collections.Hashtable/Slot[] 106 26k 251.2 1.8 71
object[] 321 23k 76.2 0.7 182
Beagle.Filters.FilterHtml 695 21k 32.0 3.9 1
System.MonoType 650 16k 25.7 8.1 166
System.Reflection.MonoMethod 293 11k 41.0 7.8 20
System.Xml.NameTable/Entry[] 20 10k 528.0 0.1 13
System.Xml.NameTable/Entry 226 5.3k 24.0 0.4 94
System.IO.FileInfo 65 5.1k 80.0 1.1 2
System.Collections.ArrayList 255 5.0k 20.0 0.8 154
System.Xml.XmlName...eManager/NsScope[] 15 4.9k 336.0 0.1 8
System.Reflection.PropertyInfo[] 116 4.6k 40.9 0.0 10
System.Collections.Hashtable 93 4.4k 48.0 2.1 62
System.Reflection.MethodInfo[] 20 3.9k 199.6 0.0 3
System.Attribute[] 197 3.6k 18.6 0.4 66
System.Xml.Serialization.XmlAttributes 55 3.2k 60.0 0.3 25
System.MonoType[] 201 3.1k 16.0 6.0 8
System.Reflection.MonoCMethod 125 2.6k 21.5 5.3 6
System.Xml.Serialization.TypeData 58 2.5k 44.0 8.9 57

(skipped 145 types)

... check the top 4 types.. string, string[], char, char[] and StringBuilder. These are the types that contributed most of allocations.

A little closer look at the Filter.cs revealed a potential-unnecessary-allocation of string being done. The fix here fixed it. The fix saved around 600K of allocations for filtering the mentioned "C#" file.

Heap-buddy report after the fix:

bhargavi@vvaradhan-lap:~/cvs/beagle/beagle/beagled> heap-buddy outfile

SUMMARY

Filename: outfile
Allocated Bytes: 2.0M
Allocated Objects: 47082
GCs: 8
Resizes: 8
Final heap size: 1.3M

Distinct Types: 170
Backtraces: 2581

.. and "individual-type-based-allocations"..
bhargavi@vvaradhan-lap:~/cvs/beagle/beagle/beagled> heap-buddy outfile types

Type # Total AvSz AvAge BT#
string 32361 1.6M 50.8 0.2 639
char 7141 69k 10.0 0.0 8
char[] 218 33k 157.3 4.4 54
System.Collections...t/SimpleEnumerator 1334 31k 24.0 0.0 17
byte[] 21 29k 1422.7 4.4 21
string[] 685 27k 40.9 0.6 55
System.Collections.Hashtable/Slot[] 107 26k 250.3 1.4 72
object[] 319 23k 75.7 0.6 182
Beagle.Filters.FilterHtml 734 22k 31.6 3.2 1
System.MonoType 652 16k 25.6 6.4 166
System.Reflection.MonoMethod 293 11k 41.0 6.1 20
System.Xml.NameTable/Entry[] 20 10k 528.0 0.1 13
System.Xml.NameTable/Entry 226 5.3k 24.0 0.4 94
System.Collections.ArrayList 253 4.9k 20.0 0.6 152
System.Xml.XmlName...eManager/NsScope[] 15 4.9k 336.0 0.1 8
System.Reflection.PropertyInfo[] 116 4.6k 40.9 0.0 10
System.IO.FileInfo 57 4.5k 80.0 0.1 2
System.Collections.Hashtable 94 4.4k 48.0 1.6 63
System.Reflection.MethodInfo[] 20 3.9k 199.6 0.0 3
System.Attribute[] 197 3.6k 18.6 0.3 66
System.Xml.Serialization.XmlAttributes 55 3.2k 60.0 0.3 25
System.MonoType[] 202 3.2k 16.0 4.5 8
System.Text.StringBuilder 127 3.0k 24.0 0.2 85
System.Reflection.MonoCMethod 122 2.6k 21.5 4.3 6
System.Xml.Serialization.TypeData 58 2.5k 44.0 6.9 57

(skipped 145 types)

When you compare the above two reports, the main contributors for memory allocations were reduced after fix.

The current fix will give some good numbers when run on bigger documents. (Will post them later.)

As a side-effect of this fix, the filters run little faster than they used to be. ;-)

No comments: