Tuesday, July 18, 2006

Document Icons

Document Icons is a software designed by Amber Frid-Jimenez that generates graphical shapes to allow the user to search intuitively through content of millions of text documents by using histograms. The program creates a shape which becomes the documents profile showing concept areas within the document.
It basically works like this: It starts with a histogram of words which is the graph of occurrence frequency. These are then grouped around major concepts that you see in red and wrapped around a point. In the example below the words on the border of the circle are the most used words. The occurrence of words are depicted by the gray area. The 3 red words at three corners are the major concepts. The shape of the gray area gives the user a comprehension of what the document is about and how much it emphasizes the selected concepts. This results in quick profiling of the document.
In this other graph you can see how multiple documents can be compared at a glance.