TY - CHAP
T1 - Mapping texts through dimensionality reduction and visualization techniques for interactive exploration of document collections
AU - De Andrade Lopes, Alneu
AU - Minghim, Rosane
AU - Melo, Vinícius
AU - Paulovich, Fernando Vieira
PY - 2006
Y1 - 2006
N2 - The current availability of information many times impair the tasks of searching, browsing and analyzing information pertinent to a topic of interest. This paper presents a methodology to create a meaningful graphical representation of documents corpora targeted at supporting exploration of correlated documents. The purpose of such an approach is to produce a map from a document body on a research topic or field based on the analysis of their contents, and similarities amongst articles. The document map is generated, after text pre-processing, by projecting the data in two dimensions using Latent Semantic Indexing. The projection is followed by hierarchical clustering to support sub-area identification. The map can be interactively explored, helping to narrow down the search for relevant articles. Tests were performed using a collection of documents pre-classified into three research subject classes: Case-Based Reasoning, Information Retrieval, and Inductive Logic Programming. The map produced was capable of separating the main areas and approaching documents by their similarity, revealing possible topics, and identifying boundaries between them. The tool can deal with the exploration of inter-topics and intra-topic relationship and is useful in many contexts that need deciding on relevant articles to read, such as scientific research, education, and training.
AB - The current availability of information many times impair the tasks of searching, browsing and analyzing information pertinent to a topic of interest. This paper presents a methodology to create a meaningful graphical representation of documents corpora targeted at supporting exploration of correlated documents. The purpose of such an approach is to produce a map from a document body on a research topic or field based on the analysis of their contents, and similarities amongst articles. The document map is generated, after text pre-processing, by projecting the data in two dimensions using Latent Semantic Indexing. The projection is followed by hierarchical clustering to support sub-area identification. The map can be interactively explored, helping to narrow down the search for relevant articles. Tests were performed using a collection of documents pre-classified into three research subject classes: Case-Based Reasoning, Information Retrieval, and Inductive Logic Programming. The map produced was capable of separating the main areas and approaching documents by their similarity, revealing possible topics, and identifying boundaries between them. The tool can deal with the exploration of inter-topics and intra-topic relationship and is useful in many contexts that need deciding on relevant articles to read, such as scientific research, education, and training.
KW - Dimension reduction
KW - LSI
KW - Text visualization
UR - https://www.scopus.com/pages/publications/33645677525
U2 - 10.1117/12.650899
DO - 10.1117/12.650899
M3 - Chapter
AN - SCOPUS:33645677525
SN - 0819461008
SN - 9780819461001
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Visualization and Data Analysis 2006 - Proceedings of SPIE-IS and T Electronic Imaging
T2 - Visualization and Data Analysis 2006
Y2 - 16 January 2006 through 17 January 2006
ER -