Skip to main navigation Skip to search Skip to main content

A study on the role of similarity measures in visual text analytics

  • F. S. San Roman
  • , R. D. De Pinho
  • , R. Minghim
  • , M. C.F. De Oliveira

Research output: Chapter in Book/Report/Conference proceedingsConference proceedingpeer-review

Abstract

Text Analytics is essential for a large number of applications and good approaches to obtain visual mappings of text are paramount. Many visualization techniques, such as similarity based point placement layouts, have proved useful to support visual analysis of documents. However, they are sensitive to data quality, which, in turn, relies on a critical preprocessing step that involves text cleaning and in some cases term detecting and weighting, as well as the definition of a similarity function. Not much has been discussed on the effect of these important similarity calculations in the quality of visual representations. This paper presents a study on the role of different text similarity measurements on the generation of visual text mappings. We focus mainly on two types of distance functions, those based on the well-known text vector representation and on direct string comparison measurements, comparing their effect on visual mappings obtained with point placement techniques. We find that both have their value but, in many circumstances, the vector space model (VSM) is the best solution when discrimination is important. However, the VSM is not incremental, that is, new additions to a collection force a recalculation of the whole feature space and similarities. In this work we also propose a new incremental model based on the VSM, which is shown to present the best visualization results in many configurations tested. We show the evaluation results and offer recommendations on the application of different text similarity measurements for Visual Text Analytics tasks.

Original languageEnglish
Title of host publicationGRAPP 2013 IVAPP 2013 - Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications
Pages429-438
Number of pages10
Publication statusPublished - 2013
Externally publishedYes
EventInternational Conference on Computer Graphics Theory and Applications, GRAPP 2013 and International Conference on Information Visualization Theory and Applications, IVAPP 2013 - Barcelona, Spain
Duration: 21 Feb 201324 Feb 2013

Publication series

NameGRAPP 2013 IVAPP 2013 - Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications

Conference

ConferenceInternational Conference on Computer Graphics Theory and Applications, GRAPP 2013 and International Conference on Information Visualization Theory and Applications, IVAPP 2013
Country/TerritorySpain
CityBarcelona
Period21/02/1324/02/13

Keywords

  • High-dimensional data visualization and multidimensional projections
  • Vector space model
  • Visual text analytics
  • Visual text mining

Fingerprint

Dive into the research topics of 'A study on the role of similarity measures in visual text analytics'. Together they form a unique fingerprint.

Cite this