Detecting Textual Information in Images from Onion Domains Using Text Spotting

  1. Pablo Blanco 1
  2. Eduardo Fidalgo 1
  3. Enrique Alegre 1
  4. Mhd Wesam Al-Nabki 1
  1. 1 Universidad de León
    info

    Universidad de León

    León, España

    ROR https://ror.org/02tzt0b78

Book:
XXXIX Jornadas de Automática: actas. Badajoz, 5-7 de septiembre de 2018
  1. Inés Tejado Balsera (coord.)
  2. Emiliano Pérez Hernández (coord.)
  3. Antonio José Calderón Godoy (coord.)
  4. Isaías González Pérez (coord.)
  5. Pilar Merchán García (coord.)
  6. Jesús Lozano Rogado (coord.)
  7. Santiago Salamanca Miño (coord.)
  8. Blas M. Vinagre Jara (coord.)

Publisher: Universidad de Extremadura

ISBN: 978-84-9749-756-5 978-84-09-04460-3

Year of publication: 2018

Pages: 975-982

Congress: Jornadas de Automática (39. 2018. Badajoz)

Type: Conference paper

DOI: 10.17979/SPUDC.9788497497565.0975 DIALNET GOOGLE SCHOLAR lock_openRUC editor

Abstract

Due to the efforts of different authorities in the fight against illegal activities in the Tor networks, the traders have developed new ways of circumventing the monitoring tools used to obtain evidence of said activities. In particular, embedding textual content into graphical objects avoids that text analysis, using Natural Language Processing (NLP) algorithms, can be used for watching such onion web contents. In this paper, we present a Text Spotting framework dedicated to detecting and recognizing textual information within images hosted in onion domains. We found that the Connectionist Text Proposal Network and Convolutional Recurrent Neural Network achieve 0.57 F-Measure when running the combined pipeline on a subset of 100 images labeled manually obtained from TOIC dataset. We also identified the parameters that have a critical influence on the Text Spotting results. The proposed technique might support tools to help the authorities in detecting these activities.