Enhancing text recognition on Tor Darknet images

  1. Pablo Blanco-Medina 1
  2. Eduardo Fidalgo 1
  3. Enrique Alegre 1
  4. Mhd Wesam Al-Nabki 1
  5. Deisy Chaves 1
  1. 1 Universidad de León
    info

    Universidad de León

    León, España

    ROR https://ror.org/02tzt0b78

Libro:
XL Jornadas de Automática: libro de actas. Ferrol, 4-6 de septiembre de 2019
  1. Jose Luis Calvo Rolle (coord.)
  2. Jose Luis Casteleiro Roca (coord.)
  3. María Isabel Fernández Ibáñez (coord.)
  4. Óscar Fontenla Romero (coord.)
  5. Esteban Jove Pérez (coord.)
  6. Alberto José Leira Rejas (coord.)
  7. José Antonio López Vázquez (coord.)
  8. Vanesa Loureiro Vázquez (coord.)
  9. María Carmen Meizoso López (coord.)
  10. Francisco Javier Pérez Castelo (coord.)
  11. Andrés José Piñón Pazos (coord.)
  12. Héctor Quintián Pardo (coord.)
  13. Juan Manuel Rivas Rodríguez (coord.)
  14. Benigno Rodríguez Gómez (coord.)
  15. Rafael Alejandro Vega Vega (coord.)

Editorial: Servizo de Publicacións ; Universidade da Coruña

ISBN: 978-84-9749-716-9

Año de publicación: 2019

Páginas: 828-835

Congreso: Jornadas de Automática (40. 2019. Ferrol)

Tipo: Aportación congreso

Resumen

Text Spotting can be used as an approach to retrieve information found in images that cannot be obtained otherwise, by performing text detection rst and then recognizing the located text. Examples of images to apply this task on can be found in Tor network images, which contain information that may not be found in plain text. When comparing both stages, the latter performs worse due to the low resolution of the cropped areas among other problems. Focusing on the recognition part of the pipeline, we study the performance of ve recognition approaches, based on state-ofthe- art neural network models, standalone OCR, and OCR enhancements. We complement them using string-matching techniques with two lexicons and compare computational time on ve di erent datasets, including Tor network images. Our nal proposal achieved 39,70% precision of text recognition in a custom dataset of images taken from Tor domains