Preprocesado de imagen y OCR para mejorar deteccion de smishing

  1. Blanco Medina, Pablo 1
  2. Carofilis, Andrés 1
  3. Fidalgo, Eduardo 1
  4. Alegre, Enrique 1
  1. 1 Universidad de León
    info

    Universidad de León

    León, España

    ROR https://ror.org/02tzt0b78

Revista:
Jornadas de Automática
  1. Cruz Martín, Ana María (coord.)
  2. Arévalo Espejo, V. (coord.)
  3. Fernández Lozano, Juan Jesús (coord.)

ISSN: 3045-4093

Año de publicación: 2024

Número: 45

Tipo: Artículo

DOI: 10.17979/JA-CEA.2024.45.10955 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Resumen

The globalization of communication technologies has led to an increase in the number of scams through phishing. Computer Emergency Response Teams receive screenshots of smartphones from citizens containing short messages with suspicious messages. These SMS try to impersonate well-known companies and persuade users to take urgent action through a URL to steal their data or make unauthorized charges to their bank account. These short messages are called Smishing, and CERTs could be interested in tools that can automatically extract the URLs from these screenshots to verify later if it is a phishing URL. In this work, we propose a pipeline for Smishing URL extraction from the screenshots that CERTs may receive. We have combined traditional computer vision techniques, such as preprocessing or morphological operations, with an OCR to recognize the suspicious URLs. We have used our pipeline to 117 screenshots of Smishing messages containing 121 URLs, achieving an accuracy of 61,16 % retrieving complete URLs from Smishing screenshots.

Referencias bibliográficas

  • Choudhary, N., Jain, A. K., 2018. Comparative analysis of mobile phishing detection and prevention approaches. In: Information and Communication Technology for Intelligent Systems (ICTIS 2017)-Volume 1 2. Springer, pp. 349–356. DOI: https://doi.org/10.1007/978-3-319-63673-3_43
  • Goel, D., Jain, A. K., 2018. Smishing-classifier: a novel framework for detection of smishing attack in mobile environment. In: Smart and Innovative Trends in Next Generation Computing Technologies: Third International Conference, NGCT 2017, Dehradun, India, October 30-31, 2017, Revised Selected Papers, Part II 3. Springer, pp. 502–512. DOI: https://doi.org/10.1007/978-981-10-8660-1_38
  • Jain, A. K., Yadav, S. K., Choudhary, N., 2020. A novel approach to detect spam and smishing sms using machine learning techniques. International Journal of E-Services and Mobile Applications (IJESMA) 12 (1), 21–38. DOI: https://doi.org/10.4018/IJESMA.2020010102
  • Jánez-Martino, F., Alaiz-Rodríguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E., 2023. Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach. Applied Soft Computing 139, 110226. DOI: https://doi.org/10.1016/j.asoc.2023.110226
  • Jánez-Martino, F., Alaiz-Rodríguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E., 2023. A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artificial Intelligence Review 56 (2), 1145–1173. DOI: https://doi.org/10.1007/s10462-022-10195-4
  • Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F., 2023. Trocr: Transformer-based optical character recognition with pre-trained models. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. pp. 13094–13102. DOI: https://doi.org/10.1609/aaai.v37i11.26538
  • Mishra, S., Soni, D., 2023. Dsmishsms-a system to detect smishing sms. Neural Computing and Applications 35 (7), 4975–4992. DOI: https://doi.org/10.1007/s00521-021-06305-y
  • Rahman, M. L., Timko, D., Wali, H., Neupane, A., 2023. Users really do respond to smishing. In: Proceedings of the Thirteenth ACM Conference on Data and Application Security and Privacy. pp. 49–60. DOI: https://doi.org/10.1145/3577923.3583640
  • Sanchez-Paniagua, M., Fernández, E. F., Alegre, E., Al-Nabki, W., González-Castro, V., 2022. Phishing url detection: A real-case scenario through login urls. IEEE Access 10, 42949–42960. DOI: https://doi.org/10.1109/ACCESS.2022.3168681
  • Smith, R., 2007. An overview of the tesseract ocr engine. In: ICDAR ’07: Proceedings of the Ninth International Conference on Document Analysis and Recognition. IEEE Computer Society, Washington, DC, USA, pp. 629–633. DOI: https://doi.org/10.1109/ICDAR.2007.4376991
  • Timko, D., Rahman, M. L., 2023. Commercial anti-smishing tools and their comparative effectiveness against modern threats. In: Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks. pp. 1–12. DOI: https://doi.org/10.1145/3558482.3590173
  • Uddin, M. S., Sultana, M., Rahman, T., Busra, U. S., 2012. Extraction of texts from a scene image using morphology based approach. In: 2012 International Conference on Informatics, Electronics & Vision (ICIEV). IEEE, pp. 876–880.
  • Wang, Y., Liu, Y., Wu, T., Duncan, I., 2020. A cost-effective ocr implementation to prevent phishing on mobile platforms. In: 2020 International Conference on Cyber Securit DOI: https://doi.org/10.1109/CyberSecurity49315.2020.9138873