Mejoras en extracción de URLs en smishing mediante text spotting

  1. Blanco Medina, Pablo 1
  2. Biswas, Rubel 1
  3. González Castro, Victor 1
  4. Alaiz Rodríguez, Rocío 1
  5. Fidalgo, Eduardo 1
  6. Alegre, Enrique 1
  1. 1 Universidad de León
    info

    Universidad de León

    León, España

    ROR https://ror.org/02tzt0b78

Revue:
Jornadas de Automática
  1. Cruz Martín, Ana María (coord.)
  2. Arévalo Espejo, V. (coord.)
  3. Fernández Lozano, Juan Jesús (coord.)

ISSN: 3045-4093

Année de publication: 2024

Número: 45

Type: Article

DOI: 10.17979/JA-CEA.2024.45.10954 DIALNET GOOGLE SCHOLAR lock_openAccès ouvert editor

Résumé

Los Equipos de Respuesta ante Emergencias Informáticas (CERT) reciben comúnmente capturas de pantalla de Smishing, que tratan de suplantar a distintos tipos de organizaciones, con el objetivo de apropiarse de información personal de usuarios o malversar fondos de sus cuentas mediante enlaces maliciosos. Los CERTs buscan soluciones automatizadas que permitan recuperar URLs de capturas de pantalla. Para extraer texto pueden utilizarse métodos basados en el reconocimiento óptico de caracteres (OCR), pero su rendimiento es bajo debido a problemas como la baja calidad de la imagen o textos divididos en múltiples frases. Proponemos un proceso para la extracción de URL de Smishing basado en técnicas de Text Spotting, complementado con una reconstrucción de URL personalizada utilizando características resaltadas en la imagen. Aplicamos la metodología propuesta a un conjunto personalizado de 244 capturas y 262 URLs, obteniendo como resultado un aumento de la precisión de reconocimiento de 3,05% a 22,90%, tras lo cual puede continuarse procesando el texto extraído en Smishing.

Références bibliographiques

  • Al-Qahtani, A. F., Cresci, S., 2022. The covid-19 scamdemic: A survey of phishing attacks and their countermeasures during covid-19. IET Information Security 16 (5), 324–345. DOI: https://doi.org/10.1049/ise2.12073
  • Baek, J., Matsui, Y., Aizawa, K., 2021. What if we only use real datasets for scene text recognition? toward scene text recognition with fewer labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3113–3122. DOI: https://doi.org/10.1109/CVPR46437.2021.00313
  • Bautista, D., Atienza, R., 2022. Scene text recognition with permuted autoregressive sequence models. In: European conference on computer vision. Springer, pp. 178–196. DOI: https://doi.org/10.1007/978-3-031-19815-1_11
  • Blanco-Medina, P., Fidalgo, E., Alegre, E., Gonzalez-Castro, V., 2022. A survey on methods, datasets and implementations for scene text spotting. IET Image Processing 16 (13), 3426–3445. DOI: https://doi.org/10.1049/ipr2.12574
  • Church, K., De Oliveira, R., 2013. What’s up with whatsapp? comparing mobile instant messaging behaviors with traditional sms. In: Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services. pp. 352–361. DOI: https://doi.org/10.1145/2493190.2493225
  • Jánez-Martino, F., Alaiz-Rodríguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E., 2023. A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artificial Intelligence Review 56 (2), 1145–1173. DOI: https://doi.org/10.1007/s10462-022-10195-4
  • Joshi, A., Fidalgo, E., Alegre, E., Fernandez-Robles, L., 2023. Deepsumm: Exploiting topic models and sequence to sequence networks for extractive text summarization. Expert Systems with Applications 211, 118442. DOI: https://doi.org/10.1016/j.eswa.2022.118442
  • Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V. R., Lu, S., et al., 2015. Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp. 1156–1160. DOI: https://doi.org/10.1109/ICDAR.2015.7333942
  • Maneriker, P., Stokes, J. W., Lazo, E. G., Carutasu, D., Tajaddodianfar, F., Gururajan, A., 2021. Urltran: Improving phishing url detection using transformers. In: MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM). IEEE, pp. 197–204. DOI: https://doi.org/10.1109/MILCOM52596.2021.9653028
  • Mishra, S., Soni, D., 2022. Sms phishing dataset for machine learning andpattern recognition. In: International Conference on Soft Computing and Pattern Recognition. Springer, pp. 597–604. DOI: https://doi.org/10.1007/978-3-031-27524-1_57
  • Rahman, M. L., Timko, D., Wali, H., Neupane, A., 2023. Users really do respond to smishing. In: Proceedings of the Thirteenth ACM Conference on Data and Application Security and Privacy. pp. 49–60. DOI: https://doi.org/10.1145/3577923.3583640
  • Sanchez-Paniagua, M., Fern ´ andez, E. F., Alegre, E., Al-Nabki, W., González-Castro, V., 2022. Phishing url detection: A real-case scenario through login urls. IEEE Access 10, 42949–42960. DOI: https://doi.org/10.1109/ACCESS.2022.3168681
  • Timko, D., Rahman, M. L., 2023. Commercial anti-smishing tools and their comparative effectiveness against modern threats. In: Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks. pp. 1–12. DOI: https://doi.org/10.1145/3558482.3590173
  • Ulfath, R. E., Sarker, I. H., Chowdhury, M. J. M., Hammoudeh, M., 2022. Detecting smishing attacks using feature extraction and classification techniques. In: Proceedings of the International Conference on Big Data, IoT, and Machine Learning: BIM 2021. Springer, pp. 677–689. DOI: https://doi.org/10.1007/978-981-16-6636-0_51
  • Vadrevu, P., Liu, J., Li, B., Rahbarinia, B., Lee, K. H., Perdisci, R., 2017. Enabling reconstruction of attacks on users via efficient browsing snapshots. In: NDSS. DOI: https://doi.org/10.14722/ndss.2017.23100
  • Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C., 2021. Pan++: Towards efficient and accurate end-to-end spotting of arbitrarilyshaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (9), 5349–5367. DOI: https://doi.org/10.1109/TPAMI.2021.3077555