Repo for Individual project in URL clustering. 2023Z
Part 1:
Basic exploration work on URLs from https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset
Work on URLs from https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset
Data description: 600k URLs - 4 classes -> benign,malware,phishing,defacement
Goal: Explore malware classification of URL data using Clustering methods
Find "good" clustering for this purpose and explore how to evaluate what is "good" in this case
Try to approach more sophisticated data (XSS,SQLi,DGA etc.) and discriminate between types of malicious URLs.
Try to approach more sophisticated data (XSS,SQLi,DGA, obfuscation techniques and phishing adresses etc.) and discriminate between types of malicious URLs.