We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Performance Analysis of Regex-Based Processing for Dark Web Targeted Crawling.
- Authors
Ruriawan, Muhammad Faris; Purwanto, Yudha; Yunelfi, Putri R.; Popalia, Agus S.
- Abstract
Data crawling in the dark web holds a critical significance in bolstering security intelligence efforts. Previous research has successfully developed fast crawlers for specific purposes such as digital investigations, abusive content, automated captcha breaking, etc. However, this research mostly focuses on faster download time and has not paid attention to the importance of assessing crawl accuracy. Due to the fast-changing dark web shape and content, accurate and complete crawled data is a vital part of security intelligence. This research has successfully developed a targeted dark web crawler by combining the focus and in-depth crawling for The Onion Router (TOR) network. Regex Text, Regex Wildcard, and Regex Optional are used to automatically filter the content by a specific keyword. The effectiveness of the crawler was tested in the five real-world dark website environments. From the testing with a depth of 3, the application achieved more than 98% accuracy. The Regex Optional processing performance was faster than the Regex Text and Regex Wildcard by over a second, due to the swift crawling attempt. In terms of accuracy, the Regex Optional achieved 99.14% which is 4.83% higher than Regex Text. The best keyword processing method in targeted crawling is Regex Optional, with an accuracy rate of over 99%.
- Subjects
DARKNETS (File sharing); INTERNET content
- Publication
International Journal of Safety & Security Engineering, 2024, Vol 14, Issue 2, p467
- ISSN
2041-9031
- Publication type
Article
- DOI
10.18280/ijsse.140214