Machine Learning Bot Classifier (Edgio)

My Role

I architected and engineered a Bot Classification Model for Edgio, allowing customers to defend against more sophisticated bots than our WAF alone could detect. I will document some details below, though you can read more here: https://www.edgecast.io/post/introducing-edgecasts-advanced-bot-manager

Impact

The model I built was estimated at 99.6% accuracy and protected:

3B+ daily requests
10k+ websites
20k+ edge servers

For our client Shoe Carnival, our ML-powered bot solution blocked 8M malicious requests in one month and reduced security exploit mitigation time by 85%.

Various Categories of bots

In my analysis of bots coming through our network, I identified a variety of bots with distinct network signatures, such as:

Scalping
DDOS attacks
Scraping
Search Engine Optimization
Monitoring
etc.

The Signal

Based on available research, as well as my own, I identified and leveraged existing network and behavioral fingerprints and features to identify and block bot traffic to our customers. Some powerful features leveraged include:

Standard deviation of interrequest time
entropy in resource type requested
request rate
user agent analysis*
referrer
TLS data (e.g. number cipher’s supported)
session tracking
firewall events
etc.

The process of building evasion-proof fingerprints was a compound effort using JA4+ and IP-level tracking. JA4 is a network fingerprint that is difficult to evade and easy to neglect, as it is based on TLS Client Hello packet information, specifically the following:

For the model, we used the JA4+ suite, which gives a variety of fingerprints at different levels of the OSI model. The power of JA4-based fingerprinting led to its adoption by competitors like Cloudflare in their bot management solution.

In Practice

Bot manager internal dashboard