Machine Learning Bot Classifier (Edgio)

My Role

I architected and engineered a Bot Classification Model for Edgio, allowing customers to defend against more sophisticated bots than our WAF alone could detect. I will document some details below, though you can read more here: https://www.edgecast.io/post/introducing-edgecasts-advanced-bot-manager

Impact

The model I built was estimated at 99.6% accuracy and protected:

  • 3B+ daily requests
  • 10k+ websites
  • 20k+ edge servers

For our client Shoe Carnival, our ML-powered bot solution blocked 8M malicious requests in one month and reduced security exploit mitigation time by 85%.

Various Categories of bots

In my analysis of bots coming through our network, I identified a variety of bots with distinct network signatures, such as:

  • Scalping
  • DDOS attacks
  • Scraping
  • Search Engine Optimization
  • Monitoring
  • etc.

The Signal

Based on available research, as well as my own, I identified and leveraged existing network and behavioral fingerprints and features to identify and block bot traffic to our customers. Some powerful features leveraged include:

  • Standard deviation of interrequest time
  • entropy in resource type requested
  • request rate
  • user agent analysis*
  • referrer
  • TLS data (e.g. number cipher’s supported)
  • session tracking
  • firewall events
  • etc.

The process of building evasion-proof fingerprints was a compound effort using JA4+ and IP-level tracking. JA4 is a network fingerprint that is difficult to evade and easy to neglect, as it is based on TLS Client Hello packet information, specifically the following:

For the model, we used the JA4+ suite, which gives a variety of fingerprints at different levels of the OSI model. The power of JA4-based fingerprinting led to its adoption by competitors like Cloudflare in their bot management solution.

In Practice

Bot manager internal dashboard