How to See Your Site or App Bot Traffic

The first step to stopping bots wasting money and stealing content is seeing it happen

March 11th 2025
by
Tim Sylvester
Tim Sylvester

Inconceivable!

You keep using that word, I do not think it means what you think it means.

This is the conversation when I tell digital publishers and digital marketers that robots.nxt can make them money from bots on their website.

Most don’t know how much bot traffic they have because their analytics hides bots.

Most think they can ignore bots, because bots are page indexers or webcrawlers for search engine ranking and search engine optimization, or they can “just” block bad bots attempting distributed denial of service attacks (DDOS) or hacking.

When Has Hiding a Problem Ever Solved It?

Hiding or ignoring was fine for most of the existence of the internet, but AI changed everything, and most content owners haven’t woken up to greet the new day yet.

Now half of internet traffic is bots, and only a small percentage of that traffic is from traditional crawlers that index webpages, like googlebot images or googlebot news. We have details on pretty much every good webcrawler on our bot page for indexers. There’s a bunch of filters so you can see whatever kinds of bots you want.

In our experience, less than 2% of a websites’ traffic is bots they want visiting.

Look at this graph from one of our users. Only 4% of their bot traffic even uses good names. Only half of that is content indexers or collecting analytics for SEO.

A Sankey showing bot traffic split between known and unknown bots.
What’s your web traffic look like? Do you even know?

About a third are bad bots, like penetration hacker bots, that should be blocked. These bots generally don’t use a consistent name, so we identify them in other ways.

But a growing number of bots, about 20% of the overall traffic on your website or app, are content scrapers that collect data for AI training. That’s a lot of cost and lost revenue.

These content scraper bots take a copy of every page on every website to feed into an AI model for training. Sometimes these bots are so aggressive they’ll DDOS (distributed denial of service attack) sites, like with iFixIt or Game UI Database.

(These were actually “just” denial of service attacks, since they’re not “distributed”.)

Do you want these bots visiting? Then let them. It’s your loss.

Do you want to stop them stealing content from you? robots.nxt stops them.

Here’s the real question: Do you want to make money from content scraping bots?

A confused man looks back and forth between different cameras.
Rumor has it Dr. Steve Brule and Dr. Mantis Toboggan vacation together

You heard me. What if we convert AI companies scraping content into paying clients?

People want to know, how can I make money from my recipes, how can I make money from my blog posts, from photos, videos. And on and on they go.

Licensing to AI is how to make money from your website on an AI-majority internet.

Any content, professional or amateur. Slap a bot paywall on it, set prices, and make money from creating content. That’s what we’re making happen here. For you.

A cartoon character does a dance with the caption 'What can I say except you're welcome.'
You can thank me by using robots.nxt

Robots.txt, Captchas, and Recaptchas for Bots

I talk to digital marketing firms and local publishers that are most in need of protection from bad bots and AI content scraping. They have a huge amount of super valuable content and they’re getting ripped off every day, every time they post.

Often they don’t understand what they’re dealing with. For the last 20 years they’ve been trained to hide and ignore bots instead of see and control them.

For the local services companies that digital marketing firms work for, AI companies training on their content marketing ruins their click-through rates and destroys SEO.

That dilutes the inbound lead generation local professional services companies use content marketing for. They do more work for less outcome. That’s a waste of time and effort for everyone except the AI companies, who get the benefits. Nice.

Then the bad bots come through and flood their spam lead generation forms, meaning they have less good traffic but are swamped with bad traffic. Nice.

For publishers, AI companies training on their articles ruins their ad revenue and paywall revenue — after all, why load the page when it’s summarized in search?

Why buy the cow when the milk is free?

But most digital marketing firms and publishers are stuck in an outdated mindset from back when they started their marketing firm or media company, and don’t understand that “the way we’ve always done it” is no longer relevant in the age of AI.

Like asking grandpa for career advice, they focus on things that don’t work anymore.

An older man points a finger accusingly.'
Take a nap, Grampa. The world’s moved on.

“robots.txt controls crawling”, they’ll say, but robots.txt is voluntary, bots ignore it.

“Captcha prevents form spam”, they say, but AI bots can solve captcha or recaptcha half the time and get better every day. Or they ship it to a spam farm in a low-income nation where people sit at phones solving captcha / recaptcha all day as a job.

Captcha and recaptcha don’t work in this new AI age, and Google has been using them to make money from your website without giving you a cut. How rude!

A young girl says how rude.'
You tell ’em Steph!

Here’s today: For the bots that matter, for the content that’s actually valuable to AI, robots.txt, captcha, and recaptcha don’t work.

How Does robots.nxt Estimate Your Revenue?

Most businesses have no idea how much their site is making them, costing them, and profiting them per-request. Wild! I couldn’t believe it. So we built a tool to show you.

When you register, we ask for some info on your hosting costs, current revenues, and traffic. We use that to build a simple revenue, expense, and profit model for you.

That shows the financial impact of traffic. Now you can measure and manage.

A user interface to estimate website traffic revenue, expenses, and profit.'
A look at robots.nxt revenue estimator

We charge $0.0001 per server req, which means we’re 20% the cost of serving the page in this example. And we assume you want us to block every bot that doesn’t pay or isn’t an indexer or SEO bot.

That means, in this case, 45% of total traffic is blocked and 2.5% of total traffic pays. (On top of whatever you’re already making from human traffic, which we don’t touch.)

We assume that you’re charging $0.2 per page served to a paying bot, but your prices can be whatever you want them to be.

In this example, this website would increase its revenue by 50% by charging bots. At our current rates, that would cost them $49 for the traffic management, and $749 in transaction fees. Less than $800 to make an extra $5k, passively. While sleeping.

A 50% bump from doing nothing. Don’t you want that? What’re your options?

How Does robots.nxt Traffic Tracking Work?

robots.nxt is super easy to set up, it should only take minutes.

The tracking script is a javascript snippet you put in your sitewide context, usually a footer that loads on every page, no different from Google Analytics or Facebook. (Most others like Ahrefs, SEMrush, SpyFu, diib, and UberSuggest plug into Google.)

Shout out to diib, which seems to be the best value for capabilities, features, and pricing.

All of your human visitors and about half your bot traffic will request the tracking script. (Low quality bots don’t interact with javascript, so they don’t trigger the tracking script.)

Every time the script is invoked, we capture the route and user agent string to compare to our bots database. If we recognize the user-agent as a bot, we log it.

If we don’t recognize the user-agent but it has markings of a bot (there’s ways to tell), we go through a couple of checks. If the confidence score is high enough, we give it an identity so we recognize it if it comes back to any of our user sites, and log it.

We don’t do anything about humans, you’ve got analytics and revenue methods for them.

We’re here for the bots.

We track all the requests bots make — what pages and routes they request, what chunks they load, what code they invoke, what API endpoints they request.

That means we can see how many bots visit your site, who they are, when they visit, and what they want from you.

When you have enough bot traffic that you’d save money by controlling them, we suggest you turn on our management service.

And once you’re managing bot traffic, well, take Anakin’s word for it!

Anakin Skywalker says 'This is where the fun begins.'
This is one of the few times I agree with Anakin about how to have fun.