[Ohiodig] [EXTERNAL] Dealing with increased bot traffic on Omeka sites?
Nicholas A Pavlik
npavlik at bgsu.edu
Mon Mar 9 14:54:18 EDT 2026
Hi Matt,
This is from our Web Applications Developer, John Kloor, who said you're welcome to share this with Maria if it may be useful (he can also share additional technical information). He never ended up needing to try it out because the aggressive bot traffic subsided before he got around to it, but sadly I'm sure we'll probably have occasion to try it at some point in the future...
"We did experience an aggressive crawler for our Omeka Classic and S sites last October. The crawler appeared to use numerous different IP addresses from several hosts and geolocations, so there was no effective way for us to block it at the network level. Our server administrator did increase the resources, particularly RAM, for our server that was being affected which did somewhat ease the issues.
We do request that known AI crawlers do not crawl our sites via the robots.txt file with information from this project:
https://github.com/ai-robots-txt/ai.robots.txt
But that file is only observed by well-behaved crawlers, so I do not think it was effective against the aggressive crawling. The aggressive crawler was attempting to access every link on our site, so our largest vulnerability was due to having numerous links to search results, particularly for faceted terms. In the future, I plan to implement facets in a different manner that does not produce individual links for each term on every page.
I did notice that the aggressive crawler would not send an HTTP Referer header. I've started experimenting with checking for the existence of that header, and if it isn't present, redirecting to a page that uses Javascript to redirect browsers back to the original page that was requested. My hope is that regular users that did not yet have a Referer header will not notice the redirect which should set the header, but crawlers will either not set the header or not run the Javascript for the redirect. This would result in the crawler only receiving a minimal static HTML file instead of running the entire Omeka application and potentially receiving further links to crawl.
The aggressive crawler seemed to run out of pages to crawl before I could fully see if that mitigation was effective, but we haven't run into an issue where our server has been overwhelmed since. I can provide technical details if anyone is interested, but there is no way of knowing if it will be helpful for other aggressive crawlers."
--John Kloor
[Bowling Green State University]<https://www.bgsu.edu/>
[Facebook]<https://www.facebook.com/OfficialBGSU/>
[X]<https://x.com/bgsu>
[Instagram]<https://www.instagram.com/officialbgsu/>
[YouTube]<https://www.youtube.com/user/bgsu>
[LinkedIn]<https://www.linkedin.com/school/bowling-green-state-university/>
[TikTok]<https://www.tiktok.com/@officialbgsu?is_from_webapp=1&sender_device=pc>
Nick Pavlik
Digital Archivist
Center for Archival Collections
Bowling Green State University<https://www.bgsu.edu/?utm_campaign=mc-signature&utm_source=signature&utm_medium=email>
603 Jerome Library
Bowling Green, OH 43403
Office: 419-372-7914
From: Ohiodig <ohiodig-bounces at lists.library.ohio.gov> On Behalf Of Carissimi, Matt via Ohiodig
Sent: Friday, March 6, 2026 2:53 PM
To: ohiodig at lists.library.ohio.gov
Subject: [EXTERNAL] [Ohiodig] Dealing with increased bot traffic on Omeka sites?
Originally From Maria Nucilli ( mnuccilli at wayne.edu<mailto:mnuccilli at wayne.edu> ) Director, Discovery & Innovation at Wayne State in Detroit Michigan via the VRA Listserv. She's wondering how Omeka users are dealing with increased bot traffic. I know this came up in the fall when we were discussing Digital Repositories but I couldn't find any meeting notes on the website.
" Hey everyone, wondering if there are any other Omeka Classic or Omeka S users out there who have found a way to effectively deal with bot traffic.
Since the "AI gold rush" is now in full swing, my team has been dealing with a huge amount of crawler traffic, probably scraping our collections to train AI. Our IT folks have been looking at doing rate limiting on the server, a geo-location limit module, but nothing solid. They initially asked us to consider limiting to our institution IP address range but that is not possible since these are supposed to be public sites.
Interested in what approach others have been taking.
Thanks,
Maria "
Matt Carissimi
Senior Digitization Specialist
The Ohio State University
University Libraries | Technology and Digital Programs
Digitization Department
Library Tech Center
1165 Kinnear Road Columbus, OH 43212
614-247-8699
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1322 bytes
Desc: image001.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 964 bytes
Desc: image002.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 2743 bytes
Desc: image003.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 3563 bytes
Desc: image004.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 520 bytes
Desc: image005.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0011.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 1183 bytes
Desc: image006.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0012.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 1308 bytes
Desc: image007.png
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20260309/6d265140/attachment-0013.png>
More information about the Ohiodig
mailing list