<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;}
@font-face
{font-family:europa;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Arial",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi Matt,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">This is from our Web Applications Developer, John Kloor, who said you’re welcome to share this with Maria if it may be useful (he can also share additional technical information).
He never ended up needing to try it out because the aggressive bot traffic subsided before he got around to it, but sadly I’m sure we’ll probably have occasion to try it at some point in the future…<o:p></o:p></span></p>
<div style="mso-element:para-border-div;border:none;border-bottom:solid windowtext 1.0pt;padding:0in 0in 1.0pt 0in">
<p class="MsoNormal" style="border:none;padding:0in"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:black">“We did experience an aggressive crawler for our Omeka Classic and S sites last October. The crawler appeared to use numerous different IP addresses from several hosts and geolocations, so there was no effective
way for us to block it at the network level. Our server administrator did increase the resources, particularly RAM, for our server that was being affected which did somewhat ease the issues.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black">We do request that known AI crawlers do not crawl our sites via the robots.txt file with information from this project:</span><o:p></o:p></p>
<p class="MsoNormal"><u><span style="color:blue"><a href="https://github.com/ai-robots-txt/ai.robots.txt">https://github.com/ai-robots-txt/ai.robots.txt</a></span></u><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black">But that file is only observed by well-behaved crawlers, so I do not think it was effective against the aggressive crawling. The aggressive crawler was attempting to access every link on our site, so our largest
vulnerability was due to having numerous links to search results, particularly for faceted terms. In the future, I plan to implement facets in a different manner that does not produce individual links for each term on every page.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black">I did notice that the aggressive crawler would not send an HTTP Referer header. I’ve started experimenting with checking for the existence of that header, and if it isn’t present, redirecting to a page that uses
Javascript to redirect browsers back to the original page that was requested. My hope is that regular users that did not yet have a Referer header will not notice the redirect which should set the header, but crawlers will either not set the header or not
run the Javascript for the redirect. This would result in the crawler only receiving a minimal static HTML file instead of running the entire Omeka application and potentially receiving further links to crawl.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black">The aggressive crawler seemed to run out of pages to crawl before I could fully see if that mitigation was effective, but we haven’t run into an issue where our server has been overwhelmed since. I can provide
technical details if anyone is interested, but there is no way of knowing if it will be helpful for other aggressive crawlers.”</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:black"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">--John Kloor</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<div>
<table class="MsoNormalTable" border="0" cellspacing="3" cellpadding="0" width="560" style="width:420.0pt">
<tbody>
<tr>
<td width="188" style="width:140.95pt;border:none;border-right:solid #EAE1D8 1.0pt;padding:0in 7.5pt 0in 0in">
<p class="MsoNormal" align="center" style="text-align:center"><a href="https://www.bgsu.edu/" target="_blank"><span style="border:none windowtext 1.0pt;padding:0in;text-decoration:none"><img border="0" width="114" height="34" style="width:1.1833in;height:.35in" id="Picture_x0020_1" src="cid:image001.png@01DCAFD4.9A401210" alt="Bowling Green State University"></span></a><span style="mso-ligatures:standardcontextual"><o:p></o:p></span></p>
<div align="center">
<table class="MsoNormalTable" border="0" cellspacing="3" cellpadding="0">
<tbody>
<tr>
<td style="padding:7.5pt 6.75pt 0in 0in">
<p class="MsoNormal" style="mso-line-height-alt:11.25pt"><a href="https://www.facebook.com/OfficialBGSU/" target="_blank"><b><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#FD5000;text-decoration:none"><img border="0" width="15" height="15" style="width:.1583in;height:.1583in" id="Picture_x0020_2" src="cid:image002.png@01DCAFD4.9A401210" alt="Facebook"></span></b></a><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#6A6A6A;mso-ligatures:standardcontextual">
<o:p></o:p></span></p>
</td>
<td style="padding:7.5pt 6.75pt 0in 0in">
<p class="MsoNormal" style="mso-line-height-alt:11.25pt"><a href="https://x.com/bgsu" target="_blank"><b><span style="font-size:9.0pt;font-family:europa;color:#FD5000;text-decoration:none"><img border="0" width="15" height="15" style="width:.1583in;height:.1583in" id="Picture_x0020_3" src="cid:image003.png@01DCAFD4.9A401210" alt="X"></span></b></a><span style="font-size:9.0pt;font-family:europa;color:#6A6A6A;mso-ligatures:standardcontextual">
</span><span style="font-size:9.0pt;font-family:europa;color:#6A6A6A;mso-ligatures:standardcontextual"><o:p></o:p></span></p>
</td>
<td style="padding:7.5pt 6.75pt 0in 0in">
<p class="MsoNormal" style="mso-line-height-alt:11.25pt"><a href="https://www.instagram.com/officialbgsu/" target="_blank"><b><span style="font-size:9.0pt;font-family:europa;color:#FD5000;text-decoration:none"><img border="0" width="15" height="15" style="width:.1583in;height:.1583in" id="Picture_x0020_4" src="cid:image004.png@01DCAFD4.9A401210" alt="Instagram"></span></b></a><span style="font-size:9.0pt;font-family:europa;color:#6A6A6A;mso-ligatures:standardcontextual"> <o:p></o:p></span></p>
</td>
<td style="padding:7.5pt 6.75pt 0in 0in">
<p class="MsoNormal" style="mso-line-height-alt:11.25pt"><a href="https://www.youtube.com/user/bgsu" target="_blank"><b><span style="font-size:9.0pt;font-family:europa;color:#FD5000;text-decoration:none"><img border="0" width="15" height="15" style="width:.1583in;height:.1583in" id="Picture_x0020_5" src="cid:image005.png@01DCAFD4.9A401210" alt="YouTube"></span></b></a><span style="font-size:9.0pt;font-family:europa;color:#6A6A6A;mso-ligatures:standardcontextual">
<o:p></o:p></span></p>
</td>
<td style="padding:7.5pt 6.75pt 0in 0in">
<p class="MsoNormal" style="mso-line-height-alt:11.25pt"><a href="https://www.linkedin.com/school/bowling-green-state-university/" target="_blank"><b><span style="font-size:9.0pt;font-family:europa;color:#FD5000;text-decoration:none"><img border="0" width="15" height="15" style="width:.1583in;height:.1583in" id="Picture_x0020_6" src="cid:image006.png@01DCAFD4.9A401210" alt="LinkedIn"></span></b></a><span style="font-size:9.0pt;font-family:europa;color:#6A6A6A;mso-ligatures:standardcontextual">
<o:p></o:p></span></p>
</td>
<td style="padding:7.5pt 0in 0in 0in">
<p class="MsoNormal" style="mso-line-height-alt:11.25pt"><a href="https://www.tiktok.com/@officialbgsu?is_from_webapp=1&sender_device=pc" target="_blank"><b><span style="font-size:9.0pt;font-family:europa;color:#FD5000;text-decoration:none"><img border="0" width="15" height="15" style="width:.1583in;height:.1583in" id="Picture_x0020_7" src="cid:image007.png@01DCAFD4.9A401210" alt="TikTok"></span></b></a><span style="font-size:9.0pt;font-family:europa;color:#6A6A6A;mso-ligatures:standardcontextual">
<o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
</div>
</td>
<td width="366" style="width:274.55pt;padding:0in 0in 0in 12.75pt">
<p class="MsoNormal" style="line-height:11.25pt"><b><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:standardcontextual">Nick Pavlik</span></b><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#6A6A6A;mso-ligatures:standardcontextual"><br>
</span><b><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#4D4D4D;mso-ligatures:standardcontextual">Digital Archivist</span></b><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#6A6A6A;mso-ligatures:standardcontextual"><br>
Center for Archival Collections<br>
<a href="https://www.bgsu.edu/?utm_campaign=mc-signature&utm_source=signature&utm_medium=email" target="_blank"><b><span style="color:#FD5000">Bowling Green State University</span></b></a>
<o:p></o:p></span></p>
<p class="MsoNormal" style="line-height:11.25pt"><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#6A6A6A;mso-ligatures:standardcontextual">603 Jerome Library<br>
Bowling Green, OH 43403 <o:p></o:p></span></p>
<p class="MsoNormal" style="line-height:11.25pt"><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:#6A6A6A;mso-ligatures:standardcontextual">Office: 419-372-7914<o:p></o:p></span></p>
</td>
</tr>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
<td style="padding:.75pt .75pt .75pt .75pt"></td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="mso-ligatures:standardcontextual"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Ohiodig <ohiodig-bounces@lists.library.ohio.gov>
<b>On Behalf Of </b>Carissimi, Matt via Ohiodig<br>
<b>Sent:</b> Friday, March 6, 2026 2:53 PM<br>
<b>To:</b> ohiodig@lists.library.ohio.gov<br>
<b>Subject:</b> [EXTERNAL] [Ohiodig] Dealing with increased bot traffic on Omeka sites?<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="color:black">Originally From Maria Nucilli ( <a href="mailto:mnuccilli@wayne.edu">
mnuccilli@wayne.edu</a> ) </span><span style="font-size:11.0pt;color:black;background:white">Director, Discovery & Innovation at Wayne State in Detroit Michigan</span> via the VRA Listserv. She’s wondering how Omeka users are dealing with increased bot traffic.
I know this came up in the fall when we were discussing Digital Repositories but I couldn’t find any meeting notes on the website. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">“ <i>Hey everyone, wondering if there are any other Omeka Classic or Omeka S users out there who have found a way to effectively deal with bot traffic. </i><o:p></o:p></p>
</div>
<p class="MsoNormal"><i><span style="color:black">Since the “AI gold rush” is now in full swing, my team has been dealing with a huge amount of crawler traffic, probably scraping our collections to train AI. Our IT folks have been looking at doing rate limiting
on the server, a geo-location limit module, but nothing solid. They initially asked us to consider limiting to our institution IP address range but that is not possible since these are supposed to be public sites.</span></i><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><i><span style="color:black">Interested in what approach others have been taking. </span></i><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><i><span style="color:black">Thanks,</span></i><span style="color:black"><o:p></o:p></span></p>
<div>
<p class="MsoNormal"><i><span style="color:black">Maria </span></i><span style="color:black"> "<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
</div>
<div id="ms-outlook-mobile-signature">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#C00000">Matt Carissimi</span></b><b><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black"> </span></b><o:p></o:p></p>
<p style="margin:0in"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">Senior Digitization Specialist</span><o:p></o:p></p>
<p style="margin:0in"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#C00000">The Ohio State University</span></b><o:p></o:p></p>
<p style="margin:0in"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">University Libraries | Technology and Digital Programs</span><o:p></o:p></p>
<p style="margin:0in"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">Digitization Department<br>
Library Tech Center</span><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black"> </span><o:p></o:p></p>
<p style="margin:0in"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">1165 Kinnear Road Columbus, OH 43212</span><o:p></o:p></p>
<p style="margin:0in"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">614-247-8699</span><o:p></o:p></p>
</div>
</div>
</body>
</html>