<div dir="ltr">Hey Ginnie,<br><br>At Lehigh University back in November/December we evaluated something like three HTR solutions and seven different ollama LLM models, and found OpenAI ChatGPT is the best model to transcribe handwritten text documents.<br><br>We made <a href="https://github.com/lehigh-university-libraries/scyllaridae/tree/main/examples/openai-htr">an Islandora microservice</a> and got our first documents in our Islandora repository successfully OCR'd and added to our search index.<br><br>For example for this image: <a href="https://preserve.lehigh.edu/sites/default/files/2024-01/328551.jpg">https://preserve.lehigh.edu/sites/default/files/2024-01/328551.jpg</a><br><br>With this prompt: <a href="https://github.com/lehigh-university-libraries/scyllaridae/blob/main/examples/openai-htr/Dockerfile#L20">https://github.com/lehigh-university-libraries/scyllaridae/blob/main/examples/openai-htr/Dockerfile#L20</a><br><br>ChatGPT returned this OCR:<br><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="monospace">Dear Sir</font> </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-family:monospace">I have been expecting to hear from you every day on the subject of the Little Poems I am about to publish. No more time must it be delayed. This pamphlet will not interfere with any negotiations between us, as it is quite a separate thing and is printed for Chandos in purpose—Mr. </span>Wesley who<span style="font-family:monospace"> has endorsed the letter which is to be printed with it, </span>called on me<span style="font-family:monospace"> the other day, and he promised to see you as soon as he returned to Town. I believe he </span>is there<span style="font-family:monospace"> before now. You will therefore be so good as</span></blockquote><br>We have plans to create some tooling around this to support more models, provide a GUI, and eventually hope to be able to generate hOCR for handwritten manuscripts.<br><br>Joe</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Feb 27, 2025 at 4:35 PM DRESSLER, Virginia via Ohiodig <<a href="mailto:ohiodig@lists.library.ohio.gov">ohiodig@lists.library.ohio.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-6479198727705599045">
<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi OhioDIG-</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I'm working with a faculty member who is looking for advice on turning some printed data collection tables with handwritten content into a tabular format (without having to manually transfer or retype it).</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I tried a few tests using ABBY FineReader to convert an image and PDF version of one sample page, but the results were pretty awful.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Anyone work on anything like this and/or have an idea? One of our librarians suggested trying ChatGPT, though I'm not sure if this data would be a good candidate or not, and another suggested Transkribus-
<a href="https://www.transkribus.org" id="m_-6479198727705599045LPlnk" target="_blank">https://www.transkribus.org</a></div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks in advance!</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Ginnie</div>
<div id="m_-6479198727705599045Signature">
<div id="m_-6479198727705599045divtagdefaultwrapper" dir="ltr" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif">
<br>
</div>
</div>
</div>
_______________________________________________<br>
Ohiodig mailing list<br>
<a href="mailto:Ohiodig@lists.library.ohio.gov" target="_blank">Ohiodig@lists.library.ohio.gov</a><br>
<a href="https://lists.library.ohio.gov/mailman/listinfo/ohiodig" rel="noreferrer" target="_blank">https://lists.library.ohio.gov/mailman/listinfo/ohiodig</a><br>
To contact the list owner send an email to <a href="mailto:Ohiodig-owner@lists.library.ohio.gov" target="_blank">Ohiodig-owner@lists.library.ohio.gov</a><br>
To unsubscribe send an email to <a href="mailto:Ohiodig-unsubscribe@lists.library.ohio.gov" target="_blank">Ohiodig-unsubscribe@lists.library.ohio.gov</a><br>
</div></blockquote></div>