[Ohiodig] "Jellification" of Text

Tallman, Nathan ntt7 at psu.edu
Thu Jun 2 09:15:13 EDT 2022


Good morning,

To me, this looks like an artifact from compression within the PDF, likely lossy JPEG2000. I've seen it before. Not sure of your workflow, but this could be introduced by OCR software that outputs a compressed PDF with searchable text or if your using Acrobat to reduce file size. LuraTech PDF Compressor does this too. You could try playing arround with the compression options, using JPEG perhaps instead of JPEG2000 or reducing the compression ratio to try and improve the image quality.

Thanks,
Nathan

--
Nathan Tallman (he/him)
Schedule a Meeting<https://outlook.office365.com/owa/calendar/PennStateUniversityLibraries2@PennStateOffice365.onmicrosoft.com/bookings/>
Chat with me on Teams<https://teams.microsoft.com/l/chat/0/0?users=ntt7@psu.edu>

From: Ohiodig <ohiodig-bounces at lists.library.ohio.gov> On Behalf Of Noah Stegman Rechtin via Ohiodig
Sent: Wednesday, June 1, 2022 3:09 PM
To: ohiodig at lists. library. ohio. gov (ohiodig at lists.library.ohio.gov) <ohiodig at lists.library.ohio.gov>; Chatham.Ewing at cpl.org; Klose.16 at osu.edu
Subject: Re: [Ohiodig] "Jellification" of Text

Dear All,

For reference, I used the term "jellification" because the results look like jelly to me for some reason. However, a more technical way to describe it might be "a blurry background with heavily non-anti-aliased text".

The "jellified" version takes significantly longer to open than the normal one. It could be the result of the former attempting to open the entire document at once while the latter only displays two pages at a time, but it might also be related to the viewer attempting to render an additional layer of poorly OCR'd text.

Why not write to Ideals and ask?
I can't believe I didn't think of doing that. I just sent them an email about it. Good suggestion.

On a related or unrelated note, here is an older post from a digitization expert (https://page2pixel.org/2013/08/when-copiers-arent-copying-as-they-should/<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpage2pixel.org%2F2013%2F08%2Fwhen-copiers-arent-copying-as-they-should%2F&data=05%7C01%7Cntt7%40psu.edu%7C641552a48ec14b22871908da44024069%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637897073712667918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7DosRHVtu7HyG8eL%2B3cXIvxEnu0v57p2iJqySzzABMo%3D&reserved=0>) that mentions copy/scan stations interpreting numbers incorrectly and changing them in derivatives.
If you click through to the linked blog post<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.dkriesel.com%2Fen%2Fblog%2F2013%2F0802_xerox-workcentres_are_switching_written_numbers_when_scanning&data=05%7C01%7Cntt7%40psu.edu%7C641552a48ec14b22871908da44024069%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637897073712667918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6Z2t5g1B9uyTjqON8IwaiwiEVqWGR1QlGL6cMlm1NjA%3D&reserved=0> and the provided examples really do look like the above. So it does indeed appear related. Thanks for the excellent link.

The concern he lays out for accurate replication is well taken. While whatever mechanism does this has not, in my experience, caused issues with text, many of the documents I encounter (e.g. Technical Manuals<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcatalog.hathitrust.org%2FRecord%2F006125802&data=05%7C01%7Cntt7%40psu.edu%7C641552a48ec14b22871908da44024069%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637897073712667918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oza1GZVrCgzCxNE8CsRmqm3PWQfNdq0uPYMaB2XiFXg%3D&reserved=0>) have diagrams or schematics that it fails to correctly reproduce. For the subject matter our museum deals with, this is important. (For example, while it's not the same cause, the difference between microfilmed and scanned originals<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.aircorpsaviation.com%2Fken-jungeberg-collection&data=05%7C01%7Cntt7%40psu.edu%7C641552a48ec14b22871908da44024069%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637897073712667918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aiUhMoaXceGbTqKy5OorkS9Z%2BVShE%2BCxeWUkJ%2B8YNCU%3D&reserved=0> can be drastic.) Being able to see the minutiae can often be the difference between being able to reproduce something correctly or not. I know this is all a bit of preaching to the choir, but I mention it because it seems that the digitizer may simply not be aware that the drawings are present because the document appears so text heavy.

Sincerely,
Noah Stegman Rechtin
Tri-State Warbird Museum<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftri-statewarbirdmuseum.org%2F&data=05%7C01%7Cntt7%40psu.edu%7C641552a48ec14b22871908da44024069%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637897073712667918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IkDXA%2Frf0udEtyZNyXItfD4xAP2nA8cIBsvbRnC76%2Fc%3D&reserved=0>
Collections Manager & Museum Attendant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20220602/73244594/attachment.htm>


More information about the Ohiodig mailing list