<div dir="ltr"><div class="gmail_quote"><div dir="ltr">Dear All,<div><br></div><div>I haven't forgotten about the text "jellification" issue I brought up the other day. I followed Chatham's suggestion and reached out to IDEALS. IDEALS informed me that they simply host the content and play no role in digitization. However, they directed me to <a href="http://www.library.illinois.edu/preservation/digitization-services" target="_blank">Digitization Services Unit</a>, who provided me with an answer. I have reproduced the reply below:<br><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It might be the compression selected in whatever program was used to create the PDF/OCR. There is a compression setting in Abbyy FineReader and Adobe Acrobat. If the original images were saved at a high DPI and the PDF downsized the images significantly you might get this result. We’ve had this happen. This could also happen if you are using JP2000 files to create PDFS.<br></blockquote></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Another idea - Abbyy doesn’t always handle graphs or tables well. There are also issues with how it handles saving the text and OCR within the image in the PDF. The scanned image and OCR end up appearing blurry.<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Here are a few suggestions I have:<br><ul><li>You can customize the recognition mode in the “Analyze and recognize image” steps in Abbyy. Go to PDF recognition mode. It defaults to Auto but you want to choose only OCR from the PDF.</li><li>Go to the Recognition option in Abbyy. Unselect detect header and footers or other structural elements of the document.</li><li>Go to the Preprocessing settings in Abbyy and deselect Correct Image Resolution and Reduce ISO Noise.</li><li>In the Save Results section of Abbyy select Best Quality next to the Keep Pictures check box.<br></li></ul><br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Without working with the files used to generate the PDFs it’s hard to say for certain but those are some thoughts I have. I would guess Abbyy was used to generate this PDF and OCR.<br></blockquote><div><br></div><div>What was strange to me about this whole problem was that I couldn't figure out how – if my assumption that it was related to OCR is correct – the OCR would be affecting the actual image. I presume OCR is saved as an "overlay" of sorts that didn't affect the quality of the original image.</div><div><br></div><div>On a related note, does anyone know whether PDF saves images to another, internal file format? Given that it can handle both vector and raster graphics as well as text I presume, PDF isn't <i>technically</i> an image file format itself, but just acts as a container to hold all the disparate formats together with some editing features on top. <strike>The reason I ask is because when you take a large PDF file, open it in both Adobe Acrobat and an image editing software like <a href="http://www.gimp.org" target="_blank">GIMP</a> the latter shows a grayscale image, while the former shows only a black and white one.</strike> (Nevermind this point, it is apparently a result of the resolution, as opening the file in the latter with a higher resolution (200 vs 100 pixels per inch) produces a result like the former.) However, I could be completely wrong.</div><div><br></div><div>Sincerely<span style="font-family:arial,sans-serif;font-size:12.8px">,</span></div><div style="font-size:12.8px;font-family:arial,sans-serif">Noah Stegman Rechtin</div><div style="font-size:12.8px;font-family:arial,sans-serif"><div style="font-size:12.8px"><i><a href="http://tri-statewarbirdmuseum.org/" target="_blank">Tri-State Warbird Museum</a></i></div></div><div><i>Collections Manager & Museum Attendant</i> </div></div>
</div></div>