[Ohiodig] FW: LOC: Filling in the File Format Gaps

Carleton, Janet (she/her) carleton at ohio.edu
Thu Jun 29 11:08:16 EDT 2023


May be of interest! From the Library of Congress.

---
Janet Carleton| Digital Initiatives Coordinator | Digital Initiatives | Mahn Center for Archives and Special Collections, Preservation & Digital Initiatives | OHIO University Libraries | Alden 333 | Athens, Ohio | 740.597.2527 | carleton at ohio.edu<mailto:carleton at ohio.edu> | https://media.library.ohio.edu<https://media.library.ohio.edu/> | she/her/hers

Feed: The Signal
Posted on: Thursday, June 29, 2023 10:10 AM
Author: Liz Holdzkom
Subject: Filling in the File Format Gaps


Today’s guest post is from Kate Murray, Marcus Nappier, and Liz Holdzkom of the Digital Collections Management & Services Division at the Library of Congress.

________________________________

This is the fourth installment of our semi-annual blog series about file format research for the Sustainability of Digital Formats: Planning for Library of Congress Collections<https://www.loc.gov/preservation/digital/formats/index.html?loclr=blogsig> at the Library of Congress. If you’re a file format fan, take a look at the other entries Fun with File Formats<http://blogs.loc.gov/thesignal/2021/12/fun-with-file-formats/?loclr=blogsig>, Return to the Fascinating World of File Formats!<http://blogs.loc.gov/thesignal/2022/06/return-to-the-fascinating-world-of-file-formats/?loclr=blogsig>, and Even More Fun with File Formats!<http://blogs.loc.gov/thesignal/2022/12/even-more-fun-with-file-formats/?loclr=blogsig>. We may not have the most creative blog post titles but we know our way around a specification and how to find a magic number<https://www.garykessler.net/library/file_sigs.html>.

This has been a busy few months for your favorite file format folks! Let’s catch you up on all the goings on.

New and updated file format descriptions (and LOTS of them)

Thanks in part to a contract with NVision Solutions, we have published 30 new file format descriptions (known as FDDs) to our site this calendar year. A full list of the new entries is available on our 2022-2023 workplan<https://www.loc.gov/preservation/digital/formats/fdd/fdd_workplan.shtml#2022-2023?loclr=blogsig> and we’re also keeping our publication log<https://www.loc.gov/preservation/digital/formats/fdd/fdd_workplan.shtml#pub-log?loclr=blogsig> up-to-date so you can follow along at home when we publish a new one.

These new FDDs fall into several content categories:

  *   Accessibility support formats which includes both formats for screen readers/audio players as well as formats for captions and subtitles in audiovisual content. A few highlights include BRF (Braille Ready Format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000553.shtml?loclr=blogsig> (FDD 551), HBL (Braille Sense Format File)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000551.shtml?loclr=blogsig> (FDD 553), WebVTT (Web Video Text Tracks Format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000567.shtml?loclr=blogsig> (FDD 567), SRT (SubRip Subtitle Format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000569.shtml?loclr=blogsig> (FDD 569) and SUB (VobSub Subtitle Format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000571.shtml?loclr=blogsig> (FDD 571). This focus on accessibility is linked to related projects in the Federal Agencies Digital Guidelines Initiative (FADGI)’s AudioVisual Working Group’s Accessibility Subgroup<https://www.digitizationguidelines.gov/guidelines/accessibilty_AV_collections.html>. An additional FADGI project reflects research into accessibility for open-source digital preservation applications<https://blogs.loc.gov/thesignal/2023/05/new-fadgi-project-researching-accessibility-in-open-source-digital-preservation-applications/?loclr=blogsig>.
  *   3D, Virtual Reality and related design formats support preferences in the Recommended Formats Statement<https://www.loc.gov/preservation/resources/rfs/design3D.html?loclr=blogsig> as well as other efforts in the Library. A few new entries to note include 3MF (3D Manufacturing Format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000557.shtml?loclr=blogsig> (FDD 557), E57 (ASTM E57 3D file format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000563.shtml?loclr=blogsig> (FDD 563), VRM<https://www.loc.gov/preservation/digital/formats/fdd/fdd000564.shtml?loclr=blogsig> (FDD 564) and ARML 2.0 (Augmented Reality Markup Language)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000556.shtml?loclr=blogsig> (FDD 556).
  *   Web-enabled format entries include WebP<https://www.loc.gov/preservation/digital/formats/fdd/fdd000577.shtml?loclr=blogsig> (FDD 577), VP8 Video Codec<https://www.loc.gov/preservation/digital/formats/fdd/fdd000578.shtml?loclr=blogsig> (FDD 578) and VP9 Video Codec<https://www.loc.gov/preservation/digital/formats/fdd/fdd000579.shtml?loclr=blogsig> (FDD 579). WACZ (Web Archive Collection Zipped)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000586.shtml?loclr=blogsig> (FDD 586) for web archiving is also included on the Recommended Formats Statement<https://www.loc.gov/preservation/resources/rfs/webarchives.html?loclr=blogsig>.
[Screenshot of spreadsheet showing newest Format Description Documents (order from newest to oldest), with FDD numbers, names, URLs, and publication dates.]<http://blogs.loc.gov/thesignal/files/2023/06/Figure1-PubLog.png>Formats publication log for new additions from January – June 2023. For the live version, see www.loc.gov/preservation/digital/formats/fdd/fdd_workplan.shtml<https://www.loc.gov/preservation/digital/formats/fdd/fdd_workplan.shtml?loclr=blogsig>.

RFS FDD prioritization

Let’s keep the FDD update train rolling! In preparation of the release of the 2023 Recommended Formats Statement (RFS)<https://www.loc.gov/preservation/resources/rfs/?loclr=blogsig>, we’ve also been updating the FDDs called out in the RFS’s various content categories. You may remember in our Return to the Fascinating World of File Formats!<http://blogs.loc.gov/thesignal/2022/06/return-to-the-fascinating-world-of-file-formats/?loclr=blogsig> blog post from last June, we developed a new process to pull the date of last update from our FDD xml to target those RFS FDDs. We’ve continued to build on this work and standardize the process to update these FDDs. “What information are we updating?” is probably a question you’re asking right now. We’re sure by now you’ve checked out an FDD or two and noticed LOTS of links to external resources. That’s where we start with our updates to ensure that links are still active and resolve to the correct source. We’ve now also developed template language for the “LC Experience” and “LC Preference” sections in our FDDs to better clarify the Library’s holdings of a particular format or whether that format is listed in the RFS. The clarity in the “LC Preference” field is important because we haven’t always been consistent in the past and it’s caused a few (or many) headaches when running our XML parsing script. We’re continuing to work on establishing consistency in that field to save ourselves from future headaches.

Unlike last year, we actually had a priority one FDD from our prioritization list! WACZ (Web Archive Collection Zipped)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000586.shtml?loclr=blogsig> as mentioned above is a brand new FDD in the RFS. We still prioritized FDDs that were listed as a preferred or acceptable format without a significant update for 5-10 years but also reviewed newer FDDs as well. With over 50 completed FDD updates, we continue to see the high value of this work and it will remain a critical part of our yearly review.

The 2023-2024 version of the RFS will be published in the coming weeks so stay tuned for a follow up blog post highlighting all the changes.

Upcoming work

We are excited to begin a new contract this month with Ashley Blewer, Abi Simkovic and Frances Harrell through Myriad Consulting. Over the next 12 months, this team will research and write close to 40 new FDDs. The 2023-2024 work plan<https://www.loc.gov/preservation/digital/formats/fdd/fdd_workplan.shtml#2023-2024?loclr=blogsig> is available and includes a few new areas of interest such as mobile device support, packaging, software and installation support, forensics and disc imaging as well as filling in gaps for existing content categories Email and Personal Information Manager (PIM) Formats<https://www.loc.gov/preservation/digital/formats/fdd/email_fdd.shtml?loclr=blogsig>, Design and 3D<https://www.loc.gov/preservation/digital/formats/fdd/design3D_fdd.shtml?loclr=blogsig>, Datasets and Databases<https://www.loc.gov/preservation/digital/formats/fdd/dataset_fdd.shtml?loclr=blogsig>, Still Images<https://www.loc.gov/preservation/digital/formats/fdd/still_fdd.shtml?loclr=blogsig> and Text<https://www.loc.gov/preservation/digital/formats/fdd/text_fdd.shtml?loclr=blogsig>. We’re personally looking forward to the research work on Audio Definition Model (ADM)<https://adm.ebu.io/>, gzip<https://formats.kaitai.io/gzip/#:~:text=Gzip%20is%20a%20popular%20and,by%20a%20chosen%20compression%20algorithm>, bzip<https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf>, and Apple ProRaw<https://support.apple.com/en-us/HT211965> just to name a few.

We’ve discussed how we prioritize which formats to work on in a previous blog post<http://blogs.loc.gov/thesignal/2021/12/fun-with-file-formats/?loclr=blogsig>. More specifically for this upcoming group of FDDs, priority formats were identified via the Library’s Music<https://www.loc.gov/rr/perform/?loclr=blogsig> and Manuscript<https://www.loc.gov/rr/mss/?loclr=blogsig> divisions’ research efforts and holdings, inclusion in projects such as BitCurator (the Library of Congress is a member of the BitCurator Consortium<https://bitcuratorconsortium.org/about/>) and wider community discussion.

Fan favorite formats

But it’s not just all about the new FDDs, so let’s look at the old favorites. We looked at the analytics from the last 12 months, and found that CSV<https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml?loclr=blogsig> is our most popular FDD, followed closely by Wavefront Material Template Library (MTL)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000508.shtml?loclr=blogsig>. We love a good CSV so this makes sense.
[Screenshot of CSV Format Description Document]<http://blogs.loc.gov/thesignal/files/2023/06/Figure2-CSV.png>A snippet of everyone’s favorite FDD, CSV Comma Separated Values (RFC 4180)! See www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml<https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml?loclr=blogsig> for the full version.

Then DWG (AutoCAD Drawing) Format Family<https://www.loc.gov/preservation/digital/formats/fdd/fdd000445.shtml?loclr=blogsig> and Email (Electronic Mail Format)<https://www.loc.gov/preservation/digital/formats/fdd/fdd000388.shtml?loclr=blogsig> come in third and fourth but with a lot less views than our top two (we’re talking thousands).

And more stats we can love: Thousands of visitors came to our site over the past year from The Signal blog and blog posts just like this! And Wikipedia is also a major referring site, which means Wikipedians are using our FDDs for source material. No matter where you are coming from, whether you are linking from a different site or coming to us directly, we love our visitors just the same.

As always, comments and feedback is very welcome! Leave a comment here or send us a note at formats at loc.gov<mailto:formats at loc.gov>.

View article...<https://blogs.loc.gov/thesignal/2023/06/filling-in-the-file-format-gaps/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.library.ohio.gov/pipermail/ohiodig/attachments/20230629/540ce471/attachment-0001.htm>


More information about the Ohiodig mailing list