Oklahoma mentor Lisa Henry (left) cleaning a U-matic deck with Public Broadcasting Preservation Fellow Tanya Yule.
This Thursday, February 15th at 8 pm EST, American Archive of Public Broadcasting (AAPB) staff will host a webinar covering quality control tools and technologies used when ingesting digitized collections into the AAPB archive, including MDQC, MediaConch, Sonic Visualizer, and QCTools.
The public is welcome to join for the first half hour. The last half hour will be limited to Q&A with our Public Broadcasting Preservation Fellows, who are just now beginning the process of digitizing at-risk public broadcasting collections to be preserved in the AAPB.
In January 2016, the Council on Library and Information Resources awarded WGBH, the Library of Congress, WETA, and NewsHour Productions, LLC a grant to digitize, preserve, and make publicly accessible on the AAPB website 32 years of NewsHour predecessor programs, from October 1975 to December 2007, that currently exist on obsolete analog formats. Described by co-creator Robert MacNeil as “a place where the news is allowed to breathe, where we can calmly, intelligently look at what has happened, what it means and why it is important,” the NewsHour has consistently provided a forum for newsmakers and experts in many fields to present their views at length in a format intended to achieve clarity and balance, rather than brevity and ratings. A Gallup Poll found the NewsHour America’s “most believed” program. We are honored to preserve this monumental series and include it in AAPB.
Last week, our contract archivist Alexander (AJ) Lawrence completed the inventory of 7,320 NewsHour tapes stored in 523 boxes located in WETA’s storage units in Arlington, Virginia, comprising the bulk of the collection. (Additional content is located at two other locations.)
“I was so excited to receive Casey’s initial email asking about my interest in the NewsHour project. I’ve been a life long watcher of the program and the chance to be involved in the preservation of such a valuable resource for historical research seemed like a wonderful opportunity.
The process of inventorying the entire collection seemed pretty daunting on my first day when I got my first in-person look at the storage units housing the estimated 7,500 tapes. However, the process has gone quite smoothly overall and we’ve now surpassed the halfway point. Generally, the tapes have little more than a date to identify them, but it’s been especially interesting to come across the tapes for significant historical events over the past 40+ years. These tapes in particular offered me a chance to reflect on some major cultural milestones I’ve witnessed, often through coverage by the NewsHour team. That said, it was also fun to come across the broadcast that aired on the day I was born, as well as the very first broadcast of The MacNeil/Lehrer NewsHour.
Thankfully, I haven’t been tackling the entire inventory alone. I need to offer a special thanks to Matthew Graylin, a desk assistant with the NewsHour who’s been tasked with assisting me with the work. Needless to say, conducting an archival inventory is well beyond the normal duties of a broadcast news assistant, but Matthew has dived in with gusto. We still have a few weeks together, so hopefully I can convert him into a future audiovisual archivist in that time.”
We have also selected a digitization vendor for the project and are looking to begin pilot tests for digitization within the next month. Meanwhile, the Library has instituted quality control procedures to ensure that all digitized files will be properly preserved for present and future generations.
We can’t wait to get started with digitization and look forward to making this monumental series accessible as part of the AAPB collection. In the meantime, we’re pleased to share this clip reel sampling of content that will be digitized, courtesy of NewsHour Productions.
The following is a guest post by Emily Halevy, Director of Media Management Sales at Crawford Media Services. In this blog post, Emily records her interview with Chip Stephenson, Crawford Project Manager, and David Braught, Crawford Logistics Coordinator. Crawford and the AAPB Project Team recently completed the American Archive of Public Broadcasting Digitization Project, funded by the Corporation for Public Broadcasting. Crawford’s role in the project was the coordination and digitization of approximately 40,000 hours of public broadcasting video and audio archival content, as well as the transcoding of approximately 20,000 born-digital files, contributed by more than 100 stations and organizations nationwide!
Now that the digitization is complete, the files will be preserved and made accessible as much as possible through the American Archive of Public Broadcasting, and the AAPB Project Team at WGBH and the Library of Congress is excited to begin working on these efforts. Continue reading below for an account of Crawford’s experience throughout the AAPB digitization project.
Happy New Year, Everyone! I’m delighted to be a guest blogger for the American Archive of Public Broadcasting, once again! As we come to the end of this migration project, I thought this time it would be fun to sit down with Chip Stephenson and David Braught and discuss some of the successes and challenges this project brought. It’s also a great time to reflect on the importance and value of the project as a whole.
Emily: What’s the first thing that comes to mind now that the project is over?
Chip: It’s over? What? We’ve been living it for over three years!
David: It’s hard to believe it’s over.
Chip: Well, it’s not quite over yet. We’re still wrapping- the engineers are finalizing data, project management is compiling spreadsheets and financials. But we’re almost there.
David: I’ve never worked on anything like this before- the logistics- everything.
Chip: Logistics of shipping, receiving, and accounting for all of the content. And then the amount of data, file configurations, bags, copying files for the individual stations. Over 125 different spreadsheets- audio, video, born digital, plus over 100 stations, which sometimes had multiple spreadsheets. It was more like 100 individual projects than one big project.
David: And every station had its own set of quirks to deal with.
Chip: Every station required multiple phone calls and emails to set things up. It’s an amazing project. The stations were all great to work with and they all had an amazing amount of work to do to make it happen. Some like New Jersey Network and University of Maryland had an incredible amount of content.
David: I’m sure the stations wanted to kill me with the number of emails about checking their files so we could delete them from our system.
Chip: Our engineers were amazing.
David: I can’t say enough good things about our engineers. Guy (Boyd) was able to adapt and push through data, JP (Lesperance) handling all of the born digital, Nathan (Lewis) re-transcoding every single proxy to meet the requirements for the Library of Congress, Herve (Bergeron) and Dr. Dave (Wolaver) switching out and repairing decks.
Chip: And don’t forget the thousands of tapes baked and repaired by Dr. Dave as well.
David: It really was a tremendous team effort.
Emily: We really do have a great team, don’t we? And we can’t leave out the migrators.
Chip: At the peak we had 3 audio migrators running 5 days a week, 24 hours a day. We had 5 video migrators digitizing content, with one pod running 5 days a week, 16 hours a day, and the other pod running 24 hours, 5 days a week. There were even many months running 7 days a week. There were also others just doing QC. And others handling born digital content, copying files into working storage, and then checking to be sure they worked and renaming and creation of the proxy file.
David: Haha! So what was the question again?
Emily: The question was “What’s the first thing that comes to mind now that’s it is over?”
David: Evidently everything! Haha!
Chip: You never understand the true complexity of the project until you look back and have time to reflect. Before the project even started, during a visit by Stephanie (Sapienza) and Caitlin (Hammer) from CPB, we were reviewing the process and we all started to realize how complex the overall project was going to be. Caitlin kept asking me, “How are you going to do this?” And my answer was “One station at a time.” Thinking about all of it at once was just overwhelming. So David and I sat down and thought about how we wanted to parse this project out. How do we want to think about this on a daily and weekly basis? So we came up with an operational spreadsheet, which then became two spreadsheets, which then became multiple spreadsheets. And there were times over the past year when we just took a deep breath and said, “Ok. 40 stations down, 60 to go.”
David: It was a constant balancing act. Nothing ended up being accurate in terms of tape counts. More audio, less video, double ¾”, which is more time consuming. We had to rearrange our thinking and the pods on a regular basis. And adjust accordingly.
Chip: But working with CPB, then the transfer of the host to WGBH went incredibly smooth. We had some discussions about what they thought and what we thought, but it was very easy moving through issues and problems as they came up.
David: And we always got great support from CPB and then WGBH.
Emily: What turned out to be the most challenging aspect of the project? (If you could name one thing.)
Chip: For me-
David: Oh! Born digital.
Chip: For me it was the born digital for a couple of reasons…
David: Well you take the issues we had with receiving the physical assets and multiply that times a million.
Emily: The born digital was one of the “orphan items” that wasn’t completely fleshed out when we got started.
Chip: We started the born digital about 8 months later than we’d hope and there were many more individual steps dealing with the stations and how they’d build their drive and name their files and create their spreadsheets. So we had to develop ways to review the file names and correct them to make them legal- spaces had to be replaced with underscores, no illegal characters, they all must have file extensions, etc. Then we had to combine GUIDs for the project with the individual station’s file name. When you do this with thousands and thousands and thousands of files, it becomes complex. And then we had to create proxy files for all of them. And the process you use to create a proxy of one file type might be different from another file type. And then all of the files needed to be QC’d and compared to the master file. Some stations, when they built their initial hard drives, had a large amount of bad files. Sometime up to 50% of the files were bad. And we had to give the stations time to rebuild. Remember the whole purpose of this project was to migrate, capture and acquire as many of these files as possible. Migrate as much as we could within the time frame we had to work with and that time frame was closing in on us.
Emily: Again- another area where we got great support from Casey and the American Archive team.
Were there any hurdles that turned out to be no big deal?
David: Just getting the content here.
Chip: In the beginning, logistics were slow. We were still trying to figure out the most efficient way to get stuff here.
David: And at the start the stations didn’t really know what they were getting into, but honestly, it went smooth.
Chip: We started to realize- let’s not worry about having too many tapes here, let’s worry about not having enough.
David: KQED for instance, they were ready to ship immediately. So we told Robert (Chehoski), “Alright, let’s bring it on!”
Chip: At one point, we had the equivalent of 65 pallets of assets in our crypt. And of course it was interesting shipping things from Alaska. But every single station helped us find a way to get their assets to us. And every single station, despite issues (time of year, reduction of staff, etc.) they all worked their butts off. They all worked really hard to pull, barcode, pack and ship their tapes to us and make this a success. Between dozens of Fed Ex shipments, three semi-truck runs across the country and an airline delivery, we managed to get everything here and under budget!
Emily: What did you learn from the project?
Chip: Efficiency. Efficiency. Efficiency. Rethink everything you do and realize there might be a better way to do something. And if it sounds like there might be, try it. When David and I sat down and put a plan together we realized quickly we were too rigid. We needed to be flexible. We had to find compromises throughout the project. There were many times we’d get off the phone with a station and say to each other, “How is this going to work?” We could not be afraid to come up with new solutions for the stations. We had to be receptive to their ideas, especially when it came to timing.
David: It didn’t do any good to stick to a timeline that wouldn’t work for them.
Chip: Initially, our idea was to do all the beta tapes together, then all the DVCPro tapes together, but we ended up digitizing several formats simultaneously.
David: Sometimes even 6 video tape formats simultaneously.
Chip: We had a few stations that had only one or two formats, but most of them had a little of everything.
David: Halfway through the project we realized we were dealing with 20 stations at one time- shipping tapes, migrating, moving data, shipping delivery drives, bagging and backing up file data, literally tracking upwards of 30 stations in a given period.
Chip: So being as flexible as possible was important, because no matter how well you thought you had it figured out, it changed on you. And, honestly, at first we fought it, but then we realized that it just wasn’t going to work. So stop fighting it. We had to maintain the flow of tapes required in order to meet the deadline, and being rigid was not going to get us there.
David: I don’t know if a day went by without asking Dr. Dave to switch out tape decks to accommodate our revised workflows.
Emily: What was your favorite “found” item from the project?
David: For me, it was the famous Akira Kurosawa footage. One of our migrators found that the tape label didn’t match the content. It was labeled as a cooking show, but turned out to be an interview with Kurosawa and George Lucas and Francis Ford Coppola. I was like, “Give me that tape!” It turned out to be a program that was thought lost for many years at the station.
Chip: For me, at one point it was all hands on deck, so I had to QC several hundred files. The content just happened to be all the history of New York City and Boston and The Revolutionary War. WNET had a whole series on the history of Manhattan dating back to the revolution. Growing up in that area, I knew a lot of the city’s history, but I never really knew the intricate history of Manhattan and the Bronx and Queens. I didn’t know that Wall Street really was a wall. I learned there’s a fence in Bowling Green Park, which still exists to this day, that was erected in 1770 to protect a statue of George III. The history in this collection is amazing. Meanwhile, I was supposed to be spending 2-3 minutes QC’ing these files and 20 minutes later I had to stop myself and get back to work!
David: That happened all the time!
Chip: The programming is so great! From arts and symphonies to theatricals, history- everything you can think of from all across the country.
Emily: Hence the “American Archive” project!
Chip: Now that the project is coming to an end, I’m just dealing with the data and the files. We did massive shipments out in October and November. It was amazing. The last truck run went up in first week of December. Right now we’re just pulling the little tidbits and reviewing everything and making sure we crossed all of our Ts and dotted all of our Is. We’re shipping out LTO tapes to the Library of Congress. And I’m a little sad it’s come to an end. On the other side, it’s a great sense of accomplishment. A year of planning and discussions. Two years of migration. Then changing all of the planning several times throughout. It all comes back to flexibility. Understanding you can’t be rigid.
By Emily Halevy, Director of Media Management Sales at Crawford Media Services
Hi everyone! My name is Emily Halevy. I’m the Director of Media Management Sales at Crawford Media Services. Hopefully by now, stations have been able to work with Chip and David- our fabulous project managers- and are well on their way to receiving their digitized content.
I want to take a moment to first say how much this project means to me. Growing up, I was an army brat and moved nearly every year of my childhood, sometimes even twice a year, until I hit 10 years old. My sister and I figured out that by the time we’d moved out of our parent’s house we’d moved a total of 24 times. I say that because there weren’t many constants in my life … just my sister, my parents, and PBS. I used to lay my blanket out in the living room floor, sit on it with my stuffed animals pretending it was a magic carpet and watch Sesame Street, Mr. Rogers, 3-2-1 Contact and all the other children’s programming for hours on end. No matter where we lived, no matter where life took us, I always had my blanket and PBS. I feel like in some way I’m now helping to preserve this programming much in the same way it helped me preserve some sense of stability throughout my childhood. To all of you who helped those great programs find their way into my home and my life, I thank you.
Enough about me! Let’s talk about this project!
The task as outlined was to digitize 35,000 hours of audio and video content across 55,000 tapes, and transcode another 5,000 hours of born digital content from approximately 100 stations. Easy enough, right?! Well, our first head scratching, “how are we gonna do this” moment came when we realized that we would actually need to hold the majority of this content simultaneously. Fifty-eight pallets of tapes and hundreds of additional boxes to be exact. So, we allocated some of our space to creating a secure crypt with temperature control and FM-200 fire suppression.
And then we thought, hmm … how is everyone going to barcode their materials consistently, so that when they arrive there is no issue with scanning them? Well, it turned out the easiest solution was for us to print the barcodes and ship them out to all of the stations.
Then we realized, huh … while this project is one project, it’s actually more like 100 different projects with clients all over the country. Even in Guam and Alaska. And about Alaska … Unalaska in the Aleutian Islands to be specific … a truck run was impossible. We couldn’t do a Fed Ex or UPS run. So, our solution was to have the station book their tapes as luggage on Alaska Airlines, which just so happens to fly into Atlanta. As for our other stations, where possible, Chip was able to coordinate shipping between stations, using 53 foot pharmaceutical, climate controlled trucks, instead of overnight carriers. We project this logistical feat has saved the project approximately $85,000 in shipping costs, which will in turn be used to digitize more media. Yay!
Now for the files … three for video tapes, two for audio tapes and one transcode for born digital. And then there’s the BagIt container … each source tape yields up to 27 objects including the media essence files, closed caption files, SAMMA migration log, technical metadata files, checksums and so on. That’s nearly 1.5M pieces of information generated and tracked throughout the project!
Along the way, we’ve uncovered a few priceless gems, including Robert Frost reading a selection of his poems from WFCR, a Frank Zappa interview from KGNU, an Ayn Rand speech from WFCR, and film studies major and movie buff David Braught’s favorite: three tapes from KQED that were actually labeled as “Over Easy” programs. These three turned out to be interviews with film director Akira Kurosawa and a tribute to Japanese Cinema, which included interviews with Kurosawa, Coppola and Lucas. These tapes were thought to be lost. No longer thanks to the American Archive of Public Broadcasting!
Here are some other little factoids:
Tapes are being digitized 24 hours a day, five days a week and even some weekends to stay on schedule.
Thousands of ¼” reel-to-reel audio tapes and ¾” Umatics have been baked.
The project will result in over 1 Petabyte of new data, 2 Petabytes with the copies.
We are just starting to tackle born digital. Our original data estimate for born digital was in the neighborhood of 6 TB of data. We now anticipate handling over 33,000 files, which will result in around 280 Terabytes of data.
To date, we’ve written over 1,000 LTO-5 data tapes.
We have thoroughly enjoyed working with all of the stations over the past year and a half. As we wind down this phase of the project over the next few months, we hope that the American Archive of Public Broadcasting continues to grow into what surely will become one of the most educational and culturally diverse archives in the country.