In January 2016, the Council on Library and Information Resources awarded WGBH, the Library of Congress, WETA, and NewsHour Productions, LLC a grant to digitize, preserve, and make publicly accessible on the AAPB website 32 years of NewsHour predecessor programs, from October 1975 to December 2007, that currently exist on obsolete analog formats. Described by co-creator Robert MacNeil as “a place where the news is allowed to breathe, where we can calmly, intelligently look at what has happened, what it means and why it is important,” the NewsHour has consistently provided a forum for newsmakers and experts in many fields to present their views at length in a format intended to achieve clarity and balance, rather than brevity and ratings. A Gallup Poll found the NewsHour America’s “most believed” program. We are honored to preserve this monumental series and include it in AAPB.
Last week, our contract archivist Alexander (AJ) Lawrence completed the inventory of 7,320 NewsHour tapes stored in 523 boxes located in WETA’s storage units in Arlington, Virginia, comprising the bulk of the collection. (Additional content is located at two other locations.)
“I was so excited to receive Casey’s initial email asking about my interest in the NewsHour project. I’ve been a life long watcher of the program and the chance to be involved in the preservation of such a valuable resource for historical research seemed like a wonderful opportunity.
The process of inventorying the entire collection seemed pretty daunting on my first day when I got my first in-person look at the storage units housing the estimated 7,500 tapes. However, the process has gone quite smoothly overall and we’ve now surpassed the halfway point. Generally, the tapes have little more than a date to identify them, but it’s been especially interesting to come across the tapes for significant historical events over the past 40+ years. These tapes in particular offered me a chance to reflect on some major cultural milestones I’ve witnessed, often through coverage by the NewsHour team. That said, it was also fun to come across the broadcast that aired on the day I was born, as well as the very first broadcast of The MacNeil/Lehrer NewsHour.
Thankfully, I haven’t been tackling the entire inventory alone. I need to offer a special thanks to Matthew Graylin, a desk assistant with the NewsHour who’s been tasked with assisting me with the work. Needless to say, conducting an archival inventory is well beyond the normal duties of a broadcast news assistant, but Matthew has dived in with gusto. We still have a few weeks together, so hopefully I can convert him into a future audiovisual archivist in that time.”
We have also selected a digitization vendor for the project and are looking to begin pilot tests for digitization within the next month. Meanwhile, the Library has instituted quality control procedures to ensure that all digitized files will be properly preserved for present and future generations.
We can’t wait to get started with digitization and look forward to making this monumental series accessible as part of the AAPB collection. In the meantime, we’re pleased to share this clip reel sampling of content that will be digitized, courtesy of NewsHour Productions.
The following is a guest post by Lily Troia, AAPB Cataloging Intern.
Hi. My name is Lily Troia and I am a public media junkie. I will admit, it is a bit of a problem. The first thing I do when traveling to any new town is find the local radio affiliate for my fix of daily news. I frequently cry along to This American Life, sit in my parked car laughing hysterically to Wait, Wait Don’t Tell Me’s antics, and I am certain Antiques Roadshow curtailed more than one family fight over the remote during my childhood.
I blame my mom and dad, ultimately, for a northern Wisconsin upbringing entrenched in public media. In the expanse of the rural Northwoods, commercial radio and static occupied most of the airwaves, with one local NPR-affiliate, WOJB, broadcast off a nearby Ojibwe reservation, serving as a beacon of independent thought and music for our small community. Cable was a luxury not yet accessible to remote country residents in the 1980s, and since my back-to-the-lander family couldn’t entertain the idea of a satellite dish, our viewing options included only NBC and PBS, with the occasional blurry-screened ABC when snowmobile traffic was reduced (seriously). Thus, I was the kid carrying my parents’ Wisconsin Public Television member tote bag to the summer pool, raised on a diet of Sesame Street, Square One, and 3-2-1 Contact in an era of Nickelodeon.
Decades later I found myself collaborating professionally with Minnesota Public Radio and Twin Cities Public Television on a regular basis. A classical music performer throughout my youth, I studied ethnomusicology at Northwestern University, yet felt disconnected from the cloistered world of academia, and eventually turned my musical interests to the business world. While running my own music management firm in Minneapolis, I produced numerous live and recorded projects, and frequently contributed content to MPR as a music and arts culture commentator. These experiences further solidified my lifelong love of and dedication to public media. Now back in school, pursuing a Masters in Library and Information Science at Simmons College, I have the unique opportunity to apply my music and humanities background in the arena of preservation and access, synthesizing my passion for scholarship and public service.
Life occasionally delivers instances of perfect serendipity; joining the American Archive of Public Broadcasting feels like such an instance. It truly is a professional dream to work on such a socially vital, dynamic project. Already in my brief time cataloging archival content from member stations across the country, I have learned about an influx of Mexican immigrants to Wyoming in the 1990s, listened to a decades-old KUT broadcast featuring Eliza Gilkyson, and discovered that Oregon hipster culture began long before Portlandia, in the form of a 1985 municipally-sponsored beard-growing contest. In a time when public media is forced to fight for basic funding–my Wisconsin stations are currently facing potential demise–ensuring the longevity and availability of this immeasurably valuable, cultural material has never been more important. What an inspiration to be at an organization like WGBH, committed to protecting and providing access to these historical gems that document our diverse American stories.
The following is a guest post by Ingrid Ockert, a doctoral student at Princeton University studying the history of science. Currently, she’s gathering material for her dissertation, which will be on the history of science educational television. Follow her on twitter @i_rockt.
Back in January, while I was furiously planning my dissertation travel for the upcoming semester, I needed to compile a list of archives. Immediately. I wanted to plan a series of trips to archives holding television production materials, but I didn’t know where to start my search. My only option was to cold-call archives. I hoped that some friendly archivist would take pity on a poor graduate student and let me into their collection.
My timing could not have been more fortuitous; one of the first people I emailed was Casey Davis, the amazing project manager at the American Archive of Public Broadcasting. Casey exemplifies the AAPB; she’s a friendly librarian dedicated to opening access to public broadcasting materials and to connecting researchers with archivists. At the time, the American Archive of Public Broadcasting had a basic webpage (they have since launched a beautiful website). Casey generously helped me get into touch with the archivists at WGBH. She also suggested other contacts for me within the AAPB network.
Trip to Boston
A few months later, I was on a train headed to Boston to visit WGBH. The glistening glass building that houses WGBH instantly wowed me. Keith Luf, the head of the archives, met me in the foyer of the building. He graciously gave me a tour of the building, allowing me to glimpse the studios and the offices of NOVA, Masterpiece, Antiques Roadshow, and American Experience. For a longtime fan of PBS like myself, I was thrilled at the chance to walk along the same halls as the people who create these amazing programs.
I researched the history of one of these programs, NOVA, for the next two weeks. Premiering in 1974, NOVA is the longest running science television program in the United States. Luckily for me, WGBH has files related to the history of the program that stretch back to the earliest discussions of the program in 1973.
One of the highlights of my travel was simply poring over files upon files of material. Or taking tea breaks from my researching and gazing out at Boston’s skyline. But just as valuable were my chats with Keith about the history of WGBH, Leah Weisse about the management of the collection, and Casey about the future of the AAPB. I am so grateful for the amount of time that they took talking with me!
Best hidden gem at WGBH? I spent a lot of time hanging out in the ‘viewing room’ and watching old episodes of NOVA. This room was a goldmine for researchers like myself – it had a working U-matic cassette player! And the best part? Leaning against the back wall was a vintage ‘Edward Gorey’, a life size sketch of a bat hanging from a bird perch by the artist Edward Gorey for the television program “Mystery!” It was just one of the many interesting artifacts in the WGBH collection.
Trip to College Park
In May, I was on another train, this time bound to College Park, Maryland. This time, I visited another archive that participates in the AAPB, the National Public Broadcasting Archives at the University of Maryland. Chuck Howell was my main contact (he specializes in the history of mass media). The National Public Broadcasting Archives is home to many interesting collections related to the history of television, including the papers for National Public Radio and the Corporation for Public Broadcasting. I was there to peruse correspondence, memos, and publicity held within the Children’s Television Workshop’s papers. What does the Children’s Television Workshop have to do with science? In the 1970s, the creators of Sesame Street and the Electric Company wanted to create a daily science series for children. The resulting program was 3-2-1 Contact!
I spent less than a week at the University of Maryland, but I was just as impressed with the collection as I had been with WGBH. Michael Henry, another archivist specializing inbroadcast journalism, greatly helped to familiarize me with the collection. Like WGBH, I discovered that I learned a lot simply by talking to the archivists. On my last day, Michael showed me the Broadcasting Reading Room on the library’s second floor. The Reading Room was an impressive space, lined with a several dramatic murals from the 1940s, each extolling the virtues of an age of radio and television. Radios, record players, and televisions – each restored to impeccable condition – lined the walls. Wandering up to each item and peering at it, I felt like a kid in a candy shop! One of my favorite artifacts in this collection was a German entertainment system from the 1950s that included a recordable tape deck and turntable. Additionally, it was housed in a beautiful wooden cabinet – perfect for what must have been a top-of-the line luxury item.
I’ve been really lucky to be able to research in such wonderful collections. I’m grateful that nonprofit and government institutions like WGBH and the University of Maryland are equally committed to providing open access to historians and researchers; I applaud the AAPB on their mission to heighten the public’s awareness of historic public media. Hopefully, through my own research, I can also contribute to a greater cultural appreciation of the history of public broadcasting.
The following is a guest post by Rebecca Fraimow, National Digital Stewardship Resident at WGBH and the AAPB.
As the National Digital Stewardship Resident with WGBH and the AAPB, I’ve backed up a lot of drives, designed a lot of workflow diagrams, and written up a lot of documentation, but for my final deliverable for the residency, I got to do something with a slightly broader focus: create a webinar that focused on digital preservation concepts through the lens of the unique needs of a public broadcasting organization.
Although I’ve spent most of the past year in a public media context, WGBH is pretty unique among public media organizations: we have a strong archival department, and a dedicated budget for preservation. That gives us a lot of opportunities to invest in tools and techniques that most public media organizations aren’t going to have. As a result, creating a webinar about digital preservation best practices from a PB perspective is not just as simple as saying ‘here’s what we do and why we do it’ – while it would be great if all stations had the same level of resources, just getting that level of buy-in is something that most archivally-minded station employees have to fight really hard to make a case for.
Therefore, instead of designing the webinar based around our workflows at WGBH, I sent out an open call for topics to see what the audience of (primarily AAPB) stations really wanted to hear about. I got a wide range of responses:
– where to start when creating a digital library
– best practices for migrating videotape to digital files
– how to manage the volume with a small staff
– tools for embedding metadata into audio and video files
– systems for small organizations with little IT support
– integrity checking, video file standards, naming conventions
– getting producers onboard from the get-go
– how to go back into the archives where proper documentation doesn’t exist
– how to properly use the PBCore field called instantiationStandard
Obviously, I don’t have the answer to all these questions (to be honest, instantiationStandard is kind of a confusing field) and, of course, for many of them, there is no right answer — as I can tell you from the experiences of my entire NDSR cohort, even organizations with huge dedicated preservation departments are still trying to figure out the solutions that make the most sense for them. Next year, the AAPB will be sending a new crop of NDSR residents into public media stations to help grapple with some of these issues, but before finding answers, the first step is figuring out the right questions to ask. The webinar is designed to provide a guide to some of those questions, and an overview of the issues to consider when making a case for digital preservation.
You can view the full webinar below (click on the title to open in a larger screen):
The following is a guest post by Jessica Brandt, PhD candidate at Drew University.
At the beginning of April, I had the pleasure of being one of the first researchers to visit the American Archive of Public Broadcasting at the Library of Congress. I am a doctoral student researching non-commercial radio during the Cold War, and I had happened across the AAPB blog while on a quest for records relating to the 1981 production of Star Wars as a radio play for NPR. Shortly after submitting the “Contact Us” form, I received a reply from Sadie Roosa with a few possible assets for me to look into.
At the time, the digitized media wasn’t available to stream through the website yet, so I had to arrange a site visit. With the help of Casey Davis at WGBH and Alan Gevinson at the Library of Congress, I was able to set up a visit in no time. Once there, I found the interface easy to navigate and the quality of the audio was excellent. I was also able to poke around more of the archive, including assets that had not yet been digitized, and I left with more leads to pursue.
Every step along the way, I found the people involved with the AAPB to be responsive and helpful, eager to make my research experience successful. And that brings me to a broader observation about the world of public media — in exploring the story behind this alternate Star Wars, I’ve had occasion to contact public radio stations across the country, and without exception, every one has responded as fully as possible. In each case, if they couldn’t offer any archives or records, they made suggestions of other places to look, and offered to make introductions where necessary. Very few fields have such a way of feeling so tight-knit and collegial.
I’m sure this is preaching to the choir, but I can’t overstate the value of digitizing these public media assets. Organizations like the Paley Center have done a great job with commercial television, in particular, but so much of the product of public radio and television stations languishes in the limited storage facilities of those stations, scattered around the country. That’s if it has been preserved at all. The nature of funding makes it unlikely that any but the largest stations in major markets will have any staff dedicated to managing their archives locally. So the service that the AAPB has to offer is opening a new world for people like me, who spend our time studying the public airwaves of the past. Take a look (or a listen) for yourselves — untold treasures await!
The following blog post was written by Margaret Bresnahan from Minnesota Public Radio.
I’m writing to share the next installment in the American Archive success story. Thanks to the cataloging done during the American Archive inventory project, Minnesota Public Radio was able to identify about 900 MPR News stories covering the Hmong settlement in Minnesota, with recordings dating from 1975 to present day. This discovery led to a collaboration with the Minnesota Historical Society (MNHS), informing an exhibition/celebration that launches this month (March 2015), and it led to new broadcasts from the MPR News Room.
Marking the 40th anniversary of the first large-scale arrival of Hmong people in Minnesota, MPR News recently launched a Hmong collection page and broadcast a few news stories–all using archive recordings to tell the story of Hmong-Minnesotans. Two of our main collaborators in the News Room plan on continuing the coverage throughout the year, bringing more archive recordings on air and online. This is a wonderful example of the power of access. The inventory made it clear that these recordings existed and enabled this great use of archive material to tell a contemporary, ongoing story.
Here are some links to the archive usage, and more are to come:
The following is a post by Karen Cariani, Director of the WGBH Media Library and Archives and Project Director for the American Archive of Public Broadcasting.
This past weekend a group of dedicated PBCore enthusiasts met prior to the Code4Lib conference in a suburban Portland, Oregon house for two days. It was a healthy mix of developers, archivists, and managers. The goal was to discuss how to move PBCore towards development of an RDF ontology. With the desire to fully utilize repositories like Fedora 4 and the desire to store data as RDF streams, users of PBCore were beginning to talk about building a PBCore ontology.
Before I continue, I want to sincerely thank everyone else who participated in the hackathon:
Jack Brighton, Illinois Public Media
Glenn Clatworthy, PBS
Laurence Cook, MetaCirque
Casey E. Davis, WGBH
Jean-Pierre Evain, EBU
Rebecca Fraimow, WGBH
Peggy Griesinger, Museum of Modern Art (MOMA)
Rebecca Guenther, New York University
Julie Louise Hardesty, Indiana University
Cliff Ingham, City of Bloomington
Andrew Myers, WGBH
Adam Wead, Penn State
PBCore is a metadata schema for audiovisual materials. Its original development in 2004 was funded by the Corporation for Public Broadcasting, with a goal of creating a metadata standard for public broadcasters to share information about their video and audio assets within and among public media stations. Since its conception, PBCore has been adopted by a growing number of audiovisual archives and organizations that needed a way to describe their archival audiovisual collections. The schema has been reviewed multiple times and is currently in further development via the American Archive of Public Broadcasting and the Association of Moving Image Archivists (AMIA) PBCore Advisory Subcommittee.
A number of PBCore users contribute to and are part of the Project Hydra community, a collaborative, open source effort to build digital repository software solutions at archives institutions. Hydra is built on a framework that uses Fedora Commons as the repository for storing metadata. Many users are seeking to update their Fedora repositories to the latest version (Fedora 4), which provides a great opportunity to develop an RDF data structure. If PBCore had an RDF ontology, it would be easier for PBCore users to take full advantage of Fedora 4 capabilities in managing data and encourage adoption of Fedora 4. In addition, managing data in RDF allows much more flexibility for data relationships and linking data to other repositories.
Knowing how much work building an ontology can be, the hope was to build upon existing work that is already well established. In particular, the EBUCore ontology is quite developed and established. EBUCore was developed from the need of European broadcasting community to express audiovisual materials in common data structures to allow for easier sharing. There seemed no need to develop something that already exists and does much of what we need it to do. In fact, the uses of EBUCore and PBCore are so similar we began to wonder why the two exist separately and we are not joining forces to develop one standard. Certainly in this day and age of limited resources and time, collaborating is more productive than working at odds with each on different but similar paths.
We were graced with the presence of Jean-Pierre Evain from the European Broadcasting Union (EBU) He clearly showed us what EBUCore did, how it was so similar to PBCore, and how far they had gotten with an RDF ontology. The gap between EBUCore and PBCore turned out to be not so far apart. Perhaps bridging that gap was easier than building a brand new ontology based on PBCore. Within a day, many of the issues had been identified, or it felt doable in a reasonable time frame with a solid workplan in place.
The group quickly came to the decision to not start from scratch by building a PBCore ontology, but try to build a bridge between PBCore and EBUCore so PBCore adopters could use the EBUCore ontology. We even talked about a new name for this new collaborative schema.
However, it was fully recognized that current PBCore users would need a path for migration, and some would not be interested in using an RDF ontology and therefore migrating. So how do we manage this community of diverse needs? There is certainly more work to do within the PBCore community around communication and education. And the PBCore community should speak up about this idea.
I am always amazed at how productive it is to gather together, face to face, dedicated people. If not for setting aside the weekend to focus on this issue, the work and decision would have lagged for months through bi-weekly one-hour phone calls and virtual meetings. The group more or less self organized and stayed focused with great guidance from Casey Davis. By the end, most everyone was in github making XSLT mappings from PBCore to EBUCore, as we completed a gap analysis (still in progress). We finished the day with a plan to move forward and a group dinner.
The PBCore Schema Team is working on an updated version of PBCore (PBCore 2.1), the changes of which will consist of minor tweaks and bug fixes, and is expected to be released in March 2015. The group thought that this work should continue, until 2.1 is released. At this point work on PBCore XML schema should freeze and efforts will go into aligning with EBUCore – making sure elements can map across, that we all understand the mapping, and building tools to help with the mapping. The PBCore community needs to comment about this direction. Does it make sense? What are your concerns? The group that met is by no means the end of the discussion.
In the end, it was worth it. For the cost of some snacks, and a home made pasta dinner, we had 11 people from across the country working on a solution, come to a consensus, and enjoy the camaraderie. I really want to thank everyone who participated and took the time to join us. It was a weekend after all.
The hackathon notes are documented here: http://wiki.code4lib.org/PBCore_RDF_Hackathon
The following is a guest post by Emily Halevy, Director of Media Management Sales at Crawford Media Services. In this blog post, Emily records her interview with Chip Stephenson, Crawford Project Manager, and David Braught, Crawford Logistics Coordinator. Crawford and the AAPB Project Team recently completed the American Archive of Public Broadcasting Digitization Project, funded by the Corporation for Public Broadcasting. Crawford’s role in the project was the coordination and digitization of approximately 40,000 hours of public broadcasting video and audio archival content, as well as the transcoding of approximately 20,000 born-digital files, contributed by more than 100 stations and organizations nationwide!
Now that the digitization is complete, the files will be preserved and made accessible as much as possible through the American Archive of Public Broadcasting, and the AAPB Project Team at WGBH and the Library of Congress is excited to begin working on these efforts. Continue reading below for an account of Crawford’s experience throughout the AAPB digitization project.
Happy New Year, Everyone! I’m delighted to be a guest blogger for the American Archive of Public Broadcasting, once again! As we come to the end of this migration project, I thought this time it would be fun to sit down with Chip Stephenson and David Braught and discuss some of the successes and challenges this project brought. It’s also a great time to reflect on the importance and value of the project as a whole.
Emily: What’s the first thing that comes to mind now that the project is over?
Chip: It’s over? What? We’ve been living it for over three years!
David: It’s hard to believe it’s over.
Chip: Well, it’s not quite over yet. We’re still wrapping- the engineers are finalizing data, project management is compiling spreadsheets and financials. But we’re almost there.
David: I’ve never worked on anything like this before- the logistics- everything.
Chip: Logistics of shipping, receiving, and accounting for all of the content. And then the amount of data, file configurations, bags, copying files for the individual stations. Over 125 different spreadsheets- audio, video, born digital, plus over 100 stations, which sometimes had multiple spreadsheets. It was more like 100 individual projects than one big project.
David: And every station had its own set of quirks to deal with.
Chip: Every station required multiple phone calls and emails to set things up. It’s an amazing project. The stations were all great to work with and they all had an amazing amount of work to do to make it happen. Some like New Jersey Network and University of Maryland had an incredible amount of content.
David: I’m sure the stations wanted to kill me with the number of emails about checking their files so we could delete them from our system.
Chip: Our engineers were amazing.
David: I can’t say enough good things about our engineers. Guy (Boyd) was able to adapt and push through data, JP (Lesperance) handling all of the born digital, Nathan (Lewis) re-transcoding every single proxy to meet the requirements for the Library of Congress, Herve (Bergeron) and Dr. Dave (Wolaver) switching out and repairing decks.
Chip: And don’t forget the thousands of tapes baked and repaired by Dr. Dave as well.
David: It really was a tremendous team effort.
Emily: We really do have a great team, don’t we? And we can’t leave out the migrators.
Chip: At the peak we had 3 audio migrators running 5 days a week, 24 hours a day. We had 5 video migrators digitizing content, with one pod running 5 days a week, 16 hours a day, and the other pod running 24 hours, 5 days a week. There were even many months running 7 days a week. There were also others just doing QC. And others handling born digital content, copying files into working storage, and then checking to be sure they worked and renaming and creation of the proxy file.
David: Haha! So what was the question again?
Emily: The question was “What’s the first thing that comes to mind now that’s it is over?”
David: Evidently everything! Haha!
Chip: You never understand the true complexity of the project until you look back and have time to reflect. Before the project even started, during a visit by Stephanie (Sapienza) and Caitlin (Hammer) from CPB, we were reviewing the process and we all started to realize how complex the overall project was going to be. Caitlin kept asking me, “How are you going to do this?” And my answer was “One station at a time.” Thinking about all of it at once was just overwhelming. So David and I sat down and thought about how we wanted to parse this project out. How do we want to think about this on a daily and weekly basis? So we came up with an operational spreadsheet, which then became two spreadsheets, which then became multiple spreadsheets. And there were times over the past year when we just took a deep breath and said, “Ok. 40 stations down, 60 to go.”
David: It was a constant balancing act. Nothing ended up being accurate in terms of tape counts. More audio, less video, double ¾”, which is more time consuming. We had to rearrange our thinking and the pods on a regular basis. And adjust accordingly.
Chip: But working with CPB, then the transfer of the host to WGBH went incredibly smooth. We had some discussions about what they thought and what we thought, but it was very easy moving through issues and problems as they came up.
David: And we always got great support from CPB and then WGBH.
Emily: What turned out to be the most challenging aspect of the project? (If you could name one thing.)
Chip: For me-
David: Oh! Born digital.
Chip: For me it was the born digital for a couple of reasons…
David: Well you take the issues we had with receiving the physical assets and multiply that times a million.
Emily: The born digital was one of the “orphan items” that wasn’t completely fleshed out when we got started.
Chip: We started the born digital about 8 months later than we’d hope and there were many more individual steps dealing with the stations and how they’d build their drive and name their files and create their spreadsheets. So we had to develop ways to review the file names and correct them to make them legal- spaces had to be replaced with underscores, no illegal characters, they all must have file extensions, etc. Then we had to combine GUIDs for the project with the individual station’s file name. When you do this with thousands and thousands and thousands of files, it becomes complex. And then we had to create proxy files for all of them. And the process you use to create a proxy of one file type might be different from another file type. And then all of the files needed to be QC’d and compared to the master file. Some stations, when they built their initial hard drives, had a large amount of bad files. Sometime up to 50% of the files were bad. And we had to give the stations time to rebuild. Remember the whole purpose of this project was to migrate, capture and acquire as many of these files as possible. Migrate as much as we could within the time frame we had to work with and that time frame was closing in on us.
Emily: Again- another area where we got great support from Casey and the American Archive team.
Were there any hurdles that turned out to be no big deal?
David: Just getting the content here.
Chip: In the beginning, logistics were slow. We were still trying to figure out the most efficient way to get stuff here.
David: And at the start the stations didn’t really know what they were getting into, but honestly, it went smooth.
Chip: We started to realize- let’s not worry about having too many tapes here, let’s worry about not having enough.
David: KQED for instance, they were ready to ship immediately. So we told Robert (Chehoski), “Alright, let’s bring it on!”
Chip: At one point, we had the equivalent of 65 pallets of assets in our crypt. And of course it was interesting shipping things from Alaska. But every single station helped us find a way to get their assets to us. And every single station, despite issues (time of year, reduction of staff, etc.) they all worked their butts off. They all worked really hard to pull, barcode, pack and ship their tapes to us and make this a success. Between dozens of Fed Ex shipments, three semi-truck runs across the country and an airline delivery, we managed to get everything here and under budget!
Emily: What did you learn from the project?
Chip: Efficiency. Efficiency. Efficiency. Rethink everything you do and realize there might be a better way to do something. And if it sounds like there might be, try it. When David and I sat down and put a plan together we realized quickly we were too rigid. We needed to be flexible. We had to find compromises throughout the project. There were many times we’d get off the phone with a station and say to each other, “How is this going to work?” We could not be afraid to come up with new solutions for the stations. We had to be receptive to their ideas, especially when it came to timing.
David: It didn’t do any good to stick to a timeline that wouldn’t work for them.
Chip: Initially, our idea was to do all the beta tapes together, then all the DVCPro tapes together, but we ended up digitizing several formats simultaneously.
David: Sometimes even 6 video tape formats simultaneously.
Chip: We had a few stations that had only one or two formats, but most of them had a little of everything.
David: Halfway through the project we realized we were dealing with 20 stations at one time- shipping tapes, migrating, moving data, shipping delivery drives, bagging and backing up file data, literally tracking upwards of 30 stations in a given period.
Chip: So being as flexible as possible was important, because no matter how well you thought you had it figured out, it changed on you. And, honestly, at first we fought it, but then we realized that it just wasn’t going to work. So stop fighting it. We had to maintain the flow of tapes required in order to meet the deadline, and being rigid was not going to get us there.
David: I don’t know if a day went by without asking Dr. Dave to switch out tape decks to accommodate our revised workflows.
Emily: What was your favorite “found” item from the project?
David: For me, it was the famous Akira Kurosawa footage. One of our migrators found that the tape label didn’t match the content. It was labeled as a cooking show, but turned out to be an interview with Kurosawa and George Lucas and Francis Ford Coppola. I was like, “Give me that tape!” It turned out to be a program that was thought lost for many years at the station.
Chip: For me, at one point it was all hands on deck, so I had to QC several hundred files. The content just happened to be all the history of New York City and Boston and The Revolutionary War. WNET had a whole series on the history of Manhattan dating back to the revolution. Growing up in that area, I knew a lot of the city’s history, but I never really knew the intricate history of Manhattan and the Bronx and Queens. I didn’t know that Wall Street really was a wall. I learned there’s a fence in Bowling Green Park, which still exists to this day, that was erected in 1770 to protect a statue of George III. The history in this collection is amazing. Meanwhile, I was supposed to be spending 2-3 minutes QC’ing these files and 20 minutes later I had to stop myself and get back to work!
David: That happened all the time!
Chip: The programming is so great! From arts and symphonies to theatricals, history- everything you can think of from all across the country.
Emily: Hence the “American Archive” project!
Chip: Now that the project is coming to an end, I’m just dealing with the data and the files. We did massive shipments out in October and November. It was amazing. The last truck run went up in first week of December. Right now we’re just pulling the little tidbits and reviewing everything and making sure we crossed all of our Ts and dotted all of our Is. We’re shipping out LTO tapes to the Library of Congress. And I’m a little sad it’s come to an end. On the other side, it’s a great sense of accomplishment. A year of planning and discussions. Two years of migration. Then changing all of the planning several times throughout. It all comes back to flexibility. Understanding you can’t be rigid.
The following is a guest post by Producer/Writer Elizabeth Deane.
Every Picture Tells a Story had its premiere in the Great Hall of the Library of Congress in February, 2014, at the launch of the American Archive of Public Broadcasting (AAPB).
Sound and images from six decades of public media filled that stately space, giving the audience a six-minute tip-of-the-iceberg glimpse at some of the treasures that will be part of the AAPB collection.
We’d made the film drawing mostly on media that had already been digitized by the AAPB — the first wave of stories that I had come to think of as locked away, imprisoned on ¾” videotape, VHS and Betacam tapes, ¼” audio tape, DVCPRO and more —the dreaded “obsolete formats” that can be such a barrier to access.
Few stations maintain playback machines for them any more, and the few in existence can be tricky to maintain and possibly risky to use; if they’re not working properly they can damage the footage, sometimes irrevocably.
Worse, as Every Picture points out, old videotapes can deteriorate, and the images are lost forever.
I found it heartening to know that even as the launch ceremony unfolded on that wintery day in Washington, trucks containing thousands of video and audio tapes from public stations all over the country were rolling towards Atlanta, where Crawford Media Services would create multiple digital versions of each tape — television and radio shows, raw footage, even outtakes and experiments — in science, natural history, drama, children’s programs, arts, education, history, local lore, news, and more — the entire broad and inspiring realm of public media programming.
Master copies will be kept safe for future generations at the Library of Congress, with access copies going to WGBH to be added to the growing AAPB database, and made available on a forthcoming website, when rights permit, to a national audience – researchers and scholars, filmmakers, educators, students, and kids of all ages. In addition, all of the digitized materials will be made available to researchers who visit WGBH and the Library’s Moving Image and Recorded Sound Research Centers.
The film is a celebration of the American Archive of Public Broadcasting at its moment of birth, just beginning to tap into its vast collection. “As of this posting close to a year later, all of it has been digitized,” says AAPB Project Manager Casey Davis. “But much of it came with only a brief description. Now we have the pleasure of watching and listening, so we can improve our records and make this remarkable collection more discoverable for all.”
Watch for the new AAPB website, set to launch with the first batch of records in April 2015, with video and audio to follow in October.
In the fall of 2012 AVPreserve received a data dump of the 2.4 million records that had been generated as part of the American Archive Content Inventory Project (AACIP) managed by WGBH Media Library & Archives. There’s a reason it’s referred to as a dump — parsing, mapping, and making that data useable or accessible is complex and messy, no matter how clean or well packaged it is at the point of transfer. Now why were we so lucky to get this dump? AVPreserve had been contracted by the Corporation for Public Broadcasting to manage the inventory metadata during the first digitization phase of the American Archive of Public Broadcasting (AAPB) project, a task that primarily centered around the development of the Archival Management System (AMS). In short, the AMS was to be a web-based tool accessible by all AAPB stakeholders (stations, CPB, digitization vendor) that would:
– Send alerts to stations regarding their timelines for packing and shipping materials for digitization;
– Identify and track materials that had been selected for digitization;
– Provide an access point for viewing or listening to digitized content for contributing stations;
– Provide an access point for searching inventory records, editing or adding metadata, and performing cleanup or normalization of records;
– Provide reporting features to allow AAPB staff to track the progress of digitization according to region, number of hours, number of assets, radio versus television, format types, and more.
You could say the purpose of the AMS was to take all those thousands of records created across the country and make the data do what the digitization project needed it to do…Which was a lot.
Having all of those item level station records was fantastic (and kudos to the WGBH team and the stations for getting that massive task done), but if that data is not searchable and useable, there is little point to having it. Before we get too deep into AMS here, I’d like to step back and take a look at how we got to this point.
The AAPB has been years in the making (and the planning, and the contracting, and the planning), our involvement in the project dating back to at least 2010 when Senior Consultant Kara Van Malssen (just prior to joining AVPreserve) worked on the research and writing of a comprehensive plan for how the AAPB could be built, taking into consideration things such as what metadata to capture, how it could be captured, what the infrastructure could look like, and what specifications should be used for digitization.
Around this same time, AVPreserve was a contributing consultant on the development of PBCore 2.0. A metadata schema developed specifically for use by public broadcasting, PBCore is one of the few audiovisual-specific schemas around that incorporates both descriptive and technical metadata to a significant degree, but development and revisions on it had ended many years ago at version 1.2. As there were plans to use PBCore as the schema of the AAPB, there was an immediate need to fix long-standing issues with the structure and update it to better fit the new realities of media production and distribution. A group was put together to make quick fixes that could be released as version 1.3 — which then could also be used for inventory gathering — and work longer term to make more substantial revision to release as version 2.0.
With a revamped PBCore in place, WGBH was able to build templates for the stations to use in the CIP. As participants may recall, stations had the option of conducting their own inventories in-house, or they could apply to CPB to have a third party come onsite and perform the inventory. AVPreserve was also active in this phase, generating the inventory for WXXI in Rochester, NY, and then separate from the CIP, for the former NJN (New Jersey Network). NJN, the only public broadcasting station in New Jersey, was shut down by the State in 2011 (ostensibly for budgetary reasons) just before the inventories were going to kick off. AVPreserve was hired separately to perform an inventory of their 120,000 item collection left in the former studios in Trenton and, thanks to the AAPB staff, that inventory was then included in the AAPB database and digitization funding was made available.
Back to the AMS, however. From the get go, we had proposed developing the AMS as an open source application using an agile development process. We used the Scrum form of the Agile project management methodology — applicable to many areas but most widely adopted in software development — which takes an iterative approach to projects, the goal being to produce working, usable software at regular intervals, as opposed to the “waterfall” approach of doing all of the design, and then all of the backend, and then all of the front end and then all of the quality control, and then the release. In Agile Scrum speak the intervals are called sprints. A sprint, typically 2 to 4 weeks long, essentially consists of: a planning meeting to create and prioritize a list of tasks or features based on which ones are most critical to have completed, and identifying the number of those tasks that can reasonably be completed in the sprint; performing the agreed upon work; and then demoing the completed features for review and actual live implementation when tested and approved. Then the process starts all over again to produce the next set of features.
We were fortunate to have proposed an agile approach, because as it turned out the digitization would begin only a few months after our development began and then run in parallel during the term of our work. There were also many other moving parts, lots of unknowns, and we needed working, reliable software almost immediately. Under the agile process we were able to be flexible, prioritize and develop immediate needs (basic framework, station alerts, shipping and tracking functions, asset prioritization) and save for a later date those functions that would not be critical until after digitization had begun in earnest (digitized asset playback, reporting, metadata clean up).
We were also fortunate that AAPB Project Manager Stephanie Sapienza was a willing collaborator in the role of the Product Owner. Within the Agile Scrum methodology, the Product Owner makes the call on what functions are a priority, whether the functioning code at the end of the sprint meets their needs or not, and also decides when previously defined functions are no longer needed for development or if an unthought of function is needed. One of the benefits of agile is this flexibility in rethinking or reprioritizing projects as they grow without resources being wasted on unneeded functions or mistakes in what direction a project is going.
And that flexibility was very important during our 18 months building the AMS and managing the AAPB inventory data, because dealing with such a large data set and an unprecedented, ambitious project like the AAPB brought all kinds of unforeseen issues or prompted new ideas as the project grew. It can be hard enough to keep good internal controls on data entry, making sure the right fields are used for the right data and that terms are spelled or used consistently. WGBH had to plan and manage that type of thing for over 120 nationwide stations. They did an excellent job under the circumstances, but there was no true oversight during the inventory process. Data could only be reviewed after it was submitted, and then it was up to a Metadata Manager to enforce consistency. This type of normalization and cleanup (make sure all the dates are written the same way and didn’t get corrupted in Excel; make sure the value in the Format field is actually a format and is spelled in the approved way; make sure all the required fields are completed; etc.) generally takes several passes to do because there are varying levels of complexity in problems and solutions (for which there may not be resources to fix all of them) and because certain functions in the database system may require the data to be presented a certain way that was not anticipated prior. Both WGBH and AVPreserve spent a considerable amount of time performing cleanup and normalization of data in order to turn it into a consolidated data set representing usable information.
One example of a common problem area here are dates. In official PBCore the accepted format for dates is the ISO 8601 standard, which at its simplest is expressed as YYYY-MM-DD. If the field does not match that it is an invalid record. However, as many people who work with audiovisual materials are familiar with, tapes frequently do not list a date, or it may be incomplete information (“Spring 1998”, “7/15”, “April 10th”, etc.). In these cases, if not directed otherwise, inventory takers transcribed the date exactly as written, or used notation like “7/15/????”, or wrote some version of “Unknown”. Now in the case of something like formats, it’s fairly easy to normalize data that may include something like BetacamSP, Betacam SP, BetaSP, BetaCamSP, etc., because patterns in the text are identifiable and limited in variation. But in the case of dates where there is a mixture of letters and numbers and characters, separating punctuation, order of the date parts, and so on, it can be quite a mess.
On top of this, prior to upload the dates could have been corrupted in Excel if not formatted as Text. When a cell in Excel is formatted as Date, it stores dates (or anything that looks like a date) as a string of numbers that represents that date in the program. The date is visually presented as 7/15/1998, but underneath Excel actually stores it as 34529, which you can see if you change the formatting of the cell from Date to Text. When these fields get moved between systems the data can sometimes flip to text and end up encoded as that number string in the new database. In these instances the data itself becomes unreliable for analysis because it is inconsistent. Our developers had to write several algorithms to run through the date fields and normalize things piece by piece.
It should be underscored that the AMS was not developed as a long-term records database or access portal. It was built for the very specific purpose of aggregating records from multiple organizations, cleaning up and normalizing the data to a central standard, and managing the digitization of materials from the multiple, geographically distant areas. Nothing like this had really been built before, and though the AAPB is a unique project, the use of a tool like the AMS is actually becoming a need. In the past year AVPreserve developed a version of the AMS for VIAAPB (Flemish Institute for Archiving – http://viaa.be/), which is the central manager for the large scale digitization of Flemish audiovisual materials held by Universities, Broadcasters, Museums, and Libraries in Belgium. Their version of AMS is now being expanded to managing the digitization workflow for newspaper collections across the country.
We’re seeing an increasing number of efforts like this, especially in Europe where there is a stronger tie of archives to the government, and at US universities beginning to follow the Indiana University Media Preservation Initiative model where there is a central manager of digitization efforts for audiovisual materials held across all departments. The efforts of the AAPB and their agreement to let us develop the AMS as open source has now contributed a valuable preservation management tool to archives and collections across the globe. The source code for AMS is now available on Github at https://github.com/avpreserve/AMS. We’re looking forward to seeing the AAPB continue to grow and have been proud contributors to this important project.
This blog post was contributed by Josh Ranger, Senior Consultant at AVPreserve.