Tune in Tomorrow for AAPB’s ‘Ask Me Anything’ Forum on Reddit!

Join the Ask Me Anything forum (AMA) at https://www.reddit.com/r/AskHistorians/!

Hosted by Ask Historians (a.k.a /r/AskHistorians), staff from the American Archive of Public Broadcasting will be answering questions during an Ask Me Anything forum (AMA) tomorrow, Wednesday, February 13, 2019 from 12pm – 4pm ET! The AMA will take place at https://www.reddit.com/r/AskHistorians/!

The American Archive of Public Broadcasting (AAPB), a collaboration between the Library of Congress and Boston public broadcaster WGBH, will be answering the public’s questions about how we collect, preserve and provide access to the collection, as well as any specific questions about the content of the archive, and of course how scholars might collaborate with the AAPB to use the archive for research or in their teaching (we’d love to hear your ideas!).

The AAPB coordinates a national effort to preserve at-risk public media before its content is lost to posterity and provides a centralized web portal for access to the unique programming aired by public stations over the past 70+ years. To date, we have digitized nearly 100,000 historic public television and radio programs and original materials (such as raw interviews). The entire collection is accessible for research on location at the Library of Congress and WGBH, and more than 45,000 programs are available for listening and viewing online, within the United States, at http://americanarchive.org.

Among the collections preserved are more than 8,000 episodes of the PBS NewsHour Collection, dating back to 1975; more than 1,300 programs and documentaries from National Educational Television, the predecessor to the Public Broadcasting Service (PBS); raw, unedited interviews from the landmark documentary Eyes on the Prize; raw, unedited interviews with eyewitnesses and historians recorded for American Experience documentaries including Stonewall Uprising, The Murder of Emmett Till, Freedom Riders, 1964, The Abolitionists and many others. We aim to grow the archive by up to 25,000 hours of additional digitized content per year. The AAPB also works with scholars to publish curated exhibits and essays that provide historical and cultural context to the Archive’s content. We have also worked with researchers who are interested in using the collection (metadata, transcripts, and media) as a dataset for digital humanities and other computational scholarship.

The collection, acquired from more than 100 stations and producers across the U.S., not only provides national news, public affairs, and cultural programming from the past 70 years, but local programming as well. Researchers using the collection have the potential to uncover events, issues, institutional shifts, and social movements on the local scene that have not yet made it into the larger historical narrative. Because of the geographical breadth of the collection, scholars can use it to help uncover ways that national and even global processes played out on the local scene. The long chronological reach from the late 1940s to the present will supply historians with previously inaccessible primary source material to document change (or stasis) over time.

The staff who will be answering questions are:

Karen Cariani, Executive Director of the WGBH Media Library and Archives and WGBH Project Director for the American Archive of Public Broadcasting

Casey Davis Kaufman, Associate Director of the WGBH Media Library and Archives and Project Manager for the American Archive of Public Broadcasting

Ryn Marchese, Engagement and Use Manager for the American Archive of Public Broadcasting at WGBH

Tune in tomorrow at https://www.reddit.com/r/AskHistorians/.

#InnovationMonday: Dr. Michael DeBakey and Heart Surgery

InnovationMonday.png
Want to help make this interview searchable and accessible online? While listening to Dr. DeBakey’s interview, audiences can edit the grammatical errors made in the computer-generated transcript at http://fixitplus.americanarchive.org/transcripts/cpb-aacip_17-73bzmhcc!

Produced by Louisiana Public Broadcasting (LPB), this episode of the series “Louisiana Legends” (1982) features the first part of an interview with Dr. Michael DeBakey, a native of Lake Charles, LA who was a preeminent surgeon whose innovations revolutionized heart surgery. During his interview, Dr. DeBakey discusses his father’s immigration to Lake Charles from Lebanon, how he became interested in the heart, the impact of Dr. Alton Ochsner on his career, and his interactions with President Richard Nixon, President John F. Kennedy, and President Lyndon B. Johnson.

“Louisiana Legends” is a talk show hosted by Gus Weill conducting in-depth conversations with Louisiana cultural icons. This series has been digitized and preserved in the American Archive of Public Broadcasting (AAPB) and the public can help make this interview searchable and accessible through the Transcribe to Digitize Challenge!

How does it work? The AAPB has created computer-generated transcripts for each radio and television program in the archive. Stations like LPB are engaging the public to help correct puncutation or misspelled words to make the program available online. These programs are then searchable by keywords and timestamps much like this interview with James Baldwin (WGBH, 1963) – http://americanarchive.org/catalog/cpb-aacip_15-0v89g5gf5r.

You can start editing here http://fixitplus.americanarchive.org/transcripts/cpb-aacip_17-73bzmhcc.

Or watch the full interview at http://americanarchive.org/catalog/cpb-aacip_17-73bzmhcc.

To learn how the Transcribe to Digitize Challenge is providing FREE digitization to AAPB’s participating organizations, visit https://americanarchivepb.wordpress.com/2018/10/22/aapbs-transcribe-to-digitize-challenge-with-george-blood/.

Thank you!

Eric Saxon, Public Broadcasting Fellow at KOPN

blog_image_1_KOPN_transmitter
KOPN’s transmitter, located east of Columbia, MO

Greetings gentle reader, I’m Eric Saxon, a Masters of Information and Library Science student specializing in archives at the University of Missouri – Columbia, and part of the second cohort of the Public Broadcasting Preservation Fellowship (PBPF). This summer, I embarked on a deep tape diving expedition at the radio station, KOPN.

KOPN 89.5 FM, community radio from Columbia, Missouri, broadcasts to antennas throughout the central part of the state and via online at kopn.org. KOPN has transmitted information and music since 1973 AD. As part of the PBPF mission to record local histories across the nation, I set out to discover Columbia and KOPN as it existed in the first twenty or so years of the station, through a media format heretofore unfamiliar to me, the ¼ in. audio tape reel.

The idea was to give these audio reels new life through digital preservation, and, subsequently, new access points to the history of community radio in Columbia, MO in the era of the ¼ in. magnetic tape.

blog_image_2_reel
A ¼ in. magnetic audio tape reel

What I ended up recording is only a small piece of this history, but the audible trace there tells a story of a community radio station being born out of the progressive ethos of the 1960s, open to and actively exploring all available ideas during the 1970s, and incompletely mutating into new wave ideals of the 1980s. During the era of the magnetic tape, KOPN filled a void in mid-Missouri left by mainstream broadcast radio and television, serving across an intersection of race, class, gender, style, sexuality, attitude, and musical preference.

The collection is particularly strong in broadcasts that represent feminist discourse and practice of the time, and my predecessor (Rebecca Benson, PBPF Spring 2018 Fellow) had already begun work that focused on feminist community radio. Having inherited her excellent start to the project, I built upon the theme and expanded it to include live music broadcasts and a wide range of programming, all under the umbrella of feminist community radio.

To convey an idea of this breadth, some titles of the audio broadcasts I digitized include Betty Friedan in Columbia (1973); Don Cooper Live at KOPN (1973); Consciousness Across the Void (1973); Angela Davis in Columbia (1974); Political Gayness (1974); National Women’s Music Festival (1975); The End of “Alternative Radio” on WGTB (1976); Off Our Backs (1976); The Fabulish Winotones Live (1977); Numerology (1978); The Booty Band: Demo Tape (1978); Reasonably Polite New Wave (1981); Program on Lesbian Separatism (1981); DuChamp Live at the Blue Note (1981); Bella Azbug at MU (1984); Gloria Kaufman, “The Politics of Humor: A Feminist View” (1992);  City Council Meetings; and discussions by the Women’s Health Collective.

I transferred only a few reels from the 1990s to a digital format, and none from the 2000s. (By that time, the station had switched to digital machines.) However, a quick listen to KOPN today will tell you that the community values and open radio format there in the beginning continue to be the guiding forces of the station.

blog_image_3_DuChamp
Kansas City new wave band, DuChamp. Handmade collage on tape reel box.

The digitization process not only transferred content but also often recorded the unique physical characteristics of the tape and its interaction with the reel-to-reel tape machines, which, in the University of Missouri – Columbia KOPN Digitization Station’s case, are the Studer A807 (mono) and the Studer B67 (stereo). These were hooked up to a PC and a Mac desktop computer, respectively, where both utilized the audio editing software, Audacity. I could have removed some tape hiss, a sizzle of magnetic particles here and there, and other imperfections, but I left in all but the most egregious content obfuscators, not only to digitize as much as possible in the time allotted, but also as an aesthetic choice and to preserve the unique qualities of the tape medium itself.

blog_image_4_studer_mono
The Studer A807

Emancipating the tape reels from their media-specific obscurity required multiple other steps, with some reels needing more TLC and resuscitation than others. After vigilant cleaning of the machines between reels, this process might entail repairing splices that popped off during the recording process, adding leader tape to the heads and tails of reels, re-housing tapes with broken parts, periodic demagnetizing of the tape machines, untangling and re-spooling tape that had become curled and twisted, and baking/dehydrating tapes exhibiting “sticky-shed syndrome” where deteriorating binder material becomes unfixed in the tape path and gums up the machine’s moving parts. In addition to the more physical aspects of the project, there was also record creation for each reel, inventory production, metadata researched and added, checksum generation, audio file conversion, and ingest into the mothership servers at WGBH.

Although I worked independently, at every stage I had a network of experts and mentors to turn to when encountering an obstacle, from the immersion week of audiovisual preservation training in Boston to the final handoff of the files. Thanks go out to the amazing folks at WGBH and all involved in immersion week, including George Blood and Jackie Jay for introducing me to legacy A/V equipment, all my fellow Fellows, host mentor Jackie Casteel and everyone at KOPN, faculty mentor Dr. Sarah Buchanan and the scholars at MU’s Allen Institute, local mentor Jim Hone, and every one else involved in this far-reaching project.

Going forward, I’m excited to bring forth more untold and seldom heard stories from their various limbos, utilizing what I learned as a PBPF fellow to help make a more complete historical record that is inclusive of the entire spectrum of human experience.

blog_image_5_audiosetup_selfportrait
Minimal audio preservation setup: computer, reel-to-reel tape machine, human

Written by Eric Saxon, PBPF Summer 2018 Cohort

*******************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

 

Steve Wilcer, Public Broadcasting Preservation Fellow at WUNC

Wilcer profile photo.jpg
I was thrilled to experience the myriads of different programs from WUNC over the years and be able to directly contribute to their preservation for the future.

Hello! My name is Steve Wilcer. I coordinated with WGBH and WUNC Radio in Chapel Hill, North Carolina as a member of the second cohort of fellows for the AAPB Public Broadcast Preservation Fellowship. I am currently working towards a Master of Science in Library Science at the University of North Carolina and plan to graduate next spring. Prior to my time in North Carolina, I studied musicology at the Ohio State University and was exposed to a wide variety of media formats and materials, ranging from microfiche to medieval manuscripts. I developed a strong passion for libraries and archives through these experiences, which led me to pursue a second master’s degree in library science.

Learning as I work

As someone who just entered North Carolina last fall, my work with WUNC Radio offered me a unique opportunity to learn about the area and its people. Public radio provides a versatile platform for education, entertainment, and awareness programming. I was thrilled to experience the myriads of different programs from WUNC over the years and be able to directly contribute to their preservation for the future. During my portion of the fellowship, I was able to digitize approximately forty assets, with most of them being digital audio tapes. I also continued to develop the cataloging and documentation for WUNC, allowing me to experience the digitization and preservation process from a more holistic standpoint.

One particularly informative component of the fellowship for me was the North Carolina Voices special collection: This collection contains materials from two of WUNC’s special program series: Understanding Poverty and Civil War. Understanding Poverty offered a wide assortment of programs and features on various financial and social issues in the state, as well as how North Carolina has developed over the last several decades. The Civil War series contained family stories of ancestors that lived during or served in the United States Civil War. Both series provided me a valuable, more tangible insight into the people of Chapel Hill and North Carolina as I listened to their stories and firsthand experiences. I also had the artistic opportunity to design our thumbnail image for the special collection as it appears on the AAPB.

Building up foundations

Being the second UNC fellow for the project, I was fortunate that our digitization station was already set up and operational. Getting the station to work was a significant challenge for the first round of the fellowship, but fortunately, the station operated without any issues for me, thanks to all the hard work from everyone involved. One of my duties in the project was to build upon the records for the digitized materials and ensure that WUNC’s personal records were uniform and easy to understand. I frequently consulted with WUNC’s Keith Weston to confirm dates, names, and programming details. In some cases, newly rediscovered items forced us to reevaluate how we defined a particular series or piece of programming, and I would edit our records as necessary.

UNC SILS Digitization station

While the fellowship focuses on digitization, cataloging the physical DATs and cassettes I handled proved to be equally important. Without proper labeling and documentation, a given asset could be unknowingly re-recorded and cost extra time. In addition to our digital master table of records, I was responsible for labeling the physical objects and their cases with the newly-determined local identifiers for WUNC. With these markings, the cases can be quickly scanned for items that are yet to be digitized, which will make future digitization projects easier for WUNC.

I developed a strong personal connection to these items as I cataloged and marked them. Each DAT and cassette had a story to tell, and it was up to me to piece together their metadata and see that they were digitized and made publicly accessible so others could listen to them. Being one of the first North Carolina-based organizations to be included in the AAPB was very exciting for me, as our work here was not only a foundation for WUNC and its archives, but for North Carolina as a state, as well. Materials like the WUNC 1953 sign-on event reminded me how long ago some of these recordings were made, and how many more there may still be at WUNC, waiting to be digitized and heard once more.

Overall, the fellowship has been a wonderful opportunity for me. It allowed me to not only develop my abilities handling audio materials and digital records, but also provide me a way to learn about the area and its people and history. I am incredibly grateful for all the support and effort from everyone that allowed this project to be realized: my advisor, Dr. Helen Tibbo, Erica Titkemeyer from the Southern Folklife collection for her technical assistance, Dena Schultz, our first fellow for the project, Keith Weston at WUNC, and all the staff at WGBH for their supervision, planning, and feedback.

Written by Steve Wilcer, PBPF Summer 2018 Cohort

———

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

 

Ben Gogel, Research Assistant on the NewsHour Digitization Project

IMG_4457.jpg

Over the last several months, I’ve worked as a Research Assistant at WGBH on the PBS NewsHour Digitization Project. This project involves taking the predecessor programs for the PBS NewsHour, including The MacNeil/Lehrer Report, The MacNeil/Lehrer NewsHour and The NewsHour with Jim Lehrer, and making them available to a wider audience through digitization, preservation, and online access. My specific responsibilities include reviewing the proxy files, or digital copies, of multiple NewsHour episodes and making sure they are presentable (no major audiovisual glitches, complete transcripts, subtitles are legible), and recording the information in an online spreadsheet. This may sound like a straightforward job, but working at WGBH taught me even straightforward jobs can have unpredictable aspects, and I learned a lot about adapting to new challenges and going outside my comfort zone.

Before working on this project, I attended a rigorous Archives Management program at Simmons University, learning about several archival processing practices, chief among them being More Product, Less Process (MPLP). The idea behind MPLP is that, in cases where large amounts of archival content needs to be preserved, the Archivist must focus on processing as many objects as possible. This approach served me well in several real-world internships, including two in different departments at WGBH. The first of these was in the Creative department during the summer of 2015, helping my co-workers not only track data but set up a Google Drive account so as to store it in a spreadsheet. I then parlayed this experience into my Simmons Archives Field Study capstone project in the WGBH Media Library and Archives (MLA). Throughout the winter of 2016, I reviewed and cataloged episodes of regional news magazines produced by the Wyoming PBS and Oregon Public Broadcasting. Between the academic training and real-world experience, I thought I could handle working on the NewsHour Digitization Project, but over time, I found out just how unprepared I was, in the best way possible.

While archives share general principles, every place and department I’ve worked at has its own unique, unpredictable challenges, and the same was true on this project as well. A typical day on the job involves watching NewsHour episodes in bits and pieces, making sure the videos were watchable and their accompanying materials (i.e. transcripts and subtitles) were present and accurate. Most of the time, review has been straightforward, and the clips themselves have occasionally been interesting looks at iconic figures from new perspectives: personal favorites include retrospectives on what would’ve been the 100th birthdays of Alfred Hitchcock and Walt Disney, for example. But there are times where I have been thrown for a loop and needed to adapt.

For clips without transcripts and/or subtitles, I had no choice but to watch them for longer time periods, paying close attention to the audio. This need for paying close attention goes both ways, and there were times during graphic reports (like 9/11 and Hurricane Katrina coverage) where I needed to take small breaks to keep from getting overwhelmed emotionally. Fortunately, my co-workers and supervisor, remembering me from my previous Archives internship, have been remarkably sympathetic and understanding, which helped alleviate this stress, among other worries. The friendly, open atmosphere also encouraged me to branch out and extend a helping hand to them, and new people at WGBH, in kind.

Throughout the summer months, the MLA had several interns join over the summer, and as a welcoming gesture, I sat down with each of them for lunch on their first days, and over the course of their time I offered practical advice whenever I could, most importantly to not rule anything out vis-à-vis future work opportunities. At the same time, I myself was a fresh addition to the audience at several MeetUps and SpeakEasys: one of each a month for promoting and socializing with people from different departments of WGBH. The MeetUps even have a whole minute set aside for the purpose of introducing yourself to strangers, a nice and well-appreciated touch. Between this mentorship and more socially conducive environment, I had a support network that helped me a great deal.

As a kid, two of the biggest things that scared me were thunderstorms and spicy foods, particularly buffalo chicken. I would always stay away from both of them at every possible opportunity, to preserve my anxiety and avoid any kind of discomfort. The last few months had their fair share of stormy heat waves and spicy hot wings, but as with archival work in general, uncomfortable situations can only be avoided for so long. In the end, I had to buck up and accept that summer storms could at least be tolerated, and it helps that my co-workers never treated it as a debilitating setback. As for the spicy foods, that I did have control over, and to set a positive example for the interns, I not only tried buffalo chicken, but also pulled pork covered in Jamaican jerk seasoning. To my surprise, neither one of those foods burned my mouth off or led to searing pain, and this growth can be directly attributed to both my at-work support group and my need/willingness to handle unforeseen archiving circumstances.

Being adaptable to unpredictable elements is the most valuable lesson I learned from this experience. On-the-nose food metaphors aside, my experience with the turbulence in both the clouds and video files forced me out of my comfort zone, but it was all in terms I could understand thanks to my years of real-world experience. In my goal of preserving and making accessible the NewsHour files, I persevered and made myself more accessible as well.

Written by Ben Gogel, https://www.linkedin.com/in/bengogel/

 

 

Riley Griffin, Public Broadcasting Preservation Fellow at GPB

Riley.png
When we toured WGBH, we took turns holding an Emmy Award trophy (Image: Riley Griffin, author, holding an Emmy Award)

Hi, everyone!  My name is Riley Griffin (xe/xir).  I am just now entering my second year of graduate school in Clayton State University’s Masters of Archival Studies program.  I am the second fellow, after Virginia Angles, to be a part of the American Archives of Public Broadcasting (AAPB) Public Broadcasting Preservation Fellowship (PBPF).  My part of the project focused on digitizing Georgia Public Broadcasting’s (GPB) Georgia Gazette under the incredibly trusting supervision of Ellen Reinhardt, Kathy Christensen, and Joshua Kitchens.  I was looking for summer opportunities when a chance at following a career path in my new-found love for preservation presented itself through the AAPBPBPF.  I was overjoyed by the scope of the fellowship, the organizations working with it, and the special collections it included.

Every fellowship starts with certain expectations only to end with different lessons and new perspectives.  At the start of my fellowship, I spent a lot of time comparing. There were a lot of things I was not expecting, my reactions being one of them.  As we visited Boston and learned about all the different types of digital media we could be working with I couldn’t help but begin to feel this sort of jealousy–wishing I could work with as many formats and topics as possible.

Of course, this hunger decreased to a low rumble as I became humbled by the Georgia Gazette materials.  I quickly realized I craved difficulty; so, I became grateful instead of jealous.  In training, we were prepared to scrub and scrub our machines clean, take precious time delicately fixing things, and balance everything to be just perfect.  However, my project was given a bit of grace by being a more modern collection. Digital Audio Tapes (DATs) are often considered one of the most fragile media formats. However, most of them were recorded at a decent quality from the 1990’s to the 2000’s, rewound to the beginning, and left alone and undisturbed in an air-conditioned radio station.  So, please forgive me when I am grateful that the worst of my worries is how many times I dropped the (very loose) pinch roller into the machine that day.

GPBDigStation.png
GPB Digitization Station (Image: Two desks with 2 computers, a DAT machine, cleaning materials, and various electronics everywhere)

The topics of everyone’s materials had me curious, too.  I was wondering what it was like to have video–as my project was only audio–and to have materials like oral histories to work with.  I quickly counted my blessings as I heard what my colleague was working on–images of war, tragedy, death, and disaster. I thanked GPB for having forward attitudes towards topics, reporters who were nearly-emotionless in comparison, and pert news reports.  I am a very sensitive soul and could imagine having to wait the tears out before being able to see what you’re working on. I also realized I was having a hard time with some of the Georgia Gazette material.  One thing I experience as an archivist who moves all over is major culture shock.  I think being an archivist is one of the best ways to learn about the place you have just moved to. But it also exposes you to things much quicker than you expect.

I’m from upstate New York, which has a different demographic and historical context; although I’m not unfamiliar with racism, being deeply embedded in Georgia’s racial history as I digitized GPB’s daily news was a new experience for me. I had moments of weeping at work as I listened to news reports about the Georgia General Assembly holding expensive special sessions in order to redistrict purely based on race, schoolchildren being prevented from going the schools they want as a result of segregation, and segregation’s long-term effects on Georgia school districts, which I still hear about today. Although I knew about these issues in the abstract, hearing them firsthand was very emotional for me and adding visuals might have been overwhelming.

I would be lying if I were to say I came away from this project without any further attachment to Georgia.  Although it has exposed me to some of the ugly parts I try to avoid in my daily life, it has also exposed me to so much more.  Even the drive to work showed me the oldest drive-in movie theater in the area that is still working.   I also got the opportunity to listen to all of the preparation and execution of the 1996 Olympics.  I am a huge fan of all things Olympics, so

DAT
Indeed, this was the “WORST Gazette ever” (Image: close-up of a DAT labelled “Maxell DAT; Gazette 01-20 95; WORST Gazette ever”)

this was a special treat for me. The Georgia Gazette has given me a sort of pseudo-pride of Georgia; every guest and topic on the show had a relation to Georgia.  Learning about popular historical figures like Blind Tom Wiggins or popular events like the National Grits Festival in Warwick gives me a great appreciation for where I live and the opportunities available to me here.  It has also given me a deeper and fuller appreciation for public broadcasting, something that had already been instilled in me.  In a time where everyone is flocking to Georgia for jobs, often displacing long-term Georgians, I remind myself that my brief time being here must be purposeful.  I hope to help make their history more accessible so that they can feel that true sense of pride they deserve.  With the Georgia Gazette, I hope I did just that–even if it was just a little bit.

 

Written by Riley Griffin, PBPF Summer 2018 Cohort

———

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

AAPB Transcription Workflow

The AAPB started creating transcripts as part of our “Improving Access to Time-Based Media through Crowdsourcing and Machine-Learning” grant from the Institute of Museum and Library Services (IMLS). For the initial 40,000 hours of the AAPB’s collection, we worked with Pop Up Archive to create machine-generated transcripts, which are primarily used for keyword indexing, to help users find otherwise under-described content. These transcripts are also being corrected through our crowdsourcing platforms FIX IT and FIX IT+.

As the AAPB continues to grow its collection, we have added transcript creation to our standard acquisitions workflow. Now, when the first steps of acquisition are done, i.e., metadata has been mapped and all of the files have been verified and ingested, the media is passed in to the transcription pipeline. The proxy media files are either copied directly off the original drive or pulled down from Sony Ci, the cloud-based storage system that serves americanarchive.org’s video and audio files. These are copied into a folder on the WGBH Archives’ server, and then they wait for an available computer running transcription software.

Dockerized Kaldi

The AAPB uses the docker image of PopUp Archive’s Kaldi running on many machines across WGBH’s Media Library and Archives. Rather than paying additional money to run this in the cloud or on a super computer, we decided to take advantage of the resources we already had sitting in our department. AAPB and Archives staff at WGBH that regularly leave their computers in the office overnight are good candidates for being part of the transcription team. All they have to do is follow instructions on the internal wiki to install Docker and a simple Macintosh application, built in-house, that runs scripts in the background and reports progress to the user. The application manages launching Docker, pulling the Kaldi image (or checking that you already have it pulled), and launching the image. The user doesn’t need any specific knowledge about how Docker images work to run the application. That app gets minimized on the dock and continues to run in the background as the staff members goes about their work during the day.* But that’s not all! When they leave for the night and their computer typically wouldn’t be doing anything, it continues to transcribe media files, making use of processing power that we were already paying for but hadn’t been utilizing.

*There have been reports of systems being perceptively slower when running this Docker image throughout the day. It has yet to have a significant impact on any staff member’s ability to do their job.

Square application window that shows list of transcripts that have been processed
Application user-interface

Centralized Solution

Now, we could just have multiple machines running Kaldi through Docker and that would let us create a lot of transcripts. However, it would be cumbersome and time-consuming to split the files into batches, manage starting a different batch on each computer, and collect the disparate output files from various machines at the end of the process. So we developed a centralized way of handling the input and output of each instance of Kaldi running on a separate machine.

That same Macintosh application that manages running the Kaldi Docker image also manages files in a network-shared folder on the Archives server. When a user launches the application, it checks that specific folder on the server for media files. If there are any media files in that folder, it takes the oldest file, copies it locally and starts transcribing it. When Kaldi has finished transcribing it, the output text and json formatted transcripts are copied to a subfolder on the Archives server, and the copy of the media file is deleted. Then the application checks the folder again, picks up the next media file, and the process continues.

Screenshot of a file directory with many .mp4 files, a few folders, and a few files named with base64 encoded strings
Files on the Archives server: the files at the top are waiting to be processed, the files near the bottom are the ones being processed by local machines

Avoiding Duplicate Effort

Now, since we have multiple computers running in parallel, all looking at the same folder on the server, how do we make sure that multiple computers aren’t duplicating efforts by transcribing the same file? Well, the process first tries to rename the file to be processed, using the person’s name and a base-64 encoding of the original filename.  If the renaming succeeds, the file is copied into the Docker container for local processing, and the process on every other workstation will ignore files named that way in their quest to pick up the oldest qualifying file. After a file is successfully processed by Kaldi, it is  then deleted, so no one else can pick it up. When Kaldi fails on a file, then the file on the server is renamed to its original file name with “_failed” appended, and again the scripts know to ignore the file. A human can later go in to see if any files have failed and investigate why. (It is rare for Kaldi to fail on an AAPB media file, so this is not part of the workflow we felt we needed to automate further).

Handling Computer and Human Errors

The centralized workflow relies on the idea that the application is not quitting in the middle of a transcription. If someone shuts their laptop, the application will stop, but when they open it again, the application will pickup right where it left off. It will even continue transcribing the current file if the computer is not connected to the WGBH network, because it maintains a local copy of the file that is processing. This allows a little flexibility in terms of staff taking their computers home or to conferences.

The problem starts when the application quits, which could occur when someone quits it intentionally, someone accidentally hits the quit button rather than the minimize button, someone shuts down or restarts their computer, or a computer fails and shuts itself down automatically. We have built the application to minimize the effects of this problem. When the application is restarted it will just pick up the next available file and keep going as if nothing happened. The only reason this is a problem at all is because the file they were in the middle of working on is still sitting on the Archives server, renamed, so another computer will not pick it up.

We consider these few downsides to this set up completely manageable:

  • At regular intervals a human must look into the folder on the server to check that a file hasn’t been sitting renamed for a long time. These are easy to spot because there will be two renamed files with the same person’s name. The older of these two files is the one that was started and never finished. The filename can be changed to its original name by decoding the base-64 string. Once the name is changed, another computer will pick up the file and start transcribing.
  • Because the file stopped being transcribed in the middle of the process, the processing time spent on that interrupted transcription is wasted. The next computer to start transcribing this file will start again at the beginning of the process.

Managing Prioritization

Because the AAPB has a busy acquisitions workflow, we wanted to make sure there was a way to manage prioritization of the media getting transcribed. Prioritization can be determined by many variables, including project timelines, user interest, and grant deadlines. Rather than spending a lot of time to build a system that let us track each file’s prioritization ranking, we opted for a simpler, more manual operation. While it does require human intervention, the time commitment is minimal.

As described above, the local desktop applications only look in one folder on the Archives server. By controlling what is copied into that folder, it is easy to control what files get transcribed next. The default is for a computer to pick up the oldest file in the folder. If you have a set of more recent files that you want transcribed before the rest of the files, all you have to do is remove any older files from that folder. You can easily put them in another folder, so that when the prioritized files are completed, it’s easy to move the rest of the files into the main folder.

For smaller sets of files that need to be transcribed, we can also have someone who is not running the application standup an instance of dockerized Kaldi and run the media through it locally. Their machine won’t be tied into the folder on the server, so they will only process those prioritized files they feed Kaldi locally.

Transforming the Output

At any point we can go to the Archives server and grab the transcripts that have been created so far. These transcripts are output as text files and as JSON files which pair time-stamp data with each word. However, the AAPB prefers JSON transcripts that are time-stamped at each 5-7 second phrase.

We use a script that parses the word-stamped JSON files and outputs phrase-stamped JSON files.

Word time-stamped JSON

Screenshot from a text editor showing a json document with wrapping json object called words with sub-objects with keys for word, time, and duration
Snippet of Kaldi output as JSON transcript with timestamps for each word

Phrase time-stamped JSON

Screenshot from a text editor of JSON with a container object called parts and sub-objects with keys text, start time, and end time.
Snippet of transformed JSON transcript with timestamps for 5-7 second phrases

Once we have the transcripts in the preferred AAPB format, we can use them to make our collections more discoverable and share them with our users. More on the part of the workflow in Part 2 (coming soon!).

Rebecca Benson, Public Broadcasting Preservation Fellow at KOPN

My name is Rebecca Benson, and I’m a graduate student at the University of Missouri, working on a Master’s in Library Science and focusing on work in special collections libraries. I am so excited for the experience I have gained working with the AAPB: I am familiar with much older materials, but the history of the past 100 years really demands broadcast media to be fully understood. The opportunity to work with AAPB and the materials from our local community radio station has expanded my archival horizons, and I look forward to sharing these materials and this history with researchers, as well as sharing this technology with other archivists.

IMG_3065The University of Missouri partnered with the one of the local community radio stations to work on this project. KOPN has been broadcasting from the same office in downtown Columbia since it was founded in 1973  — and I’m pretty sure some of the reels I digitized had not been touched since then. As one of the first open-access community radio stations, they have an amazing perspective on the history of the past several decades. The collection spans an incredible number of areas, from radio theatre to concerts to talk shows, from feminist, queer, indigenous, and otherwise marginalized voices. Working with Jackie Casteel, we decided to begin by digitizing the women’s programming, from the annual Women’s Weekend, the League of Women Voters, and the local Women’s Health collective, among others. Even within this subset, the range of programming spans from interview shows with women in prison to a discussion from one of the first female dentists in the area. Every time I start a new reel, I learn something new and interesting about Columbia or the world, and I cannot wait for others to use this trove of information to begin doing research. I have benefited from the information myself — by chance, I digitized the 1986 League of Women Voters panel on hospital trustees a week before another hospital trustee election in town, which dealt with the hospital lease discussed in 1986!

As I have worked with these materials, I have found that this sort of archival work can re-unite communities and bring people together. Not only have I worked with the university and our initial contacts at the station, I have encountered numerous other people who are, or were, connected with programming that I have now heard. Working on the metadata for our programs led me to the State Historical Society, and their archives of broadcast lists. My time sorting reels at the station led to meeting with a woman who had run much of the radio theatre programming for decades. A chance mention of KOPN led to learning more about the alternative ‘zine community in Columbia, and its connection with the radio station. This project has shown me all the ways in which archival projects are more than just scholarly work, but a way to build and re-build communities.

Getting all of these reels digitized has been — and continues to be — a massive project. As a community radio station, KOPN did not have the most standardized procedures for recording, broadcasting, and documentation, which has led to some interesting moments at the work station. I’m still uncertain how someone managed to splice one tape inside out and backwards! On the other hand, all of these quirks are a result of the creative community that grew around KOPN, and without it, the history of the station would be much poorer. We are so excited to share this vibrant part of our local history with the world.

Written by Rebecca Benson, PBPF Spring 2018 Cohort

*******************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.