AAPB Transcription Workflow, Part 1

The AAPB started creating transcripts as part of our “Improving Access to Time-Based Media through Crowdsourcing and Machine-Learning” grant from the Institute of Museum and Library Services (IMLS). For the initial 40,000 hours of the AAPB’s collection, we worked with Pop Up Archive to create machine-generated transcripts, which are primarily used for keyword indexing, to help users find otherwise under-described content. These transcripts are also being corrected through our crowdsourcing platforms FIX IT and FIX IT+.

As the AAPB continues to grow its collection, we have added transcript creation to our standard acquisitions workflow. Now, when the first steps of acquisition are done, i.e., metadata has been mapped and all of the files have been verified and ingested, the media is passed in to the transcription pipeline. The proxy media files are either copied directly off the original drive or pulled down from Sony Ci, the cloud-based storage system that serves americanarchive.org’s video and audio files. These are copied into a folder on the WGBH Archives’ server, and then they wait for an available computer running transcription software.

Dockerized Kaldi

The AAPB uses the docker image of PopUp Archive’s Kaldi running on many machines across WGBH’s Media Library and Archives. Rather than paying additional money to run this in the cloud or on a super computer, we decided to take advantage of the resources we already had sitting in our department. AAPB and Archives staff at WGBH that regularly leave their computers in the office overnight are good candidates for being part of the transcription team. All they have to do is follow instructions on the internal wiki to install Docker and a simple Macintosh application, built in-house, that runs scripts in the background and reports progress to the user. The application manages launching Docker, pulling the Kaldi image (or checking that you already have it pulled), and launching the image. The user doesn’t need any specific knowledge about how Docker images work to run the application. That app gets minimized on the dock and continues to run in the background as the staff members goes about their work during the day.* But that’s not all! When they leave for the night and their computer typically wouldn’t be doing anything, it continues to transcribe media files, making use of processing power that we were already paying for but hadn’t been utilizing.

*There have been reports of systems being perceptively slower when running this Docker image throughout the day. It has yet to have a significant impact on any staff member’s ability to do their job.

Square application window that shows list of transcripts that have been processed
Application user-interface

Centralized Solution

Now, we could just have multiple machines running Kaldi through Docker and that would let us create a lot of transcripts. However, it would be cumbersome and time-consuming to split the files into batches, manage starting a different batch on each computer, and collect the disparate output files from various machines at the end of the process. So we developed a centralized way of handling the input and output of each instance of Kaldi running on a separate machine.

That same Macintosh application that manages running the Kaldi Docker image also manages files in a network-shared folder on the Archives server. When a user launches the application, it checks that specific folder on the server for media files. If there are any media files in that folder, it takes the oldest file, copies it locally and starts transcribing it. When Kaldi has finished transcribing it, the output text and json formatted transcripts are copied to a subfolder on the Archives server, and the copy of the media file is deleted. Then the application checks the folder again, picks up the next media file, and the process continues.

Screenshot of a file directory with many .mp4 files, a few folders, and a few files named with base64 encoded strings
Files on the Archives server: the files at the top are waiting to be processed, the files near the bottom are the ones being processed by local machines

Avoiding Duplicate Effort

Now, since we have multiple computers running in parallel, all looking at the same folder on the server, how do we make sure that multiple computers aren’t duplicating efforts by transcribing the same file? Well, the process first tries to rename the file to be processed, using the person’s name and a base-64 encoding of the original filename.  If the renaming succeeds, the file is copied into the Docker container for local processing, and the process on every other workstation will ignore files named that way in their quest to pick up the oldest qualifying file. After a file is successfully processed by Kaldi, it is  then deleted, so no one else can pick it up. When Kaldi fails on a file, then the file on the server is renamed to its original file name with “_failed” appended, and again the scripts know to ignore the file. A human can later go in to see if any files have failed and investigate why. (It is rare for Kaldi to fail on an AAPB media file, so this is not part of the workflow we felt we needed to automate further).

Handling Computer and Human Errors

The centralized workflow relies on the idea that the application is not quitting in the middle of a transcription. If someone shuts their laptop, the application will stop, but when they open it again, the application will pickup right where it left off. It will even continue transcribing the current file if the computer is not connected to the WGBH network, because it maintains a local copy of the file that is processing. This allows a little flexibility in terms of staff taking their computers home or to conferences.

The problem starts when the application quits, which could occur when someone quits it intentionally, someone accidentally hits the quit button rather than the minimize button, someone shuts down or restarts their computer, or a computer fails and shuts itself down automatically. We have built the application to minimize the effects of this problem. When the application is restarted it will just pick up the next available file and keep going as if nothing happened. The only reason this is a problem at all is because the file they were in the middle of working on is still sitting on the Archives server, renamed, so another computer will not pick it up.

We consider these few downsides to this set up completely manageable:

  • At regular intervals a human must look into the folder on the server to check that a file hasn’t been sitting renamed for a long time. These are easy to spot because there will be two renamed files with the same person’s name. The older of these two files is the one that was started and never finished. The filename can be changed to its original name by decoding the base-64 string. Once the name is changed, another computer will pick up the file and start transcribing.
  • Because the file stopped being transcribed in the middle of the process, the processing time spent on that interrupted transcription is wasted. The next computer to start transcribing this file will start again at the beginning of the process.

Managing Prioritization

Because the AAPB has a busy acquisitions workflow, we wanted to make sure there was a way to manage prioritization of the media getting transcribed. Prioritization can be determined by many variables, including project timelines, user interest, and grant deadlines. Rather than spending a lot of time to build a system that let us track each file’s prioritization ranking, we opted for a simpler, more manual operation. While it does require human intervention, the time commitment is minimal.

As described above, the local desktop applications only look in one folder on the Archives server. By controlling what is copied into that folder, it is easy to control what files get transcribed next. The default is for a computer to pick up the oldest file in the folder. If you have a set of more recent files that you want transcribed before the rest of the files, all you have to do is remove any older files from that folder. You can easily put them in another folder, so that when the prioritized files are completed, it’s easy to move the rest of the files into the main folder.

For smaller sets of files that need to be transcribed, we can also have someone who is not running the application standup an instance of dockerized Kaldi and run the media through it locally. Their machine won’t be tied into the folder on the server, so they will only process those prioritized files they feed Kaldi locally.

Transforming the Output

At any point we can go to the Archives server and grab the transcripts that have been created so far. These transcripts are output as text files and as JSON files which pair time-stamp data with each word. However, the AAPB prefers JSON transcripts that are time-stamped at each 5-7 second phrase.

We use a script that parses the word-stamped JSON files and outputs phrase-stamped JSON files.

Word time-stamped JSON

Screenshot from a text editor showing a json document with wrapping json object called words with sub-objects with keys for word, time, and duration
Snippet of Kaldi output as JSON transcript with timestamps for each word

Phrase time-stamped JSON

Screenshot from a text editor of JSON with a container object called parts and sub-objects with keys text, start time, and end time.
Snippet of transformed JSON transcript with timestamps for 5-7 second phrases

Once we have the transcripts in the preferred AAPB format, we can use them to make our collections more discoverable and share them with our users. More on the part of the workflow in Part 2 (coming soon!).

Rebecca Benson, Public Broadcasting Preservation Fellow at KOPN

My name is Rebecca Benson, and I’m a graduate student at the University of Missouri, working on a Master’s in Library Science and focusing on work in special collections libraries. I am so excited for the experience I have gained working with the AAPB: I am familiar with much older materials, but the history of the past 100 years really demands broadcast media to be fully understood. The opportunity to work with AAPB and the materials from our local community radio station has expanded my archival horizons, and I look forward to sharing these materials and this history with researchers, as well as sharing this technology with other archivists.

IMG_3065The University of Missouri partnered with the one of the local community radio stations to work on this project. KOPN has been broadcasting from the same office in downtown Columbia since it was founded in 1973  — and I’m pretty sure some of the reels I digitized had not been touched since then. As one of the first open-access community radio stations, they have an amazing perspective on the history of the past several decades. The collection spans an incredible number of areas, from radio theatre to concerts to talk shows, from feminist, queer, indigenous, and otherwise marginalized voices. Working with Jackie Casteel, we decided to begin by digitizing the women’s programming, from the annual Women’s Weekend, the League of Women Voters, and the local Women’s Health collective, among others. Even within this subset, the range of programming spans from interview shows with women in prison to a discussion from one of the first female dentists in the area. Every time I start a new reel, I learn something new and interesting about Columbia or the world, and I cannot wait for others to use this trove of information to begin doing research. I have benefited from the information myself — by chance, I digitized the 1986 League of Women Voters panel on hospital trustees a week before another hospital trustee election in town, which dealt with the hospital lease discussed in 1986!

As I have worked with these materials, I have found that this sort of archival work can re-unite communities and bring people together. Not only have I worked with the university and our initial contacts at the station, I have encountered numerous other people who are, or were, connected with programming that I have now heard. Working on the metadata for our programs led me to the State Historical Society, and their archives of broadcast lists. My time sorting reels at the station led to meeting with a woman who had run much of the radio theatre programming for decades. A chance mention of KOPN led to learning more about the alternative ‘zine community in Columbia, and its connection with the radio station. This project has shown me all the ways in which archival projects are more than just scholarly work, but a way to build and re-build communities.

Getting all of these reels digitized has been — and continues to be — a massive project. As a community radio station, KOPN did not have the most standardized procedures for recording, broadcasting, and documentation, which has led to some interesting moments at the work station. I’m still uncertain how someone managed to splice one tape inside out and backwards! On the other hand, all of these quirks are a result of the creative community that grew around KOPN, and without it, the history of the station would be much poorer. We are so excited to share this vibrant part of our local history with the world.

Written by Rebecca Benson, PBPF Spring 2018 Cohort

*******************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

Tanya Yule, Public Broadcasting Preservation Fellow at CAAM

 

Screen Shot 2018-05-07 at 4.13.46 PM
Drives loaded up and ready to be sent to the AAPB!!

 

Hello, my name is Tanya Yule and I am one of the five, in the first cohort of the AAPB Public Broadcast Preservation Fellows. Later this month I will be receiving my Masters in Library and Information Science, and an advanced certificate in Digital Assets Management from San Josè State University, with an emphasis in archives and preservation.

When I began the program at SJSU it was with a focus on photography preservation; this was initially a means of utilizing my background in historic photography practices as a way to protect and preserve images for future generations. However, through my work at the Hoover Institution Archives (where I am an intern), I began to fall in love with working in all areas of archives, not just with photographs, and have had the fortunate experience to process incredible collections that range from the Russian Revolution to the Vietnam War, each providing a unique glimpse of someone’s life that I get to describe, organize, and preserve for future generations. When the fellowship was posted, I had a “this was made for me” moment and applied instantly. I have wanted to work with A/V media for quite sometime, and have yet to have the opportunity, until now.

For the last three-months I have been entrenched in material spanning the globe; each item as unique as the next, and giving me more in return than I was prepared for. As I am sitting here trying to tap out a structure and synthesis of what the heck just occurred during the American Archive of Public Broadcasting’s Preservation Fellowship, I am almost overwhelmed with the task.

 

Screen Shot 2018-05-07 at 4.13.34 PM
Bay Area Video Coalition (BAVC) Set-up

 

The specialness of this particular fellowship has been based in the opportunity to work with at-risk magnetic media, multiple stakeholders, and learn a very complex technique for capturing. I was fortunate to be able to work with two amazing San Francisco based non-profit organizations that focus on representing arts and culture for underrepresented communities, and have been pillars in what they do for several decades. The collection I worked from came from the Center for Asian American Media (CAAM); CAAM isn’t a traditional archives, but their holdings are significant and represent a wide range of diverse films and documentaries; many which have appeared on local and national PBS stations over the years. The collection contained U-matic, Betacam, and Digibeta tapes, many which haven’t been viewed in decades. The majority of the fellowship was spent over at the Bay Area Video Coalition (BAVC), under the watchful (and extremely patient and knowledgeable) eye of Jackie Jay. I was fortunate to be able to have my experience take place with the help of a staff that do this work daily, and could help me capture and learn in the best possible situation. I would like to also give a shout out to Morgan Morel for suffering though my lack of commandline knowledge, he has inspired me to take a python class when this is all over.

What is in a name?

While inventorying the items for the collection at CAAM, I couldn’t help but be curious about some of the titles: Anatomy of a Springroll, Dollar a Day, 10 Cents a Dance, A Village Called Versailles, Sewing Woman, to name a few. Since all of the items are on some form of video (magnetic media) it isn’t as easy as just popping in a deck and taking a peek. While capturing in the dark room with my noise cancelling headphones on, there were moments that I would literally laugh out loud, or cry; the subjects are heavy, as is the perspective and history, my work at the Hoover Archives had helped prepare me for dealing with difficult collections, especially when it comes to visual materials regarding war and atrocities.

 

Screen Shot 2018-05-07 at 4.08.43 PM
Many videos have some form of image error, the above “watermark” is a blemish on an old tape, this can be seen in 1/30 of a second. After capturing I would go back to any discrepancy to investigate further

 

Cleaning, cleaning, and some baking!

I soon learned that the majority of my time was in making sure that the decks and tapes were in tip-top shape before capturing. It is quite amazing how much time is spent cleaning tapes, cleaning the decks, baking tapes (in a really high tech food dehydrator), re-cleaning tapes, and re-cleaning machines, as well as setting up levels and making sure that the item being digitized is as close to the original as possible. The cleaning ensures that there is no transfer of dust or debris from another tape, and that the output from the deck is precise. I am extremely fortunate to have my digitization station at BAVC, as they understand the fundamentals of video preservation and digitization, and helped me learn more about the process then I thought I would be capable of in such a short time.

About the collection

As archivists often times we really don’t know what the collection is “about” until the end, there are usually surprises, and most the times these records don’t come with a “read me” file, so I figured I would save this portion to the end as well. The collection as a whole speaks to the diversity of Asian American life, culture, and experiences; evoking the universal struggle of the human condition. When curating the featured films for the AAPB Special Collections page it was difficult to choose, however, many of the films tell the history of women who have defied odds, been outspoken, or who had sacrificed so much for so little in return, I wanted to put these women upfront and recognize their stories and the ones who decided to tell them.

 

Screen Shot 2018-05-07 at 4.13.20 PM
CAAM Video Archive

 

Having this wonderful opportunity to participate in this fellowship while completing my degree allowed me to expand my technical and historical knowledge base, which I am forever grateful for. I would like to thank SJSU and my wonderful advisor Alyce Scott, James Ott and Davin Agatep at the CAAM for helping me out with the project, the entire preservation crew at BAVC for making sure I didn’t break anything, and of course the AAPB and all of the wonderful WGBH folks that made this fellowship happen.

If you are interested in learning more, here is a Q & A I did with CAAM when I started, you can also follow #aapbpf for photos of the stations and process.

 

 

Written by Tanya Yule, PBPF Spring 2018 Cohort

*******************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

Evelyn Cox, Public Broadcasting Preservation Fellow at OETA

Oklahoma Legacy: Indelible Impressions of Perseverance, Fortitude, Resilience and Pride

Tornado
Storm Chaser Footage of a Tornado in Newcastle, Oklahoma headed towards the Newcastle High School.

Greetings from the lovely state of Oklahoma. My name is Evelyn Cox and I am the Public Broadcasting Preservation fellow partnered with Oklahoma Educational Television Authority (OETA). I represent the Spring 2018 Cohort from the School of Library and Information Studies at the University of Oklahoma and have been blessed to work with outstanding mentors and advisors throughout this fellowship, collaborating with my host station mentor and Vice President of Operations, Janette Thornbrue and the talented staff at OETA; my local mentor and Political Commercial Archivist at the University of Oklahoma, Lisa Henry; and my faculty advisor and Director of the School of Library and Information Studies at OU, Dr. Susan Burke. It has been my honor to explore and select for preservation from the treasure trove of audiovisual content within the OETA Archives housed on both analog and digital tapes dating back to the 1970s.

About OETA’s Collection

Glowingboxes
Oklahoma City Murrah Bombing Memorial. The chairs represent lives taken.

Previously identified as Native American territory prior to statehood, Oklahoma Educational Television Authority’s collection is a glimpse into the past, covering topics and exploring issues that are relevant to the diverse cultures represented, both then and now. Issues such as racial diversity, terrorism, natural disasters, war, and poverty become the catalyst for unity and the impetus for exploration, growth, and acceptance. This collection is an eclectic mix of at-risk public media material from the Oklahoma Educational Television Authority (OETA) Archive with contributions from the Oklahoma Department of Wildlife Conservation Archive. At the heart of this collection, are the people. The resilient men and women who have both contributed to the legacy of Oklahoma as well as the mosaic of our great nation in the area of art, music, science, exploration, politics, religion, architecture, literature, language, etc. Oklahoma Legacy is a culmination of indelible impressions of perseverance, fortitude, resilience and pride.

Exploring the Legacy of Oklahoma

flight.png
Elizabeth Smith (right) and Margaret Anne Hamilton of Enid, OK (left) are WASPs that were given Congressional Medals for service During WWII.

As I combed through the OETA Archive, I felt giddy with excitement. Oklahoma has so much rich, culturally significant and diverse history that many people do not have access to. I could not believe that because of the PBPF fellowship, I would have the opportunity to select material that would be accessible online at American Archive of Public Broadcasting’s website and preserved at the Library of Congress. What an honor. I was like a kid in a candy store, eagerly anticipating the chance to break out the audiovisual equipment and get reacquainted with the treasures of our past. I found information about Amelia Earhart and the Woman Airforce Service Pilots (WASPS) of Oklahoma who bravely contributed during World War II. People like Betty Riddle of Tulsa, Oklahoma are our very own Wonder Women. Talk about girl power.  There was information about Clara Luper, known in Oklahoma as the mother of the sit-in and a pioneering leader during the American Civil Rights Movement. She marched with Dr. Martin Luther King, Jr. I found information about “Pistol Pete” Eaton and black and white footage of the Land Run of 1889, as well as Quanah Parker the great Comanche leader. I was just scratching the surface! Thanks to the American Archive of Public Broadcasting @amarchivepub a collaboration between the Library of Congress and WGBH this will be available to people throughout the United States from a centralized web portal at online at americanarchive.org.

Digitizing the At-Risk Material: Collaboration is the Key to Success

I was chomping at the bit and excited to exercise what I learned during Immersion Week hosted at WGBH Education Foundation in Boston. Like any worthwhile venture, I had setbacks of my own to overcome; but if I learned anything from the material selected for this collection, I learned that adversity is just a temporary setback that can be endured with perseverance. I counted my setbacks as badges of honor, which were many. We experienced setbacks regarding copyright issues. We had equipment issues right out of the gate. We had a BetacamSP deck that worked for two seconds. We had issues getting the older technology to play nicely with the new technology. I had so much support and help from my Academic Advisor here at the University of Oklahoma School of Library and Information Studied as well as from the staff in our SLIS office, my local mentor Lisa Henry and OU technical support Gary Bates, all of whom devoted countless hours trying to get our equipment up and running.

digitization
School of Library and Information Studies at the University of Oklahoma’s Digitizing Station with a BetacamSP and DVCPro decks.

I also had great support from Janette Thornbrue at OETA. I can’t say enough about how wonderful everyone has been through this entire process. The collaboration between AAPB, WGBH, OETA, and the University of Oklahoma models the kind of collaboration needed to effectively provide access to training in audiovisual preservation, allowing for a pool of resources and support to future archivists on a local as well as national level. I feel so blessed to be part of such a wonderful program!

From left to right: OETA Host Station Mentor Janette Thornbrue, Director of SLIS and Project Advisor Dr. Susan Burke; Political Commercial Archivist and Local Mentor Lisa Henry; Public Broadcasting Preservation Fellow Spring 2018, Evelyn Cox

Written by Evelyn Cox, PBPF Spring 2018 Cohort

*******************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

 

Virginia Angles, Public Broadcasting Preservation Fellow at GPB

******************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

Dena Schulze, Public Broadcasting Preservation Fellow at WUNC

My name is Dena Schulze and I am the Public Broadcasting Preservation fellow partnered with WUNC radio station in Chapel Hill, North Carolina and the University of North Carolina at Chapel Hill. I graduate in May from the Archives and Records Management track in the Library Science School at UNC. It has been my privilege to digitize over 170 assets from WUNC radio station that were deemed at risk.  Formats included CDs, cassettes and DAT tapes. Check out some pictures and ramblings about my experience below!

WUNC-FM

Time Travelin’ with WUNC

Every time I put on the headphones, cue up the tape or CD and press record it’s like stepping into a time machine! I had noise reducing headphones that allowed me to be totally immersed in the recordings. Shows at WUNC that I digitized were mostly weekly talk shows about current events and the people, places and things of North Carolina. There were also special programs and recordings that changed up the monotony of talk shows. I enjoyed learning about the state that I have called home for the last fifteen years. Over the course of the fellowship I was able to digitize about 170 assets and learned so much about both the process and the content. Here are a few key words that summarize my experience:

Relevance

There were times when I was listening to a talk show or news segment and if you had changed the names and dates, I would have thought it was a current broadcast. Topics included poverty, politics, abortion, economics, gay marriage, health care, etc. These issues are still constantly in the news and being debated in our country. While I was listening to people talk about these issues 5, 10, 20 years ago it brought a new perspective to the news I was reading about in the present. Will we ever solve these problems or end the debate? Maybe not but I think the continuing discussion is vital and looking back on what has been said before can help the present conversation move forward.

Appreciation

Many of the shows and recordings also featured performing arts and music. Gary Shivers on Jazz played collections of jazz music, including an episode on Frank Sinatra and Ella Fitzgerald which I thoroughly enjoyed. The first episode of The Linda Belans show focused on television, specifically the popular shows airing at the time: Friends and Frasier. There was also a collection of short stories recorded by authors including Lee Smith and Haven Kimmel. As someone who loves the arts, I loved this theme throughout the assets and listening to things I would never have heard of otherwise.

Treasures

Cueing up a tape was almost like going on a treasure hunt! The titles of the episode didn’t necessarily tell me what I was going to be listening to for the next hour or so. Sometimes they were pretty simple: “Ray Bradbury” was a conversation with the famous author. Others had one description or name but that was only part of the tape. I was surprised to discover a whole segment on the art of fiddling and another interview featuring actress Amy Adams at the beginning of her career. Some did not even have a description on the tape and that content was a total surprise! Kept me on my toes!

desk-e1524585308344.jpg

North Carolina!

As mentioned above, I have lived in North Carolina for the past fifteen years and felt a strong connection to the shows focusing on the people, places and issues of the state. One show discusses a school being built near where I lived and I had no idea its history and beginning. Another had an interview with Dr. William Friday, who is basically North Carolina royalty and at one time was the president of the University of North Carolina system. Every recording dealt with a person, issue or place concerning the state of North Carolina. It gave me a greater knowledge and appreciation for the state I call home!

Flexibility

This word describes more of the process than the content. Because we were creating the workstation and workflow from the ground up, there were a lot of hiccups to work through. Equipment did not arrive on time or did not work properly, the computer did not read the CDs or programs correctly, miscommunication in emails are just a few examples. I had to be ready to move onto another part of the fellowship while other factors were figured out or fixed. Once the workstation and workflow were set up, everything ran a lot smoother but it takes time to get all the different pieces working together. I found it vital that I had mentors and professionals at my university and at the station to ask for help and I would not have gotten the workstation up and running without them!

I had so much fun immersing myself in recordings from the past and learning some history! I think these recordings are going to be so valuable on the AAPB website and I am so glad I was able to help get them online!

– Written by PBPF Fellow Dena Schulze

*********************

About PBPF

The Public Broadcasting Preservation Fellowship (PBPF), funded by the Institute of Museum and Library Services, supports ten graduate student fellows at University of North Carolina, San Jose State University, Clayton State University, University of Missouri, and University of Oklahoma in digitizing at-risk materials at public media organizations around the country. Host sites include the Center for Asian American Media, Georgia Public Broadcasting, WUNC, the Oklahoma Educational Television Authority, and KOPN Community Radio. Contents digitized by the fellows will be preserved in the American Archive of Public Broadcasting. The grant also supports participating universities in developing long-term programs around audiovisual preservation and ongoing partnerships with their local public media stations.

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work.

Upcoming AAPB Webinar Featuring Kathryn Gronsbell, Digital Collections Manager at Carnegie Hall

DUuoHWcVQAABjGA

Photo courtesy of Rebecca Benson, @jeybecques, PBPF Fellow at University of Missouri.

This Thursday, March 15th at 8 pm EST, American Archive of Public Broadcasting (AAPB) staff will host a webinar with Kathryn Gronsbell, Digital Collections Manager at Carnegie Hall and will cover topics in documentation, including why documentation is important, what to think about when recording workflows for future practitioners, and where to find examples of good documentation in the wild.

The public is welcome to join for the first half hour. The last half hour will be limited to Q&A with our Public Broadcasting Preservation Fellows, who have now begun to inventory their digitized public broadcasting collections to be preserved in the AAPB.

Webinar URL: http://wgbh1.adobeconnect.com/documentation/

For anyone who missed the last webinar on tools for Quality Control, it’s now also available for viewing through this link: http://wgbh1.adobeconnect.com/psv1042lp222/.

*******************************

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work: digitized content preserved in the American Archive of Public Broadcasting from our collaborating host organizations WUNCKOPNOklahoma Educational Television AuthorityGeorgia Public Broadcasting, and the Center for Asian American Media as well as documentation created to support ongoing audio and video preservation education at the University of MissouriUniversity of OklahomaClayton State UniversityUniversity of North Carolina at Chapel Hill, and San Jose State University.

Celebrate Women’s History Month by Preserving Women’s Voices in Public Media

One of the most fascinating aspects of the American Archive of Public Broadcasting (AAPB) is discovering how local broadcasting stations used their platforms to communicate national issues to local audiences.

As second-wave feminism gained momentum between the years 1960 to 1980, WNED from Buffalo, New York documented the movement’s ripple effect in a half-hour public affairs talk show series titled Woman.  Syndicated by over 200 PBS stations during the years 1973-1977, Woman was the only year-round, national public television forum where a wide variety of national experts provided perspectives on the (then) evolving world of women’s history.

To celebrate this milestone in women’s public media history, the American Archive of Public Broadcasting (AAPB) launched a new Special Collection featuring the Woman series! Over 190 episodes are available online via the AAPB website: http://americanarchive.org/special_collections/woman-series.

Screen Shot 2018-03-06 at 10.10.46 AM.png
Woman Series, WNED – Buffalo, NY (1973-1977)

The AAPB invites you to celebrate Women’s History Month by helping preserve and make accessible six Woman transcripts. We’re launching a demo-version of our *NEW* transcript editor tool FIX IT+, a line-by-line editing platform initially developed by the New York Public Library. The six featured interviews include conversations with Gloria Steinem (editor and co-founder of Ms. Magazine), Dorothy Pitman Hughes (African American activist and co-founder of Ms. Magazine), Betty Friedan (author of The Feminine Mystique), Nora Ephron (editor for Esquire magazine and the author of the best-selling book Crazy Salad), Marcia Ann Gillespie (editor-in-chief of Essence Magazine and a board member of Essence communications), Connie Uri, M.D. (on the National Board of Research on the Plutonium Economy and the advisory board of NASC, the Native American Solidarity Committee), and Marie Sanchez (Chief Judge of the Northern Cheyenne Tribe, member of the Indian Women United for Social Justice).

These transcripts will be made available online through the AAPB’s website, allowing women’s voices in public media to be more readily searchable and accessible for future generations.

Below are sample recordings of the six interviews mentioned above. Search the Woman Special Collection for more interviews with activists, journalists, writers, scholars, lawyers, artists, psychologists, and doctors, covering topics such as women in sports, the Equal Rights Amendment, sexuality, marriage, women’s health, divorce, the Women’s Liberation Movement, motherhood, and ageism, among others.

Direct link to FIX IT+: http://54.205.165.195.xip.io/

Sample Recordings of Featured Transcripts:

Connie Uri, M.D. and Marie Sanchez, Chief Judge of the Northern Cheyenne Tribe, FIX IT+ Transcript: http://54.205.165.195.xip.io/transcripts/cpb-aacip_81-67wm3fxh

Marcia Ann Gillespie, FIX IT+ Transcript: http://54.205.165.195.xip.io/transcripts/cpb-aacip_81-69z08t6x

Nora Ephron, FIX IT+ Transcript: http://americanarchive.org/catalog/cpb-aacip_81-988gttr0

Gloria Steinem, FIX IT+ Transcript: http://americanarchive.org/catalog/cpb-aacip_81-57np5qgv

Betty Friedan, FIX IT+ Transcript: http://americanarchive.org/catalog/cpb-aacip_81-9995xhm0

Dorothy Pitman Hughes, FIX IT+ Transcript: http://54.205.165.195.xip.io/transcripts/cpb-aacip_81-59c5b5nr

Written by Ryn Marchese, AAPB Engagement and Use Manager

Upcoming Webinar: AAPB’s Quality Control Tools and Techniques for Ingesting Digitized Collections

static1.squarespace.jpg

Oklahoma mentor Lisa Henry (left) cleaning a U-matic deck with Public Broadcasting Preservation Fellow Tanya Yule.

This Thursday, February 15th at 8 pm EST, American Archive of Public Broadcasting (AAPB) staff will host a webinar covering quality control tools and technologies used when ingesting digitized collections into the AAPB archive, including MDQC, MediaConch, Sonic Visualizer, and QCTools.

The public is welcome to join for the first half hour. The last half hour will be limited to Q&A with our Public Broadcasting Preservation Fellows, who are just now beginning the process of digitizing at-risk public broadcasting collections to be preserved in the AAPB.

Webinar URL: http://wgbh1.adobeconnect.com/psv1042lp222/

*******************************

For more updates on the Public Broadcasting Preservation Fellowship project, follow the project at pbpf.americanarchive.org and on Twitter at #aapbpf, and come back in a few months to check out the results of their work: digitized content preserved in the American Archive of Public Broadcasting from our collaborating host organizations WUNCKOPNOklahoma Educational Television AuthorityGeorgia Public Broadcasting, and the Center for Asian American Media as well as documentation created to support ongoing audio and video preservation education at the University of MissouriUniversity of OklahomaClayton State UniversityUniversity of North Carolina at Chapel Hill, and San Jose State University.

 

Resources Roundup: AAPB Presentations from 2017 AMIA Conference

DRq7ymbVwAE8zFi

Earlier this month the American Archive of Public Broadcasting staff hosted several workshops at the 2017 Association of Moving Image Archivists (AMIA) conference in New Orleans. Their presentations on workflows, crowdsourcing, and best copyright practices are now available online! Be sure to also check out AMIA’s YouTube channel for recorded sessions.

THURSDAY, November 30th

  • PBCore Advisory Sub-Committee Meeting
    Rebecca Fraimow reported on general activities of the Sub-Committee and the PBCore Development and Training Project. The following current activities were presented:

PBCore Cataloging Tool (Linda Tadic)
PBCore MediaInfo updates (Dave Rice)
ProTrack integration (Rebecca Fraimow)
Updated CSV templates (Sadie Roosa)
PBCore crosswalks (Rebecca Fraimow and Sadie Roosa)

FRIDAY, Dec 1st

Archives that hold A/V materials are at a critical point, with many cultural heritage institutions needing to take immediate action to safeguard at-risk media formats before the content they contain is lost forever. Yet, many in the cultural heritage communities do not have sufficient education and training in how to handle the special needs that A/V archive materials present. In the summer of 2015, a handful of archive educators and students formed a pan-institutional group to help foster “educational opportunities in audiovisual archiving for those engaged in the cultural heritage sector.” The AV Competency Framework Working Group is developing a set of competencies for audiovisual archive training of students in graduate-level education programs and in continuing education settings. In this panel, core members of the working group will discuss the main goals of the project and the progress that has been made on it thus far.

Born-Digital audiovisual files continue to present a conundrum to archivists in the field today: should they be accepted as-is, transcoded, or migrated? Is transcoding to a recommended preservation format always worth the potential extra storage space and staff time? If so, what are the ideal target specifications? In this presentation, individuals working closely with born-digital audiovisual content from the University of North Carolina, WGBH, and the American Folklife Center at the Library of Conference will present their own use cases involving collections processing practices, from “best practice” to the practical reality of “good enough”. These use cases will highlight situations wherein video quality, subject matter, file size and stakeholder expectations end up playing important roles in directing the steps taken for preservation. From these experiences, the panel will put forth suggestions for tiered preservation decision making, recognizing that not all files should necessarily be treated alike.

  • Crowdsourcing Anecdotes

How does the public play a role in making historical AV content accessible? The American Archive of Public Broadcasting has launched two games that engage the public in transcribing and describing 70+ years of audio and visual content comprising more than 50,000 hours.

 THE TOOLS: 

(Speech-to-Text Transcript Correction) FIX IT is an online game that allows the public to identify and correct errors in our machine-generated transcripts. FIX IT players have exclusive access to historical content and long-lost interviews from stations across the country.

AAPB KALDI is a tool and profile for speech-to-text transcription of video and audio, released by the Pop Up Archive and made available on Github at github.com/WGBH/american-archive-kaldi.

(Program Credits Cataloging) ROLL THE CREDITS is a game that allows the public to identify and transcribe information about the text that appears on the screen in so many television broadcasts. ROLL THE CREDITS asks users to collect this valuable information and classify it into categories that can be added to the AAPB catalog. To accomplish this goal, we’ve extracted frames from uncataloged video files and are asking for help to transcribe the important information contained in each frame.

20171201_182116.jpg

SATURDAY, Dec 2nd

Digitized collections often remain almost as inaccessible as they were on their original analog carriers, primarily due to institutional concerns about copyright infringement and privacy. The American Archive of Public Broadcasting has taken steps to overcome these challenges, making available online more than 22,000 historic programs with zero take-down notices since the 2015 launch. This copyright session will highlight practical and successful strategies for making collections available online. The panel will share strategies for: 1) developing template forms with standard terms to maximize use and access, 2) developing a rights assessment framework with limited resources (an institutional “Bucket Policy”), 3) providing limited access to remote researchers for content not available in the Online Reading Room, and 4) promoting access through online crowdsourcing initiatives.

20171202_101425.jpg

The American Archive of Public Broadcasting seeks to preserve and make accessible significant historical public media content, and to coordinate a national effort to save at-risk public media recordings. In the four years since WGBH and the Library of Congress began stewardship of the project, significant steps have been taken towards accomplishing these goals. The effort has inspired workflows that function constructively, beginning with preservation at local stations and building to national accessibility on the AAPB. Archivists from two contributing public broadcasters will present their institutions’ local preservation and access workflows. Representatives from WGBH and the Library of Congress will discuss collaborating with contributors and the AAPB’s digital preservation and access workflows. By sharing their institutions’ roles and how collaborators participate, the speakers will present a full picture of the AAPB’s constructive inter-institutional work. Attendees will gain knowledge of practical workflows that facilitate both local and national AV preservation and access.

As an increasing number of audiovisual formats become obsolete and the available hours remaining on deteriorating playback machines decrease, it is essential for institutions to digitize their AV holdings to ensure long-term preservation and access. With an estimated hundreds of millions of items to digitize, it is impractical, even impossible, that institutions would be able to perform all of this work in-house before time runs out.  While this can seem like a daunting process, why learn the hard way when you can benefit from the experiences of others? From those embarking on their first outsourced AV digitization project to those who have completed successful projects but are looking for ways to refine and scale up their process, everyone has something to learn from these speakers about managing AV digitization projects from start to finish.

How do you bring together a collection of broadcast materials scattered in various geographical locations across the country? National Education Television (NET), the precursor to PBS, distributed programs nationally to educational television stations from 1954-1972. Although this collection is tied together through provenance, it presents a challenge to processing due to differing approaches in descriptive practices across many repositories over many years. By aggregating inventories into one catalog and describing titles more fully, the NET Collection Catalog will help institutions holding these materials make informed preservation decisions. By its conclusion, AAPB will publish an online list of NET titles annotated with relevant descriptive information culled from NET textual records that will greatly improve discoverability of NET materials for archivists, scholars, and the general public. Examples of specific cataloging issues, including contradictory metadata documentation and legacy records, inconsistent titling practices, and the existence of international version will be explored.

download.jpg

ABOUT THE AAPB

The American Archive of Public Broadcasting (AAPB) is a collaboration between the Library of Congress and the WGBH Educational Foundation to coordinate a national effort to preserve at-risk public media before its content is lost to posterity and provide a central web portal for access to the unique programming that public stations have aired over the past 70 years. To date, over 50,000 hours of television and radio programming contributed by more than 100 public media organizations and archives across the United States have been digitized for long-term preservation and access. The entire collection is available on location at WGBH and the Library of Congress, and almost 25,000 programs are available online at americanarchive.org.