AAPB Transcription Workflow, Part 1

The AAPB started creating transcripts as part of our “Improving Access to Time-Based Media through Crowdsourcing and Machine-Learning” grant from the Institute of Museum and Library Services (IMLS). For the initial 40,000 hours of the AAPB’s collection, we worked with Pop Up Archive to create machine-generated transcripts, which are primarily used for keyword indexing, to help users find otherwise under-described content. These transcripts are also being corrected through our crowdsourcing platforms FIX IT and FIX IT+.

As the AAPB continues to grow its collection, we have added transcript creation to our standard acquisitions workflow. Now, when the first steps of acquisition are done, i.e., metadata has been mapped and all of the files have been verified and ingested, the media is passed in to the transcription pipeline. The proxy media files are either copied directly off the original drive or pulled down from Sony Ci, the cloud-based storage system that serves americanarchive.org’s video and audio files. These are copied into a folder on the WGBH Archives’ server, and then they wait for an available computer running transcription software.

Dockerized Kaldi

The AAPB uses the docker image of PopUp Archive’s Kaldi running on many machines across WGBH’s Media Library and Archives. Rather than paying additional money to run this in the cloud or on a super computer, we decided to take advantage of the resources we already had sitting in our department. AAPB and Archives staff at WGBH that regularly leave their computers in the office overnight are good candidates for being part of the transcription team. All they have to do is follow instructions on the internal wiki to install Docker and a simple Macintosh application, built in-house, that runs scripts in the background and reports progress to the user. The application manages launching Docker, pulling the Kaldi image (or checking that you already have it pulled), and launching the image. The user doesn’t need any specific knowledge about how Docker images work to run the application. That app gets minimized on the dock and continues to run in the background as the staff members goes about their work during the day.* But that’s not all! When they leave for the night and their computer typically wouldn’t be doing anything, it continues to transcribe media files, making use of processing power that we were already paying for but hadn’t been utilizing.

*There have been reports of systems being perceptively slower when running this Docker image throughout the day. It has yet to have a significant impact on any staff member’s ability to do their job.

Square application window that shows list of transcripts that have been processed
Application user-interface

Centralized Solution

Now, we could just have multiple machines running Kaldi through Docker and that would let us create a lot of transcripts. However, it would be cumbersome and time-consuming to split the files into batches, manage starting a different batch on each computer, and collect the disparate output files from various machines at the end of the process. So we developed a centralized way of handling the input and output of each instance of Kaldi running on a separate machine.

That same Macintosh application that manages running the Kaldi Docker image also manages files in a network-shared folder on the Archives server. When a user launches the application, it checks that specific folder on the server for media files. If there are any media files in that folder, it takes the oldest file, copies it locally and starts transcribing it. When Kaldi has finished transcribing it, the output text and json formatted transcripts are copied to a subfolder on the Archives server, and the copy of the media file is deleted. Then the application checks the folder again, picks up the next media file, and the process continues.

Screenshot of a file directory with many .mp4 files, a few folders, and a few files named with base64 encoded strings
Files on the Archives server: the files at the top are waiting to be processed, the files near the bottom are the ones being processed by local machines

Avoiding Duplicate Effort

Now, since we have multiple computers running in parallel, all looking at the same folder on the server, how do we make sure that multiple computers aren’t duplicating efforts by transcribing the same file? Well, the process first tries to rename the file to be processed, using the person’s name and a base-64 encoding of the original filename.  If the renaming succeeds, the file is copied into the Docker container for local processing, and the process on every other workstation will ignore files named that way in their quest to pick up the oldest qualifying file. After a file is successfully processed by Kaldi, it is  then deleted, so no one else can pick it up. When Kaldi fails on a file, then the file on the server is renamed to its original file name with “_failed” appended, and again the scripts know to ignore the file. A human can later go in to see if any files have failed and investigate why. (It is rare for Kaldi to fail on an AAPB media file, so this is not part of the workflow we felt we needed to automate further).

Handling Computer and Human Errors

The centralized workflow relies on the idea that the application is not quitting in the middle of a transcription. If someone shuts their laptop, the application will stop, but when they open it again, the application will pickup right where it left off. It will even continue transcribing the current file if the computer is not connected to the WGBH network, because it maintains a local copy of the file that is processing. This allows a little flexibility in terms of staff taking their computers home or to conferences.

The problem starts when the application quits, which could occur when someone quits it intentionally, someone accidentally hits the quit button rather than the minimize button, someone shuts down or restarts their computer, or a computer fails and shuts itself down automatically. We have built the application to minimize the effects of this problem. When the application is restarted it will just pick up the next available file and keep going as if nothing happened. The only reason this is a problem at all is because the file they were in the middle of working on is still sitting on the Archives server, renamed, so another computer will not pick it up.

We consider these few downsides to this set up completely manageable:

  • At regular intervals a human must look into the folder on the server to check that a file hasn’t been sitting renamed for a long time. These are easy to spot because there will be two renamed files with the same person’s name. The older of these two files is the one that was started and never finished. The filename can be changed to its original name by decoding the base-64 string. Once the name is changed, another computer will pick up the file and start transcribing.
  • Because the file stopped being transcribed in the middle of the process, the processing time spent on that interrupted transcription is wasted. The next computer to start transcribing this file will start again at the beginning of the process.

Managing Prioritization

Because the AAPB has a busy acquisitions workflow, we wanted to make sure there was a way to manage prioritization of the media getting transcribed. Prioritization can be determined by many variables, including project timelines, user interest, and grant deadlines. Rather than spending a lot of time to build a system that let us track each file’s prioritization ranking, we opted for a simpler, more manual operation. While it does require human intervention, the time commitment is minimal.

As described above, the local desktop applications only look in one folder on the Archives server. By controlling what is copied into that folder, it is easy to control what files get transcribed next. The default is for a computer to pick up the oldest file in the folder. If you have a set of more recent files that you want transcribed before the rest of the files, all you have to do is remove any older files from that folder. You can easily put them in another folder, so that when the prioritized files are completed, it’s easy to move the rest of the files into the main folder.

For smaller sets of files that need to be transcribed, we can also have someone who is not running the application standup an instance of dockerized Kaldi and run the media through it locally. Their machine won’t be tied into the folder on the server, so they will only process those prioritized files they feed Kaldi locally.

Transforming the Output

At any point we can go to the Archives server and grab the transcripts that have been created so far. These transcripts are output as text files and as JSON files which pair time-stamp data with each word. However, the AAPB prefers JSON transcripts that are time-stamped at each 5-7 second phrase.

We use a script that parses the word-stamped JSON files and outputs phrase-stamped JSON files.

Word time-stamped JSON

Screenshot from a text editor showing a json document with wrapping json object called words with sub-objects with keys for word, time, and duration
Snippet of Kaldi output as JSON transcript with timestamps for each word

Phrase time-stamped JSON

Screenshot from a text editor of JSON with a container object called parts and sub-objects with keys text, start time, and end time.
Snippet of transformed JSON transcript with timestamps for 5-7 second phrases

Once we have the transcripts in the preferred AAPB format, we can use them to make our collections more discoverable and share them with our users. More on the part of the workflow in Part 2 (coming soon!).

Summer in the City: Farmers’ Markets and Their Origins

As an intern at the American Archive of Public Broadcasting at WGBH, I am living in Boston for the first time. I’ve decided to make it my goal to explore the city and since it’s summertime, the sun is out and beckoning the city’s inhabitants to head outside. One popular activity is frequenting the farmers’ markets that Boston has to offer! The City of Boston reports that it handles almost thirty markets, but that number doesn’t even include the numerous markets that are in the surrounding suburbs. But when did farmers’ markets become so popular? We might take their existence for granted now, but they haven’t always had the thriving customer base they do today. Looking through content at the American Archive of Public Broadcasting, we can see how farmers’ markets have evolved throughout the years.

In July of 1978, a Boston WGBH production called GBH Journal presented a story about a farmers’ market in Dorchester. In the story, the reporter explains how the markets provide benefits for both the farmers and the buyers. For example, the farmers can “bypass the middle person” and the consumers pay “less for their produce and also get fresh, nutritious vegetables and fruits for their money.” The program also describes how farmers’ markets aid the economy in Massachusetts by providing an economic boost for struggling farmers and an affordable food source for lower-income citizens. This same farmers’ market in Dorchester still runs today at Fields Corner every Saturday from 9 a.m. to 12 p.m.

Farmers’ markets continued to grow in popularity throughout the country, and in 2007, Tampa public broadcasting station WEDU ran a storyScreen Shot 2017-06-30 at 11.55.36 AM about a popular market in Sarasota, Florida on its series Gulf Coast Journal with Jack Perkins. Featured is a local citrus farmer, Tim Brown of Brown’s Grove Citrus and Produce. Brown talks about the high quality of his family’s produce, emphasizing its freshness: “the citrus that we pick on Friday night is on the street Saturday morning.” Tony Souza from the Downtown Partnership of Sarasota explains the market’s popularity in the clip, stating that “the locals come up because it’s the thing to do.” Throughout the story, the program highlights the community involvement found at farmers’ markets as a main attraction. Like the Dorchester market in Boston, the Sarasota farmers’ market still runs every Saturday. The Brown family even still sells their produce.

Screen Shot 2017-06-30 at 2.05.19 PM

To understand why farmers’ markets are popular today, it is helpful to understand how organic and small farmers gained prevalence in an industry that favors corporate, high-quantity producers. In September of 2004 at Washington State University, Northwest Public Television recorded a presentation by the former Executive Director of the Organic Farming Research Foundation (OFRF), Bob Scowcroft. In this talk, Scowcroft discusses how the OFRF assisted in bringing national attention to organic farming, citing press interviews, conferences, and researching for grants as key factors to its success. He also reads a passage from a 1970 LIFE magazine, quoting “the ideas are simple and appealing: we eat too much, mostly of the wrong things; our food comes to us not as nature intended, but altered by man during both growth and processing.” As a pioneer in organic farming, Scowcroft offers insight to how organic, small farming has grown throughout the years and the challenges it still faces.

Today, farmers’ markets continue to flourish. In 1978, public broadcasting aimed to inform the public about the basic facts of farmers’ markets. Thirty years later in 2007, public broadcasting instead demonstrated how farmers’ markets had become a community staple where people from different backgrounds could come together to support the local economy. These markets remain an excellent way to learn, explore, and enjoy a variety of unique and vibrant cultural areas all over the United States and even beyond its borders.

hannah_gore_headshotThis post was written by Hannah Gore, AAPB Intro to Media Archives Intern and student at Dickinson College.

Forty Years through Women’s Healthcare

As lawmakers currently decide the future of American healthcare, many politicians and organizations are seizing the opportunity to express their own sentiments on the subject. A particularly hot topic has been how the new law will affect women, which has long been a controversial subject in the United States. The modern women’s healthcare movement has its roots in the figure of Margaret Sanger. Though she had to take temporary asylum in Europe after illegally distributing contraceptive information, Sanger eventually established the modern-day Planned Parenthood in 1921. To learn more about the evolution of women’s healthcare, content from the American Archive of Public Broadcasting facilitates an analysis of how public opinion of healthcare has developed over the last forty years.

In the 70s, WNScreen Shot 2017-06-22 at 12.05.17 PMED of Buffalo, N.Y. produced a series of interviews entitled “Woman.” On Dec. 4, 1975, WNED recorded an episode with breast cancer advocate Rose Kushner. Interviewed by Sandra Elkin, Kushner criticizes the routine medical practices taken in treatment of breast cancer in the United States. Instead, she praises the practices of other countries with nationalized healthcare systems: “There is a big difference in countries where medicine is nationalized,” she explains. For context, a timeline by PBS reports that in the 70s, healthcare was “seen as in crisis” due to the quickly rising costs.

Screen Shot 2017-06-22 at 1.06.18 PMAbout twenty years later in 1992, healthcare costs continued to rise due to economic inflation. On July 29, Kojo Nnmadi hosted a group of four women to speak on his WHUT-produced program, Evening Exchange, in Washington, D.C. The guests included physician Maggie Covington, naturopathologist Andrea Sullivan, president of the National Black Women’s Health project Julia Scott, and president of the Black Lesbian Support Group Cindy Smith. In the program, Scott explains her view that minority women do not receive enough care in medicine, stating that “there is a lot happening in our society that has to do with racism and classicism that makes poor women be much more ill than other segments of the population.” The program ends with the conclusion that women need greater representation in the healthcare system, as Cindy Smith asserts “it’s still a man’s world.”

Screen Shot 2017-06-22 at 2.08.15 PMAnother twenty years later, President Barack Obama signed into existence his healthcare reform bill in 2010. In a 2013 Louisiana Public Broadcasting broadcast, Donna Fraiche speaks at the Baton Rotary Club on the now three-year-old Affordable Care Act. Although moving away from the lens of the female experience, she details the high costs of American healthcare. Notably, Fraiche outlines that “we’re the richest country in the world, but we spend the most of our dollars on healthcare.” Her knowledge of Japan allows her to explain the differences in the American and Japanese systems, since she served as honorary consul-general of Japan for New Orleans.

Today, this passion for a stronger healthcare system thrives as the American people continue the journey to find a system that is beneficial for our country. Though laws and politics may continue to change, history affirms our dedication towards bettering our medical care. In these programs, the women all had drastically different perspectives and backgrounds, ranging from writers to lawyers to advocates. Yet, they all provided information that helped to develop and challenge public notions of healthcare in the United States.

This post was written by Hannah Gore, AAPB Intro to Media Archives Intern and student at Dickinson College.

Pride in 1978

“Come on out! Join us, bring a friend!” This joyous sentiment, spoken by Harvey Milk at the San Francisco parade for the 8th Gay Freedom Day in 1978, still resonates today. Over the past month, there have been countless events right here in Boston, and across the country, celebrating LGBTQ+ pride. In the last week of Pride Month, WGBH wanted to take the opportunity to look back on the history of the Pride movement in the United States through some broadcasts in the American Archive of Public Broadcasting.

This particular program is a sound portrait of one of the first pride parades in the United States, and captures the spirit of Pride that we still see today. Upon listening to the program, one can hear many chants and songs, some silly and lighthearted, like, “we’re here because we’re queer because we’re here because we’re queer,” and also serious, “Ain’t gonna let Anita Bryant turn us around,” referencing the famed anti-gay rights activist. Fresh off the heels of the Dade County-Miami decision, and just as the Briggs Initiative was proposed in California, sound bites and interviews in this program captures the sentiment of the LGBTQ+ movement in this moment in time.

Many of those participating in the parade were interviewed about the Briggs Initiative, also known as California Proposition 6, which would have banned gay men and lesbian women, as well as their straight allies, from working in the California public school system. This initiative was one of the most important gay rights issue in California at this time. One parade-goer states, “My friends, anyone who supports my right to be gay, can be fired, just for believing that. People think of it in terms of, it’s a gay rights issue, as opposed to being a free speech issue.” Another states, “If you’re white, male, straight, educated, and, uh, well-off, then that’s who gets the rights in this country. It’s been everyone else who’s had to go out into the streets and fight for their rights.” As is still true today, Pride parades have not only been a space for celebration, but also a space for activism, since their very beginnings.

One thing the listener notes is how diverse the interviewees are. They range from Vietnam veterans, to parents, to straight allies, to Mormons, to businessmen. There are people who believe that others are playing into gay stereotypes, and those who are completely unapologetic of their own flamboyance. There is even one man interviewed who does not even know that there is a gay pride parade happening until he is informed by the interviewer. All of these people unapologetically give their opinions on the parade, as well as gay rights, to the interviewers.

This sound portrait is only one of many broadcasts in the American Archive of Public Broadcasting that traces the LGBTQ+ movement in American history and broadcasting. As one parade-goer, Cathy Patterson, states, “Gay and straight are one and the same really, and we all have the same goal—or at least we should.”

You can enjoy more materials like this at http://americanarchive.org, under the LGBTQ tab in our browsing catalog.

This post was written by Olivia Hess, AAPB Intern and student at St. Lawrence University.

Using Linked Data for the NET Collection Catalog

Who I Am

I am Chris Pierce, the Cataloger/Metadata Specialist for the American Archive of Public Broadcasting and the National Educational Television (NET) Collection Catalog project at the Library of Congress. The NET Collection Catalog Project is a collaboration between WGBH and Library of Congress and funded by the Council on Library and Information Resources (CLIR). The NET project involves the creation of a national catalog of records documenting the existence and robust description of titles distributed by NET, public media’s first national network and its earliest and among its most at-risk content.

In addition to cataloging moving image material distributed by NET during the mid to late fifties to early seventies, I am also working on a feasibility report on the implementation of linked data for the NET catalog.

Linked data? Huh?

What is linked data? The Wikipedia definition is “a method of publishing structured data so that it can be interlinked.” To put it simply, linked data is data that can be linked to other data, very much like how browsers manage hyperlinks.

Why would we want to implement linked data? There are several reasons:

  • AAPB/NET metadata contains valuable and largely undiscovered relationships that, when reused by others on the internet, can enhance the information already online.
  • It would open AAPB/NET metadata to web applications and making the metadata more discoverable and shareable on the web
  • It would contribute to the sustainability of metadata creation for future cataloging at the AAPB with metadata that is more deeply connected to external metadata, which could then be reused for description of AAPB material

Very often we talk about linked data being actionable, by which we mean that the data can be linked to other data through Uniform Resource Identifiers (URIs) (or hyperlinks that direct the user to more information about the resource or property). A key part of being actionable is that data that has been designed to be interlinked in such a way can be said to be a node in a traversable “web” of data. Thus, the model for linked data is a graph, and linked datasets are typically modelled on a graph model rather than relational or hierarchical structures. It is very common to see linked data visualized through this sort of image:

Image from The Oracle Alchemist

These links are structured through relationships expressed as triples. In the image above, these triples are represented in graph form, but they can also be serialized in machine readable code. In both the serialization and the graph, these triples are logical statements:

This person [has]realName Stephen King

This person hasTwitter @StephenKing

@StephenKing hasContent [pictures of his dog Molly aka Thing of Evil]

A triple is simply a relationship between a subject and an object communicated through a predicate:

SUBJECT——PREDICATE——OBJECT

The data model that supports the exchange of data structured in this way (as a web of interlinked nodes connected through relationships expressed as triples) is the Resource Description Framework (RDF). RDF can be semantically structured through specifications that define what types of data are being modelled. For instance, the RDF schema (RDFs) is a data modelling vocabulary that can be used to define classes and possible relationships between classes. BIBFRAME is another vocabulary that is being developed by the Library of Congress to represent library bibliographic metadata in RDF. Another example is EBUCORE, a vocabulary designed by the European Broadcasting Union to support linked data in various stages of the life cycle of broadcasting material, including production, business, and archives. Vocabularies such as these are central  to having every object, subject, and predicate defined and expressed as Uniform Resource Identifiers (URIs) rather than literal string values (strings that are not actionable through links), and they expand upon the types of things that can be described as linked data (at various levels of granularity).

This framework of linked data advances the principles proposed by Tim Berners-Lee as the foundation of linked data:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF)
  4. Include links to other URIs, so that they can discover more things.

The NET project

The feasibility report on which my colleagues at the Library of Congress and I are working will focus on records generated through the NET catalog project (where I spend the majority of my day cataloging). We catalog these records in our content management system, MAVIS. MAVIS outputs the data to MAVISXML, which is a hierarchically structured format for representing metadata. We are looking at ways to transform MAVISXML to PBCORE (the XML schema in use by AAPB) and then to RDF linked data. We are examining existing technologies, vocabularies, and workflows, and identifying other problems we need to solve. The results of this research will be a benefit not only to the AAPB, but also to other cultural heritage institutions and the public broadcasting community taking efforts to implement linked data. I am currently on the “literature review” stage of the linked data research. Look forward to future posts about our process!

This post was written by Chris Pierce, AAPB and NET Cataloger/Metadata Specialist.

A Day In the Life of NET

Hi there! We’re part of the National Educational Television (NET) collection at the Library of Congress’s National Audiovisual Conservation Center (NAVCC) – maybe you’ve heard of us? Recently, the Council on Library and Information Resources (CLIR) funded the AAPB to complete the NET Collection Catalog Project, whereby some nifty catalogers are working to create fabulous descriptions of programs distributed by NET (1952-1972, which makes up some of the earliest public television content!). People know so little about us because, up until now, we’ve been stored in unprocessed collections! So we’re looking to get makeovers, too. We are happy here, NAVCC has optimal storage facilities for us – we’re stored at a cool 50 degrees with 30% relative humidity – but we would like it if people could find us more easily.

To give you a better idea of just what processing a film title in the collection entails, we’re going to give you an inside look. The first part of our journey? Getting pulled from the stacks, of course! When we’re pulled, we make our way down from the shelves, onto an obliging cart, and are rolled out of the vaults. Yippee!

But because we like it chilly, we don’t appreciate temperature shock. So we get wheeled into the acclimatization room, where we can get adjusted to the new climes.

After gradually thawing out, we get picked up. Today’s the big day, we’re getting processed today!

Quick, time to make a break for it!!

We find our way here, to a work bench, where the magic happens.

All right, Mr. DeMille, I’m ready for my close-up. We get pulled, one by one from the cart. But you can find a lot of great metadata on us, so all that info gets written down first for input into our collection database system later.

Sometimes when you open us up, there’s a prize inside! No, not of the Cracker Jack variety – these prizes come in the form of broadcast histories and/or condition assessments. They get re-foldered and stored safely away, too, but hey, this is about us, the NET film!

We get placed up on the spindle, ready to wind! (Good thing Sleeping Beauty isn’t a film archivist, whew.)

We’re going to transfer from an old reel onto a slick, plastic “core.” The core (you can see cores stored in the boxes below the bench) is fixed inside the split reel on the right.

When we’ve been wound through, I end up on the right now, wrapped around a core.

How embarrassing! Look away!

Like a beautiful butterfly, now that we’ve been transformed, we shed the old reel and accompanying film can (that is, they are promptly disposed of).

Ouch!

I’m then rehoused into a – blue, blue, ‘lectric blue (that’s the color of my room) – plastic can.

And I’m taken over to a computer, to complete my cataloging in the collection database system MAVIS.

And now for my favorite part! I get labeled with a Library of Congress item barcode, new rack number, and a snazzy title label so people can find me again!

Now I’m all set! Ahhh 1331 – I’ve always liked the sound of a palindrome. Now I’m headed back to the vaults to get some well-deserved shut-eye. Later!

This post was written by Susie Booth, NET Cataloger at NAVCC, on behalf of the NET film.

17k for 2017

AAPB is kicking off the new year by adding a lot more content to our Online Reading Room. We now have more than 17,000 historic public broadcasting programs available for anyone in the United States to watch or listen to on our site!

Highlights from the newly available recordings include:

evening_exchange

Episodes of WHUT’s Evening Exchange, including this episode on The Future of the Black Family (see left). Evening Exchange is a series featuring discussions with “writers, philosophers and newsmakers whose work offers insight into the black community.”

  • Episodes of the children’s radio series, Afield with Ranger Mac, which was broadcast on Wisconsin Public Radio as part of the Wisconsin School of the Air.
  • A speech by a United Mine Workers of America official recorded for the Appalshop documentary UMWA 1970: A House Divided.
  • Episodes of WFMU’s series Wasted Vinyl, including this interview with Joseph Shabalala, founder of Ladysmith Black Mambazo.modoc
  • A locally-produced chronicle of the Modoc War (1872 – 1873) and Modoc leader, Captain Jack from Southern Oregon Public Television’s collection (see right).
  • Episodes of Iowa Press, including this one about Rural Poverty. Iowa Press is a news talk show, featuring an in-depth news report on one topic each episode, followed by a conversation between experts on the issue.

Overall, the new content in the ORR includes recordings from 23 different organizations across the country:

We are very excited to continue making more historic public media available again to the American public, helping to fulfill public media’s mission to enlighten, inspire, and educate its audiences.

Cataloging Old Wazzu: AAPB and Northwest Public Television

This post was written by Caitlin Sanders, student at Simmons College and intern at the AAPB.

As a longtime fan of public television (from “Arthur” to “Masterpiece”), and a current graduate student studying archives and library science at Simmons College, I feel fortunate to have spent this past fall as a cataloging intern for the American Archive of Public Broadcasting, at the beautiful WGBH studio headquarters. Despite this amazing and surreal experience, being in Boston does not mean I never get homesick for my native state of Washington. Therefore, I was extremely pleased with the project I was assigned: describing videos from Northwest Public Television (KWSU/KTNW), a public broadcasting station associated with Washington State University.

You don’t have to be a “coug” to appreciate archival programming from this station. Content varies from lecture series coverage, to dramatic re-enactments, to concerts and sports coverage. I might also add that its videos of moose (including the famous Morty) and of black bear cubs frolicking about the academic campus are among the most adorable animal videos that I have seen in quite some time.

In all seriousness, the significance of this collection should not be undermined. Alone and together, these videos work to tell the story of Washington state’s past. Difficult subjects, such as the 1979 bombing of the Streit-Perham residence hall, are documented for posterity with forensic footage of the ruined building. Reflections on national disasters, such as the environmental impact of the eruption of Mt. Saint Helens, can be referred to in the event of future catastrophe. Washington State University’s programming is also important in its attempt to include more voices in the presentation of history. Local interviews from the PBS series “Our Neighbors’ Stories” record the experiences of African Americans who worked at the Hanford site as part of the Manhattan Project. Similarly, “South by Northwest: Blacks in the Pacific Northwest” is an earnest attempt to accessibly dramatize the experiences of African Americans as they moved to the Pacific Northwest. Even if the 1976-1981 series occasionally shows its age, it nonetheless stands as a record for a perspective often untold in standard American history classes.

I am proud to announce that these videos are now all cataloged, many can be viewed in the AAPB Online Reading Room, and all of them are available on location at WGBH and the Library of Congress. I encourage you to check them out!

List of Early Public Television Content: An NET Project Update

We are excited to post lists of NET Series Titles and Individual Program Titles on the AAPB website, as part of the National Educational Television (NET) Collection Catalog Project, funded by the Council on Library and Information Resources (CLIR). To read more about the history and significance of NET, public television’s first national programming network, check out our September update.

To begin this project, we needed to determine what content should be part of the “NET Collection.” Since there is no single complete list of programs distributed by NET, we’ve been working very hard to cobble together the most comprehensive list possible. So far we’ve compared titles from:

  • NET’s Program files
  • NET’s Flexible Service Catalog
  • WGBH databases
  • Library of Congress original inventory printouts
  • Additional inventories created for and by the Library of Congress and PBS

From these sources, we’ve gathered additional metadata. Often we could identify broadcast years, producers, runtimes, original formats and color. We’ve included this information in our title lists, and we’re hoping it will help institutions identify any NET content held in their collections.

The next phase of the project is locating NET media assets at archives and public media organizations across the country. In the next few months we’ll add episode title information to the Series Title list and contact institutions with NET content, as well as all previous producing stations. As we locate relevant materials, we’ll build inventory level records and add them to our database. By the end of the project, people will be able to see where copies of the content exist, and we’ll be able to better prioritize digitization and preservation efforts. If you have NET materials in your collection, we’d really appreciate it if you reached out to us. You can contact Sadie Roosa at sadie_roosa@wgbh.org.

Cataloging the Earliest Public Television Content: An NET Project Update

In January 2015, we announced that WGBH and the Library of Congress, on behalf of the AAPB, were awarded a grant by the Council on Library and Information Resources (CLIR) to catalog the National Educational Television (NET) programs. We’re so excited to be working on this project to further the mission of the AAPB, to preserve and make accessible significant historic content created by public media and to coordinate a national effort to save it before it is lost to posterity.

NET was public television’s first national programming network, the precursor to PBS, and NET titles are among public media’s earliest and most at-risk content. The NET Collection includes 8,000–10,000 programs produced by more than 30 stations across the country from 1952-1972, a period marked by societal and cultural shifts of great importance. Public television itself changed significantly during this time. From its early dedication to childhood and adult education, NET by 1963 transitioned to serving adult audiences with documentaries exploring citizenship issues of urgency and cultural programming dedicated to the arts, humanities and sciences.

NET programs often covered internationally relevant topics and events, including new scientific research, the Vietnam War, the Civil Rights Movement, the treatment of prisoners in America, the Cuban Missile Crisis, the environment, various new approaches to human psychology, senior citizens, poverty, space exploration, and critical analysis of modern art.

In our previous digitization project, the earliest video formats that we digitized were 3/4-inch. The majority of the NET collection is on 16mm film, 2″ and 1″ videotape, and copies exist at multiple locations including the Library of Congress, Indiana University, WNET, WGBH and other stations that produced for NET. Before we can prioritize these materials for digitization and preservation, we need to know what the titles are, where they exist and who has the best copy.

We have been working hard on the first phase of the project, which includes developing a complete title list, or at least one that’s as complete as possible. We’ve gathered titles and other descriptive information from a variety of sources including:
● WGBH databases
● Library of Congress’s original inventory printouts
● Microfiche of NET program records
● NET’s Flexible Service Catalog
● Additional inventories created at the Library of Congress and PBS

The majority of these were only available on paper, some even on handwritten lists. We were able to OCR a few sources, while others had to be manually transcribed. Once we had the titles from each source stored electronically, we were able to compare them with each other. The resulting list includes more than 500 series, with over 8,500 episodes, as well as over 800 individually broadcast programs. We’re working on getting the list ready to publish on the AAPB website, so that collection holders and NET-era producers will be able to see which titles NET distributed, and see if any of these titles exist in their own collections.

Starting with an authoritative title list is important because it will help us clear up potential duplication of titles and duplicated preservation efforts. One possible source of duplication is that some pieces ended up airing multiple times but under different series. In situations like these, we will have one record for the content and assign that record multiple series titles and NOLA codes, since the content itself was the same each time it was broadcast.

The authoritative title list also helps us keep track of what was and wasn’t distributed by NET. Now that we have this information, we’ve started going through the existing inventory records in the AAPB and pulling out records for NET titles. This is a good starting place for the ultimate goal of the project, which is to create a catalog of NET titles that describes the content and also tracks where copies of the content exist across the country. Based on a cursory analysis, we believe that over 60% of the NET titles exist in the inventories of at least one of the AAPB participating organizations. We’re hoping to increase that number by reaching out to other stations and archives with NET materials.

While we work on getting together more information to share with you, if you have any questions please reach out to the NET project coordinator Sadie Roosa at sadie_roosa@wgbh.org.