PBS NewsHour Digitization Project Update: “Asset Review” and Access and Description Workflows

I’ve previously written about developing and automating management of our workflows for the NewsHour project (click for link), and WGBH’s processes for ingesting and preserving the NewsHour digitizations (click for link). Now that the project is moving along, and over one thousand episodes of the NewsHour are already on the AAPB (with recently added transcript search functionality!!), I thought I would share more information about our access workflows and how we make NewsHour recordings available.

In this post I will describe our “Asset Review” and “Online Workflow” phases. The “Asset Review” phase is where we determine what work we will need to do to a recording to make it available online, and the “Online Workflow” phase is where we extract metadata from a transcript, add the metadata to our repository, and make the recording available online.

The goals and realities of the NewsHour project necessitate an item level content review of each recording. The reasons for this are distinct and compounding. The scale of the collection (nearly 10,000 assets) meant that the inventories from which we derived our metadata were generated only from legacy databases and tape labels, which are sometimes wrong. At no point were we able to confirm that the content on any tape is complete and correct prior to digitization. In fact, some of the tapes are unplayable before being prepared to be digitized. Additionally, there is third-party content that needs to be redacted from some episodes of the NewsHour before they can be made available. A major complication is that the transcripts only match 7pm Eastern broadcasts, and sometimes 9pm or 11pm updates would be recorded and broadcast if breaking news occurred. The tapes are not always marked with broadcast times, and sometimes do not contain the expected content – or even an episode of the NewsHour!

These complications would be fine if we were only preserving the collection, but our project goal is to make each recording and corresponding transcript or closed caption file broadly accessible. To accomplish that goal each record must have good metadata, and to have that we must review and describe each record! Luckily, some of the description, redaction, and our workflow tracking is automatable.

Access and Description Workflow Overview

As I’ve mentioned before, we coordinate and document all our NewsHour work in a large Google Sheet we call the “NewsHour Workflow workbook” (click here for link). The chart below explains how a GUID moves through sheets of the NewsHour workbook throughout our access and description work.

NewsHour_AccessWorkflowChart.png
AAPB NewsHour Acces and Description workflow chart

After a digitized recording has been delivered to WGBH and preserved, it is automatically placed in queue on the “Asset Review” sheet of our workbook. During the Asset Review, the reviewer answers thirteen different questions about the GUID. Using these responses, the Google Sheet automatically places the assets into the appropriate workflow trackers in our workbook. For instance, if a recording doesn’t have a transcript, it is placed in the “No Transcript tracker”, which has extra workflow steps for generating a description and subject metadata. A GUID can have multiple issues that place it into multiple trackers simultaneously. For instance, a tape that is not an episode will also not have a transcript, and will be placed on both the “Not an Episode tracker” and the “No Transcript tracker”. The Asset Review is critical because the answers determine the work we must perform, and ensures that each record will be correctly presented to the public when work on it is completed.

A GUID’s status in the various trackers is reflected in the “Master GUID Status sheet”, and is automatically updated when different criteria in the trackers are met and documented. When a GUID’s workflow tasks have been completely resolved in all the trackers, it appears as “Ready to go online” on the “Master GUID Status sheet.” The GUID is then automatically placed into to the “AAPB Online Status tracker”, which presents the metadata necessary to put the GUID online and indicates if tasks have been completed in the “Online Workflow tracker”. When all tasks are completed, the GUID will be online and our work on the GUID is finished.

In this post I am focusing on a workflow that follows digitizations which don’t have problems. This means the GUIDs are episodes, contain no technical errors, and have transcripts that match (green arrows in the chart). In future blog posts I’ll elaborate on our workflows for recordings that go into the other trackers (red arrows).

Asset Review

NewsHour_AssetReview
An image of a portion of our Access Review spreadsheet

Each row of the “Asset Review sheet” represents one asset, or GUID. Columns A-G (green cell color) on the sheet are filled with descriptive and administrative metadata describing each item. This metadata is auto-populated from other sheets in the workbook. Columns H-W (yellow cell color) are the reviewer’s working area, with questions to answer about each item reviewed. As mentioned earlier, the answers to the questions determines the actions that need to be taken before the recording is ready to go online, and place the GUID into the appropriate workflow trackers.

The answers to some questions on the sheet impact the need to answer others, and cells auto-populate with “N/A” when one answer precludes another. Almost all the answers require controlled values, and the cells will not accept input besides those values. If any of the cells are left blank (besides questions #14 and #15) the review will not register as completed on the “Master GUID Status Sheet”. I have automated and applied value control to as much of the data entry in the workbook as possible, because doing so helps mitigate human error. The controlled values also facilitate workbook automation, because we’ve programmed different actions to trigger when specific expected text strings appear in cells. For instance, the answer to “Is there a transcript for this video?” must be “Yes” or “No”, and those are the only input the cell will accept. A “No” answer places the GUID on the “No Transcript tracker”, and a “Yes” does not.

To review an item, staff open the GUID on an access hard drive. We have a multiple access drives which contain copies of all the proxy files delivered NewsHour digitizations. Reviewers are expected to watch between one and a half to three minutes of the beginning, middle, and end of a recording, and to check for errors while fast-forwarding through everything not watched. The questions reviewers answer are:

  1. Is this video a nightly broadcast episode?
  2. If an episode, is the recording complete?
  3. If incomplete, describe the incompleteness.
  4. Is the date we have recorded in the metadata correct?
  5. If not, what is the corrected date?
  6. Has the date been updated in our metadata repository, the Archival Management System?
  7. Is the audio and video as expected, based on the digitization vendor’s transfer notes?
  8. If not, what is wrong with the audio or video?
  9. Is there a transcript for this video?
  10. If yes, what is the transcript’s filename?
  11. Does the video content completely match the transcript?
  12. If no, in what ways and where doesn’t the transcript match?
  13. Does the closed caption file match completely (if one exists)?
  14. Should this video be part of a promotional exhibit?
  15. Any notes to project manager?
  16. Date the review is completed.
  17. Initials of the reviewer.

Our internal documentation has specific guidelines on how to answer each of these questions, but I will spare you those details! If you’re conducting quality control and description of media at your institution, these questions are probably familiar to you. After a bit of practice reviewers become adept at locating transcripts, reviewing content, and answering the questions. Each asset takes about ten minutes to review if the transcript matches, the content is the expected recording, and the digitization is error free. If any of those criteria are not true, the review will take longer. The review is laborious, but an essential step to make the records available.

Online Workflow

A large majority of recordings are immediately ready to go online following the asset review. These ready GUIDs are automatically placed into the “AAPB Online Status tracker,” where we track the workflow to generate metadata from the transcript and upload that and the recording to the AAPB.

About once a month I use the “AAPB Online Status tracker” to generate a list of GUIDs and corresponding transcripts and closed caption files that are ready to go online. To do this, all I have to do is filter for GUIDs in the “AAPB Online Status tracker” that have the workflow status “Incomplete” and copy the relevant data for those GUIDs out of the tracker and into a text file. I import this list into a FileMaker tool we call “NH-DAVE” that our Systems Analyst constructed for the project.

NewsHour_NHDAVE.png
A screenshot of our FileMaker tool “NH-DAVE”

“NH-DAVE” is a relational database containing all of the metadata that was originally encoded within the NewsHour transcripts. The episode transcripts provided by NewsHour contained the names of individuals appearing and subject terms for that episode in marked up values. Their subject terms were much more specific than ours, so we mapped them to the more broad AAPB controlled vocabulary we use to facilitate search and discovery on our website. When I ingest a list of GUIDs and transcripts to “NH-DAVE” and click a few buttons, it uses an AppleScript to match metadata from the transcript to the corresponding NewsHour metadata records in our Archival Management System and generate SQL statements. We use the statements to insert the contributor and subject metadata from the transcripts into the GUIDs’ AAPB metadata records in the Archival Management System.

Once the transcript metadata has been ingested we use both a Bash and a Ruby script to upload the proxy recordings to our streaming service, Sony Ci, and the transcripts and closed caption SRT files to our web platform, Amazon. We run a Bash script to generate another set of SQL statements to add the Sony Ci URLs and some preservation metadata (generated during the digital preservation phase) to our Archival Management System. We then export the GUIDs’ Archival Management System records into PBCore XML and ingest the XML into the AAPB’s website. As each step of this process is completed, we document it in the “Online Workflow tracker,” which will eventually register that work on the GUID is completed. When the PBCore ingest is completed and documented on the “Online Workflow tracker,” the recording and transcript are immediately accessible online and the record displays as complete on the “Master GUID Status spreadsheet”!

We consider a record that has an accurate full text transcript, contributor names, and subject terms to be sufficiently described for discovery functions on the AAPB. The transcript and terms will be fully indexed to facilitate searching and browsing. When a transcript matches, our descriptive process for NewsHour is fully automated. This is because we’re able to utilize the NewsHour’s legacy data. Without that data, the descriptive work required for this collection would be tremendous.

A large majority of NewsHour records follow the workflow I’ve described in this post in their journey to the AAPB. If, unlike those covered here, a record is not an episode, does not have a matching transcript, needs to be redacted, or has technical errors, then it requires more work than I have outlined. Look forward to blog posts about those records in the future! Click here to see a NewsHour record that went through this workflow. If you’re interested in our workflow, I encourage you to open the workbook and use “Find” to follow this GUID (“cpb-aacip-507-0r9m32nr3f”) through the various trackers. Click here to see all NewsHour records that have been put online!

Forty Years, Forty Films, Forty Weeks: The Medicine Game

Vision Maker Media’s “Forty Years, Forty Films, Forty Weeks” promotion concludes this week with our final featured Vision Maker Media film.

“The Medicine Game” follows the story of brothers from the Onondaga Nation who pursue their dreams of playing lacrosse for Syracuse University. With their dream nearly in reach, the boys are caught in a constant struggle to define their Native identity, live-up to their family’s expectations and balance challenges on and off the Reservation.

Screen Shot 2017-08-01 at 8.55.59 AM.png

Watch “The Medicine Game” on the American Archive of Public Broadcasting website.

Vision Maker Media would like to thank all the viewers who tuned in to stream 40 Years. 40 Films. 40 Weeks. In the last 40 years, the organization has created more than 500 films, awarded $11 million to independent producers and held hundreds of film-screening events across the nation. While only a portion of that was able to be shared in the last 40 weeks, Vision Maker Media hopes that these films have inspired viewers to look at the world through Indigenous eyes.

The AAAPB has been proud to collaborate with Vision Maker Media to share these films and celebrate the amazing work done by Vision Maker Media over the past forty years.

About Vision Maker Media

Vision Maker Media is the premier source for quality American Indian and Alaska Native educational and home videos. All aspects of Vision Maker Media programs encourage the involvement of young people to learn more about careers in the media – to be the next generation of storytellers. Vision Maker Media envisions a world changed and healed by understanding Native stories and the public conversations they generate.

With funding from the Corporation for Public Broadcasting (CPB), Vision Maker Media’s Public Media Content Fund awards support to projects with a Native American theme and significant Native involvement that ultimately benefits the entire public media community. Vision Maker Media, a nonprofit 501(c)(3) empowers and engages Native People to tell stories. For more information, http://www.visionmakermedia.org

Forty Years, Forty Films, Forty Weeks: Sousa on the Rez

This week’s featured Vision Maker Media film focuses on Native American marching bands, their history, and their players’ relationships with American and Western music. The film contains interviews with musicians in Native American bands and features both contemporary and historical footage of Native players.

“Sousa on the Rez” examines the history of American marching music, the influence of boarding schools on Native American musicians, stereotypes, and the continuing legacy of Native American marching bands throughout the country.

cpb-aacip_508-3t9d50gh3n.jpg

Watch “Sousa on the Rez” on the American Archive of Public Broadcasting website.

Check back here every Tuesday, or follow us at @amarchivepub on Twitter to keep up with featured streaming films over the 40 weeks of the celebration. You can find the complete schedule here.

About Vision Maker Media

Vision Maker Media is the premier source for quality American Indian and Alaska Native educational and home videos. All aspects of Vision Maker Media programs encourage the involvement of young people to learn more about careers in the media – to be the next generation of storytellers. Vision Maker Media envisions a world changed and healed by understanding Native stories and the public conversations they generate.

With funding from the Corporation for Public Broadcasting (CPB), Vision Maker Media’s Public Media Content Fund awards support to projects with a Native American theme and significant Native involvement that ultimately benefits the entire public media community. Vision Maker Media, a nonprofit 501(c)(3) empowers and engages Native People to tell stories. For more information, www.visionmakermedia.org

Each week for the next forty weeks, a different film featuring Native voices from Native producers will be available to stream free online, in celebration of Vision Maker Media’s 40 years supporting American Indian and Alaska Native film projects.

Follow Vision Maker Media on FacebookTwitterYouTubeInstagramTumblrLinkedInVimeoPinterest, or Google+.

PBS NewsHour Digitization Project Update: Ingest and Digital Preservation Workflows

In our last blog post (click for link) on managing the PBS NewsHour Digitization Project, I briefly discussed WGBH’s digital preservation and ingest workflows. Though many of our procedures follow standard practices common to archival work, I thought it would be worthwhile to cover them more in-depth for those who might be interested. We at WGBH are responsible for describing, providing access to, and digitally preserving the proxy files for all of our projects. The Library of Congress preserves the masters. In this post I cover how we preserve and prepare to provide access to proxy files.

Before a file is digitized, we ingest the item-level tape inventory generated during the project planning stages into our Archival Management System (AMS – see link for the Github). The inventory is a CSV that we normalized to our standards, upload, and then map to PBCore in MINT, or “Metadata Interoperability Services,” an open-source web-based plugin designed for metadata mapping and aggregation. The AMS ingests the data and creates new PBCore records, which are stored as individual elements in tables in the AMS. The AMS generates a unique ID (GUID) for each asset. We then export the metadata, provide it to the digitization vendor, and use the GUID identifiers to track records throughout the project workflow.

Screen Shot 2016-07-07 at 3.30.19 PM.png
Mapping a CSV to PBCore in MINT

For the NewsHour project, George Blood L.P. receives the inventory metadata and the physical tapes to digitize to our specifications. For every GUID, George Blood creates a MP4 proxy for access, a JPEG2000 MXF preservation master, sidecar MD5 checksums for both video files, and a QCTools report XML for the master. George Blood names each file after the corresponding GUID and organizes the files into an individual folder for each GUID. During the digitization process, they record digitization event metadata in a PREMIS spreadsheets. Those sheets are regularly automatically harvested by the AMS, which inserts the metadata into the corresponding catalog records. With each delivery batch George Blood also provides MediaInfo XML saved in BagIt containers for every GUID, and a text inventory of the delivery’s assets and corresponding MD5 checksums. The MediaInfo bags are uploaded via FTP to the AMS, which harvests technical metadata from them and creates PBCore instantiation metadata records for the proxies and masters. WGBH receives the digitized files on LTO 6 tapes, and the Library of Congress receives theirs on rotating large capacity external hard drives.

For those who are not familiar with the tools I just mentioned, I will briefly describe them. A checksum is a computer generated cryptographic hash. There are different types of hashes, but we use MD5, as do many other archives. The computer analyzes a file with the MD5 algorithm and delivers a 32 character code. If a file does not change, the MD5 value generated will always be the same. We use MD5s to ensure that files are not corrupted during copying and that they stay the same (“fixed”) over time. QCTools is an open source program developed by the Bay Area Video Coalition and its collaborators. The program analyzes the content of a digitized asset, generates reports, and facilitates the inspection of videos. BagIt is a file packaging format developed by the Library of Congress and partners that facilitates the secure transfer of data. MediaInfo is a tool that reports technical metadata about media files. It’s used by many in the AV and archives communities. PREMIS is a metadata standard used to record data about an object’s digital preservation.

Now a digression about my inventories – sorry in advance. ¯\_(ツ)_/¯

I keep two active inventories of all digitized files received. One is an Excel spreadsheet “checksum inventory” in which I track if a GUID was supposed to be delivered but was not received, or if a GUID was delivered more than once. I also use it to confirm that the checksums George Blood gave us match the checksums we generate from the delivered files, and it serves as a backup for checksum storage and organization during the project. The inventory has a master sheet with info for every GUID, and then each tape has an individual sheet with an inventory and checksums of its contents. I set up simple formulas that report any GUIDs or checksums that have issues. I could use scripts to automate the checksum validation process, but I like having the data visually organized for the NewsHour project. Given the relatively small volume of fixity checking I’m doing this manual verification works fine for this project.

Screen Shot 2017-04-10 at 2.37.28 PM.png
Excel “checksum inventory” sheet page for NewsHour LTO tape #27.

The other inventory is the Approval Tracker spreadsheet in our Google Sheets NewsHour Workflow workbook (click here for link). The Approval Tracker is used to manage reporting about GUID’s ingesting and digital preservation workflow status. I record in it when I have finished the digital preservation workflow on a batch, and I mark when the files have been approved by all project partners. Partners have two months from the date of delivery to report approvals to George Blood. Once the files are approved they’re automatically placed on the Intern Review sheet for the arrangement and description phase of our workflow.

Screen Shot 2017-04-10 at 2.38.11 PM.png
The Approval Tracker in the NewsHour Workflow workbook.

Okay, forgive me for that, now back to WGBH’s  ingest and digital preservation workflow for the NewsHour project!

The first thing I do when we receive a shipment from George Blood is the essential routine I learned the hard way while stocking a retail store – always make sure everything that you paid for is actually there! I do this for both the physical LTO tapes, the files on the tapes, the PREMIS spreadsheet, the bags, and the delivery’s inventory. In Terminal I use a bash script that checks a list of GUIDs against the files present on our server to ensure that all bags have been correctly uploaded to the AMS. If we’ve received everything expected, I then organize the data from the inventory, copying the submission checksums into each tape’s spreadsheet in my Excel “checksum inventory”. Then I start working with the tapes.

Important background information is that the AAPB staff at WGBH work in a Mac environment, so what I’m writing about works for Mac, but it could easily be adopted to other systems. The first step I take with the tapes is to check the them for viruses. We use Sophos to do that in Terminal, with the Sweep command. If no viruses are found I then use one of our three LTO workstations to copy the MP4 proxies, proxy checksums, and QCTools XML reports from the LTO to a hard drive. I use the Terminal to do the copying, which I leave run while I go to other work. When the tape is done copying I use Terminal to confirm that the number of files copied matches the number of files I expected to copy. After that, I use it to run an MD5 report (with the find, -exec, and MD5 commands) on the copied files on the hard drive. I put those checksums into my Excel sheet and confirm they match the sums provided by George Blood, that there are no duplicates, and that we received everything we expected. If all is well, I put the checksum report onto our department server and move on to examining the delivered files’ specifications.

I use MediaInfo and MDQC to confirm that files we receive conform to our expectations. Again, this is something I could streamline with scripts if the workflow needed, but MDQC gets the job done for the NewsHour project. MDQC is a free program from AVPreserve that checks a group of files against a reference file and passes or fails them according to rules you specify. I set the test to check that the delivered batch are encoded to our specifications (click here for those). If any files fail the test, I use MediaInfo in Terminal to examine why they failed. I record any failures at this stage, or earlier in the checksum stage, in an issue tracker spreadsheet the project partners share, and report the problems to the vendor so that they can deliver corrected files.

Screen Shot 2017-04-10 at 2.39.55 PM
MDQC’s simple and effective user interface.

Next I copy the set of copies on the hard drive onto other working hard drives for the interns to use during the review stage. I then skim a small sample of the files to confirm their content meets our expectations, comparing the digitizations to the transfer notes provided by George Blood in the PREMIS metadata. I review a few of the QCTools reports, looking at the video’s levels. I don’t spend much time doing that though, because the Library of Congress reviews the levels and characteristics of every master file. If everything looks good I move on, because all the proxies will be reviewed at an item level by our interns during the next phase of the project’s workflow anyways.

The last steps are to mark both the delivery batch’s digital preservation complete and the files as approved in the Approval Tracker, create a WGBH catalog record for the LTO, run a final MD5 manifest of the LTO and hard drive, upload some preservation metadata (archival LTO name, file checksums, and the project’s internal identifying code) to the AMS, and place the LTO and drive in our vault. The interns then review and describe the records and, after that, the GUIDs move into our access workflow. Look forward to future blog posts about those phases!

PBS NewsHour Digitization Project Update: Workflow Management

NewsHour_Project_LogosIn January 2016, the Council on Library and Information Resources awarded WGBH, the Library of Congress, WETA, and NewsHour Productions, LLC a grant to digitize, preserve, and make publicly accessible on the AAPB website 32 years of NewsHour predecessor programs, from October 1975 to December 2007, that currently exist on obsolete analog formats. Described by co-creator Robert MacNeil as “a place where the news is allowed to breathe, where we can calmly, intelligently look at what has happened, what it means and why it is important,” the NewsHour has consistently provided a forum for newsmakers and experts in many fields to present their views at length in a format intended to achieve clarity and balance, rather than brevity and ratings. A Gallup Poll found the NewsHour America’s “most believed” program. We are honored to preserve this monumental series and include it in AAPB.

Today, we’re pleased to update you on our project progress, specifically regarding the new digitization project workflows that we have developed and implemented to achieve the goals of the project.

The physical work digitizing the NewsHour tapes and ingesting the new files across the project collaborators has been moving forward since last fall and is now healthily and steadily progressing. Like many projects, ours started out as a great idea with many enthusiastic partners – and that’s good, because we needed some enthusiasm to help us sort out a practical workflow for simultaneously tracking, ingesting, quality checking, digitally preserving, describing, and making available at least 7512 unique programs!

In practice the workflow has become quite different from what the AAPB experienced with our initial project to digitize 40,000 hours of programming from more than 100 stations. With NewsHour, we started by examining the capabilities of each collaborator and what they already intended to do regarding ingestion and quality control on their files. That survey identified efficiencies: The Library of Congress (the Library) took the lead on ingesting preservation quality files and conducting item level quality control of the files. WGBH focused on ingestion of the proxies and communication with George Blood, the digitization vendor. The Library uses the Baton quality control software to individually pass or fail every file received. At WGBH, we use MDQC from AVPreserve to check that the proxy files we receive are encoded in accordance with our desired specifications. Both institutions use scripts to validate the MD5 file checksums the vendor provides us. If any errors are encountered, we share them in a Google Sheet and WGBH notifies the vendor. The vendor then rectifies the errors and submits a replacement file. Once approved, it is time for WGBH to make the files accessible on the AAPB website.

I imagined that making the files accessible would be a smooth routine – I would put the approved files online and everything would be great. What a nice thought that was! In truth, any one work (Global Unique Identifier or “GUID” – our unique work level identifier) could have many factors that influence what actions we need to be taken to prepare it to go online. When I started reviewing the files we were receiving, looking at transcripts, and trying to keep track of the data and where various GUIDs were in the workflow, I realized that the “some spreadsheets and my mind” system I intended to employ would result in too many GUIDs falling through the cracks, and would likely necessitate far too much duplicate work. I decided to identify the possible statuses of GUIDs in the NewsHour series and every action that would need to be taken to resolve each status. After I stared at a wall for probably too long, my coworkers found me with bloodshot eyes (JK?) and this map:

newshourworkflowwall
(It seems appropriate that the fire alarm is in this picture of the map)

Some of the statuses I identified are:

  • Tapes we do not want captured
  • Tapes that are not able to be captured
  • GUIDs where the digitization is not yet approved
  • GUIDs that don’t have transcripts
  • GUIDs that have transcripts, but they don’t match the content
  • GUIDs that are not a broadcast episode of the NewsHour
  • GUIDs that are incomplete recordings
  • GUIDs that need redacting
  • GUIDs that passed QC but should not have

Every status has multiple actions that need to be taken to resolve that issue and move the GUID towards being accessible. The statuses are not mutually exclusive, though some are contingent on or preclude others. It was immediately clear to me that this would be too much to manually track and that I needed a centralized automated solution. The system would have to allow simultaneous users and would need to be low cost and maintenance. After discussions with my colleagues, we decided that the best solution would be a Google Spreadsheet that everyone at the AAPB could share.

Here is a link to a copy of the NewsHour Workflow workbook we built. The workbook functions through a “Master List” with a row of metadata for every GUID, an “Intern Review” phase worksheet that automatically assigns statuses to GUIDs based on answers to questions, workflow “Tracker” sheets with resolutive actions for each status, and a “Master GUID Status Sheet” that automatically displays the status of every GUID and where each one is in the overall workflow. Some actions in trackers automatically place the GUID into another tracker – for instance, if a reviewer working on an episode for which we don’t have a transcript in the “No Transcript Tracker” and that GUID is identified as having content that needs to be redacted, the GUID is automatically placed on the “Redaction Tracker”.

A broad description of our current project workflow is: All of the project’s GUIDs are on the “Master GUID List” and their presence on that list automatically puts them on the “Master GUID Status Sheet”. When we receive a GUID’s digitized file, staff put the GUID on the “Approval Tracker”. When a GUID passes both WGBH and the Library’s QC workflows it is marked approved on the “Approval Tracker” and automatically placed on the “Intern Review Sheet.” Interns review each GUID and answer questions about the content and transcript, and the answers to those questions automatically place the GUID into different status trackers. We then use the trackers to track actions that resolve the GUIDs statuses. When a GUID’s issues in all the status trackers are resolved, it is marked as “READY!” to go online and placed in the “AAPB Online Tracker.” When we’ve updated the GUID’s metadata, put the file online, and recorded those actions in the “AAPB Online Tracker,” the GUID is automatically marked complete. Additionally, any statuses that indicate a GUID cannot go online (for instance, a tape was in fatal condition and unable to be captured) are marked as such in the “Master GUID Status Sheet.” This function helps us differentiate between GUIDs that will not be able to go online and GUIDs that are not yet online but should be when the project is complete.

Here is a picture of a portion of the “Master GUID Status Sheet.”’

newshourworkflowstatus
Right now there are a lot of red GUIDs in this picture of the Master sheet, but in the coming months they will be switching to green!

The workbook functions through cross-sheet references and simple logic. It is built with mostly “IF,” “COUNTIF,” and “VLOOKUP” statements. Its functionality depends on users inputting the correct values in action cells and confirming that they’ve completed their work, but generally those values are locked in with data validation rules and sheet permissions. The workflow review I had conducted proved valuable because it provided the logic needed to construct the formulas and tracking sheets.

Building the workflow manager in Google Sheets took a few drafts. I tested the workflow with our first few NewsHour pilot digitizations, unleashed it on a few kind colleagues, and then improved it with their helpful feedback. I hope that the workbook will save us time figuring out what needs to happen to each GUID and will help prevent any GUIDs from falling through the cracks or incorrectly being put online. Truthfully, the workbook struggles under its own weight sometimes (at one point in my design I reached the 2,000,000 cell limit and had to delete all the extra cells spreadsheet programs always automatically make). Anyone conducting a project any larger or more complicated than the NewsHour would likely need to upgrade to a true workflow management software or a program designed to work from the command line. I hope, if you’re interested, that you take some time to try out the copy of the NewsHour Workflow workbook! If you’d like more information, a link to our workflow documentation that further explains the workbook can be provided.

This post was written by Charles Hosale, WGBH.

Forty Years, Forty Films, Forty Weeks: Weaving Worlds

Explores the lives of Navajo artisans and their unique–and often controversial–relationship with Reservation traders in this week’s featured Vision Maker Media film, “Weaving Worlds.”

In this compelling and intimate portrait of economic and cultural survival through art, Navajo filmmaker Bennie Klain takes viewers into the world of contemporary Navajo weavers and their struggles for self-sufficiency. The film artfully relates the Navajo concepts of kinship and reciprocity with the human and cultural connections to sheep, wool, water and land in the world of contemporary Navajo weavers struggling for self-sufficiency.

screen-shot-2017-01-04-at-4-29-47-pm

Watch “Weaving Worlds” on the American Archive of Public Broadcasting website.

Check back here every Tuesday, or follow us at @amarchivepub on Twitter to keep up with featured streaming films over the 40 weeks of the celebration. You can find the complete schedule here.

About Vision Maker Media

Vision Maker Media is the premier source for quality American Indian and Alaska Native educational and home videos. All aspects of Vision Maker Media programs encourage the involvement of young people to learn more about careers in the media – to be the next generation of storytellers. Vision Maker Media envisions a world changed and healed by understanding Native stories and the public conversations they generate.

With funding from the Corporation for Public Broadcasting (CPB), Vision Maker Media’s Public Media Content Fund awards support to projects with a Native American theme and significant Native involvement that ultimately benefits the entire public media community. Vision Maker Media, a nonprofit 501(c)(3) empowers and engages Native People to tell stories. For more information, www.visionmakermedia.org

Each week for the next forty weeks, a different film featuring Native voices from Native producers will be available to stream free online, in celebration of Vision Maker Media’s 40 years supporting American Indian and Alaska Native film projects.

Follow Vision Maker Media on Facebook, Twitter, YouTube, Instagram, Tumblr, LinkedIn, Vimeo, Pinterest, or Google+.

Meet Charles Hosale, our new American Archive team member

charles_profileHi! I’m Charles Hosale, and I’m very glad
to be joining the AAPB team as a Special Projects Assistant at WGBH. I come from Milwaukee, Wisconsin, where I worked as the AV Project Archivist at the University of Wisconsin-Milwaukee. I’ve also been a Contract Archivist for MillerCoors and Robert W. Baird & Co. I’m really excited to be able to contribute to AAPB because it is a project I’ve been enthusiastic about since it began!

At WGBH I will be working on a few different projects. On the NET Collection Catalog Project, I will be working to create robust catalog records and doing historical research into titles for which little information exists. For the NewsHour Digitization Project, I will be cataloging titles, processing transcripts, working with our vendors to ensure the quality of digitized episodes, and ingesting the files into our digital holdings. I’ll be handling ingestion and quality control for the American Masters Digital Archive Project too. I’ll also be adding some records besides NewsHour and American Masters to the AAPB.

I developed a passion for public media while growing up watching public programing, including the NewsHour and NOVA. I became an archivist so that I could interact with a different story every day, and could help keep those stories safe for the future. I’m grateful today to be able to work on stories that shaped me into the person I am! Even more than that I’m thrilled to increase access to and promote the records so that they can have continued use. Public media programs remain as vital today as when they were produced! Television is an interesting artistic medium to archive because, like I’ve already seen while working on the NET Project, there are some incredible productions that might have only been broadcast once and then put on the shelves, forgotten without any method of continued access. I hope that our work leads to the rediscovery of some compelling stories.

If you’d like to know a bit more about me, I received both an undergraduate in Comparative Literature and an MLIS with an Archives concentration from the University of Wisconsin-Milwaukee. When I’m not working I like to cook (I worked as a line cook during undergrad!) and listen to music. Because I lived in Milwaukee my whole life I’m looking forward to experiencing the East Coast and exploring Boston!