Section One - Site Selection and Planning
Section Two - Workflow, Project and Time Management
Section Three - Standards and Practices: Digitization, Metadata and Rights
Section Four - Assessment and Outcomes
Institutional scanning for Culture in Transit was developed as a service for cultural heritage institutions. Activities included:
Onsite digitization & metadata creation for a small collection.
Collection hosting on DCMNY, METRO’s digital hosting platform.
Contribution of records for all digitized items to the Digital Public Library of America (DPLA) and Worldcat Digital Gateway.
Provide copies of the images to the local institution:the master files (TIFFs) for preservation, derivative versions (jp2s or jpegs) and metadata.
The service was primarily aimed at METRO member institutions that had little or no experience with digitizing their collections. Priority was given to those faced with some combination of the following circumstances:
Had no digitization equipment or budget to purchase.
Did not have the staff expertise to undertake digitization work.
Had no budget to outsource digitization work.
Lacked staff time or necessary support to undertake digitization projects/planning.
What do we mean by ‘institutional scanning’?
We developed the term ‘institutional scanning’ to differentiate the work done at METRO member institutions under the Knight Foundation Culture in Transit grant from the community scanning strand of the project. Institutional scanning refers to working with small cultural heritage institutions in METRO’s membership. This ranged from small private collecting institutions to community colleges to public libraries.
Under the grant, we had a one year activity period, with a goal of working with 10-15 institutions. The time-limited nature of the project saw us work with about one institution per month, however, this limited us to a 2 week maximum on-site visit. We typically digitized small collections, but encompassing a wide range of formats, from manuscripts, photos, lantern slides, glass plate negatives and oversize items such as posters. Section 4 of this toolkit covers in more detail the amount and types of material digitized in the given time frame.
Who is this toolkit for?
The institutional toolkit is aimed at others who are interested in creating a replicable small-scale digitization service. Uses cases may include:
Library councils and consortia.
Individual institutions, such as an archive.
Academic institutions with multiple campuses or distributed locations.
Historical societies.
The content in the Institution Section of this toolkit was authored by Caroline Catchpole, Culture in Transit Mobile Digitization Specialist for METRO.
Our approach to selection for Culture in Transit differed from a traditional digitization project where the focus is typically centered on things like demand for use or fragility of the materials. For Culture in Transit we also needed to consider the coherence of the materials for purposes of presenting them online as a small digital collection, as well as the suitability of the project partner and the working environment they could provide.
Our approach to selecting project partners consisted of the following:
Selection criteria and guidance for METRO members
We developed a page on the METRO website to offer information about:
The Culture in Transit project.
The service we could offer.
What was expected of the project partner.
What we could and could not digitize.
Guidelines for collection selection.
Interest Form
We created an interest form for METRO members to express their interest in the mobile digitization service. We asked questions covering basic collection information including:
Description of materials.
Size of collection.
Extent to which collection was organized and processed.
Previous digitization work.
Confirmation that the institution was willing to contribute metadata to DPLA.
Initial Screening Interview
After evaluating the interest forms based on eligibility and other basic criteria, we arranged phone conversations with eligible project partners and used an internal form to gather more detailed information.
Topics covered in the initial screening interview were split into 3 main categories:
Collection Information
We needed to understand the size of the collection, the type of materials in the collection, what metadata existed, the extent of arrangement/processing and if there were any prohibitive rights issues that would keep the material from being placed online.
A key part of the project was contribution of the online records to DPLA. The initial screening interview was important in explaining this to potential partners and ensuring they understood the process involved from securing a permission letter to make all metadata available under a Creative Commons CC0 license, to aligning metadata and rights statements in accordance with DPLA guidelines. We worked with project partners to implement the appropriate rights statements, but it was the responsibility of the partner to ensure the collection put forward for digitization had no restrictive rights issues and could be displayed publicly online.
Previously Digitized Materials
Institution and Staffing Information
The initial screening interviews helped us make an informed decision on the fit of the institution for the mobile digitization service by giving us a clearer picture of:
The suitability of the collection.
The total amount of material.
Any conservation concerns.
To what extent metadata existed for the collection and what might need to be done to complete it on the part of institution and CIT staff.
Capability and willingness of staff to undertake necessary work to ensure successful participation in the project.
Site Visit
After the further information interviews, site visits were arranged with institutions that seemed viable as Culture in Transit participants.
Site visits allowed us to:
View the collection and better understand its size, quantity, and condition.
Assess the institution’s physical working space.
Answer any questions the institution had regarding the project.
Pre On-site Communication
Once an institution was selected as a project partner, we sent a confirmation email with approximate dates for onsite work. In the intervening period, final arrangements were made, including:
Logistics for the on-site work:
Confirmation of dates.
Arrival time on first day.
Expected length of time on-site.
Discussion regarding metadata for the project. Metadata guidelines and a blank metadata spreadsheet were shared with the institution if they were assisting in creation of metadata.
Permission letter signed by the institution for DPLA contribution was required before any on-site work could begin.
Project planning for the mobile digitization service involved both METRO’s project team and the project partner (the institution who received the digitization service).
METRO’s Project Team
We developed and executed the mobile digitization service with a small team of three that included:
Digitization Specialist: Responsible for selection of project partners, the majority of pre on-site, on-site and post on-site work (specific tasks set out below).
Metadata Specialist: Responsible for ingest of images and metadata into DCMNY.
Project Manager: Responsible for overall management of the project, oversight of Digitization and Metadata Specialists, guidance and final selection of project partners.
Workflow Management
Workflow documentation and tracking tools to monitor progress are a good way to ensure all involved in a project are kept informed of progress and of their responsibilities.
We created a visual aid of the end-to-end CIT workflow at METRO to help us understand and communicate the steps involved in the project.
We used a checklist for every project partner we worked with to track every step of our work with them (detailed in the checklist are METRO centric tasks). It was kept in the team’s Google Drive folders, ensuring everyone had access to it.
Overall Project Management
Google Drive is an excellent tool for team collaboration and for sharing and editing project documentation. Google Drive allowed us an easier working relationship with some of our project partners. Some partners were involved in the creation of collection metadata and were able to edit the metadata spreadsheet while other work was being done.
We used the task management tool Asana at METRO to track and manage project progress. Asana gives you the ability to create a ‘team’, add colleagues to that team and then create ‘projects’ in your team area. We created a project for every partner we worked with and used Asana to:
Assign tasks to different team members for different project steps.
Ask questions or comment on specific tasks.
Set deadlines for tasks.
Store documentation in the project area.
A tool such as Asana is similar to the checklist we developed. Both were used simultaneously, but Asana offered an added layer of functionality; it supported group discussions and time management and allowed us to add deadlines to specific tasks and be notified of those deadlines via email.
Project Management - pre on-site tasks:
Pre-onsite planning should equip you with enough knowledge of a collection, particularly the extent of any metadata that already exists and the number of items in the collection, to ensure a smooth transition to digitization. Tasks before digitization commenced included:
Secure signed DPLA permission letter from partner institution.
Liaise with project partner on copyright status of collection.
Confirm name of collection.
Confirm unique identifier for collection.
Metadata preparation.
Set up collections on stage and production servers of DCMNY.
Project Management - on-site tasks:
On-site work at project partner locations was limited to two working weeks maximum. Tasks onsite included:
Assessment of item types in a collection to see which equipment was needed for digitization.
Creation of digital surrogates.
Basic metadata creation (identifier, title, description, size).
Preparation of small test-batch for ingest. This included images and metadata records for 5 items.
Quality assurance of scanned images (1 in every 10 images was tested).
Daily backup of images and metadata spreadsheet to external hard drive.
Daily statistics tracking - number of images created and number of items digitized.
Reconciliation of stats sheet to number of image files and items.
Statistics tracking was invaluable in monitoring progress during each project to ensure that digitization could be completed on time or if not, to prioritize remaining items for inclusion.
Project Management - post on-site tasks:
Each collection needed to be finalized before it was published online. Post-on-site work included different members of the team for different stages of the work, with the Digitization Specialist undertaking the bulk of the tasks:
Metadata completion and review.
Image conversion to derivatives.
Derivative image editing (crop out color chart and resize).
Image backup (to second external hard drive or cloud solution).
Final reconciliation between statistics, metadata spreadsheet, master images and derivative images.
Convert metadata into required format for DCMNY.
Ingest of metadata and images into DCMNY production server.
Quality review (metadata and image display).
Collect and finalize collection information from project partner, such as collection description, logos etc.
Add information about collection to DCMNY collections spreadsheet.
Publish the collection so it is live/available online on DCMNY production server.
Quality review of collection by collection owner.
Final sign off by collection owner after any needed corrections and updates.
Publicity (see below).
Collection harvested for contribution to DPLA.
Collection set up for sync with Worldcat Digital Gateway.
Send project partner copies of image files (master and derivatives) and metadata spreadsheet.
Publicizing Digitized Collections
It is important to build time into your workflow to publicize digitized collections once online. Ensuring stakeholders and the public are aware that the the collections are accessible to potential audiences is just as important as the digitization itself.
For Culture in Transit, we publicized the collections at various points throughout the process of working with a partner institution:
Immediately after digitization work was completed, the Digitization Specialist wrote a post for the Culture in Transit blog, giving readers a glimpse into the collection, such as the subject matter, types of material and a few highlights from the collection.
Once the collection was published on DCMNY, it was publicized on Culture in Transit’s Twitter page with the link to the collection in DCMNY. The institution was notified so they could cross promote via their channels and a press release was issued, on the METRO website (which was also then promoted via METRO and Culture in Transit Twitter pages).
Once the collection’s records went live in DPLA, we promoted via Twitter with the link to the collection in DPLA. We also notified the institution of DPLA inclusion so they could cross promote via their channels.
Some institutions took advantage of the opportunity to highlight the digitized material shortly after digitization. For example, one institution was able to use their materials in community engagement events & in promotional materials for a local anniversary.
Time Management
Due to the time limited nature of our project, we often had work simultaneously with multiple project partners at varying stages of the process. Generally the timeline for working with partners followed this schedule:
1-4 months prior: Initial screening interviews.
1-3 months prior: Site visits.
2 weeks prior: Finalizing details with the project partner re. collection name, metadata and rights statement(s).
On-site: Digital surrogate and metadata creation, including sending test images and metadata to Metadata Specialist at METRO for review.
1-2 weeks post: Finalizing collection for ingest - metadata completion and derivative image editing (crop and resize).
2-6 weeks post: Ingest of metadata and images into DCMNY. Obtain collection description and logo from project partner.
6-10 weeks post: Quality review of collection, publication online, publicize.
We create master files and access files when digitizing, copies of which are provided to the project partner after completion of the on-site work.
A master file is a digital surrogate of an analog object. Scanned at high resolution, stored uncompressed, it should not be altered in any way. Access files are created from the master file. For our project, we used the following standards for the most common items we digitized. They form part of the DCMNY Digitization Requirements and Guidelines produced by METRO:
Master Files | Access Files | |
---|---|---|
Document Type | Reflective | Reflective |
Bit Depth | 24-bit | 24-bit |
Color Space | Adobe RGB | Adobe RGB |
Resolution | 600ppi | 300ppi (image file size 2000-2400 on the longest side) |
File Type | TIFF | JPEG or JPEG2000 |
Project partners were given 300dpi access files for printing/reproduction purposes. Access files for web viewing are generally restricted to 72dpi.
Image Referencing Tools:
Use a color target and/or ruler when digitizing. These tools ensure that the viewing environment for the master files can be adjusted to mimic the settings when the materials were digitized.
File Name Conventions:
Develop a standardized file naming convention, which adheres to existing institutional standards. Information to consider building into the logic of your file naming convention includes:
Catalog number of item (if processed).
Abbreviated name of collection.
Date of digitization.
Numerical numbering of each item digitized.
Information about structure of object (such as recto, verso).
For Culture in Transit we developed a filename structure that included:
Repository identifier.
Collection Identifier.
Catalog/Unique ID.
Sequential Number.
Metadata is an integral part of any digitization project and should follow a recognized metadata schema. Metadata was by far the most time-consuming part of the work with our project partners and is just as important in project planning as the digitization work.
As the collections we digitize go into Digital Culture of Metropolitan New York (DCMNY), we use the following guidelines:
The requirements for DCMNY include required and recommended metadata fields for completion. Our approach for Culture in Transit was to complete as many of the metadata fields as possible. As each collection we worked with was different, the metadata work with our project partners fell into 3 categories:
Full: No pre-existing metadata available; all metadata about the item needed to be created.
Partial: Descriptive metadata available; technical metadata about the item and subject terms still needed to be created.
Minimal: Descriptive and subject terms available; technical metadata about the item still needed to be created.
Description of item and subject terms were the most time-consuming areas of metadata to complete, but are also some of the most important, value-added areas that allow for greater accessibility and discoverability of the item once it’s online.
The table below shows minimum elements from DCMNY metadata spreadsheet completed by METRO digitization specialist informed by either what metadata already existed for collection or what metadata was being completed by project partner.
Full Metadata | Partial Metadata | Minimal Metadata |
---|---|---|
Collection title | Filename | Filename |
Filename | Identifier | Identifier |
Identifier | Subject terms | Note |
Item title | Note | Digital Format |
Subject terms | Genre | Digital Origin |
Dates | Extent | |
Item Description | Digital Format | |
Note | Digital Origin | |
Type of Resource | ||
Genre | ||
Extent | ||
Language | ||
Rights | ||
Owning Institution/Held By | ||
Digital Format | ||
Digital Origin |
METRO’s DCMNY metadata is based on the MODS schema, whereas many of our project partners use Dublin Core, VRA Core or other local schemas. Some considerations for metadata creation include:
Mapping an existing metadata schema to the one in use by the project may be necessary.
It can speed up work if the project partner can assist with metadata creation.
Foreign language collections that have little or no metadata may require more time to complete because of translation requirements.
Some elements of metadata can be populated after onsite work, such as subject terms.
If the project partner is not involved in metadata creation, it is useful to have them review the metadata in case anything has been missed or left out.
If the project partner has created some or most of the metadata, it will need to be reviewed by project staff to ensure that metadata guidelines have been followed.
Any digitization project requires copyright considerations and the application of appropriate statements of use for digitized material. All material digitized as part of Culture in Transit had to be made freely accessible online, so was either in the public domain, the institution controlled the rights to the materials, they had permission from the rightsholder, or they assessed the risk and/or made the content available under fair use.
In addition, as all the content we digitized as part of Culture in Transit was contributed to Digital Public Library of America, we attempted to align the rights statements in our metadata records with RightsStatements.org, DPLA’s set of standardized rights statements for online cultural heritage. However, RightsStatements.org did not launch until late into our project, therefore not all of project digitized collections align with these rights statements as of this writing.
The process of digitization is labor and time intensive. Keeping detailed statistics can be useful for forward project and resource planning. Three types of statistics evolved in our work with project partners during Culture in Transit - detailed daily statistics, overall project statistics and collection statistics.
Detailed Daily Productivity Statistics
At the outset of the project, a system was devised and implemented to enable daily tracking of the work that would be undertaken with project partners. The type of information tracked was;
Date of work.
Number of scanned images.
Number of photographed images.
Number of master images produced.
Number of RAW camera files converted to TIFFs.
Number of derivatives edited.
Number of metadata records produced.
Number of simple objects (items made up of 1 image).
Number of compound objects (items made up of 1+ images).
Notes.
Types of objects digitized.
Time taken to complete post-processing tasks.
There were a few core reasons we chose to do this;
Project reconciliation: Detailed tracking like this allowed the ability to tally up between the metadata spreadsheet and the image files, ensuring there were the correct number of records and files. There were a few instances of numbers not tallying and having the daily statistics helped enormously in tracking down where the error had occurred and fixing it.
It demonstrated what was achievable in a set amount of time, with particular equipment. Copy stand work took longer due to the more involved set and capture process, which reflected in the project stats.
From understanding what was achievable in a set amount of time, it allowed us to set realistic expectations for projects going forward. In conversations with project partners during the latter part of Culture in Transit, we were able to set approximate numbers of items that could be digitized in the two-week period we had for on site work.
METRO’s Culture in Transit Productivity Statistics
Overall Project Statistics
From the detailed daily statistics, we were able to create a statistics sheet showing productivity throughout the whole project, working with 10 different project partners. The type of information included for each project was;
Days on site at project partner.
Days needed for digitization.
Available hours for digitization (average hours per day).
Kit used (scanner or copy stand).
Amount of images digitized.
Number of unique items digitized.
Level of metadata creation.
Any post processing work done on site.
Average number of images digitized per day.
Types of material digitized.
The biggest challenge with digitization can be estimating how long a project will take and these broader set of statistics offer information and data that is hopefully useful to others thinking of undertaking a digitization project, showing how much was achievable within a given timeframe, with differing variables of equipment used, amount of metadata needed to be completed and types of material to be digitized.
METRO’s Culture in Transit Project Statistics
CIT Collection Information
Once the project partners collections were published on DCMNY, we kept separate collection statistics that differed from the types of detail in the other project statistics. These statistics tracked;
If institution is planning to add more items to the collection.
Date collection went live in DCMNY.
Date collection went live in DPLA.
Number of objects in the collection.
Number of compound objects in the collection.
Number of single file objects in the collection.
Name space/sub-domain of collection.
Name space/ collection name.
Keeping this type of data was important primarily for our own internal uses but also for project reporting to our funders.
METRO’s Culture in Transit Collection Information
Project Partner Feedback
For assessment purposes, we developed two feedback forms that we asked project partners to complete:
Initial Feedback Form for Project Partners:
This feedback form was sent to the project partner after the on-site scanning portion of the project was completed. Through this form we gathered information on project communication, execution and experience of hosting the digitization specialist.
Final Feedback Form for Project Partners:
This feedback form was sent to the project partner 2-3 months after the collection was available online in DCMNY and records were harvested and live in DPLA. Through this form we assessed the extent to which the project partner promoted the digitized content internally and externally once online, if the digitized collection had been used in any way by the institution, and if the institution felt it had achieved its goals in participating in the project or had any additional feedback.
Tracking and Reporting
Upon request, METRO provides monthly or quarterly usage reports of the online collections in DCMNY to project partners.
If the functionality is available from the digital hosting platform/DAMS system to monitor traffic to a particular item or collection, that can be a good way to gauge use and popularity of material. A generic web analytics tool, such as Google Analytics or PiWik can be used in place of a built in analytics tool.
Below is a list of all resources developed and used for community scanning events. These resources are located throughout this section but this list provides a quick-glance locator for the resources.
Questions for Culture in Transit Initial Screening Interview.
METRO’s Culture in Transit Initial Feedback Form for Project Partners.
METRO’s Culture in Transit Final Feedback Form for Project Partners.
Institution Scanning Blog Posts
We used our blog as a way to track progress and record our experiences throughout the project. Below are a list of posts we have written about our institution scanning work that people may find useful/helpful.
Wrapping It All Up: Institutional Scanning at Fordham University, July 2016
The Hall of Fame for Great Americans: Institutional Scanning at Bronx Community College, March 2016
Digitizing Key LGBT Material: Institutional Scanning at the LGBT Community Center, December 2015
Soaking Up Some Art and Culture: Institutional Scanning at Hostos Community College, November 2015
CIT Pilot for Host Institution Scanning: Wildlife Conservation Society Archives, July 2015