Fedora and Microservices

In this post, I want to discuss repository architecture philosophies, although I will focus primarily on Fedora and California Digital Library microservices, there are some generalizations one can pull out of this. It would also be interesting to pull in some very different repository models, like iRODS or a triple-store-backed system, but that’s outside of my expertise.

The basics

This is not a section I really want to write, but I don’t know of a high-level answer to “when we say repository, this is what we mean”. I spent a little time looking around for a summary, but more often than not I found more questions (or, perhaps more useful yet inappropriate for my purposes, technology-based answers rather than use-driven), so I’ve taken a stab at addressing what I believe are some key issues:

Repositories are a collection of services, with well-defined interfaces, for storing and managing data (both content and metadata) in a format-neutral, display-independent manner way. Repositories can be used as preservation repositories, as access repositories, as centralized aggregations of far-flung data, etc and operate on any scale for any audience. Furthermore, there are existing standards and agreements about what it means to be a certain type of repository (TDR, OAIS, etc). All of these repositories, however, share some common services — whether implemented as software, external processes, or manual processes.

Some essential repository services are:

  • Identifier services, which may include assignment + registration
  • Storage services (although the content stored may be only pointers to the “actual” content)
  • Content identification, matching identifiers to content items
  • Ingest workflows
  • Access mechanisms

Without these services in place, a repository system would face some difficult obstacles in creating and providing value-added services. Repositories may provide multiple flavors of these services, some of which may be defined in generally accepted standards, models, and specifications.

Other basic services which operate on top of the above services are fairly common in most well-developed repository frameworks include:

  • Dissemination services, to transform repository data into other forms + formats
  • Authorization services

More advanced services may include:

  • preservation services, including checksum (generation + verification), file format migration, support for models like LOCKSS
  • relationship services, using an RDF triplestore or similar, offering SPARQL endpoints, interferencing, etc
  • discovery services, using Lucene/Solr/etc, to provide relevancy, optimized user experience, drill-down faceting

These more advanced services are likely separate applications in the repository ecosystem and are generally useful utilities independent of any repository system. Repositories generally integrate with these external applications in a modular, mix-and-match manner using well-defined interfaces.

Fedora

One approach to repository services is the “repository-in-a-box” model, where you can install and configure a base set of services provided by a single application. Within this group of services, Fedora provides a very basic implementation of the core repository services (vs a full-stack application like DSpace, which provides production-ready user interfaces). Fedora bills itself as a Flexible, Extensible Digital Object Repository Architecture.

  • Identifier services, through PIDGen which provides sequential identifiers per-namespace
  • maps http uris to deferenceable uris to files
  • REST + SOAP APIs for Ingest + Delivery
  • Dissemination services using WSDL
  • Authorization using XACML (and authentication using a number of plugins)
  • Integrates with the Mulgara triplestore and a Lucene index (by default)

Fedora provides a many opportunities for customization and enhancements through custom development:

As services go beyond the basic, common applications present in institutional repositories, enhanced repository services require custom development or supplemental services outside of the repository services. For most, this includes integration with a more advanced search provider (like Solr). At some point, additional services can blur the lines between the repository services and front-end user interfaces (which have to respond to local customization to meet user needs).

Repository-independent services, or third-party services, require some wrapper to make them interoperable with the Fedora APIs, which makes integration with existing technology more difficult. Even Duraspace’s Duracloud offering is (currently) built as separate services with some possibility of storage-level integration. Preservation support services will bypass the repository APIs and provide those services against the file system instead.

Considering the services Fedora doesn’t provide or the obstacles Fedora creates in integration, many ask why they should start using Fedora anyway. The strongest response to this, I believe, is that it provides a common structure to basic repository services, while at the same time not creating major obstacles to future expansion or migration outside Fedora. Out of the box, Fedora provides a set of “training wheels” (ht Mike Giarlo <http://lackoftalent.org/michael/blog/>) for repository services development that can be removed when unnecessary, but in the meantime offers structure for the creation of new repositories and support for repository services as needed.

CDL Microservices

Another approach to repository services are “microservices” like those designed by the California Digital Library (CDL), provide standards and specifications for individual repository services, which form a structure for standardized, mix-and-match repository services that can integrate, interoperate and take advantage of existing technology independent of a repository application like Fedora. This, conceivably, allows all domain developers to take advantage of these common projects without using a specific technology. CDL provides microservices specifications for:

  • identifier assignment + registration, using NOID, which can act as a CLI tool or a CGI service
  • file-system structures, using the Pairtree convention
  • data exchange and verification, using BagIt
  • access standards, using the ARK URL format

The standards are developed inline the “UNIX philosophy”:

Write programs that do one thing and do it well. Write programs to work together. — Doug McIlroy

These basic services can be organized and crafted using the existing capabilities in web servers, file systems, etc. More advanced services can act within this structure, using individual standards when needed. While significant development and customization may be required to get a microservices architecture to a useable state, the end result is more flexible and targeted to an institutions needs.

Flexing Fedora

These two approaches are certainly not incompatible, and Fedora is quite capable of using some of these micro-services standards under the hood (replacing custom developed approaches to these basic services). By taking this approach, Fedora could act as a management application on top of generic repository data, allow both Fedora-based and microservices-based services to operate on the data, and make it easier to reach around Fedora when necessary (or, go so far as to remove it entirely).

What follows is a short summary of on-going work in this area, which mostly focus on removing the Fedora-centric definitions of /how/ or /where/ services act. The majority of these ideas build on new developments and best practices (since Fedora was initially created) in the repository community as a result increased adoption or awareness of issues. Where available, I’ve included links to projects in-the-works.

Some of this work is quite easy to do:

Other projects that are more involved, and require more work than just creating new modules for Fedora:

More advanced microservices integration is highly involved and would require a major re-work of the application:

  • Two-way messaging queues (or file alteration monitors, or database update hooks) to allow Fedora to receive updates
  • decreased reliance on self-generated registries, I think the situation is getting better, but I’m not sure its fully there..
  • pluggable storage modules with intelligent filtering, routing, multiplexing, and rules mechanisms — the Akubra project may be doing (part of?) this <http://www.fedora-commons.org/confluence/display/AKUBRA/Akubra+Project>
  • workflow support hooks, to allow integration and automation of workflow tools (possibly a result of Hydra?)
4 people like this post.

Posted in Repository.

Tagged with , , .


PBCore 2.0: What I’d like to see

This is a short writeup of things I would like to see present in PBCore 2.0, which is currently in progress. It reflects my own personal opinions, etc.

One of the biggest challenges that PBCore 2.0 will face is determining how all-encompassing a standard it should be. Media organizations create a large variety of assets through diverse mechanisms for a wide range of purposes with any and all possible skill sets and technologies. Billed as the metadata standard for public broadcasting, it probably needs to respond to everyone’s needs and avoid requiring the impossible or limiting the foreseeable. It is for this reason I believe the most important thing PBCore 2.0 can do is provide a structure and framework for metadata without proscribing “the one true way”. To do this, PBCore 2.0 must be flexible, and more importantly, extendable if it is going to succeed.

These ideas probably fall outside “core” PBCore-compliance, but would enhance the descriptive power of the schema. All it would take are two considerations during the development of PBCore 2.0: a permissive data model and (more importantly) a system and place to document and describe standard extensions, best practices, and implementations.

One of the biggest strengths of PBCore 1.x, as I’ve written earlier, is the vast data dictionary that is the combination of a number of siloed applications full of current data. In PBCore 2.0, I truly hope due consideration is given to linked data and semantic ontologies to provide an easy way for an organization or individual to supplement a core vocabulary with a purpose-driven vocabulary for describing assets (the EBU’s P-META classification schemes have taken the first tentative step into this realm and are well worth a look) . This could be done as simply as providing URL-based references to data dictionary values, e.g.:

...

RDF Schema
wikipedia.org

...

This system could be easily extended (in a standardized way) to provide data dictionary descriptions, relational information (sameAs, parentOf, etc) and more, while allowing some level of basic compliance that can ignore the extension.

Other extensions to the schema are probably more complex and would require the PBCore 2.0 schema to be permissive, rather than restrictive. One important (and I’d argue, essential) example of this is temporal + spatial media fragments, which could allow a system to describe, in some level of detail, fragments of an asset. This could be represented like:

...

RDF Schema


...

...

...

(obviously the semantics, describing multiple instantiations, and other issues would need to be worked out..)

I’d like to take this a step further and develop a systematic way of embedding other schemas (presumably designed for describing objects and ideas outside of the core focus of PBCore, such as people and entities, rights metadata, and provenance). By developing some best practices, this could be done in a discoverable and standard way, maybe something like:

...

    Chris Beer

   Chris Beer
   Male
   Mr
   
 
    Rabble-rouser

...

Tools that don’t understand FOAF should be encouraged to ignore these additions, but they provide a rich method of extending the schema in a decentralized and flexible manner.

Again, I’m not calling for the inclusion of advanced (and likely, complicated) features into core PBCore compliance, just hoping that in developing a standard for the future, it remains flexible and extendable to meet the needs of all users while being accessible to all.

Posted in Uncategorized.

Tagged with , , .


Open source happenings

Just some quick notes:

  1. I got a patch into FITS to add some basic video metadata extraction. I’d like to take it further to ensure support for the formats that exiftool supports, but it’s a good start.
  2. Today I pushed out a first release of ave-sync, a media/xml synchronization tool. Also a good start, and should be a starting place to play with the w3c FileAPI in Firefox 3.6.
  3. XForms applications are painful to write, but probably a good choice for XML-based workflows.. more on that later..
1 person likes this post.

Posted in Code.


A Fedora in a Pairtree

The California Digital Library (CDL) has released a number of exciting micro-services specifications for digital libraries. The Fedora repository from DuraSpace takes an opposite approach and has a monolithic applications comprised of a number of modules. With the modular approach, it should be possible to slip micro-services under the hood of Fedora easily.

Here is a first attempt at implementing the Pairtree filesystem hierarchy for Fedora:

package fedora.server.storage.lowlevel;

import java.io.File;
import java.util.Map;

import fedora.server.errors.LowlevelStorageException;

/**
 * @author Chris Beer
 */
class PairtreePathAlgorithm
        extends PathAlgorithm {

    private final String storeBase;

    private static final String SEP = File.separator;

    public PairtreePathAlgorithm(Map configuration) {
        super(configuration);
        storeBase = (String) configuration.get("storeBase");
    }

    @Override
    public final String get(String pid) throws LowlevelStorageException {
        return format(pid);
    }

    public String format(String pid) throws LowlevelStorageException {
        String pt = to_pairtree(pid);
		return storeBase + pt + "obj" + SEP + pid;
    }

    private String to_pairtree(String s) {
		String pt = SEP;
		String src = escape(s);

		int i = 0;
		while(i < src.length()) {
			pt += src.substring(i, i+2) + SEP;
            i+= 2;
		}

		if(i < src.length()) {
			pt += src.substring(i);
		}

		return pt;
    }
    private String escape(String s) {
		/*
		 Fedora PIDs do not support non-visible ASCII or the characters below,
		 so we skip hex encoding:
		 "   hex 22           <   hex 3c           ?   hex 3f
		 *   hex 2a           =   hex 3d           ^   hex 5e
		 +   hex 2b           >   hex 3e           |   hex 7c
		 ,   hex 2c
		 */
		return s.replace("/", "+").replace(":", "+").replace(".", ",");
    }
}

See also: http://gist.github.com/280020

This basic services replaces the Timestamp Path algorithm for FOXML storage and creates a minimally compliant Pairtree. A better implementation could add:

  • Splitting Fedora datastreams into individual files on the filesystem. A first step would be to implement an appropriate managed content mapper
  • Add the appropriate identifier cleaning specified in §3. Much of this was omitted in this implementation, with the assumption that the repository core would handle identifier validation
  • The implementation should support pairtree initialization (§4). The current assumption is the repository maintainer would pre-establish a pairtree hierarchy for Fedora to populate. To do this properly, I think one would need to override the DefaultLowlevelStorageModule to add an initialization step.

Posted in Experiments, Repository.

Tagged with , , , .


jQuery and SVG (and inline SVG)

If you’re using Keith Wood’s great jQuery SVG plugin, you may find that the .css() function doesn’t work on SVG elements, as in:

var elem = $('#test');
if (elem.css('display') == 'none') {
    elem.css('display', '');
}
will generate an error when CSS properties are written, but not when they are read. To address this, add this code to jquery.svgdom.js:
/* Support CSS on SVG nodes. */
var origCSS = $.fn.css;
$.fn.css = function(name, value, type) {
	if (typeof name === 'string' && value === undefined) {
		var val = origCSS.apply(this, [name, value, type]);
		return (val && val.baseVal ? val.baseVal.valueAsString : val);
	}
	var options = name;
	if (typeof name === 'string') {
		options = {};
		options[name] = value;
	}
	return this.each(function() {
		if (isSVGElem(this)) {
			for (var n in options) {
				this.style[n] = (typeof options[n] == 'function' ? options[n]() : options[n]);
			}
		}
		else {
			origCSS.apply($(this), [name, value, type]);
		}
	});
}

I make no guarantees that it works on all platforms or browsers, but I mimicked the way Keith implemented .attr() for SVG elements, using the style attribute instead, so it hopefully has similar levels of portability. So far it works for me in Firefox 3.5 and Chrome 3.0. I'm going to guess that it works in Safari, because old code that used the style attribute worked there as well. No idea about IE, because my SVG doesn't load in that to begin with...

In addition to this, I needed to use the plugin to modify existing inline SVG, which seemed daunting given that the plugin normally created its own SVG canvas to render on. However, by hacking together a few calls to some of the internal functions I was able to get a jQuery SVG Wrapper with which I could call methods such as circle(), etc.:

var theDiv = $("#svg-container-div")[0];
$.svg._afterLoad(theDiv, $("#svg-root-element"), {});
var svgRoot = $.svg._getSVG($("#svg-container-div"));
var svgDoc = $("#svg-root-element");
svgRoot._svg = svgDoc[0];
svgRoot.circle(svgDoc[0], 270, 150, 25, {'id' : 'testcircle', 'fill' : "#ffffff", 'stroke' : '#ff0000'});

I'm not sure that this method of doing things is entirely acceptable as far as using the plugin correctly, but for me it successfully modified the DOM and I was able to reference the created elements without incident afterwards. At this point there was one last thing that I wanted to change about the plugin: when creating an SVG element I needed to specify a DOM element, not a jQuery wrapper. Usually this just means that the jQuery wrapper must be de-referenced to get the node, i.e. svgDoc[0] above, but I would get annoyed having to remember to add the array de-reference, so I modified jquery.svg.js again, this time changing the definition of _args, which handles argument decoding for all of the svg functions:

_args: function(values, names, optSettings) {
		names.splice(0, 0, 'parent');
		names.splice(names.length, 0, 'settings');
		var args = {};
		var offset = 0;
		var vOffset = 0;
		if (values[0] != null && (typeof values[0] != 'object' || !values[0].nodeName)) {
			if (!(values[0].jquery && values[0][0].nodeName)) {
				args['parent'] = null;
				offset = 1;
				vOffset = 0;
			}
			else {
				args['parent'] = values[0][0];
				offset = 1;
				vOffset = 1;
			}
		}
		for (var i = 0; i < values.length; i++) {
			args[names[i + offset]] = values[i+vOffset];
		}
		if (optSettings) {
			$.each(optSettings, function(i, value) {
				if (typeof args[value] == 'object') {
					args.settings = args[value];
					args[value] = null;
				}
			});
		}
		return args;
	},

This should provide the same behavior but allow the results of $(selector) to be supplied directly as the first argument to any SVG drawing methods (circle, rect, etc.), but only the first of these will be painted to, which is most useful when selecting an id, not a class. In addition the previous method of directly supplying an SVG DOM element will still work, as will falling back to using the default internal reference that is established when the SVG canvas is created.

Posted in Code.

Tagged with , , , .


Compiling mod_h264_streaming for lighttpd

In compiling the mod_h264_streaming module for lighttpd on Mac OS 10.4 (Tiger), I hit a few snags following these directions.

From the directions, the first half went smoothly:

wget http://download.lighttpd.net/lighttpd/releases-1.4.x/lighttpd-1.4.25.tar.gz
tar -xvzf lighhttpd-1.4.25.tar.gz
cd lighttpd-1.4.25
wget http://h264.code-shop.com/download/lighttpd-1.4.18_mod_h264_streaming-2.2.0.tar.gz
tar -zxvf lighttpd-1.4.18_mod_h264_streaming-2.2.0.tar.gz
cp lighttpd-1.4.18/src/moov.* src
cp lighttpd-1.4.18/src/mod_h264* src

add these lines to src/Makefile.am around line 266:
lib_LTLIBRARIES += mod_h264_streaming.la
mod_h264_streaming_la_SOURCES = mod_h264_streaming.c moov.c
mod_h264_streaming_la_LDFLAGS = -module -export-dynamic -avoid-version -no-undefined
mod_h264_streaming_la_LIBADD = $(common_libadd)

./autogen.sh
./configure --prefix=/opt/local

make

At this point, I ran into an error with the plugin:

[...]
moov.c:77:22: error: byteswap.h: No such file or directory
moov.c: In function 'byteswap16':
moov.c:96: warning: implicit declaration of function 'bswap_16'
moov.c: In function 'byteswap32':
moov.c:105: warning: implicit declaration of function 'bswap_32'
moov.c: In function 'esds_read':
moov.c:1941: warning: unused parameter 'size'
[...]

Looking around, it seems to be an incompatibility between mac os x and *nix. I found a patch for a similar problem in navit, which loads the right library and function aliases.

Here’s my diff:

--- lighttpd-1.4.18/src/moov.c  2009-06-27 03:58:50.000000000 -0400
+++ src/moov.c  2009-12-07 16:49:50.000000000 -0500
@@ -73,11 +73,12 @@
 #define DIR_SEPARATOR '\\'
 #endif

-#ifndef WIN32
-#include 
+#include

+#define bswap_16 OSSwapInt16
+#define bswap_32 OSSwapInt32
+#define bswap_64 OSSwapInt64
 #include 
 #define DIR_SEPARATOR '/'
-#endif

 uint64_t atoi64(const char* val)
 {

After that, you can just continue merrily on..

make
make install

The directions for testing the plugin are a little buried on the maintainer’s site, but once I found them, everything seemed in order.

Posted in Uncategorized.

Tagged with , , .




Teaching PBCore, Questions and Notes

The questions below are loosly based on those raised by particpants in the introduction to XML workshop presented at the Association of Moving Image Archivists 2009 conference in St. Louis, MO on 3 November.

In general, tangible examples are crucial to the teaching and understanding of PBCore. At present, the PBCore examples are hap-hazard and follow little logical progression. An improvement in this area would be beneficial to the adoption of PBCore. In addition, tools should be created to support new PBCore-based applications which would make distiguishing between well-formed XML, valid PBCore, and PBCore that conforms to a community of practice easier.

- Where are the XML attributes?
After an introduction to XML, which taught the partipants about the basic building blocks of XML (elements, entities, and attributes), the lack of attributes in PBCore was confusing. Rather than:

<title type="Program">Jimmy Carter</title>

PBCore requires:

<pbcoreTitle>
<title>Jimmy Carter</title>
<titleType>Program</titleType>
</pbcoreTitle>

As a developer, the additional mechanics to parse each type, each authority, or each role are annoying copy+paste jobs, but it is clear that even those new to XML develop the same expectations. With some of the recent developments from DCMI to make Dublin Core more relevant to the changing metadata landscape, it seems like PBCore has failed to evolve.

The reason, as best I can determine, is the PBCore 1.x schema was developed based on existing XML exports from a relational database where that convention is born out of the need for a semantically agnostic schema rather than proper schema creation.

- What is PBCore’s relation to Dublin Core?
PBCore is introduced as being a derivative or extension of Dublin Core, but for some shared element names, there is no obvious relationship. This should either be clarified in future development or dropped.

- What is the difference between the formatPhysical, formatMediaType, and formatGenerations?
These three instantiation-level metadata elements all describe similar problems slightly differently

* formatPhysical (or formatDigital, perhaps) describes the carrier format, which may be independent of the content on the carrier
* formatMediaType describes the content present on the carrier
* formatGenerations describes the type of content on the carrier

The PBCore value lists could be clarified to remove some of the current (seemingly) redundant information

- Why are formatPhysical and formatDigital formatted different? Or, why wouldn’t one use multiple instantiations to express the different formats for which an item is available?

The value list for formatDigital is based on the IANA MIME type registry, while the formatPhysical list is the aggregate of the source elements, which is reflected in the inconsistency of formatting. Could the formatPhysical list become more cohesive and resemble MIME types?

The relation between current instantiations is, at best, unclear and not systematic. The biggest flaw in the current approach is that it is difficult to express the provenance of an instantiation and it’s relation to the intellectual work. The current situation also breaks the 1:1 correspondance between an instantiation and a carrier/file/etc. Some major restructuring, possibly breaking backwards-compatibility is necessary to correct these issues. In the meantime, I would recommend creating a new instantiation for each instance and using the pbcoreAnnotation field to supply basic provenance information.

- The PBCore outline graph is confusing.
As is, the outline graph mixes XML elements with conceptual groupings which makes it confusing to someone new to XML or to PBCore. The graphic could be easily revised to use shaded groups to communicate the content classes, rather than tree nodes.

- The PBCore metadata dictionary picklists provide no definitions or best practices
The metadata dictionary, which may be the most important part of PBCore 1.x, is marginalized on the website. The picklists are offered only as lists and fail to provide appropriate definitions for titleType, descriptionType, etc. Without this guidance, each implementor is forced to make determinations without respect to a community of practice. Taking descriptionType as an example, guidance is needed to describe when to use the format-specific types (program, series, etc) vs the generic type labels (abstract, summary).

- The PBCore website conflates schema rules with best practices
The PBCore website recommends best practices and guidelines for usage closely integrated with the schema requirements. This placement is confusing; while the best practices are very important and are essential resources for understanding, it adds difficulty to the understanding of PBCore.

- A schema-validating XML editor complains when the XML document lacks recommended or optional fields
In particular, oXygen indicates to the user that fields like pbcoreGenre are REQUIRED for conformance to PBCore, while the website leads one to believe this is not the case. In fact, this should not be the case because genre is very specific to broadcasting/traffic needs and will likely be missing in general usage.

This leads me to believe that PBCore should examine the approach the TEI community took with regard to modulization. Proper modulization would provide implementors with a relevant set of metadata elements necessary for use, and perhaps make it easier to integrate PBCore with other metadata schemas (for example, a rights schema or technical metadata standard), leaving PBCore responsible for description and rules for aggregation.

- How do you exchange records? Or, how can I put multiple description documents in the same file?

A PBCoreDescriptionDocument, according to the PBCore schema, should have only one document per file, which is common XML practice, but unknown to those new to XML. Participants were attracted to aggregations as a way to deliver contextually complete documents containing metadata records for relations, etc. Other standards have explored aggregations independent of standards (say, Atom or OAI-PMH), which is probably a more-sound approach.

- Extensions are hard, confusing.
Yep.

Posted in Uncategorized.


15 ways to improve PBCore

This is a post describing shortcomings and potential improvements for PBCore, an XML markup for media material interchange. These suggestions try to work within the current confines of PBCore, rather than introducing radical changes (which could bring PBCore more in line with the rest of the XML and linked data worlds). Further, we recognize the strength of PBCore is in descriptive metadata, and these suggestions are primarily to strengthen those components, rather than trying to compete on technical metadata.

  1. Define what all the data dictionary elements mean — “clip”, “element”, “actuality”, “version of”, etc. These need to be defined in order for the community to better apply consistently. Other communities have come up with these already – we just need to determine which ones apply to which elements.  See for example, the European Broadcasting Union does a nice job of distributing machine-readable XML definitions for their data dictionary.
  2. Enhance semantics of relation types by creating an ontology (using rdfs or similar, like the Fedora RELS-EXT ontology) – eg. instead of simply “version of” allow “derivation of”, “copy of” “identical to” etc.
  3. PBCore only has contextual date on individual instantiations, but we want an overall date with types for created/issued/etc (e.g. the date an interview was conducted). A similar issue exists for locations. Both of these are different from pbcoreCoverage — coverage is about the content, rather than the context.
  4. Format of the content — whether it is an interview, a panel discussion, a live event, b-roll, beauty shots, etc. formatGenerations provides a piece of this puzzle, but this is ultimately descriptive metadata, which probably don’t belong in an instantiation. EBUCore provides for part of this with a controlled vocabulary for editorial formats, but it’s not granular enough (e.g. Discussion/Interview/Debate/Talkshow). Our suggestion is to explore enhancing the genre data dictionary to include archival descriptors like “interview” “b-roll”, which would solve this in a backwards-compatible way.
  5. Machine parseable rights language; we’re embedding the Open Digital Rights Language (ODRL) as a member of pbcoreRightsSummary, but it would be nice to have a common way to express rights (both rights the publisher has, and rights granted by the publisher to the user). An alternate (and perhaps desirable and necessary) solution would be to at least investigate better ways to combine PBCore with established schemas like ODRL, MODS, etc.
  6.  A way to identify the primary title and description of an asset, for use in a discovery interface. Existing solutions, like picking titles based on hierarchy, or using a separate metadata document, are flawed.
  7. A formal way to order, prioritize, and relate instantiations within a record (e.g. programs within a series, provenance/hierarchy of digital instances).
  8.  A way to label the type for a pbcoreSubject is (e.g. person, organization, place, date, etc), in addition to the existing authority reference.
  9. Authority references should be available in most (if not all) PBCore containers, which could help enable linked data applications. This could be accomplished through new xml attributes, which would be ignored by legacy applications, and perhaps better in line with other standards.
  10. Better handling of “element” level materials, for archival raw footage and similar. Finished programs are handled decently in the existing PBCore, but the data dictionaries aren’t prepared for this level.
  11. Adopt proper RDF relationships for PBCore relations.
  12. Consider adding educational levels and standards. PBCore currently addresses this tangentially with audienceLevel and audienceRating.
  13. Better way to handle metadata about people, whether by enhancing the existing structure, supporting an hCard microformat, or otherwise.
  14. Semantics to deal with thumbnails for discovery interfaces, or how to attach visual representations/facsimiles of a PBCore media instantiation. This is probably a low priority, nice to have change.
  15. Content flags, which include advisory messages about sensitive content, are regularly created for broadcast programs, but PBCore doesn’t provide a way to capture these. Perhaps the best way here is to add time-based metadata to the descriptive material (but, then, what do you base the timecode against? See next.)
  16. BONUS: Add timecode information to instantiations and relationships to identify sections of content, in order to support time-based metadata, content flags, etc.

Posted in Uncategorized.