DCAM Revision/TeleconReport-20120229

From DCMI_MediaWiki

Jump to: navigation, search
2012-02-29 DCAM call - 11:00 EST - report

Present: TomB, JonP, StuartS, DianeH, RichardU, Michaelp, GordonD, DanielL (IRC), CoreyH
This report: http://wiki.dublincore.org/index.php/DCAM_Revision/TeleconReport-20120229
Agenda:      http://wiki.dublincore.org/index.php/DCAM_Revision/TeleconAgenda-20120229
Previous:    http://wiki.dublincore.org/index.php/DCAM_Revision/TeleconReport-20120115
 
Richard: Each of the encoding technologies we may want to use have different
    abilities.  Some things are easier, some are harder, to express.  Defining
    "mandatory/applicable" will depend on underlying capabilities of these encoding
    technologies.

Tom: The original motivation behind DCAM was to make comparisons -- specify
    which of DCAM's features can be expressed using this or that encoding syntax,
    and which cannot.

Richard: At an abstract level, not how DCAM can be expressed in XML, etc.
    Rather, looking at constraints.  Is there some systematic way we could 
    look at Karen and Gordon's examples? For example, the idea of properties 
    being mandatory or not.  How do we enable validation, assuming two different 
    kinds of validation - RDF-type and XML-type.

Corey: Richard, you have hit on the crux of the problem.  That is why we
    separated out the constraint language. But this is very incomplete. I like
    thinking of DCAM/DSP as a single abstraction that talks about how to map
    graph-based metadata into constructs that are more validatable. Creating a new
    concrete syntax on top of RDF/XML that allows validation doesn't seem like a
    good goal.  Rather, how to construct using RelaxNG, or Pellet (which I do not 
    fully understand).
    
Jon: The method of validation shouldn't matter.  Not on top of RDF, but alongside RDF.
    should define a metadata record in the abstract: what is being described and what 
    properties describe it in a 'closed world'.  What the constraints on those properties 
    are in a 'closed world'.  And how to describe those 'closed world' things to the rest 
    of the world.
    
    Corey: jonphipps++

Richard: Harder in an XML environment to do ontology sorts of things.
    Different technologies have different types of validation.  Syntactic versus
    ontological validation: very different things.

Diane: But isn't what you can do in a particular syntax part of how you choose
    what syntax to use?

Tom: DCAM is about the syntax bit. Describing "things in my metadata" versus 
    describing "things in the world".
    
Jon: Both XML schema and OWL ontology describe domain-specific knowledge, even 
    when that knowledge domain is based on a 'community domain'.

    Diane: I get that, I was responding more to the 'be everything to
        everybody' kind of requirement I seem to be hearing

Corey: Look at Singapore Framework Document to reinforce what Tom's saying: 
    http://dublincore.org/documents/2008/01/14/singapore-framework/singapore-framework.png

    Diane: Corey ++

Richard: Perhaps this gets back to my original question about whether Karen's
    list is about DSP or DCAM?  One-to-one is a core *constraint* of DCAM.

Corey: Jon was saying DCAM should define metadata record in the abstract - how
    it should look. If you look at SF, DCAM is the foundation for things on the
    right side of the diagram (diagrams). Ontology is defined to the left (domain
    model). DCAM used to convert ontology into record-like objects. Maybe this
    requires Pellet, maybe XML, etc. My big use cases are SOLR and Lucene.  We want
    to be technology-agnostic.

Jon: You create and validate metadata as records, you distribute and aggregate
    it as either records (XML, marc21) or collections of statements (RDF, named
    graphs) or both.  DCAM stradles those activities.  That's its value.

Corey: Basically understanding the syntactic needs of a particular application
    (and whatever validations and linting tools you have in that environment) and
    mapping a domain model, defined in a domain model and ontology, into that
    application specific environment.  In short it *has* to be implementation 
    syntax agnostic (but still based on the RDF data *model*).

Richard: Put "application profile" sorts of things into DCAM itself?

Jon: We should work hard at not conflating the requirements of creation/storage
    and the requirements of distribution/aggregation.  RDF by definition exists in
    an 'open world'. This is intrinsic to the RDF model.  DCAM describes a domain model, 
    a 'closed world' by definition. This is intrinsic to the DCAM model.

Diane: If you look back at what libraries traditionally done - cramming things
    instead of one-to-one. What Jon is saying about where records and aggregations
    fit into this is very important. I have used DCAM in teaching.  It changes the
    way we think about what a record is, and makes it easier to explain.  I like the 
    "nested boxes" diagram of DCAM.

Gordon: Different librarians have different ideas of what constitutes a
    "thing", so Richard's dark pattern example is actually widespread practice.

Richard: How are we using "open" and "closed" worlds here? My understanding of
    these terms is about completeability of reasoning about records, but it seems
    like there is a different sense here?  Or rather about "representations" - it
    is so easy to slip into talking about "records".

Corey: See Roy Tennant's blog post at [1], and especially Ed Summers's response:
    "RDFa + Schema.org" versus "Microdata + Schema.org" is a false dichotomy.
    Conflating carrier with [carried]. Express things built on RDF graph, but publish 
    in microdata. If DCAM also supports this idea - "here's how to construct HTML5 
    microdata" - that's the kind of guidance we need.

    Jon: Corey++
    Aaron: Corey++
    Gordon: Maybe Microdata needs a DC application profile?
        Corey: @GordonD: I'd argue yes. But I'm not sure what that looks like...

    [1] http://www.thedigitalshift.com/2012/02/roy-tennant-digital-libraries/why-microdata-not-rdf-will-power-the-semantic-web/

Jon: It's also about 'validation'. We are talking about records when we talk about DCAM.

Jon: @chrpr Microdata vs. RDFa is an implementation detail to which the DCAM should be 
    agnostic.

Richard: Jon, validating an instance as conforming to some syntax seems closed,
    but I don't think this necessarily closes an "open" world for other kinds of
    reasoning.  Microdata is just another example of representation.

Jon: Richard, metadata exists both within my world and outside my world. Within
    my world _I_ say what's syntactically valid and logically consistent.

Richard: Closed world also seems important to traditional IR and library practices. 
    I.e. being able to infer that if I don't get a search hit means there isn't a 
    resource that meets my query.

    Jon: Outside my world, others say that about metadata in their own context.
        My personal notion of consistent and valid _means nothing_ outside my
        domain, but it means everything to me.

    Richard: Jon, but isn't DCAM/DSP, etc. a way for you to tell me what you 
        think is valid? for the purposes of interoperability?

Tom: With "Dublin Core application profile", are we talking original Dublin Core fifteen? 
    Do these need an application profile of their own? A Microdata-based AP? That raises 
    the question: "are these the right 15"?

Diane: Corey, don't ask questions you don't want an answer to, eg 'the 15'.

Corey: What's the use case of a generic DC-15 application profile? Answer: OAI-PMH.

Diane: We've been talking about a separate AP for the 15 for a long time, should be separate 
    from DCMI Metadata Terms.

Richard: OAI-PMH seems dead.

    Diane: Richard, I don't think OAI-PMH is dead. It never was the ginzu knife
        people wanted anyway.

    Richard: Diane: , what I mean by that is there doesn't seem to be any new
        development going on. I think it will continue to be used for a long time
        out.

    Diane: In a coma maybe.

Corey: Schema.org microdata could use APs built on top. SOLR application
    profiles. Set of good practices how to map triples. Even EAD - how to extract
    triples from.

    Richard: i.e. corey's suggestion of updating the OAI-PMH DC schemas to include
        Qualified as part of the basic syntax.

Jon: Once we start sharing metadata, then both of us need to be able to apply
    our personal notion of valid and consistent to each others data.
    We either need to a priori agree (ancient practice) or be able to communicate 
    our domain knowledge about valid and consistent in order to reconcile differences.
    That communication is facilitated by a syntax-agnostic DCAM.

    Richard: Jon, true, but I'd rather not have to guess what you thought was
        valid, even if i apply my own notion.

Tom: Carrying forward the ACTION from Kai, who could not be here today:

        ACTION: Kai and Tom to work on technical part in wiki, e.g.:
            http://wiki.dublincore.org/index.php/DCAM_Revision_Tech
            http://wiki.dublincore.org/index.php/DCAM_Revision
            http://wiki.dublincore.org/index.php/DCAM_Revision_Scratchpad
            http://wiki.dublincore.org/index.php/DCAM_Revision_Graphics

    What about the ongoing action on Richard?
        
        ACTION: Tom and Richard to put placeholder for introductory text into wiki document
            at http://wiki.dublincore.org/index.php/DCAM_Revision_Draft.

Richard: In order to write an introduction, we need a bit more consensus. We
need a better tie-in with Interoperability Levels.

ACTION: Tom distribute Doodle poll. 

    Note: ACTION completed 2012-02-29 - see poll for next DCAM call in the 12-23 March
        window: http://www.doodle.com/ikx4236dmd7bqdtn

Jon: True. What's valid and consistent to me means nothing to you _even if_
    we've agreed on things like format.  It's a matter of negotiation and
    reconciliation. Applying your notion of valid in your domain to data that may
    have been provided by me.

Corey: jonphipps++ negotiation_and_reconciliation++

Diane: Task-oriented stuff might be better in user documentation.
    In terms of examples in Karen's list, Gordon's too - but not the same as
    user tasks. These are more use cases for things we know we must be able to
    express.
    
Richard: Jon, it seems that trying to get everyone to use fixed approaches
    (i.e. everybody use Dublin Core) just leads to people applying their own
    understanding of DC that is opaque to me as an aggregator.
    Conversations about user tasks sometimes get bogged down.

    Jon: Diane, task orientation and use cases are a good way to drive
        requirements and to determine if the spec satisfies.  You're usually the
        person complaining that practitioners don't have enough say in these
        'abstract models'.

    Diane: Jon, not sure we've identified who the 'users' are yet--I don't
        think they are who we normally think of as users.

Richard: Because a data provider's sense of valid isn't available to me, I need
    to do a lot of work to figure it out from patterns in the instance data.
    I'd like to see DCMI find ways to make it easier for data providers to share 
    their perspectives on metadata.

    Jon: Richard, that's why you need a DCAM from me expressed as a validation
        schema and an ontology.

    Diane: yup, agreed

    Richard: Jon, a DCAM? or an AP (which was built using DCAM)?

    Jon: Richard, I need a schema. You need an ontology. We both could use a single
        implementation-agnostic abstract model (DCAM) to define those things.

    Richard: Thank you, Jon, it seems like a hard task to be agnostic, but I
        appreciate the challenge of trying to think through this.
Personal tools