SECO: Mediation Services for RDF Data Integration

Andreas Harth (DERI/National University of Ireland, Galway)

http://sw.deri.ie/~aharth/

Thanks to #rdfig for inspiration

What is the Problem?

People use RDF and put RDF data online

This is great, but: how to make use of all the data?

The Web of RDF Data

Friend-of-a-Friend (FOAF)

<rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:foaf="http://xmlns.com/foaf/0.1/">

  <foaf:Person>
    <foaf:name>Andreas Harth</foaf:name>
    <foaf:firstName>Andreas</foaf:firstName>
    <foaf:surname>Harth</foaf:surname>
    <foaf:nick>aharth</foaf:nick>
    <foaf:mbox rdf:resource="mailto:andreas.harth@deri.ie"/>
    <foaf:homepage rdf:resource="http://www.harth.org/andreas/"/>
    <foaf:phone rdf:resource="tel:+353-91-512-651"/>
    <foaf:workplaceHomepage rdf:resource="http://www.deri.ie/"/>
    <rdfs:seeAlso rdf:resource="http://www.isi.edu/~aharth/foaf.rdf"/>
    <foaf:knows>
      <foaf:Person>
	<foaf:name>Jose Luis Ambite</foaf:name>
	<foaf:mbox rdf:resource="mailto:mailto:ambite@isi.edu"/>
	<rdfs:seeAlso rdf:resource="http://www.isi.edu/~ambite/foaf.rdf"/>
      </foaf:Person>
    </foaf:knows>
  </foaf:Person>
</rdf:RDF>

RDF Site Summary (RSS 1.0)

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
 xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">

   ...

   <item rdf:about="http://slashdot.org/article.pl?sid=04/05/21/1245258">
      <title>Internet Grocery Shopping Slowly Gaining Ground</title>
      <link>
        http://slashdot.org/article.pl?sid=04/05/21/1245258
      </link>
      <description>
        bakreule writes "Online grocery shopping, once the laughing stock of the
        internet, has quietly started gaining ground. It seemed that the idea had
        been killed ...
      </description>
      <dc:creator>michael</dc:creator>
      <dc:subject>internet</dc:subject>
      <dc:date>2004-05-21T13:29:00+00:00</dc:date>
      <slash:department>three-bags-of-cheetos</slash:department>
      <slash:section>articles</slash:section>
      <slash:comments>99</slash:comments>
      <slash:hitparade>99,95,84,58,28,12,5</slash:hitparade>
   </item>
</rdf:RDF>

The Web of RDF Data

→ Islands of machine-readable data.

Solution

SECO integrates all kinds of RDF data.

RDF data plus possibly lots of other sources (MySQL databases, IMAP4, search engines...)

Google vs. SECO

Search Engine Data Integration
documents data
crawl documents access databases
mirror the web distributed architecture
build index/calculate rankings once a week/month integration on demand
specialized indices for web sites, newsgroups, pictures... schema mapping
keyword search on index queries

Big Picture

RDF Crawler: Scutter

Scutter

rdfs:seeAlso

Wrappers/Legacy Data Sources

Provide RDQL query facilities for:

Search engine wrapper

All SECO Components...

... have an RDQL interface and return RDF as result

"A mediator is a software module that exploits encoded knowledge about some sets or subsets of data to create information for a higher layer of applications." (Widerhold 1992).

Parallel Access

Parallel Access

Object Consolidation

Object Consolidation

Schema Mapping

Mediated term Native term
title dc:title, rss:title, foaf:name, ical:summary, gg:title
desc dc:description, rss:description, foaf:mbox, ical:description, gg:snippet
date dc:date, ical:value, gg:date

Schema Mapping

User Interface

User Interface

Implementation

Implemented in Java.

Summary of Demo

Future Work

Pointers

SECO demo at http://seco.semanticweb.org/

SECO homepage with source under BSD-style licence at http://sw.deri.ie/2003/seco/.

Article in IEEE Intelligent Systems: Semantic Web Challenge.

Thanks to Stefan Decker and Jose Luis Ambite for discussion and support.