Web Tripping

Sarven Capadisli

Web Tripping

BFH Web Tripping , Bern, 2015-04-13 #Linked Data #BFH

Sarven's avatar Sarven Capadisli https://csarven.ca/#i @csarven

Are you in the right room?

Lecture Stuff

Lecture Overview

  • Web / Linked Data
  • HTTP, URI (the fundamentals)
  • RDF data model (language) and syntaxes
  • SPARQL (querying for RDF)

Technology Context

Figure of Semantic Web Technologies

Illustration by Sandro Hawke

Hypertext Transfer Protocol

HTTP

  • An application protocol that is at the core of data communication on WWW
  • Version 1.1 used mostly, 2.0 on the way
  • It has Request methods (e.g., GET, POST, PUT, DELETE)
  • Responses with status codes (e.g., 200 OK, 301 Moved Permanently, 303 See Other, 404 Not Found) [seeAlso cats]

HTTP Session

  • HTTP GETing a particular representation of a resource:
  • Includes HTTP Accept headers when requesting, HTTP Content-type in response

Content negotiation

http://dbpedia.org/resource/Switzerland has no information on its nature. Can it return HTML, RDF?

  1. Request: I accept HTML and RDF/XML, prefer HTML
  2. Response: Okay, go to http://dbpedia.org/page/Switzerland
  3. Request: I accept HTML and RDF/XML, prefer HTML
  4. Response: Okay, I have an HTML representation

Dereferencing a URI to HTML

  • curl -iLH "Accept: text/html" http://dbpedia.org/resource/Switzerland (requesting an HTML response)
  • curl -iLH "Accept: text/html;q=0.9, application/rdf+xml;q=0.8" http://dbpedia.org/resource/Switzerland (preferring an HTML response)

Dereferencing a URI to RDF

  • curl -iLH "Accept: application/rdf+xml" http://dbpedia.org/resource/Switzerland (requesting an RDF/XML response)
  • rapper -i rdfxml http://dbpedia.org/resource/Switzerland

URIs / IRIs

URIs / IRIs

  • A URI identifies (refers to or names) a web resource
  • They are unique and essentially represents some thing (e.g., document, concept)
  • IRI (Unicode/ISO 10646) is a generalization of URI (ASCII character set)

URI Syntax

<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

More at: http://en.wikipedia.org/wiki/URI_scheme

See also RFC 2396

Example IRIs

  • http://www.ietf.org/rfc/rfc2396.txt
  • http://en.wikipedia.org/wiki/Neuchâtel
  • mailto:John.Doe@example.com
  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • ldap://[2001:db8::7]/c=GB?objectClass?one

Example IRIs

  • news:comp.infosystems.www.servers.unix
  • irc://irc.freenode.net/csarven,isnick
  • telnet://melvyl.ucop.edu/
  • gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
  • urn:uuid:96550970-f26f-41f0-9dbd-db4c0d522889

Example IRIs

  • urn:isbn:9780812696110
  • doi:10.1000/182
  • bitcoin:1H67NnpSGAUrSRA5SkPHMmqNqHqpXHuFGp

HTTP URI design patterns in the wild

  • http://{lang}.wikipedia.org/wiki/{Article}
  • http://dbpedia.org/resource/{Article}
  • http://dbpedia.org/property/{id}
  • https://twitter.com/{username}
  • http://reddit.com/r/{subreddit}

HTTP URI design patterns in the wild

  • http://imgur.com/gallery/{hash}
  • http://www.flickr.com/photos/{username}/{id}
  • http://delicious.com/tag/{id}
  • https://csarven.ca/#i
  • http://worldbank.270a.info/dataset/{id}

URI design patterns in the wild

  • http://creativecommons.org/licenses/{type}/{version}/
  • http://moodle.bfh.ch/course/view.php?id={id}
  • http://www.wirtschaft.bfh.ch/{lang}/{degree}/{id}.html
  • {firstname}.{lastname}@student.bfh.ch

Cool URIs?

  • Dedicated service (1, 5, 100 years?)
  • Consistent patterns
  • Re-use existing identifiers
  • Link multiple-representations

Cool URIs?

  • Avoid ownership, versions (usually), auto-increment
  • Avoid query strings, file extensions

A must read is TimBL's Cool URIs don't change

See also 10 Rules for Persistent URIs (see it as a rough guideline, not musts)

Tools and services

Identifying things

Now! That should clear up a few things around here. The Far Side

Identifying things

“Any resource of significance should be given a URI.”
Tim Berners-Lee, W3C

Data

  • Data is everywhere (personal, government, health, events..)
  • Uncovering insights
  • Predictions
  • Making decisions (e.g., where to save energy)
  • Smarter systems

Data: What is it good for?

Absolutely everything!

  • Understanding human societies
  • Health conditions
  • Stable economies
  • How do/should things work?

Technology Flow

Figure of the flow of technologies

Core idea

  • Structured data available on a global scale
  • Connect related data items across multiple sources
  • ?
  • Profit

Why Linked Data?

  • Classical data management vs. distributed Web
  • Development using standards
  • Open things up
  • Gradual and sustainable

Linked Data Design Principles

  1. URIs as names for things
  2. HTTP URIs so that people can look up those names
  3. When someone looks up a URI, provide useful information, using RDF*, SPARQL
  4. Include links to other URIs for discovery

http://www.w3.org/DesignIssues/LinkedData.html

Linked Open Data

Linked Open Data Cloud

Linked Open Data cloud diagram as of 2011-09

Wikipedia to DBpedia

Transforming Wikipedia data to DBpedia

A grain of rice

Graph of a grain of rice

Linked Data Life Cycles

  • Original data owners
  • Data publishers
  • Data enrichment parties
  • Data consumers

RDF logo RDF

RDF

  • Essentially a language. A Graph model
  • Similar to Entity-Attribute-Value (EAV) data model e.g., Sarven.height = 1.65m
  • To describe concepts and their relationships (many graphs) Graph of subject predicate object
  • Names for resources (URIs) e.g., https://csarven.ca/#i and http://dbpedia.org/property/height

Human languages

How do we express ourselves?

  • Sentences: subjects and predicates (verbs), and sometimes objects
  • Body language meh ;)

Vocabularies

Vocabularies

  • SIOC: vocabulary for online communities
  • Schema.org: various - Google/Bing/Yahoo/Yandex initiative
  • Open Graph Protocol: webpages as part of rich objects in social graph - Facebook

What about OWL?

  • Meant to be an Ontology Language for the Web
  • Very powerful: rich semantic meaning
  • .. but it can also be painful (PITA)
  • Need domain experts
  • Usually need to write domain specific applications

RDF Triples

Graph of subject predicate object

<Subject> <Predicate> <Object> .

  • Different ways to write it
  • Serializations: RDFa, Turtle, N-Triples, RDF/XML
  • Easy to - automatically - transform from one to another
  • A triple statement: URI (spo), _:bnode (so), "literal" (o)

N-Triples

Sarven's height is 1.65m.

<https://csarven.ca/#i>
    <http://dbpedia.org/property/height>
        "1.65m" .

N-Triples

Bern is a city.

<http://dbpedia.org/resource/Bern>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://dbpedia.org/ontology/City> .

Turtle

Sarven is interested in electronic music and monkeys:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wikipedia: <http://en.wikipedia.org/wiki/> .

Turtle

<https://csarven.ca/#i>
    rdf:type foaf:Person ;
    foaf:givenName "Sarven"@en ;
    foaf:interest wikipedia:Electronica , wikipedia:Monkey ;
    foaf:mbox <mailto:info@csarven.ca> ;
    foaf:account <https://twitter.com/csarven> .

Turtle

Switzerland has capital Bern. Bern is a City.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix dbp: <http://dbpedia.org/property/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .

Turtle

dbr:Switzerland dbp:capital dbr:Bern .
dbr:Bern
    rdf:type dbo:City  ;
    geo:lat "46.950001"^^xsd:float ;
    geo:long "7.439583"^^xsd:float .

RDFa example

<p about="https://csarven.ca/#i">
    Sarven is
    <span property="dbp:height">1.65m</span></p>

<p about="dbr:Bern">
   typeof="dbo:City">Bern is a city.</p>

<p about="https://csarven.ca/#i">
    Sarven is in <a rel="foaf:based_near"
                    href="dbr:Bern">Bern</a>.</p>

RDF/XML

<rdf:RDF xmlns:rdf="
  http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="
    http://dbpedia.org/resource/Bern">
    <rdf:type rdf:resource="
      http://dbpedia.org/ontology/City">
  </rdf:Description>
</rdf:RDF>

RDF/XML

<rdf:Description rdf:about="
  http://dbpedia.org/ontology/City">
  <rdf:type rdf:resource="
    http://www.w3.org/2002/07/owl#Class">
  <rdfs:label xml:lang="en">City</rdfs:label>
  <rdfs:label xml:lang="de">Stadt</rdfs:label>
</rdf:Description>

Overview of the RDF formats

  • Turtle: compact, human readable, triple pattern in SPARQL
  • N-Triples: verbose, small footprint (inspection, processing), exchange
  • RDF/XML: verbose, existing toolchains, tree model
  • RDFa: (X)HTML page, for humans and machines
  • JSON-LD: usually client-side dev, tree model

SPARQL Protocol and RDF Query Language

  • A way to query for triple patterns (as opposed to searching)
  • SPARQL endpoints (query public datasets)
  • Federated queries (merge patterns from different source)
  • See also: SPARQL 1.1 Query and SPARQL 1.1 Federated Query specifications.

SPARQL query example

Select a list of subjects that is a city.

SELECT ?city
WHERE {
    ?city a <http://dbpedia.org/ontology/City> .
}

SPARQL query result example

city
http://dbpedia.org/resource/Bern
http://dbpedia.org/resource/Montreal
...

SPARQL query example

People who were born in Bern before 1900:

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?name ?birth ?person WHERE {
    ?person dbo:birthPlace dbr:Bern .
    ?person dbo:birthDate ?birth .
    ?person foaf:name ?name .
    FILTER (?birth < "1900-01-01"^^xsd:date) .
}
ORDER BY ?name

See results from DBpedia SPARQL Endpoint.

SPARQL query

List of project names in countries which are classified to have low to middle income that are situated above the equator:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX property: <http://worldbank.270a.info/property/>
PREFIX country: <http://worldbank.270a.info/classification/country/>

SPARQL query

PREFIX income-level: <http://worldbank.270a.info/classification/income-level/>
PREFIX graph: <http://worldbank.270a.info/graph/>

SPARQL query

SELECT ?projectLabel ?countryLabel WHERE {
  GRAPH graph:meta {
    ?country property:income-level income-level:LMC .
    ?country wgs:lat ?latitude .
    FILTER (?latitude > 0)
    ?country skos:prefLabel ?countryLabel . }

SPARQL query

    GRAPH graph:world-bank-projects-and-operations {
        ?project property:country ?country ;
                 skos:prefLabel ?projectLabel . }
} ORDER BY ?projectLabel

See results at World Bank Linked Data SPARQL Endpoint.

SPARQL query: Federated

CONSTRUCT { .. } WHERE {
{ SERVICE <http://data.linkedmdb.org/sparql> {
    SELECT * WHERE { .. }                    }
}
  UNION
{ SERVICE <http://dbpedia.org/sparql> {
    SELECT * WHERE { .. }             }
}

Tools and services

Tools and services

Web Tripping

BFH , Web Tripping, Bern, 2015-04-13 #LinkedData #BFH

Sarven's avatar Sarven Capadisli https://csarven.ca/#i @csarven

Credits