1. Linked Statistical Dataspaces and Analysis

    CEDAR minisymposium, eHumanities, Amsterdam, 2014-03-27

    #LinkedData

    Sarven's avatar Sarven Capadisli http://csarven.ca/#i @csarven

  2. Statistical Data

    Hypercube
    2004-2006 2005-2007 2006-2008
    MaleFemaleMaleFemaleMaleFemale
    Newport76.780.777.180.977.081.5
    Cardiff78.783.378.683.778.783.4
    Monmouthshire76.681.376.581.576.681.7
    Merthyr Tydfil75.579.175.579.474.979.6
    • Dimensions: Time, Location, Sex
    • Measure: values
    • Attributes: unit of measurement, accuracy
  3. Statistical Data on the Web (Characteristics)

    • Decentralized
    • Heterogeneous
    • Structured
    • High volume
    • Formats (e.g., CSV, Excel, PC-Axis, SDMX-ML, XML)
    • Distribution and Access

    Clean? Synchronised? Comparable? Provenance? Trustable? Analyses?

  4. A Linked Dataspace

    from Statistical Linked Dataspaces

  5. Statistical Linked Dataspaces

    .. from Government, IGO, NGO data

  6. Statistical Linked Dataspaces (2010-2011)

  7. Galway City page on DataGovIE

    Screenshot of Galway City page on DataGovIE
  8. School Explorer

    School Explorer Pilot screenshot
  9. Statistical Linked Dataspaces (2012)

  10. Central government debt indicator chart view for some countries

    Screenshot of World Bank indicator GC.DOD.TOTL.GD.ZS for countries CA,US,DE,CH,IE in worldbank.270a.info/view

    See also: http://worldbank.270a.info/view?indicator=GC.DOD.TOTL.GD.ZS&country=CA,US,DE,CH,IE

  11. Statistical Linked Dataspaces (2013)

    Source format? SDMX-ML

  12. Statistical Linked Dataspaces (2014+)

  13. External adoption of Linked SDMX

    • Swiss Federal Statistics Office / Bern University of Applied Sciences (pilot)
    • Italian National Institute of Statistics / SpazioDati (pilot)
    • LOD2 Statistical Workbench
    • ?
  14. Interlinking

  15. Statistical Linked Data vocabularies

    • RDF Data Cube: Data structure definitions, code lists, datasets, ..
    • SKOS: Code lists, and concepts can be reused
    • XKOS: Hierarchical concept schemes
    • VoID: vocabulary for dataset metadata
    • PROV-O: provenance
    • British reference periods, DC Terms, FOAF, ..
  16. Provenance

  17. Provenance

    PROV-O Key Concepts

  18. 270a Cloud (Statistical Linked Dataspaces)

  19. Triples count

  20. Some numbers

    • Over 1 billion triples in the 270a Cloud
    • Regression analysis: over 100 billion (estimate)
    • How about other analysis?
  21. Interesting queries?

    • Number of people born in Bern before 1900
    • Inflation rate in Italy when the prime minister was ...
    • Development projects in low-middle income countries situated above the equator
  22. How about interesting analysis?

    • statistically significant analysis about GDP and mortality-rate
    • strong correlations
    • predicting or forecasting
    • Investigating the WHYs
  23. stats.270a.info

    Citizen-centric interfaces for statistical stuff.

    Intended for data journalists, researchers, non-developers!

    ... and Linked Data friendly.

  24. Analysis user-interface (Plot) 1/3

    http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2009/year:2009

  25. Analysis user-interface (Summary) 2/3

    http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2009/year:2009

  26. Oh yeah?

    Provenance user-interface

  27. Analysis user-interface (Provenance) 3/3

    http://stats.270a.info/provenance/fa698e46868fe348865678884e89ef84b0be6c64

  28. Adding some context?

  29. stats.270a.info Toolkit

    • Shiny server (node)
    • R (Shiny, SPARQL packages)
    • Jena Fuseki
    • Apache
    • Linked Data Pages
  30. Challenges and Goals

    • Federated querying statistical data
    • UI and URI design
    • Creating statistical (cultural) artefacts
    • Finding purpose
  31. Lets take another step back

  32. Identifying things

    Now! That should clear up a few things around here. The Far Side
  33. Linked Statistical (Cultural) Artefacts

    • Dataset: http://worldbank.270a.info/dataset/world-bank-finances
    • Observation: http://ecb.270a.info/dataset/SEE/A/AT/WBR0/EXT/X/E/2011
    • Dimension: http://oecd.270a.info/dimension/1.0/TIME
    • Measure: http://ecb.270a.info/measure/1.0/OBS_VALUE
    • Attribute: http://transparency.270a.info/classification/attribute/matching-percentiles
    • Concept: http://imf.270a.info/concept/1.0/PGI/REF_AREA
    • Code list: http://fao.270a.info/code/0.1/CL_UN_COUNTRY
    • Hierarchical code list: http://bfs.270a.info/code/1.0/HR_HGDE_HIST
    • Regression Analysis: http://stats.270a.info/analysis/worldbank:GC.DOD.TOTL.GD.ZS/transparency:CPI2009/year:2009

    Cool URIs? 1, 5, 100, 10000 years? Ha!

  34. Linked Statistical Dataspaces and Analysis

    Sarven's avatar Sarven Capadisli

    http://csarven.ca/#i

    @csarven

  35. Credits

  36. Credits