1. Statistical Linked Open Data and Analysis

    ConFoo, Montréal, 2014-02-26

    #LinkedData #ConFoo

    Sarven's avatar Sarven Capadisli http://csarven.ca/#i @csarven

  2. Statistical Data

    Data Cube Life expectancy
    • Dimensions: Time, Location, Sex
    • Measure: values
    • Attributes: unit of measurement, accuracy
  3. Statistical Data on the Web (Characteristics)

    • Heterogeneous
    • Decentralized
    • Structured
    • High volume
    • Formats (e.g., CSV, Excel, PC-Axis, SDMX-ML, XML)

    Clean? Synchronised? Comparable? Provenance? Trustable? Analyses?

  4. Technology Context

    Figure of Semantic Web Technologies
  5. Linked Data Design Principles

    1. URIs as names for things
    2. HTTP URIs so that people can look up those names
    3. When someone looks up a URI, provide useful information, using RDF*, SPARQL
    4. Include links to other URIs for discovery
  6. A Linked Dataspace

    from Statistical Linked Dataspaces

  7. Statistical Linked Dataspaces (2010-2011)

  8. Galway City page on DataGovIE

    Screenshot of Galway City page on DataGovIE
  9. School Explorer

    School Explorer Pilot screenshot
  10. Statistical Linked Dataspaces (2012)

  11. Central government debt indicator chart view for some countries

    Screenshot of World Bank indicator GC.DOD.TOTL.GD.ZS for countries CA,US,DE,CH,IE in worldbank.270a.info/view

    See also: http://worldbank.270a.info/view?indicator=GC.DOD.TOTL.GD.ZS&country=CA,US,DE,CH,IE

  12. Statistical Linked Dataspaces (2013)

    Source format? SDMX-ML

  13. Statistical Linked Dataspaces (2014+)

  14. 270a Cloud (Statistical Linked Dataspaces)

  15. Interesting queries?

    • Number of people born in Bern before 1900
    • Inflation rate in Italy when the prime minister was ...
    • Development projects in low-middle income countries situated above the equator
  16. How about interesting analysis?

    • statistically significant analysis about GDP and mortality-rate
    • strong correlations
    • predicting or forecasting possible outcomes
    • Investigating the WHYs
  17. stats.270a.info

    Citizen-centric interfaces for statistical stuff.

    Intended for data journalists, researchers, non-developers!

    ... and Linked Data friendly.

  18. Analysis user-interface (Plot) 1/3

    http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2009/year:2009

  19. Analysis user-interface (Summary) 2/3

    http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2009/year:2009

  20. Oh yeah?

    Provenance user-interface

  21. Analysis user-interface (Provenance) 3/3

    http://stats.270a.info/provenance/fa698e46868fe348865678884e89ef84b0be6c64

  22. stats.270a.info Toolkit

    • Shiny server (node)
    • R (Shiny, SPARQL packages)
    • Jena Fuseki
    • Apache
    • Linked Data Pages
  23. So What?

    • Strengthening trust
    • Better data journalism?
    • Discovery of interesting correlations
    • Uncovering insights, making predictions, ... decisions
    • Bulk pre-analysis
    • Production of new statistical artefacts
  24. Consider the following

  25. Identifying things

    Now! That should clear up a few things around here. The Far Side
  26. Linked Statistical Artefacts

    • Dataset: http://worldbank.270a.info/dataset/world-bank-finances
    • Observation: http://ecb.270a.info/dataset/SEE/A/AT/WBR0/EXT/X/E/2011
    • Dimension: http://oecd.270a.info/property/TIME
    • Measure: http://ecb.270a.info/property/OBS_VALUE
    • Attribute: http://transparency.270a.info/classification/attribute/matching-percentiles
    • Concept: http://imf.270a.info/code/1.0/CL_AREA/CH
    • Code list: http://fao.270a.info/code/0.1/CL_UN_COUNTRY
    • Hierarchical code list: http://bfs.270a.info/code/1.0/HR_HGDE_HIST
    • Regression Analysis: http://stats.270a.info/analysis/worldbank:GC.DOD.TOTL.GD.ZS/transparency:CPI2009/year:2009

    Cool URIs? 1, 5, 100, 10000 years? Ha!

  27. What will be your artefacts?

  28. Consider the following

    • Artefacts, Artefacts, Artefacts
    • Citizen-centric interfaces
    • Provenance
    • Discoverability
    • Comparability
  29. Statistical Linked Open Data and Analysis

    Sarven's avatar Sarven Capadisli

    http://csarven.ca/#i

    @csarven

  30. Credits

  31. Credits