linear models

statistics, probabilistic inference, model construction

references

test remote jupyter kernel

x = "foo"
y = "bar"
println(x)
println(y)

examples

In most cases we do not need to modify the default :results output verbatim replace in the :header-args:. See Header arguments - Org Babel reference card.

julia tests

x = "foo"
y = "bar"
println(x)
println(y)

foo
bar

Since jupyter-julia supports sessions, we can make use of the variables defined in the previous block

print(join([x, " ", y]))

foo bar

load common libraries

Load libraries

@time begin
using Dates
using Turing
using CSV, DataFrames, Distributions
using MCMCChains, StatsFuns, StatsPlots
end
now()

 35.783676 seconds (79.42 M allocations: 4.116 GiB, 4.30% gc time)
2020-10-29T21:39:30.7

Print list of loaded modules

loaded_modules = Base.loaded_modules_array()
println(loaded_modules)
now()

models

testing

model definition

See load libraries for those used across examples.

@time begin
end
now()

Print current working directory

println(pwd())
now()

/Users/crs58/projects/notes/org
2020-10-17T11:24:44.226

Define the Turing.jl model


@time begin
# Define a simple Normal model with unknown mean and variance.
@model gdemo(x, y) = begin
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ Normal(m, sqrt(s))
    y ~ Normal(m, sqrt(s))
end
end
now()

  0.000082 seconds (85 allocations: 7.407 KiB)
2020-10-29T21:39:51.823

sampling

Run and profile the sampler

#  Run sampler, collect results
@time chn = sample(gdemo(1.5, 2), NUTS(0.65), 1000)
now()

┌ Info: Found initial step size
│   ϵ = 1.6
└ @ Turing.Inference /Users/crs58/.julia/packages/Turing/RzDvB/src/inference/hmc.jl:625
Sampling: 100%|█████████████████████████████████████████| Time: 0:00:00
 16.546298 seconds (42.02 M allocations: 2.128 GiB, 4.43% gc time)
2020-10-29T21:40:08.642

Print the result summary

# Summarise results (currently requires the master branch from MCMCChains)
display(chn)
# describe(chn; sections = :internals)
# show(stdout, "text/plain", chn)
now()

Chains MCMC chain (500×14×1 Array{Float64,3}):

Iterations        = 1:500
Thinning interval = 1
Chains            = 1
Samples per chain = 500
parameters        = m, s
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64

           m    1.2898    0.8406     0.0376    0.0795   138.8908    1.0138
           s    1.9378    1.6108     0.0720    0.1375   109.5077    1.0028

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           m   -0.1236    0.7457    1.2822    1.7125    2.9355
           s    0.6235    1.1054    1.5581    2.1717    5.7594

2020-10-29T21:40:39.455

plotting

statsplots

Plot the results

# Plot and save results
p = plot(chn)
savefig("data/linear_models/basic-example-plot.pdf")
savefig("data/linear_models/basic-example-plot.png")

arviz

Load additional libraries

using ArviZ, PyPlot, PyCall
now()

2020-10-29T21:40:50.04

The following may be useful if the Latin Modern fonts are not installed or setup to work with matplotlib.

mpl = pyimport("matplotlib")
mplfm = pyimport("matplotlib.font_manager")
mplfm.fontManager.addfont("~/Library/Fonts/lmsans10-regular.otf")
mplfm.fontManager.addfont("~/Library/Fonts/lmroman10-regular.otf")
mplfm.findfont("Latin Modern Sans")
mplfm._rebuild()
print(mpl.rcParams)

Make InferenceData object

idata = from_mcmcchains(chn, library = "Turing")

InferenceData with groups:
    > posterior
    > sample_stats

Print posterior and sample_stats from InferenceData object

print(idata.posterior)
print(idata.sample_stats)
now()

Dataset (xarray.Dataset)
Dimensions:  (chain: 1, draw: 500)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
Data variables:
    m        (chain, draw) float64 1.589 1.183 3.026 ... -0.02606 1.485 -0.3521
    s        (chain, draw) float64 1.985 1.348 5.464 1.353 ... 1.573 1.399 4.237
Attributes:
    created_at:         2020-10-18T18:52:46.016452
    arviz_version:      0.9.0
    inference_library:  TuringDataset (xarray.Dataset)
Dimensions:           (chain: 1, draw: 500)
Coordinates:
  * chain             (chain) int64 0
  * draw              (draw) int64 0 1 2 3 4 5 6 ... 493 494 495 496 497 498 499
Data variables:
    energy            (chain, draw) float64 5.718 5.12 8.651 ... 5.826 8.27
    energy_error      (chain, draw) float64 0.115 -0.09808 ... -0.2852 0.4234
    depth             (chain, draw) int64 2 2 2 2 1 2 2 2 2 ... 2 1 2 2 2 1 2 2
    diverging         (chain, draw) bool False False False ... False False False
    log_density       (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393
    max_energy_error  (chain, draw) float64 0.2966 -0.09808 ... -0.2852 0.4626
    nom_step_size     (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675
    is_accept         (chain, draw) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    tree_size         (chain, draw) int64 7 3 3 7 1 3 7 3 3 ... 3 1 3 3 3 1 7 3
    lp                (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393
    step_size         (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675
    mean_tree_accept  (chain, draw) float64 0.8756 1.0 0.5808 ... 0.985 0.6447
Attributes:
    created_at:         2020-10-18T18:52:46.263700
    arviz_version:      0.9.0
    inference_library:  Turing

2020-10-18T14:55:35.997

Plot

# Plot and save results
p_arviz_trace = plot_trace(idata)
PyPlot.savefig("data/linear_models/normal-model-arviz-trace.pdf")
PyPlot.savefig("data/linear_models/normal-model-arviz-trace.png")

template