linear models
statistics, probabilistic inference, model construction
references
- https://en.wikipedia.org/wiki/Generalized%5Flinear%5Fmodel#Link%5Ffunction
- Gelman2006,Gelman2014,McElreath2019,Barber2012,Bishop2007
test remote jupyter kernel
x = "foo"
y = "bar"
println(x)
println(y)
examples
In most cases we do not need to modify the default :results output verbatim replace in the :header-args:. See Header arguments - Org Babel reference card.
julia tests
x = "foo"
y = "bar"
println(x)
println(y)
foo
bar
Since jupyter-julia supports sessions, we can make use of the variables defined in the previous block
print(join([x, " ", y]))
foo bar
load common libraries
Load libraries
@time begin
using Dates
using Turing
using CSV, DataFrames, Distributions
using MCMCChains, StatsFuns, StatsPlots
end
now()
35.783676 seconds (79.42 M allocations: 4.116 GiB, 4.30% gc time)
2020-10-29T21:39:30.7
Print list of loaded modules
loaded_modules = Base.loaded_modules_array()
println(loaded_modules)
now()
models
-
testing
-
model definition
See load libraries for those used across examples.
@time begin end now()Print current working directory
println(pwd()) now()/Users/crs58/projects/notes/org 2020-10-17T11:24:44.226Define the
Turing.jlmodel@time begin # Define a simple Normal model with unknown mean and variance. @model gdemo(x, y) = begin s ~ InverseGamma(2, 3) m ~ Normal(0, sqrt(s)) x ~ Normal(m, sqrt(s)) y ~ Normal(m, sqrt(s)) end end now()0.000082 seconds (85 allocations: 7.407 KiB) 2020-10-29T21:39:51.823
-
sampling
Run and profile the sampler
# Run sampler, collect results @time chn = sample(gdemo(1.5, 2), NUTS(0.65), 1000) now()┌ Info: Found initial step size │ ϵ = 1.6 └ @ Turing.Inference /Users/crs58/.julia/packages/Turing/RzDvB/src/inference/hmc.jl:625 Sampling: 100%|█████████████████████████████████████████| Time: 0:00:00 16.546298 seconds (42.02 M allocations: 2.128 GiB, 4.43% gc time) 2020-10-29T21:40:08.642Print the result summary
# Summarise results (currently requires the master branch from MCMCChains) display(chn) # describe(chn; sections = :internals) # show(stdout, "text/plain", chn) now()Chains MCMC chain (500×14×1 Array{Float64,3}): Iterations = 1:500 Thinning interval = 1 Chains = 1 Samples per chain = 500 parameters = m, s internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth Summary Statistics parameters mean std naive_se mcse ess rhat Symbol Float64 Float64 Float64 Float64 Float64 Float64 m 1.2898 0.8406 0.0376 0.0795 138.8908 1.0138 s 1.9378 1.6108 0.0720 0.1375 109.5077 1.0028 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 m -0.1236 0.7457 1.2822 1.7125 2.9355 s 0.6235 1.1054 1.5581 2.1717 5.75942020-10-29T21:40:39.455
-
plotting
-
statsplots
Plot the results
# Plot and save results p = plot(chn) savefig("data/linear_models/basic-example-plot.pdf") savefig("data/linear_models/basic-example-plot.png")
-
arviz
Load additional libraries
using ArviZ, PyPlot, PyCall now()2020-10-29T21:40:50.04The following may be useful if the
Latin Modernfonts are not installed or setup to work withmatplotlib.mpl = pyimport("matplotlib") mplfm = pyimport("matplotlib.font_manager") mplfm.fontManager.addfont("~/Library/Fonts/lmsans10-regular.otf") mplfm.fontManager.addfont("~/Library/Fonts/lmroman10-regular.otf") mplfm.findfont("Latin Modern Sans") mplfm._rebuild() print(mpl.rcParams)Make
InferenceDataobjectidata = from_mcmcchains(chn, library = "Turing")InferenceData with groups: > posterior > sample_statsPrint posterior and
sample_statsfromInferenceDataobjectprint(idata.posterior) print(idata.sample_stats) now()Dataset (xarray.Dataset) Dimensions: (chain: 1, draw: 500) Coordinates: * chain (chain) int64 0 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499 Data variables: m (chain, draw) float64 1.589 1.183 3.026 ... -0.02606 1.485 -0.3521 s (chain, draw) float64 1.985 1.348 5.464 1.353 ... 1.573 1.399 4.237 Attributes: created_at: 2020-10-18T18:52:46.016452 arviz_version: 0.9.0 inference_library: TuringDataset (xarray.Dataset) Dimensions: (chain: 1, draw: 500) Coordinates: * chain (chain) int64 0 * draw (draw) int64 0 1 2 3 4 5 6 ... 493 494 495 496 497 498 499 Data variables: energy (chain, draw) float64 5.718 5.12 8.651 ... 5.826 8.27 energy_error (chain, draw) float64 0.115 -0.09808 ... -0.2852 0.4234 depth (chain, draw) int64 2 2 2 2 1 2 2 2 2 ... 2 1 2 2 2 1 2 2 diverging (chain, draw) bool False False False ... False False False log_density (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393 max_energy_error (chain, draw) float64 0.2966 -0.09808 ... -0.2852 0.4626 nom_step_size (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675 is_accept (chain, draw) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 tree_size (chain, draw) int64 7 3 3 7 1 3 7 3 3 ... 3 1 3 3 3 1 7 3 lp (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393 step_size (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675 mean_tree_accept (chain, draw) float64 0.8756 1.0 0.5808 ... 0.985 0.6447 Attributes: created_at: 2020-10-18T18:52:46.263700 arviz_version: 0.9.0 inference_library: Turing2020-10-18T14:55:35.997Plot
# Plot and save results p_arviz_trace = plot_trace(idata) PyPlot.savefig("data/linear_models/normal-model-arviz-trace.pdf") PyPlot.savefig("data/linear_models/normal-model-arviz-trace.png")
-
-
-
template
-
model definition
See load libraries for those used across examples.
Define the
Turing.jlmodel@time begin # Define a simple Normal model with unknown mean and variance. @model gdemo(x, y) = begin s ~ InverseGamma(2, 3) m ~ Normal(0, sqrt(s)) x ~ Normal(m, sqrt(s)) y ~ Normal(m, sqrt(s)) end end now()0.000082 seconds (85 allocations: 7.407 KiB) 2020-10-29T21:39:51.823
-
sampling
Run and profile the sampler
# Run sampler, collect results @time chn = sample(gdemo(1.5, 2), NUTS(0.65), 1000) now()┌ Info: Found initial step size │ ϵ = 1.6 └ @ Turing.Inference /Users/crs58/.julia/packages/Turing/RzDvB/src/inference/hmc.jl:625 Sampling: 100%|█████████████████████████████████████████| Time: 0:00:00 16.546298 seconds (42.02 M allocations: 2.128 GiB, 4.43% gc time) 2020-10-29T21:40:08.642Print the result summary
# Summarise results (currently requires the master branch from MCMCChains) display(chn) # describe(chn; sections = :internals) # show(stdout, "text/plain", chn) now()Chains MCMC chain (500×14×1 Array{Float64,3}): Iterations = 1:500 Thinning interval = 1 Chains = 1 Samples per chain = 500 parameters = m, s internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth Summary Statistics parameters mean std naive_se mcse ess rhat Symbol Float64 Float64 Float64 Float64 Float64 Float64 m 1.2898 0.8406 0.0376 0.0795 138.8908 1.0138 s 1.9378 1.6108 0.0720 0.1375 109.5077 1.0028 Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 m -0.1236 0.7457 1.2822 1.7125 2.9355 s 0.6235 1.1054 1.5581 2.1717 5.75942020-10-29T21:40:39.455
-
plotting
Load additional libraries
using ArviZ, PyPlot, PyCall now()2020-10-29T21:40:50.04Make
InferenceDataobjectidata = from_mcmcchains(chn, library = "Turing")InferenceData with groups: > posterior > sample_statsPrint posterior and
sample_statsfromInferenceDataobjectprint(idata.posterior) print(idata.sample_stats) now()Dataset (xarray.Dataset) Dimensions: (chain: 1, draw: 500) Coordinates: * chain (chain) int64 0 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499 Data variables: m (chain, draw) float64 1.589 1.183 3.026 ... -0.02606 1.485 -0.3521 s (chain, draw) float64 1.985 1.348 5.464 1.353 ... 1.573 1.399 4.237 Attributes: created_at: 2020-10-18T18:52:46.016452 arviz_version: 0.9.0 inference_library: TuringDataset (xarray.Dataset) Dimensions: (chain: 1, draw: 500) Coordinates: * chain (chain) int64 0 * draw (draw) int64 0 1 2 3 4 5 6 ... 493 494 495 496 497 498 499 Data variables: energy (chain, draw) float64 5.718 5.12 8.651 ... 5.826 8.27 energy_error (chain, draw) float64 0.115 -0.09808 ... -0.2852 0.4234 depth (chain, draw) int64 2 2 2 2 1 2 2 2 2 ... 2 1 2 2 2 1 2 2 diverging (chain, draw) bool False False False ... False False False log_density (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393 max_energy_error (chain, draw) float64 0.2966 -0.09808 ... -0.2852 0.4626 nom_step_size (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675 is_accept (chain, draw) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 tree_size (chain, draw) int64 7 3 3 7 1 3 7 3 3 ... 3 1 3 3 3 1 7 3 lp (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393 step_size (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675 mean_tree_accept (chain, draw) float64 0.8756 1.0 0.5808 ... 0.985 0.6447 Attributes: created_at: 2020-10-18T18:52:46.263700 arviz_version: 0.9.0 inference_library: Turing2020-10-18T14:55:35.997Plot
# Plot and save results p_arviz_trace = plot_trace(idata) PyPlot.savefig("data/linear_models/normal-model-arviz-trace.pdf") PyPlot.savefig("data/linear_models/normal-model-arviz-trace.png")
-