notes

 (

index

)

linear models

statistics, probabilistic inference, model construction

references

test remote jupyter kernel

x = "foo"
y = "bar"
println(x)
println(y)

examples

In most cases we do not need to modify the default :results output verbatim replace in the :header-args:. See Header arguments - Org Babel reference card.

julia tests

x = "foo"
y = "bar"
println(x)
println(y)
foo
bar

Since jupyter-julia supports sessions, we can make use of the variables defined in the previous block

print(join([x, " ", y]))
foo bar

load common libraries

Load libraries

@time begin
using Dates
using Turing
using CSV, DataFrames, Distributions
using MCMCChains, StatsFuns, StatsPlots
end
now()
 35.783676 seconds (79.42 M allocations: 4.116 GiB, 4.30% gc time)
2020-10-29T21:39:30.7

Print list of loaded modules

loaded_modules = Base.loaded_modules_array()
println(loaded_modules)
now()

models

  • testing

    • model definition

      See load libraries for those used across examples.

      @time begin
      end
      now()
      

      Print current working directory

      println(pwd())
      now()
      
      /Users/crs58/projects/notes/org
      2020-10-17T11:24:44.226
      

      Define the Turing.jl model

      
      @time begin
      # Define a simple Normal model with unknown mean and variance.
      @model gdemo(x, y) = begin
          s ~ InverseGamma(2, 3)
          m ~ Normal(0, sqrt(s))
          x ~ Normal(m, sqrt(s))
          y ~ Normal(m, sqrt(s))
      end
      end
      now()
      
      
        0.000082 seconds (85 allocations: 7.407 KiB)
      2020-10-29T21:39:51.823
      
    • sampling

      Run and profile the sampler

      #  Run sampler, collect results
      @time chn = sample(gdemo(1.5, 2), NUTS(0.65), 1000)
      now()
      
      ┌ Info: Found initial step size
      │   ϵ = 1.6
      └ @ Turing.Inference /Users/crs58/.julia/packages/Turing/RzDvB/src/inference/hmc.jl:625
      Sampling: 100%|█████████████████████████████████████████| Time: 0:00:00
       16.546298 seconds (42.02 M allocations: 2.128 GiB, 4.43% gc time)
      2020-10-29T21:40:08.642
      

      Print the result summary

      # Summarise results (currently requires the master branch from MCMCChains)
      display(chn)
      # describe(chn; sections = :internals)
      # show(stdout, "text/plain", chn)
      now()
      
      Chains MCMC chain (500×14×1 Array{Float64,3}):
      
      Iterations        = 1:500
      Thinning interval = 1
      Chains            = 1
      Samples per chain = 500
      parameters        = m, s
      internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
      
      Summary Statistics
        parameters      mean       std   naive_se      mcse        ess      rhat
            Symbol   Float64   Float64    Float64   Float64    Float64   Float64
      
                 m    1.2898    0.8406     0.0376    0.0795   138.8908    1.0138
                 s    1.9378    1.6108     0.0720    0.1375   109.5077    1.0028
      
      Quantiles
        parameters      2.5%     25.0%     50.0%     75.0%     97.5%
            Symbol   Float64   Float64   Float64   Float64   Float64
      
                 m   -0.1236    0.7457    1.2822    1.7125    2.9355
                 s    0.6235    1.1054    1.5581    2.1717    5.7594
      
      2020-10-29T21:40:39.455
      
    • plotting

      • statsplots

        Plot the results

        # Plot and save results
        p = plot(chn)
        savefig("data/linear_models/basic-example-plot.pdf")
        savefig("data/linear_models/basic-example-plot.png")
        
      • arviz

        Load additional libraries

        using ArviZ, PyPlot, PyCall
        now()
        
        2020-10-29T21:40:50.04
        

        The following may be useful if the Latin Modern fonts are not installed or setup to work with matplotlib.

        mpl = pyimport("matplotlib")
        mplfm = pyimport("matplotlib.font_manager")
        mplfm.fontManager.addfont("~/Library/Fonts/lmsans10-regular.otf")
        mplfm.fontManager.addfont("~/Library/Fonts/lmroman10-regular.otf")
        mplfm.findfont("Latin Modern Sans")
        mplfm._rebuild()
        print(mpl.rcParams)
        

        Make InferenceData object

        idata = from_mcmcchains(chn, library = "Turing")
        
        InferenceData with groups:
            > posterior
            > sample_stats
        

        Print posterior and sample_stats from InferenceData object

        print(idata.posterior)
        print(idata.sample_stats)
        now()
        
        Dataset (xarray.Dataset)
        Dimensions:  (chain: 1, draw: 500)
        Coordinates:
          * chain    (chain) int64 0
          * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
        Data variables:
            m        (chain, draw) float64 1.589 1.183 3.026 ... -0.02606 1.485 -0.3521
            s        (chain, draw) float64 1.985 1.348 5.464 1.353 ... 1.573 1.399 4.237
        Attributes:
            created_at:         2020-10-18T18:52:46.016452
            arviz_version:      0.9.0
            inference_library:  TuringDataset (xarray.Dataset)
        Dimensions:           (chain: 1, draw: 500)
        Coordinates:
          * chain             (chain) int64 0
          * draw              (draw) int64 0 1 2 3 4 5 6 ... 493 494 495 496 497 498 499
        Data variables:
            energy            (chain, draw) float64 5.718 5.12 8.651 ... 5.826 8.27
            energy_error      (chain, draw) float64 0.115 -0.09808 ... -0.2852 0.4234
            depth             (chain, draw) int64 2 2 2 2 1 2 2 2 2 ... 2 1 2 2 2 1 2 2
            diverging         (chain, draw) bool False False False ... False False False
            log_density       (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393
            max_energy_error  (chain, draw) float64 0.2966 -0.09808 ... -0.2852 0.4626
            nom_step_size     (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675
            is_accept         (chain, draw) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
            tree_size         (chain, draw) int64 7 3 3 7 1 3 7 3 3 ... 3 1 3 3 3 1 7 3
            lp                (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393
            step_size         (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675
            mean_tree_accept  (chain, draw) float64 0.8756 1.0 0.5808 ... 0.985 0.6447
        Attributes:
            created_at:         2020-10-18T18:52:46.263700
            arviz_version:      0.9.0
            inference_library:  Turing
        
        2020-10-18T14:55:35.997
        

        Plot

        # Plot and save results
        p_arviz_trace = plot_trace(idata)
        PyPlot.savefig("data/linear_models/normal-model-arviz-trace.pdf")
        PyPlot.savefig("data/linear_models/normal-model-arviz-trace.png")
        
  • template

    • model definition

      See load libraries for those used across examples.

      Define the Turing.jl model

      
      @time begin
      # Define a simple Normal model with unknown mean and variance.
      @model gdemo(x, y) = begin
          s ~ InverseGamma(2, 3)
          m ~ Normal(0, sqrt(s))
          x ~ Normal(m, sqrt(s))
          y ~ Normal(m, sqrt(s))
      end
      end
      now()
      
      
        0.000082 seconds (85 allocations: 7.407 KiB)
      2020-10-29T21:39:51.823
      
    • sampling

      Run and profile the sampler

      #  Run sampler, collect results
      @time chn = sample(gdemo(1.5, 2), NUTS(0.65), 1000)
      now()
      
      ┌ Info: Found initial step size
      │   ϵ = 1.6
      └ @ Turing.Inference /Users/crs58/.julia/packages/Turing/RzDvB/src/inference/hmc.jl:625
      Sampling: 100%|█████████████████████████████████████████| Time: 0:00:00
       16.546298 seconds (42.02 M allocations: 2.128 GiB, 4.43% gc time)
      2020-10-29T21:40:08.642
      

      Print the result summary

      # Summarise results (currently requires the master branch from MCMCChains)
      display(chn)
      # describe(chn; sections = :internals)
      # show(stdout, "text/plain", chn)
      now()
      
      Chains MCMC chain (500×14×1 Array{Float64,3}):
      
      Iterations        = 1:500
      Thinning interval = 1
      Chains            = 1
      Samples per chain = 500
      parameters        = m, s
      internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
      
      Summary Statistics
        parameters      mean       std   naive_se      mcse        ess      rhat
            Symbol   Float64   Float64    Float64   Float64    Float64   Float64
      
                 m    1.2898    0.8406     0.0376    0.0795   138.8908    1.0138
                 s    1.9378    1.6108     0.0720    0.1375   109.5077    1.0028
      
      Quantiles
        parameters      2.5%     25.0%     50.0%     75.0%     97.5%
            Symbol   Float64   Float64   Float64   Float64   Float64
      
                 m   -0.1236    0.7457    1.2822    1.7125    2.9355
                 s    0.6235    1.1054    1.5581    2.1717    5.7594
      
      2020-10-29T21:40:39.455
      
    • plotting

      Load additional libraries

      using ArviZ, PyPlot, PyCall
      now()
      
      2020-10-29T21:40:50.04
      

      Make InferenceData object

      idata = from_mcmcchains(chn, library = "Turing")
      
      InferenceData with groups:
          > posterior
          > sample_stats
      

      Print posterior and sample_stats from InferenceData object

      print(idata.posterior)
      print(idata.sample_stats)
      now()
      
      Dataset (xarray.Dataset)
      Dimensions:  (chain: 1, draw: 500)
      Coordinates:
        * chain    (chain) int64 0
        * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
      Data variables:
          m        (chain, draw) float64 1.589 1.183 3.026 ... -0.02606 1.485 -0.3521
          s        (chain, draw) float64 1.985 1.348 5.464 1.353 ... 1.573 1.399 4.237
      Attributes:
          created_at:         2020-10-18T18:52:46.016452
          arviz_version:      0.9.0
          inference_library:  TuringDataset (xarray.Dataset)
      Dimensions:           (chain: 1, draw: 500)
      Coordinates:
        * chain             (chain) int64 0
        * draw              (draw) int64 0 1 2 3 4 5 6 ... 493 494 495 496 497 498 499
      Data variables:
          energy            (chain, draw) float64 5.718 5.12 8.651 ... 5.826 8.27
          energy_error      (chain, draw) float64 0.115 -0.09808 ... -0.2852 0.4234
          depth             (chain, draw) int64 2 2 2 2 1 2 2 2 2 ... 2 1 2 2 2 1 2 2
          diverging         (chain, draw) bool False False False ... False False False
          log_density       (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393
          max_energy_error  (chain, draw) float64 0.2966 -0.09808 ... -0.2852 0.4626
          nom_step_size     (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675
          is_accept         (chain, draw) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
          tree_size         (chain, draw) int64 7 3 3 7 1 3 7 3 3 ... 3 1 3 3 3 1 7 3
          lp                (chain, draw) float64 -5.151 -4.634 -8.2 ... -4.762 -7.393
          step_size         (chain, draw) float64 0.7675 0.7675 ... 0.7675 0.7675
          mean_tree_accept  (chain, draw) float64 0.8756 1.0 0.5808 ... 0.985 0.6447
      Attributes:
          created_at:         2020-10-18T18:52:46.263700
          arviz_version:      0.9.0
          inference_library:  Turing
      
      2020-10-18T14:55:35.997
      

      Plot

      # Plot and save results
      p_arviz_trace = plot_trace(idata)
      PyPlot.savefig("data/linear_models/normal-model-arviz-trace.pdf")
      PyPlot.savefig("data/linear_models/normal-model-arviz-trace.png")
      

Links to this note