PharmaPy

Using PharmaPy to solve a Parameter estimation problem: reaction network¶

Problem description¶

The purpose of this case study is to test the parameter estimation capabilities developed in PharmaPy, using a set of reactions corresponding to the thermal isomerization of beta-pinene (A):

$$begin{aligned} A & xrightarrow{k_1} B \ A & xrightarrow{k_2} C \ C & xrightarrow{k_3} D \ C & xrightarrow{k_4} E \ E & xrightarrow{k_5} C end{aligned} $$

The isothermal reaction data is given by [reference]. First-order kinetics will be used to describe the reaction network for each reaction $i$:

$$ r_i = k_i C_i $$

where $C_i$ is the molar concentration of the reactant in reaction $i$.

Package loading¶

First, some general Python packages are imported

In [1]:

import numpy as np
import matplotlib.pyplot as plt

Then, PharmaPy packages are loaded. The requiered classes are the LiquidPhase to specify the initial conditions of the reactor contents, BatchReactor to create a reactor object, RxnKinetics to specify seed values for kinetic parameters, and SimExec to create an flowsheet instance, to which the reactor object is attached, and which also allows to create and set a parameter estimation instance.

In [2]:

from PharmaPy.SimExec import SimulationExec
from PharmaPy.Reactors import BatchReactor
from PharmaPy.Kinetics import RxnKinetics
from PharmaPy.Phases import LiquidPhase

Could not find cannot import name 'radau5' from 'assimulo.lib' (C:ProgramDataAnaconda3libsite-packagesassimulolib__init__.py)
Could not find cannot import name 'dopri5' from 'assimulo.lib' (C:ProgramDataAnaconda3libsite-packagesassimulolib__init__.py)
Could not find cannot import name 'rodas' from 'assimulo.lib' (C:ProgramDataAnaconda3libsite-packagesassimulolib__init__.py)
Could not find cannot import name 'odassl' from 'assimulo.lib' (C:ProgramDataAnaconda3libsite-packagesassimulolib__init__.py)
Could not find ODEPACK functions.
Could not find RADAR5
Could not find GLIMDA.

Kinetic data¶

Seeds for kinetic constants and related reaction information is provided at the creation of a RxnKinetics instance. In this case, the effect of temperature is not evaluated, thus a vector of zeros for activation energy (ea_seed) is passed to the instance. Dummy values for heat of reaction are passed, thus the corresponding energy balance will not have any significance.

Moreover, no reaction orders are specified in the kinetic instance. As a consequence, the platform will assume first order kinetics per default. Additionally, reaction orders will not be considered as part of the set of kinetic parameters "visible" to the optimizer. If that was the intention, then rxn orders must be provided using the argument params_f when creating the kinetic instance.

In [3]:

stoich_matrix = np.array([[-1, 1, 0, 0, 0],  # A -->[k_1] B
                          [-1, 0, 1, 0, 0],  # A -->[k_2] C
                          [0, 0, -1, 1, 0],  # C -->[k_3] D
                          [0, 0, -1, 0, 1],  # C -->[k_4] E
                          [0, 0, 1, 0, -1]])  # E -->[k_5] C

k_seed = [1e-6*60]*5
ea_seed = [0]*5

kineticInst = RxnKinetics(stoich_matrix, k_seed, ea_seed, reparam_center=True, tref_hrxn=293.14, delta_hrxn=-1000)

Initial load to the reactor¶

The initial load to the reactor is created as:

In [4]:

pure_path = '../data/process/compounds_pinene.json'
init_c = [1, 0, 0, 0, 0]  # 
liquidPhase = LiquidPhase(path_thermo=pure_path, temp=293.14, vol=0.01, concentr=init_c)

where pure_path is the path to a json file that contains the physical properties of the substances of the mixture, such as their formula, molecular weight, liquid density, heat capacity, etc. After this, a simulation instance is created, to which a unit operation object is attached directly:

In [5]:

sim = SimulationExec(pure_path)

species_rxn = list('ABCDE')
partic_species = species_rxn

sim.R01 = BatchReactor(name_species=species_rxn, partic_species=species_rxn)  # R01 is an arbitrary UO name

In the last cell, the reactor was created by specifying a list of strings with the species names, and also by indicating which species participate in the reaction. In this case, all species participate in the reaction (there is no inert compounds or solvents). Now, the kinetics and phase objects can be attached to the reactor:

In [6]:

sim.R01.KinInstance = kineticInst
sim.R01.Phases = (liquidPhase, )

Model testing¶

At this point, it is useful to check if the reactor runs with the specified conditions. For this purpose, the batch reactor model can be solved using its solve_unit method, for which runtime is required. A running time of 600s is tested as it is the maximum time of the reaction data. After the solution, a report by SUNDIALS is print by default, displaying important statistics about the numerical integration.

In [7]:

sim.R01.solve_unit(runtime=600)
sim.R01.plot_profiles(fig_size=(8, 3))

Final Run Statistics: --- 

 Number of steps                                 : 12
 Number of function evaluations                  : 17
 Number of Jacobian evaluations                  : 1
 Number of function eval. due to Jacobian eval.  : 0
 Number of error test failures                   : 0
 Number of nonlinear iterations                  : 14
 Number of nonlinear convergence failures        : 0

Solver options:

 Solver                   : CVode
 Linear multistep method  : BDF
 Nonlinear solver         : Newton
 Linear solver type       : DENSE
 Maximal order            : 5
 Tolerances (absolute)    : 1e-06
 Tolerances (relative)    : 1e-06

Simulation interval    : 0.0 - 600.0 seconds.
Elapsed simulation time: 0.003971100000001115 seconds.

Out[7]:

(<Figure size 576x216 with 2 Axes>,
 array([<matplotlib.axes._subplots.AxesSubplot object at 0x0000018E83664A90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E836D5F28>],
       dtype=object))

Parameter estimation¶

Now that the reactor is running, the parameter estimation problem can be created and populated with data. First, the experimental data is imported using numpy, where the first column is the elapsed time, and the rest are the time-dependent concentrations for each species.

In [8]:

exp_data = np.genfromtxt('../data/reactor/pinene_dataset.csv', delimiter=',', skip_header=1)

t_exp = exp_data[:, 0]
c_exp = exp_data[:, 1:] / 100

A quick inspection of the experimental data is often necessary and useful:

In [9]:

plt.plot(t_exp, c_exp, 'o', markerfacecolor='None')

Out[9]:

[<matplotlib.lines.Line2D at 0x18e848b5710>,
 <matplotlib.lines.Line2D at 0x18e848b5898>,
 <matplotlib.lines.Line2D at 0x18e848b59b0>,
 <matplotlib.lines.Line2D at 0x18e848b5b00>,
 <matplotlib.lines.Line2D at 0x18e848b5c50>]

It can be seen that the model prediction plotted before is far from the experimental data. Now that the experimental data is available, the sim object is used to set the parameter estimation problem. Per default, the kinetic parameter set is ${k_1, k_2, ldots, k_{n_{rxns}}, E_{a, 1}, E_{a, 2}, ldots, E_{a, n_{rxns}} }$. In this particular application, only $k's$ are going to be subject of optimization, thus, a list of booleans is passed to the SetParameterEstimation method as a map of this requirement:

In [10]:

opt_flags = [True]*5 + [False]*5
sim.SetParamEstimation(x_data=t_exp, y_data=c_exp, optimize_flag=opt_flags)

Finally, the parameter estimation is run, from which the estimated parameters, the inverse of the hessian matrix at the parameter estimates, and an info dictionary are retrieved. It is important to note that PharmaPy internally performs a logarithmic reparametrization of the optimization problem, which has been proven to be critical for performance and convergence issues.

In [11]:

optim_log, inv_hessian, info = sim.EstimateParams()
optim_params = np.exp(optim_log)

---------------------------------------------
eval    fun_val    ||step||   gradient  
---------------------------------------------
0       4.155e+00  ---        1.392e-01 
1       5.979e+00  2.800e+01  1.392e-01 
2       5.979e+00  2.663e+01  1.392e-01 
3       5.979e+00  2.479e+01  1.392e-01 
4       5.978e+00  1.693e+01  1.392e-01 
5       1.991e+00  2.900e+00  6.848e-01 
6       4.612e-01  2.252e+00  4.581e-01 
7       2.473e-01  7.810e-01  3.461e-02 
8       1.301e-01  3.169e+00  7.328e-02 
9       1.344e-02  2.355e+00  3.467e-02 
10      7.497e-03  2.284e+00  1.910e-02 
11      3.799e-03  8.371e-01  2.108e-03 
12      6.030e-03  2.602e+00  2.108e-03 
13      2.734e-03  1.640e+00  6.045e-03 
14      2.090e-03  7.625e-01  2.893e-03 
15      1.988e-03  7.284e-02  6.076e-05 
16      1.987e-03  1.434e-02  4.880e-06 
17      1.987e-03  2.286e-03  6.665e-07 
18      1.987e-03  3.547e-04  9.885e-08 
19      1.987e-03  5.447e-05  1.497e-08 
---------------------------------------------

Optimization time: 6.25e-01 s.

Model prediction using the converged and the initial values can be simultaneously displayed in a plot, where circles represent the experimental data and continuous lines the model prediction at the converged parameter values. Discontinuous lines describe the model prediction with the seed parameters.

For this purpose, the ParamInst object automatically created inside sim by PharmaPy is used.

In [12]:

sim.ParamInst.plot_data_model(fig_size=(8, 6), plot_initial=True)

Out[12]:

(<Figure size 576x432 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x18e848da0b8>)

Statistical analysis¶

As a final step, a statistical analysis of the parameter estimates is performed, using the capabilities built inside PharmaPy. First, a statistical object is created:

In [13]:

Stats = sim.CreateStatsObject(alpha=0.95)

Then, different method can be invoked to analyze statistics. For instance, confidence intervals can be obtained with:

In [14]:

confid_params, correl_mat = Stats.get_intervals()

---------------------------------------------------------
param              low            value           high      
---------------------------------------------------------
theta_1          -5.655          -5.639          -5.623     
theta_2          -6.364          -6.332          -6.301     
theta_3          -6.988          -6.702          -6.416     
theta_4          -4.266          -4.106          -3.947     
theta_5          -6.429          -6.033          -5.636     
---------------------------------------------------------

As mentioned before, the parameter estimation problem is solving for $ln(k_i)$ instead of the original $k_i's$. Another useful statistical analysis is a bootstrap analysis , which will return the parameter distributions when sampled randomly around their converged values. Then, the distributions are typically displayed in pairwise sampling plots, along with their linear or asymptotic 2D projections of their ellipsoidal confidence regions. Here, the analysis is performed with 500 random samples.

In [15]:

Stats.bootstrap_params(num_samples=500)
Stats.plot_pairwise(fig_size=(8, 8))

Bootstrap time: 107.77 s

Out[15]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000018E84DE2EB8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E84E4B9E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E84DC3748>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E84E3CCF8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E84E7D2E8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89DB7898>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89DECE48>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89E27470>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89E274A8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89E8AF98>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89EC9588>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89EFAB38>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89F37128>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89F696D8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89F9AC88>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000018E89FD9278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A008828>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A03BDD8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A0783C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A0A7978>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A0DDF28>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A115518>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A149AC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A1890B8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000018E8A1B8668>]],
      dtype=object)

In [ ]: