The purpose of this case study is to test the parameter estimation capabilities developed in PharmaPy
, using a set of reactions corresponding to the thermal isomerization of beta-pinene (A):
The isothermal reaction data is given by [reference]. First-order kinetics will be used to describe the reaction network for each reaction $i$:
$$ r_i = k_i C_i $$where $C_i$ is the molar concentration of the reactant in reaction $i$.
First, some general Python packages are imported
import numpy as np
import matplotlib.pyplot as plt
Then, PharmaPy
packages are loaded. The requiered classes are the LiquidPhase
to specify the initial conditions of the reactor contents, BatchReactor
to create a reactor object, RxnKinetics
to specify seed values for kinetic parameters, and SimExec
to create an flowsheet instance, to which the reactor object is attached, and which also allows to create and set a parameter estimation instance.
from PharmaPy.SimExec import SimulationExec
from PharmaPy.Reactors import BatchReactor
from PharmaPy.Kinetics import RxnKinetics
from PharmaPy.Phases import LiquidPhase
Seeds for kinetic constants and related reaction information is provided at the creation of a RxnKinetics
instance. In this case, the effect of temperature is not evaluated, thus a vector of zeros for activation energy (ea_seed
) is passed to the instance. Dummy values for heat of reaction are passed, thus the corresponding energy balance will not have any significance.
Moreover, no reaction orders are specified in the kinetic instance. As a consequence, the platform will assume first order kinetics per default. Additionally, reaction orders will not be considered as part of the set of kinetic parameters "visible" to the optimizer. If that was the intention, then rxn orders must be provided using the argument params_f
when creating the kinetic instance.
stoich_matrix = np.array([[-1, 1, 0, 0, 0], # A -->[k_1] B
[-1, 0, 1, 0, 0], # A -->[k_2] C
[0, 0, -1, 1, 0], # C -->[k_3] D
[0, 0, -1, 0, 1], # C -->[k_4] E
[0, 0, 1, 0, -1]]) # E -->[k_5] C
k_seed = [1e-6*60]*5
ea_seed = [0]*5
kineticInst = RxnKinetics(stoich_matrix, k_seed, ea_seed, reparam_center=True, tref_hrxn=293.14, delta_hrxn=-1000)
The initial load to the reactor is created as:
pure_path = '../data/process/compounds_pinene.json'
init_c = [1, 0, 0, 0, 0] #
liquidPhase = LiquidPhase(path_thermo=pure_path, temp=293.14, vol=0.01, concentr=init_c)
where pure_path
is the path to a json
file that contains the physical properties of the substances of the mixture, such as their formula, molecular weight, liquid density, heat capacity, etc. After this, a simulation instance is created, to which a unit operation object is attached directly:
sim = SimulationExec(pure_path)
species_rxn = list('ABCDE')
partic_species = species_rxn
sim.R01 = BatchReactor(name_species=species_rxn, partic_species=species_rxn) # R01 is an arbitrary UO name
In the last cell, the reactor was created by specifying a list of strings with the species names, and also by indicating which species participate in the reaction. In this case, all species participate in the reaction (there is no inert compounds or solvents). Now, the kinetics and phase objects can be attached to the reactor:
sim.R01.KinInstance = kineticInst
sim.R01.Phases = (liquidPhase, )
At this point, it is useful to check if the reactor runs with the specified conditions. For this purpose, the batch reactor model can be solved using its solve_unit
method, for which runtime is required. A running time of 600s is tested as it is the maximum time of the reaction data. After the solution, a report by SUNDIALS
is print by default, displaying important statistics about the numerical integration.
sim.R01.solve_unit(runtime=600)
sim.R01.plot_profiles(fig_size=(8, 3))
Now that the reactor is running, the parameter estimation problem can be created and populated with data. First, the experimental data is imported using numpy
, where the first column is the elapsed time, and the rest are the time-dependent concentrations for each species.
exp_data = np.genfromtxt('../data/reactor/pinene_dataset.csv', delimiter=',', skip_header=1)
t_exp = exp_data[:, 0]
c_exp = exp_data[:, 1:] / 100
A quick inspection of the experimental data is often necessary and useful:
plt.plot(t_exp, c_exp, 'o', markerfacecolor='None')
It can be seen that the model prediction plotted before is far from the experimental data. Now that the experimental data is available, the sim
object is used to set the parameter estimation problem. Per default, the kinetic parameter set is ${k_1, k_2, ldots, k_{n_{rxns}}, E_{a, 1}, E_{a, 2}, ldots, E_{a, n_{rxns}} }$. In this particular application, only $k's$ are going to be subject of optimization, thus, a list of booleans is passed to the SetParameterEstimation
method as a map of this requirement:
opt_flags = [True]*5 + [False]*5
sim.SetParamEstimation(x_data=t_exp, y_data=c_exp, optimize_flag=opt_flags)
Finally, the parameter estimation is run, from which the estimated parameters, the inverse of the hessian matrix at the parameter estimates, and an info dictionary are retrieved. It is important to note that PharmaPy
internally performs a logarithmic reparametrization of the optimization problem, which has been proven to be critical for performance and convergence issues.
optim_log, inv_hessian, info = sim.EstimateParams()
optim_params = np.exp(optim_log)
Model prediction using the converged and the initial values can be simultaneously displayed in a plot, where circles represent the experimental data and continuous lines the model prediction at the converged parameter values. Discontinuous lines describe the model prediction with the seed parameters.
For this purpose, the ParamInst
object automatically created inside sim
by PharmaPy
is used.
sim.ParamInst.plot_data_model(fig_size=(8, 6), plot_initial=True)
As a final step, a statistical analysis of the parameter estimates is performed, using the capabilities built inside PharmaPy
. First, a statistical object is created:
Stats = sim.CreateStatsObject(alpha=0.95)
Then, different method can be invoked to analyze statistics. For instance, confidence intervals can be obtained with:
confid_params, correl_mat = Stats.get_intervals()
As mentioned before, the parameter estimation problem is solving for $ln(k_i)$ instead of the original $k_i's$. Another useful statistical analysis is a bootstrap analysis , which will return the parameter distributions when sampled randomly around their converged values. Then, the distributions are typically displayed in pairwise sampling plots, along with their linear or asymptotic 2D projections of their ellipsoidal confidence regions. Here, the analysis is performed with 500 random samples.
Stats.bootstrap_params(num_samples=500)
Stats.plot_pairwise(fig_size=(8, 8))