This page contains basic models, which are applied to the modeling of the learning process and to decision-making in the classroom.
Simple two-dimensional optimization
The focus in this section is on the decision making of the learner. It has been adapted from Fourer et al. (1990).
A learner must decide how to allocate next week’s studying time. She has to assimilate two types of competencies: knowledge and application. The first skill (knowledge) requires that the learner knows the terminology, theory, and techniques in a given subject. The second skill (application) requires the learner to apply the knowledge to particular problems or issues. Learners have to recognise which terms or concepts apply to a particular problem or situation. For achieving these skills, the learner must memorize the knowledge and make exercises, in which the knowledge is applied.
The learner is constrained to a limited number of pages that she can study in an hour. This means that the representative learner cannot memorize more than 20 pages per hour. Similarly, she cannot study more than 14 pages of exerises per hour.
The two skills will affect the final mark in different ways. We assume that the profitability obtained per page memorized and applied is respectively given by: 25 marks and 30 marks.
The following weekly studying amounts are the most that can be achieved in the light of the time constraints:
-
Memorization of knowledge: 600 pages
-
Exercises: 400 pages
The question that the learner faces is: If 40 hours of studying time are available this week, how many pages of memorization and how many pages of exercises should be studied to bring the greatest total profit or outcome in terms of final marks?
While we are given numeric values for number of pages that can be studied and marks that can be achieved per page, the pages of memorization and of exercises to be produced are as yet unknown. These quantities are the decision variables whose values we must determine so as to maximize profits.
The purpose of the linear program is to specify the profits and production limitations as explicit formulas involving the variables, so that the desired values of the variables can be determined systematically.
We will write \(X_M\) for the number of pages of memorization to be studied, and \(X_E\) for pages of exercises. The total hours to study all these pages is then given by:
\begin{equation} (\mbox{hours to study one page of memorization}) \times X_M + (\mbox{hours to study one page of exercises}) \times X_E \nonumber \end{equation}
This number cannot exceed the 40 hours available. Since hours per page is the reciprocal of the pages per hour given above, we have a constraint on the variables:
\begin{equation} \frac{1}{20} X_M + \frac{1}{14} X_E \leq 40 \nonumber \end{equation}
Pages that can be studied per week are also limited:
\begin{equation} 0 \leq X_M \leq 600 \nonumber \\ 0 \leq X_E \leq 400 \nonumber \end{equation}
In this problem, the upper limits were specified, but the lower limits were assumed. It is obvious that a negative number of pages that can be studied would be meaningless. For computational reasons, a lower limit has to be defined.
By analogy with the formula for total hours, the total outcome in marks must be:
\begin{equation}
(\mbox{outcome per page of memorization}) \times X_M
(\mbox{outcome per page of exercises}) \times X_E \nonumber
\end{equation}
That is, our objective is to maximize \(25 X_M + 30 X_E\). Putting this all together, we have the following linear program:
\begin{equation} \max 25 X_M + 30 X_E \nonumber \\ \hspace{1cm}\mbox{s.t.} \frac{1}{20} X_M + \frac{1}{14} X_E \leq 40 \nonumber \\ \hspace{2cm} 0 \leq X_M \leq 600 \nonumber \\ \hspace{2cm} 0 \leq X_E \leq 400 \nonumber \end{equation}
The horizontal line represents the studying limit on memorization, the vertical on exercises. The diagonal line is the constraint on hours, each point on that line represents a combination of memorization and exercises that requires exactly 40 hours of studying time, and any point downward and to the left requires less than 40 hours.
The shaded region bounded by the axes and these three lines corresponds exactly to the feasible solutions - those that satisfy all three constraints. Among all the feasible solutions represented in this region, we seek the one that maximizes the profit.
For this problem, a line of slope \(-\frac{25}{30}\) represents combinations that produce the same profit, for example, the line from (0, 450) to (540, 0) represents combinations that yield ??? points of outcome.
Different total marks of outcome give different but parallel lines in the figure, with higher total marks of outcome giving lines that are higher and further to the right.
Solving the Model using AMPL language in NEOS
The following block gives the Ampl code for the previous problem, which
should be saved in a separate file (study0.mod
)
var XM;
var XE;
maximize Profit: 25 * XM + 30 * XE;
subject to Time: (1/20) * XM + (1/14) * XE <= 40;
subject to M_limit: 0 <= XM <= 600;
subject to E_limit: 0 <= XE <= 400;
Create another file that contains the following instructions (study0.run
)
solve;
display XM, XE;
These two files can be uploaded to the NEOS server. Note: You must register to the Website (it won’t cost you anything).
This code can be written in a more general form as follows:
The file that contains the model will look like this (study.mod
):
set P;
param a {j in P};
param b;
param c {j in P};
param u {j in P};
var X {j in P};
maximize Total_Profit: sum {j in P} c[j] * X[j];
subject to Time: sum {j in P} ( 1/a[j] ) * X[j] <= b;
subject to Limit {j in P}: 0 <= X[j] <= u[j];
The parameter values can be stored in a separate file (study.dat
):
set P := memorization, exercise;
param: a c u :=
memorization 20 25 600
exercise 14 30 400;
param b := 40;
And finally the instructions for the optimizer can be collected in a separate
file (study.run
):
solve;
display X;
Running this general specification can be done by uploading the three files on the NEOS server.
This will give the following message from NEOS:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Checking ampl.mod for cplex_options...
Checking ampl.com for cplex_options...
Executing AMPL.
processing data.
processing commands.
Executing on prod-exec-5.neos-server.org
Presolve eliminates 2 constraints.
Adjusted problem:
2 variables, all linear
1 constraint, all linear; 2 nonzeros
1 inequality constraint
1 linear objective; 2 nonzeros.
CPLEX 20.1.0.0: threads=4
CPLEX 20.1.0.0: optimal solution; objective 19200
1 dual simplex iterations (0 in phase I)
X [*] :=
exercise 140
memorization 600
;
Solving the Model using JuMP in Julia
The same problem can be solved using the JuMP package in Julia (Julia code):
# Launch the script from Tearminal: include("scriptname.jl")
# Source: https://hackernoon.com/linear-optimization-using-julia-40311c0ad98c
using JuMP
using GLPK
# Create model
lpModel = Model(GLPK.Optimizer)
@variable(lpModel, XM <= 600)
@variable(lpModel, XE <= 400)
# Create an objective
@objective(lpModel, Max, 25*XM + 30*XE)
# Create constraints
@constraint(lpModel, (1/20)*XM+(1/14)*XE <= 40)
# Solve and read the answer
optimize!(lpModel)
println("Value of objective function : $(objective_value(lpModel))" )
println("Value of XM (memorization) : $(getvalue(XM))")
println("Value of XE (exercise) : $(getvalue(XE))")
The previous code will give the following results:
Value of objective function : 19200.0
Value of XM (memorization) : 600.0
Value of XE (exercise) : 140.0
Simple Model: Inference
The focus in this section is on the decision making of the teacher. It has been adapted from Gelman et al. (1995) and Lunn et al. (2013).
When designing a lesson, the teacher has to decide on the ratio knowledge to application, \(kn/app\), treated in a lesson. The learner’s responses are typically characterized by a dichotomous outcome: for example, "has acquired the competency" or "failed to acquire the competency".
Qn experimental lesson of this kind gives rise to data of the form \begin{equation} (x_i, n_i, y_i); i=1, \ldots, k, \end{equation} where \(x_i\) represents the i\(th\) of \(k\) lessons with different ratios \(kn/app\). \(n_i\) represents the number of learners present in the lesson, of which \(y_i\) subsequently respond with positive outcome.
An example is shown in the table below: a sequence of four lessons with ten learners in each lesson has been tried.
Ratio, \(x_i\) (log (\(kn(min)/app(min)\)) ) |
Number of learners in the lesson |
Number of learners who failed to acquire the competency |
---|---|---|
-0.92 |
10 |
2 |
-0.69 |
10 |
4 |
-0.36 |
10 |
6 |
0.40 |
10 |
8 |
If we assume that 15 min of a 50 min lesson is dedicated to presenting theory and knowledge, and 35 min is dedicated to applying that knowledge, we will have a ratio \(15/35\) or \(3/7\), which would be: log(\(3/7\))=-0.85\(\equiv x_i\).
We will model the outcomes of the ten learners within each group \(i\) as exchangeable, and it seems reasonable (if we assume that these are different classes) to model them as independent with equal probabilities, which implies that the data points \(y_i\) are binomially distributed:
\begin{equation} y_i | \theta_i \sim Bin(n_i, \theta_i) \nonumber \end{equation}
where \(\theta_i\) is the probability of failing the targeted competency for the learner exposed to a lesson of type \(x_i\).
It should be noted that a situation in which independence and the binomial model would not be appropriate is if the learners are solving problems in groups - meaning with knowledge transfer from one learner to the other.
For the case described above, we assume that there is no groupwork and that the four classes are different classes with different (not overlapping) learners, given the parameters \(\theta_1, \ldots, \theta_4\).
The simplest analysis would treat the four parameters \(\theta_i\) as exchangeable in their prior distribution, perhaps using a noninformative density such as \(p(\theta_1, \ldots, \theta_4) \propto 1\), in which case the parameters \(\theta_i\) would have independent beta posterior distributions. The exchangeable prior model for the \(\theta_i\) parameters has a serious flaw, however; we know the ratio level \(x_i\) for each group \(i\), and one would expect the probability of failing to vary systematically as a function of the ratio.
The simplest model of ratio-response relation — that is, the realtion of :\(\theta_i\) to \(x_i\) — is linear: \(\theta_i = \alpha + \beta x_i\).
Unfortunately, this model has the flaw that at low or high values of the ratio, \(x_i\) approaches \(\pm \infty\), whereas \(\theta_i\), being a probability, must be constrained to lie between 0 and 1.
The standard solution is to use a transformation of the \(\theta_i\)'s, such as a logistic, in the ratio-response relation:
\begin{equation} logit(\theta_i) = \alpha + \beta x_i \end{equation}
where \(logit(\theta_i) = log( \theta_i / (1+\theta_i) )\). This is called a logistic regression model.
One approach to estimate th eparameters of this model is to use the BUGS software.
We fit the following logistic regression model:
\begin{equation} y_i \sim Binomial(p_i, n_i) \\ logit(p_i) = \alpha + \beta(x_i - \bar{x}) \end{equation}
with vague \(Normal(0, 100^2)\) priors for \(\alpha\) and \(\beta\).
Note that, the covariate (ratio) is centred, by subtracting \(\bar{x}=N^{-1} \sum x_i\). This is because serious MCMC convergence issues arise when the \(x_i\)'s are used directly, due to very high posterior correlation between \(\alpha\) and \(\beta\).
Centering essentially relocates the \(y\)-axis to \(x=\bar{x}\), which, in this case, vastly reduces the dependence of the intercept \(\alpha\) on \(\beta\).
The likelihood is specified as follows:
for (i in 1:4) {
y[i] ~ dbin( p[i], n[i] )
logit(p[i]) <- alpha + beta * ( x[i] - mean(x[]) )
}
If we want to assess the model fit visually then we will need to either transform the data onto the same scale as the linear model or transform the model fit onto the same scale as the observations.
We insert the following code in the loop above so that we can examine both:
phat[i] <- y[i]/n[i]
yhat[i] <- n[i]*p[i]
Under the model described earlier, we can write the sampling distribution, or likelihood, for each group \(i\) in terms of the parameters \(\alpha\) and \(\beta\) as
\begin{equation} p(y_i \mid \alpha, \beta, n_i, x_i) \propto [logit^{-1}(\alpha + \beta x_i)]^{y_i} [1-logit^{-1}(\alpha + \beta x_i)]^{n_i-y_i} \end{equation}
The model (BUGS code) is characterized by the parameters \(\alpha\) and \(\beta\), whose joint posterior distribution is
\begin{equation} p(\alpha, \beta \mid y, n, x) \propto p(\alpha, \beta \mid n, x) p(y \mid \alpha, \beta, n, x) \\ \propto p(\alpha, \beta) \prod_{i=1}^{k} p(y_i \mid \alpha, \beta, n_i, x_i) \end{equation}
We consider the sample sizes \(n_i\) and ratio levels \(x_i\) as fixed for this analysis and suppress the conditioning on \((n, x)\) in subsequent notation.
We present an analysis based on a prior distribution for \((\alpha, \beta)\) that is independent and locally uniform in two parameters; that is \(p(\alpha, \beta) \propto 1\). In practice, we might use a uniform prior distribution if we really have no prior knowledge about the parameters, or if we want to present a simple analysis of this experiment alone.
If the analysis using noninformative prior distribution is insufficiently precise, we may consider using other sources of substantive information (for example, from other classroom experiments) to construct an informative prior distribution.
Another tool for solving this problem is the Stan software (cmdstan
).
The previous model can be implemented as follows:
data {
int<lower=0> N;
int<lower=0> n[N];
int<lower=0> r[N];
vector[N] x;
}
transformed data {
vector[N] centered_x;
real mean_x;
mean_x <- mean(x);
centered_x <- x - mean_x;
}
parameters {
real alpha_star;
real beta;
}
transformed parameters {
vector[N] m;
m <- alpha_star + beta * centered_x;
}
model {
alpha_star ~ normal(0.0, 1.0E4);
beta ~ normal(0.0, 1.0E4);
r ~ binomial_logit(n, m);
}
generated quantities {
real alpha;
real p[N];
real llike[N];
real rhat[N];
for (i in 1:N) {
p[i] <- inv_logit(m[i]);
llike[i] <- r[i]*log(p[i]) + (n[i]-r[i])*log(1-p[i]);
rhat[i] <- p[i]*n[i]; // fitted values
}
alpha <- alpha_star - beta*mean_x;
}
The initial values are given in a separate file:
alpha_star <- 0
beta <- 0
Finally, the data is also stored in a separate file:
x <- c(-0.92, -0.69, -0.36, 0.40)
n <- c(10, 10, 10, 10)
r <- c(2, 4, 6, 8)
N <- 4
Running the previous code in the Terminal can be done as follows:
STAN_HOME = ../../../../..
BOOSTPATH = pass:[\((shell find \)](STAN_HOME)/lib -path '*lib/boost_*' -regex '.*lib\/boost_[^/]*')
EIGENPATH = pass:[\((shell find \)](STAN_HOME)/lib -path '*lib/eigen_*' -regex '.*lib\/eigen_[^/]*')
CPPFLAGS = -I pass:[\((BOOSTPATH) -I\)](EIGENPATH) -I $(STAN_HOME)/src
LIBFLAGS = -L$(STAN_HOME)/bin -lstan
$(STAN_HOME)/bin/stanc --name=study study.stan
g++ -O0 -DNDEBUG pass:[\((CPPFLAGS) study_logit.cpp -o study \)](LIBFLAGS)
./study data=study-stan-logit-data.R sample
Note: On my Mac, STAN_HOME is located at /Users/…/opt/anaconda3/envs/stan-env/ . And the stanc
binary is located at /Users/…/opt/anaconda3/envs/stan-env/bin/cmdstan/bin/ .
For more details on checking the Stan compiler.
To Do
-
The data and initial values in the Stan code are in R → Convert into Julia! -
Explain how to read data when running cmdstan in the Terminal -
How to read the initial values when running cmdstan?
References
Fourer, Robert, David M. Gay, and Brian W. Kernighan. "A modeling language for mathematical programming." Management Science 36, no. 5 (1990): 519-554.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian data analysis. Chapman and Hall/CRC, 1995.
Hongwei, Jin. A Tutorial of AMPL for Linear Programming
Lunn, David, Christopher Jackson, Nicky Best, Andrew Thomas, and David Spiegelhalter. "The BUGS book." A Practical Introduction to Bayesian Analysis, Chapman Hall, London (2013).
Stan Development Team. YEAR. Stan Modeling Language Users Guide and Reference Manual, VERSION. https://mc-stan.org
Stan Development Team. stan-dev at Github
StanJulia. CmdStan Installation
Wright, Stephen, and Jorge Nocedal. " Numerical optimization." Springer Science 35, no. 67-68 (1999): 7.