login about faq

Hi Y'all

I am experimenting with an entropy max problem, and I run into the problem that log(0) is not allowed. In the model below, the entropy would be maxed by putting all the whatevers into Stuff[1] and zero into Stuff[2..end], but zero values for Stuff[j] make the objective function undefined.

I might try an equiv formulation, maybe using Stuff2[j] = Stuff[j]+1, but there might be canonical tricks to solving this problem that I don't know about.

Any ideas?

option halt_on_ampl_error 1;

param total; 
param celln;

set J := 1..celln;

var Stuff{j in J} >= 0;
var E {j in J} = (Stuff[j]/total) * log(Stuff[j]/total);
var Ent = -1 * sum {j in J} E[j];

maximize _Ent: Ent;

subject to T1: sum {j in J} Stuff[j] = total;

data;
param total := 100;
param celln := 10;

After much interesting discussion, here is something that works as I expect with Ipopt:

option halt_on_ampl_error 1;

set PLACE;

param total; 
param smin{PLACE};
param smax{PLACE};

var Stuff{p in PLACE} >= 0;
var E {p in PLACE};
var Ent;

## minimize the negative for max entropy
minimize _Ent:  -1 * Ent;

## control total
s.t. T1                 : sum {p in PLACE} Stuff[p] = total;

## min and max stuff
s.t. SMin {p in PLACE}  : Stuff[p] >= smin[p];          
s.t. SMax {p in PLACE}  : Stuff[p] <= smax[p];

## set up entropy equations
s.t. E1 {p in PLACE}    : E[p] = (Stuff[p]/total) * log(Stuff[p]/total);
s.t. Ent1               : Ent = -1.0 * (sum {p in PLACE} E[p]);

data;   ## DATA STARTS HERE ################
param total := 7000;

param: PLACE:    smin     smax  :=
    FF01         1000     1300 
    FF03         1200     1400
    FF05         1000     1300
    FF07         800      900
    FF09         1000     1300
    FF11         1000     1200    ;

asked Dec 14 '11 at 12:39

forkandwait's gravatar image

forkandwait
35314

edited Dec 15 '11 at 12:43

I might be misunderstanding the issue... but have you considered just adding a small constant inside the log to make sure that it doesn't happen?

I guess this is what you mean by Stuff2[j] = Stuff[j]+1, but, don't need to make a new variable or anything, or even use 1.

What is the physical meaning of log(0) in the context of your problem?

(Dec 14 '11 at 12:45) Iain Dunning

I tried the small constant and it works now, at least it compiles...

I am trying to model allocating population (or whatever) based on limited prior knowledge encoded in the constraints. So, we might know that between 10 and 20 people live in census tract #1, zero in #2, but have no idea about #3. The entropy max thing tries to spread around whatever is not constrained equally.

(I am getting weird results when I use Bonmin, but I thought it should be OK, even though there is no integer term. On NEOS' mosek it works fine though... EDIT: Works with Ipopt ...)

(Dec 14 '11 at 13:49) forkandwait

There's more than one way to deal with this situation, e.g.:

  1. Make use of conditional expressions in AMPL (see #Q8)
  2. Follow @Iain's advice (and remember: \(log(x/y) = log(x)-log(y)\))
  3. Re-model the variable assignment as "Big M constraint"
  4. Read the comment below.
link

answered Dec 14 '11 at 13:52

fbahr's gravatar image

fbahr ♦
1.6k37

edited Dec 14 '11 at 14:39

3

Hi, just a few friendly comments. (1) Conditional expressions only work for parameters in AMPL, not variables. (2) In my experience, the problem with logs is their extremely nonlinear behavior near 0. Adding a small constant avoids the singularity but some solvers still don't like the kind of behavior that is produced. As for log(x/y), in this case, the denominator is a constant that behaves like a scaling factor, so leaving it as log(x/y) may actually be numerically favorable. (3) adding integer variables may be a bit of an expensive approach just to get around the singularity issue. ;-)

(Dec 14 '11 at 14:25) Gilead ♦
2

BTW, please don't take my comments the wrong way. It's just that I've spent years and years doing nonlinear modeling and have gone down the wrong path many times (I have white hair to show for it). I just want to share my experience so that others can avoid making the mistakes I did.

(Dec 14 '11 at 14:30) Gilead ♦
1

@fbahr +1 for the funny edit :-)

(Dec 15 '11 at 03:25) Bo Jensen ♦

If you'll allow me to restate your constraint in mathematical terms:

$$ \begin{align} & E_{j} = \frac{S_{j}}{T} \log(\frac{S_{j}}{T}), &\forall j \in J \end{align} $$

You could conceivably rewrite the constraint this way:

$$ \begin{align} & E_{j} = \frac{S_{j}}{T} y_{j}, &\forall j \in J \\ & \exp(y_{j}) = \frac{S_{j}}{T}, &\forall j \in J \\ & y_{L} \leq y_{j} \leq y_{U} , &\forall j \in J \end{align} $$ Reasonable bounds can usually be found for \(y_{j}\).

Update: controversial statement removed because further study is required on my part.

Note: your mileage may vary. This is only one possible formulation. I believe it to be a correct and robust one (for several reasons), but it may not be the best one. As others have noted, it is valid to write a log formulation and restrict its function argument to be greater than a slightly positive number.

link

answered Dec 14 '11 at 14:06

Gilead's gravatar image

Gilead ♦
1.2k28

edited Dec 15 '11 at 02:48

So if I'm understanding it right, his y_U will be 0, and y_L = -M. How would you go about determining a good value for it? I guess c * log(1/total)?

(Dec 14 '11 at 15:34) Iain Dunning

In practice, I rarely specify bounds on \(y\) (I only included it for completeness). I use an IPM solver and even it seems to do the right thing without the bounds. However, it's easy to see that \(S_{j} = T\exp(y_{j})\). Suppose \(\epsilon_{m}\) is the optimizer tolerance, and \(S_{U}\) is the upper bound for \(S_{j}\). Then \(y_{L} = \log(\epsilon_{m}/T)\) and \(y_{U} = \log(S_{U}/T) \). If we consider \(\epsilon_{m} = 1 \times 10^{-6}\) and \(S_{U} = 100\) in this case, you're looking at \(y_{L} = -18.4 \) and \(y_{U} = 0 \), which is a reasonable range.

(Dec 14 '11 at 18:16) Gilead ♦

I strongly disagree with those claiming that log() should be avoided. For instance geometric programming problems are almost always solved on dual form which is same as preferring log() to exp(). Note exp(x) is very badly behaved for even moderately large x.

It is true active set based optimizers have problems with the log() and therefore you should use an interior-point based one. It automatically keeps x away from 0. I suggest you try something like MOSEK (www.mosek.com) on your problem. You can do that using NEOS for instance.

Erling (Working for MOSEK)

link

answered Dec 15 '11 at 01:41

Erling's gravatar image

Erling
3704

Just to respond to your comment about exp(x) being badly behaved; it is true that even modest positive values of x will give rise to very large function values. However, in my work with logs, say for y=log(x), I usually do a scaling on x such that: e <= x <= 100, where e is say, 1e-6. This means, exp(y)=x and therefore log(e) <= y <= log(100), or -13.8 <= y <= 4.6, which to me seems to be a reasonable range for both x and y.

(Dec 15 '11 at 01:54) Gilead ♦

As for geometric programming, I have to defer to your expertise there because I have no experience at all with that class of problems. But I do see that I need to tone down my suggestion that logs should be avoided. I need to rethink this.

(Dec 15 '11 at 01:59) Gilead ♦

Note also that the so-called barrier function in interior-point methods is sum_j ln(x_j). Hence, you even introduce the ln(x) in your method :-). Btw since ln(x)>=t is the same as x >=exp(t) then the two functions are closely related.

(Dec 15 '11 at 02:39) Erling

Thanks to everyone, here is a new model which (1) seems to work (at least in Ipopt), (2) uses the more idiomatic method of stuffing the equalities into constraints, rather than "variables", and (3) fixes the logarithm problem.

Now I will start to work on the data section to better describe the problem (allocating equally into 10 bins isn't all that exciting ....)

The one thing I haven't figured out is why when I switch signs on my objective function I don't get a different allocation -- seems like with such a switch we should get a clustered distribution all in one bin.

Thanks to all! Fun and instructive!

 
option halt_on_ampl_error 1;

param total; 
param celln;

set J := 1..celln;

var Stuff{j in J} >= 0;
var E {j in J};
var Y {j in J};
var Ent;

minimize _Ent:  Ent;

s.t. T1            : sum {j in J} Stuff[j] = total;
s.t. E1 {j in J}   : E[j] = (Stuff[j]/total) * Y[j];
s.t. Y1 {j in J}   : exp(Y[j]) = Stuff[j]/total;
s.t. Ent1          : Ent = +1.0 * (sum {j in J} Y[j]);

data;   ## DATA STARTS HERE ################
param total := 100;
param celln := 10;
link

answered Dec 14 '11 at 16:28

forkandwait's gravatar image

forkandwait
35314

2

Just a note: You may want to append your updates to the original question. The way these Q&A sites work is just slightly different from forums in that each reply is really an answer. As for your model, I think the Y[j] term in the RHS of Ent should really be E[j] because essentially, Y[j] = log(Stuff[j]/total).

(Dec 14 '11 at 18:45) Gilead ♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×10
×1
×1

Asked: Dec 14 '11 at 12:39

Seen: 690 times

Last updated: Dec 15 '11 at 12:43

OR-Exchange! Your site for questions, answers, and announcements about operations research.