From Wikipedia, the free encyclopedia
This article is within the scope of WikiProject Mathematics , a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Mathematics Wikipedia:WikiProject Mathematics Template:WikiProject Mathematics mathematics Low This article has been rated as Low-priority on the project's priority scale .
What is the "trick" in the section "log-sum-exp trick for log-domain calculations"? I had to read the sentence "Like multiplication operation in linear-scale becoming simple addition in log-scale; an addition operation in linear-scale becomes the LSE in the log-domain." three times for it to sort of make sense, I'll try and fix it, assuming log-scale and log-domain are the same thing. --WiseWoman (talk ) 20:56, 14 March 2020 (UTC) [ reply ]
The trick is to replace
L
S
E
(
x
1
,
…
,
x
n
)
{\displaystyle \mathrm {LSE} (x_{1},\ldots ,x_{n})}
by
L
S
E
(
x
1
−
x
m
a
x
,
…
,
x
n
−
x
m
a
x
)
+
x
m
a
x
{\displaystyle \mathrm {LSE} (x_{1}-x_{\mathrm {max} },\ldots ,x_{n}-x_{\mathrm {max} })+x_{\mathrm {max} }}
which is numerically more stable (e.g. when used in a computer program). I think the text is clear (perhaps it has changed since you commented). --80.129.163.20 (talk ) 14:39, 20 January 2022 (UTC) [ reply ]
I stumbled on that sentence and math too. Apparently, the point is that applying LogSumExp to a vector of variables transformed to, or taken to be in, log space (which I agree is not obviously defined, as log can in principle output any number), is equivalent to taking the log of the sum of the vector of untransformed variables. Equivalence is symmetric, so LSE can also be thought of as a way to notate/represent/compute the logarithm of a sum. Whether and how it (the trick and the whole function) is useful is another question, perhaps not sufficiently answered by this article.
NB: The other reply explains another section of the article. Elias (talk ) 09:59, 10 March 2023 (UTC) [ reply ]
I think the LSE acronym is misleading as it can be read as Least Square Error. I'd be consistent across the text and use LogSumExp. User:misssperovaz — Preceding unsigned comment added by Missperovaz (talk • contribs ) 04:57, 14 January 2021 (UTC) [ reply ]
In the case of two real-valued variables, it is possible to approximate the function as:
ln
(
e
x
+
e
y
)
≈
{
x
+
ln
(
2
)
,
x
=
y
x
e
x
ln
(
2
)
−
y
e
y
ln
(
2
)
e
x
ln
(
2
)
−
e
y
ln
(
2
)
,
otherwise
{\displaystyle \ln(e^{x}+e^{y})\approx {\begin{cases}x+\ln(2),\qquad \qquad x=y\\{\dfrac {x\ e^{\frac {x}{\ln(2)}}-y\ e^{\frac {y}{\ln(2)}}}{e^{\frac {x}{\ln(2)}}-e^{\frac {y}{\ln(2)}}}},\ {\text{otherwise}}\end{cases}}}
showing how heavily nonlinear the function really is.
Hope someone could fact check and later add it to the main text. 45.181.122.234 (talk ) 15:29, 13 September 2024 (UTC) [ reply ]
Later I found a less accurate but more intuitive approximation:
ln
(
e
x
+
e
y
)
≈
x
+
y
2
+
1
2
(
x
−
y
)
2
+
(
2
ln
(
2
)
)
2
{\displaystyle \ln(e^{x}+e^{y})\approx {\frac {x+y}{2}}+{\frac {1}{2}}{\sqrt {(x-y)^{2}+(2\ln(2))^{2}}}}
It has the property that both functions solves
f
x
+
f
y
=
1
{\displaystyle f_{x}+f_{y}=1}
45.181.122.234 (talk ) 17:21, 12 July 2025 (UTC) [ reply ]
Another one less accurate but useful since works better than a Taylor expansion of second order but keeps the same components.
The 2nd order Taylor expansion is given by:
Z
=
ln
(
e
X
+
e
Y
)
≈
2nd order Taylor's
ln
(
2
)
+
Y
+
X
2
+
(
Y
−
X
)
2
8
{\displaystyle Z=\ln(e^{X}+e^{Y}){\overset {\text{2nd order Taylor's}}{\approx }}\ln(2)+{\frac {Y+X}{2}}+{\frac {(Y-X)^{2}}{8}}}
Then the expected value will be limited to find the terms:
E
[
Z
]
≈
ln
(
2
)
+
E
[
Y
]
+
E
[
X
]
2
+
E
[
(
Y
−
X
)
2
]
8
{\displaystyle E[Z]\approx \ln(2)+{\frac {E[Y]+E[X]}{2}}+{\frac {E[(Y-X)^{2}]}{8}}}
a way to improve considerably
And improvement for this approximation by using the same terms could be found by using the classic *small-angle approximation* for the cosine function
cos
(
x
)
≈
1
−
x
2
2
{\displaystyle \cos(x)\approx 1-{\frac {x^{2}}{2}}}
but instead of simplifying, by going in the other way around, and then applying the Isserlis's theorem :
:
Z
=
ln
(
e
X
+
e
Y
)
≈
ln
(
2
)
+
Y
+
X
2
+
(
Y
−
X
)
2
8
±
1
:
=
1
+
ln
(
2
)
+
Y
+
X
2
−
[
1
−
1
2
(
|
Y
−
X
|
2
)
2
]
⏟
small angle approximation in reverse
:
=
1
+
ln
(
2
)
+
Y
+
X
2
−
cos
(
|
Y
−
X
|
2
)
⏟
cosine is an even function
:
=
1
+
ln
(
2
)
+
Y
+
X
2
−
cos
(
Y
−
X
2
)
⏟
complex-valued expansion
:⇒
Z
=
ln
(
e
X
+
e
Y
)
≈
{
1
+
ln
(
2
)
+
Y
+
X
2
−
1
2
[
e
−
i
(
Y
−
X
)
2
+
e
−
i
(
X
−
Y
)
2
]
,
(
Y
−
X
)
2
8
≤
3
max
{
X
,
Y
}
,
otherwise
:
{\displaystyle {\begin{array}{r c l}:Z=\ln(e^{X}+e^{Y})&\approx &\ln(2)+{\frac {Y+X}{2}}+{\frac {(Y-X)^{2}}{8}}\pm 1\\:&=&1+\ln(2)+{\frac {Y+X}{2}}-\underbrace {\left[1-{\frac {1}{2}}\left({\frac {|Y-X|}{2}}\right)^{2}\right]} _{\text{small angle approximation in reverse}}\\:&=&1+\ln(2)+{\frac {Y+X}{2}}-\underbrace {\cos \left({\frac {|Y-X|}{2}}\right)} _{\text{cosine is an even function}}\\:&=&1+\ln(2)+{\frac {Y+X}{2}}-\underbrace {\cos \left({\frac {Y-X}{2}}\right)} _{\text{complex-valued expansion}}\\:\Rightarrow Z=\ln(e^{X}+e^{Y})&\approx &{\begin{cases}1+\ln(2)+{\frac {Y+X}{2}}-{\frac {1}{2}}\left[e^{-i{\frac {(Y-X)}{2}}}+e^{-i{\frac {(X-Y)}{2}}}\right],\quad {\frac {(Y-X)^{2}}{8}}\leq 3\\\max\{X,Y\},\quad {\text{otherwise}}\end{cases}}:\end{array}}}
and since when the variables are too different I just have:
E
[
Z
]
≈
{
E
[
X
]
,
(
X
>
Y
)
∧
(
(
Y
−
X
)
2
8
>
3
)
E
[
Y
]
,
(
Y
>
X
)
∧
(
(
Y
−
X
)
2
8
>
3
)
{\displaystyle E[Z]\approx {\begin{cases}E[X],\quad (X>Y)\wedge \left({\frac {(Y-X)^{2}}{8}}>3\right)\\E[Y],\quad (Y>X)\wedge \left({\frac {(Y-X)^{2}}{8}}>3\right)\end{cases}}}
I really care about computing accurately when
(
Y
−
X
)
2
8
≤
3
{\displaystyle {\frac {(Y-X)^{2}}{8}}\leq 3}
, here applying the Expected value jointly with Isserlis's theorem leads to:
:
E
[
Z
]
=
E
[
ln
(
e
X
+
e
Y
)
]
|
(
Y
−
X
)
2
8
≤
3
≈
1
+
ln
(
2
)
+
E
[
X
]
+
E
[
Y
]
2
−
1
2
[
e
−
1
2
E
[
(
Y
−
X
2
)
2
]
+
e
−
1
2
E
[
(
X
−
Y
2
)
2
]
]
:
=
1
+
ln
(
2
)
+
E
[
X
]
+
E
[
Y
]
2
−
1
2
[
e
−
1
8
E
[
(
Y
−
X
)
2
]
+
e
−
1
8
E
[
(
Y
−
X
)
2
]
⏟
identicals
]
{\displaystyle {\begin{array}{r c l}:E[Z]=E\left[\ln \left(e^{X}+e^{Y}\right)\right]{\Biggr |}_{{\frac {(Y-X)^{2}}{8}}\leq 3}&\approx &1+\ln(2)+{\frac {E[X]+E[Y]}{2}}-{\frac {1}{2}}\left[e^{-{\frac {1}{2}}E\left[\left({\frac {Y-X}{2}}\right)^{2}\right]}+e^{-{\frac {1}{2}}E\left[\left({\frac {X-Y}{2}}\right)^{2}\right]}\right]\\:&=&1+\ln(2)+{\frac {E[X]+E[Y]}{2}}-{\frac {1}{2}}\left[\underbrace {e^{-{\frac {1}{8}}E\left[\left(Y-X\right)^{2}\right]}+e^{-{\frac {1}{8}}E\left[\left(Y-X\right)^{2}\right]}} _{\text{identicals}}\right]\end{array}}}
⇒
E
[
Z
]
≈
1
+
ln
(
2
)
+
E
[
X
]
+
E
[
Y
]
2
−
e
−
1
8
E
[
(
Y
−
X
)
2
]
,
(
Y
−
X
)
2
8
≤
3
{\displaystyle \Rightarrow E[Z]\approx 1+\ln(2)+{\frac {E[X]+E[Y]}{2}}-e^{-{\frac {1}{8}}E\left[\left(Y-X\right)^{2}\right]},\quad {\frac {(Y-X)^{2}}{8}}\leq 3}
Note that the bound
(
X
−
Y
)
2
8
<
3
{\displaystyle {\frac {(X-Y)^{2}}{8}}<3}
is quite "fit", since it comes from the fact that since
ln
(
e
X
+
e
Y
)
=
X
+
ln
(
1
+
e
Y
−
X
)
{\displaystyle \ln(e^{X}+e^{Y})=X+\ln(1+e^{Y-X})}
and that:
1
+
ln
(
2
)
+
Y
+
X
2
−
cos
(
Y
−
X
2
)
=
1
+
ln
(
2
)
+
X
+
Y
−
X
2
−
cos
(
Y
−
X
2
)
{\displaystyle 1+\ln(2)+{\frac {Y+X}{2}}-\cos \left({\frac {Y-X}{2}}\right)=1+\ln(2)+X+{\frac {Y-X}{2}}-\cos \left({\frac {Y-X}{2}}\right)}
by matching both and making
Y
−
X
=
D
{\displaystyle Y-X=D}
I get:
X
+
ln
(
1
+
e
D
)
=
1
+
ln
(
2
)
+
X
+
D
2
−
cos
(
D
2
)
⇒
D
≈
±
3.225
{\displaystyle {\cancel {X}}+\ln(1+e^{D})=1+\ln(2)+{\cancel {X}}+{\frac {D}{2}}-\cos \left({\frac {D}{2}}\right)\Rightarrow D\approx \pm 3.225}
Summarizing, the approximation is given by:
ln
(
e
X
+
e
Y
)
≈
{
1
+
ln
(
2
)
+
Y
+
X
2
−
cos
(
Y
−
X
2
)
,
(
Y
−
X
)
2
8
≤
3
max
{
X
,
Y
}
,
otherwise
{\displaystyle \ln(e^{X}+e^{Y})\approx {\begin{cases}1+\ln(2)+{\frac {Y+X}{2}}-\cos \left({\frac {Y-X}{2}}\right),\quad {\frac {(Y-X)^{2}}{8}}\leq 3\\\max\{X,Y\},\quad {\text{otherwise}}\end{cases}}}
45.181.122.234 (talk ) 18:43, 14 July 2025 (UTC) [ reply ]
I realized that the expected value formula is wrong if
Y
−
X
{\displaystyle Y-X}
don't have zero mean, but it could be fixed as:
E
[
Z
]
≈
ln
(
2
)
+
E
[
X
]
+
E
[
Y
]
2
+
1
−
cos
(
E
[
Y
]
−
E
[
X
]
2
)
⋅
e
−
1
8
V
[
Y
−
X
]
,
(
Y
−
X
)
2
8
≤
3
{\displaystyle E[Z]\approx \ln(2)+{\frac {E[X]+E[Y]}{2}}+1-\cos \left({\frac {E[Y]-E[X]}{2}}\right)\cdot e^{-{\frac {1}{8}}V[Y-X]},\quad {\frac {(Y-X)^{2}}{8}}\leq 3}
45.181.122.234 (talk ) 23:05, 22 July 2025 (UTC) [ reply ]
in the properties section, you need to specify that t is positive, otherwise it leads to mislead. 94.180.181.63 (talk ) 10:13, 8 December 2024 (UTC) [ reply ]