#!/usr/bin/env python # coding: utf-8 # In[2]: from IPython.display import Image Image('../../../python_for_probability_statistics_and_machine_learning.jpg') # [Python for Probability, Statistics, and Machine Learning](https://www.springer.com/fr/book/9783319307152) # In[3]: from pprint import pprint import textwrap import sys, re # # Useful Inequalities # # In practice, few quantities can be analytically calculated. Some knowledge # of bounding inequalities helps find the ballpark for potential solutions. This # sections discusses three key inequalities that are important for # probability, statistics, and machine learning. # # ## Markov's Inequality # # Let $X$ be a non-negative random variable # and suppose that $\mathbb{E}(X) < \infty$. Then, # for any $t>0$, # $$ # \mathbb{P}(X>t)\leq \frac{\mathbb{E}(X)}{t} # $$ # This is a foundational inequality that is # used as a stepping stone to other inequalities. It is easy # to prove. Because $X>0$, we have the following, # $$ # \begin{align*} # \mathbb{E}(X)&=\int_0^\infty x f_x(x)dx =\underbrace{\int_0^t x f_x(x)dx}_{\text{omit this}}+\int_t^\infty x f_x(x)dx \\\ # &\ge\int_t^\infty x f_x(x)dx \ge t\int_t^\infty x f_x(x)dx = t \mathbb{P}(X>t) # \end{align*} # $$ # The step that establishes the inequality is the part where the # $\int_0^t x f_x(x)dx$ is omitted. For a particular $f_x(x)$ that my be # concentrated around the $[0,t]$ interval, this could be a lot to throw out. # For that reason, the Markov Inequality is considered a *loose* inequality, # meaning that there is a substantial gap between both sides of the inequality. # For example, as shown in [Figure](#fig:ProbabilityInequalities_001), the # $\chi^2$ distribution has a lot of its mass on the left, which would be omitted # in the Markov Inequality. [Figure](#fig:ProbabilityInequalities_002) shows # the two curves established by the Markov Inequality. The gray shaded region is # the gap between the two terms and indicates that looseness of the bound # (fatter shaded region) for this case. # # # #
# #The $\chi_1^2$ density has much of its weight on the left, which is excluded in the establishment of the Markov Inequality.
#
#
#
#
#
#
#
#
#
# The shaded area shows the region between the curves on either side of the Markov Inequality.
#
#
#
#
#
# ## Chebyshev's Inequality
#
# Chebyshev's Inequality drops out directly from the Markov Inequality. Let
# $\mu=\mathbb{E}(X)$ and $\sigma^2=\mathbb{V}(X)$. Then, we have
# $$
# \mathbb{P}(\vert X-\mu\vert \ge t) \le \frac{\sigma^2}{t^2}
# $$
# Note that if we normalize so that $Z=(X-\mu)/\sigma$, we
# have $\mathbb{P}(\vert Z\vert \ge k) \le 1/k^2$. In particular,
# $\mathbb{P}(\vert Z\vert \ge 2) \le 1/4$. We can illustrate this
# inequality using Sympy statistics module,
# In[4]:
import sympy
import sympy.stats as ss
t=sympy.symbols('t',real=True)
x=ss.ChiSquared('x',1)
# To get the left side of the Chebyshev inequality, we
# have to write this out as the following conditional probability,
# In[5]:
r = ss.P((x-1) > t,x>1)+ss.P(-(x-1) > t,x<1)
# This is because of certain limitations in the statistics module at
# this point in its development regarding the absolute value function. We could
# take the above expression, which is a function of $t$ and attempt to compute
# the integral, but that would take a very long time (the expression is very long
# and complicated, which is why we did not print it out above). This is because
# Sympy is a pure-python module that does not utilize any C-level optimizations
# under the hood. In this situation, it's better to use the built-in cumulative
# density function as in the following (after some rearrangement of the terms),
# In[6]:
w=(1-ss.cdf(x)(t+1))+ss.cdf(x)(1-t)
# To plot this, we can evaluated at a variety of `t` values by using
# the `.subs` substitution method, but it is more convenient to use the
# `lambdify` method to convert the expression to a function.
# In[7]:
fw=sympy.lambdify(t,w)
# Then, we can evaluate this function using something like
# In[8]:
map(fw,[0,1,2,3,4])
# to produce the following [Figure](#fig:ProbabilityInequalities_003).
#
#
#
#
#
# The shaded area shows the region between the curves on either side of the Chebyshev Inequality.
#
#
#
#
#
# **Programming Tip.**
#
# Note that we cannot use vectorized inputs for the `lambdify` function because
# it contains embedded functions that are only available in Sympy. Otherwise, we
# could have used `lambdify(t,fw,numpy)` to specify the corresponding functions
# in Numpy to use for the expression.
#
#
#
# ## Hoeffding's Inequality
#
#
# Hoeffding's Inequality is similar, but less loose, than Markov's Inequality.
# Let $X_1,\ldots,X_n$ be iid observations such that $\mathbb{E}(X_i)=\mu$ and
# $a\le X_i \le b$. Then, for any $\epsilon>0$, we have
# $$
# \mathbb{P}(\vert \overline{X}_n -\mu\vert \ge \epsilon) \le 2 \exp(-2 n\epsilon^2/(b-a)^2)
# $$
# where $\overline{X}_n = \tfrac{1}{n}\sum_i^n X_i$. Note that we
# further assume that the individual random variables are bounded.
#
# **Corollary.** If $X_1,\ldots,X_n$ are independent with $\mathbb{P}(a\le X_i\le b)=1$
# and all with $\mathbb{E}(X_i)=\mu$. Then, we have
# $$
# \vert\overline{X}_n-\mu\vert \le \sqrt{\frac{c}{2 n}\log \frac{2}{\delta}}
# $$
# where $c=(b-a)^2$. We will see this inequality again in the machine
# learning chapter. [Figure](#fig:ProbabilityInequalities_004) shows the Markov
# and Hoeffding bounds for the case of ten identically and uniformly distributed
# random variables, $X_i \sim \mathcal{U}[0,1]$. The solid line shows
# $\mathbb{P}(\vert \overline{X}_n - 1/2 \vert > \epsilon)$. Note that the
# Hoeffding Inequality is tighter than the Markov Inequality and that both of
# them merge when $\epsilon$ gets big enough.
#
#
#
#
#
# This shows the Markov and Hoeffding bounds for the case of ten identically and uniformly distributed random variables.
#
#
#