Bayesian Methods for Hackers

3 minute read

1. 베이지안 추론의 철학Permalink

서론
- 사전확률 : 사건 A에 대한 우리의 믿음의 양
  $P(A)$
- 사후확률 : 증거 X가 주어진 상황에서 A의 확률
  $P(A|X)$
베이지안 프레임워크
- 베이즈 정리
  $P(A|X) = \frac {p(A)p(X|A)}{p(X)}$
확률분포
- 확률변수 Z가 이산적인 경우, 확률질량함수
  $P(Z=k) = \frac {\lambda^k e^{-\lambda}}{k!}$ , $k=0,1,2,…$
  $Z$ ~ $Poi(\lambda)$
  $E[Z|\lambda] = \lambda$
- 확률변수 Z가 연속적인 경우, 확률밀도함수
  $f_Z(z|\lambda) = \lambda e^{-\lambda z}, z \ge 0$
  $Z$ ~ $Exp(\lambda)$
  $E[Z|\lambda] = \frac {1}{\lambda}$

컴퓨터를 사용하여 베이지안 추론하기

	figsize(12.5, 3.5)
	count_data = np.loadtxt("data/txtdata.csv")
	n_count_data = len(count_data)
	plt.bar(np.arange(n_count_data), count_data, color="#348ABD")
	plt.xlabel("Time (days)")
	plt.ylabel("count of text-msgs received")
	plt.title("Did the user's texting habits change over time?")
	plt.xlim(0, n_count_data);

	import pymc3 as pm

	with pm.Model() as model:
	alpha = 1.0/count_data.mean() # Recall count_data is the
	# variable that holds our txt counts
	lambda_1 = pm.Exponential("lambda_1", alpha)
	lambda_2 = pm.Exponential("lambda_2", alpha)

	tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1)

	with model:
	idx = np.arange(n_count_data) # Index
	lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)

	with model:
	observation = pm.Poisson("obs", lambda_, observed=count_data)

	with model:
	step = pm.Metropolis()
	trace = pm.sample(10000, tune=5000, step=step, return_inferencedata=False)

	lambda_1_samples = trace['lambda_1']
	lambda_2_samples = trace['lambda_2']
	tau_samples = trace['tau']

	figsize(12.5, 10)
	#histogram of the samples:

	ax = plt.subplot(311)
	ax.set_autoscaley_on(False)

	plt.hist(lambda_1_samples, histtype='stepfilled', bins=30, alpha=0.85,
	label="posterior of $\lambda_1$ ", color="#A60628", density=True)
	plt.legend(loc="upper left")
	plt.title(r"""Posterior distributions of the variables
	$\lambda_1,\;\lambda_2,\;\tau$ """)
	plt.xlim([15, 30])
	plt.xlabel(" $\lambda_1$ value")

	ax = plt.subplot(312)
	ax.set_autoscaley_on(False)
	plt.hist(lambda_2_samples, histtype='stepfilled', bins=30, alpha=0.85,
	label="posterior of $\lambda_2$ ", color="#7A68A6", density=True)
	plt.legend(loc="upper left")
	plt.xlim([15, 30])
	plt.xlabel(" $\lambda_2$ value")

	plt.subplot(313)
	w = 1.0 / tau_samples.shape[0] * np.ones_like(tau_samples)
	plt.hist(tau_samples, bins=n_count_data, alpha=1,
	label=r"posterior of $\tau$ ",
	color="#467821", weights=w, rwidth=2.)
	plt.xticks(np.arange(n_count_data))

	plt.legend(loc="upper left")
	plt.ylim([0, .75])
	plt.xlim([35, len(count_data)-20])
	plt.xlabel(r" $\tau$ (in days)")
	plt.ylabel("probability");

	figsize(12.5, 5)
	# tau_samples, lambda_1_samples, lambda_2_samples contain
	# N samples from the corresponding posterior distribution
	N = tau_samples.shape[0]
	expected_texts_per_day = np.zeros(n_count_data)
	for day in range(0, n_count_data):
	# ix is a bool index of all tau samples corresponding to
	# the switchpoint occurring prior to value of 'day'
	ix = day < tau_samples
	# Each posterior sample corresponds to a value for tau.
	# for each day, that value of tau indicates whether we're "before"
	# (in the lambda1 "regime") or
	# "after" (in the lambda2 "regime") the switchpoint.
	# by taking the posterior sample of lambda1/2 accordingly, we can average
	# over all samples to get an expected value for lambda on that day.
	# As explained, the "message count" random variable is Poisson distributed,
	# and therefore lambda (the poisson parameter) is the expected value of
	# "message count".
	expected_texts_per_day[day] = (lambda_1_samples[ix].sum()
	+ lambda_2_samples[~ix].sum()) / N

	plt.plot(range(n_count_data), expected_texts_per_day, lw=4, color="#E24A33",
	label="expected number of text-messages received")
	plt.xlim(0, n_count_data)
	plt.xlabel("Day")
	plt.ylabel("Expected # text-messages")
	plt.title("Expected number of text-messages received")
	plt.ylim(0, 60)
	plt.bar(np.arange(len(count_data)), count_data, color="#348ABD", alpha=0.65,
	label="observed texts per day")

	plt.legend(loc="upper left");

view raw bayesian_inferring.py hosted with ❤ by GitHub

결론
부록
연습문제
참고자료

2. PyMC 더 알아보기Permalink

서론
- Stochastic 변수 : 값이 정해지지 않은 변수
- Deterministic 변수 : 변수의 부모를 모두 알고있는 경우에 랜덤하지 않은 변수
  - @pymc.deterministic 파이썬 래퍼 (데코레이터)를 써서 구분

모델링 방법

관측된 빈도와 실제 빈도간에는 차이가 발생
베르누이 분포
$X$ ~ $Berp(p)$ : X는 p의 확률로 1, 1-p의 확률로 0
이항분포
$X$ ~ $Bin(N, p)$
$P(X=k)=\binom {n} {k} p^k (1-p)^{N-k}$
기댓값 : $Np$

데이터 Import 예시

	# Data Import Option
	test_data = np.getfromtxt("data.csv", skip_header=1, usecols=[1, 2], missing_values="NA", delimiter=",")

	# Remove NA rows
	test_data = test_data[~np.isnan(test_data[:,1])]

view raw data_import.py hosted with ❤ by GitHub

Logistic Function

	from IPython.core.pylabtools import figsize
	from matplotlib import pyplot as plt
	import numpy as np

	figsize(12, 3)

	def logistic(x, beta, alpha=0):
	return 1.0 / (1.0 + np.exp(np.dot(beta, x) + alpha))

	x = np.linspace(-4, 4, 100)
	plt.plot(x, logistic(x, 1), label=r" $\beta = 1$ ", ls="--", lw=1)
	plt.plot(x, logistic(x, 3), label=r" $\beta = 3$ ", ls="--", lw=1)
	plt.plot(x, logistic(x, -5), label=r" $\beta = -5$ ", ls="--", lw=1)

	plt.plot(x, logistic(x, 1, 1), label=r" $\beta = 1, \alpha = 1$ ", color="#348ABD")
	plt.plot(x, logistic(x, 3, -2), label=r" $\beta = 3, \alpha = -2$ ", color="#A60628")
	plt.plot(x, logistic(x, -5, 7), label=r" $\beta = -5, \alpha = 7$ ", color="#7A68A6")

	plt.xlabel(" $x$ ")
	plt.ylabel("Logistic function")
	plt.title("Logistic functions based on $\\alpha$ , $\\beta$ ", fontsize=14)
	plt.legend(loc="lower left");

view raw logistic_function.py hosted with ❤ by GitHub

정규분포
정규확률변수 : $X \sim N(\mu, 1/\tau)$
확률밀도함수 : $f(x|\mu,\tau)=\sqrt{\frac {\tau}{2\pi}}exp(-\frac{\tau}{2}(x-\mu)^2)$
정규분포의 기댓값 : $E[X|\mu, \tau]=\mu$
정규분포의 분산 : $Var(X|\mu, \tau)=\frac{1}{\tau}$

우리의 모델이 적절한가?
결론
부록
연습문제
참고자료

3. MCMC 블랙박스 열기Permalink

베이지안 지형

수렴 판정하기

	figsize(12.5, 4)

	import pymc3 as pm
	x_t = np.random.normal(0, 1, 200)
	x_t[0] = 0
	y_t = np.zeros(200)
	for i in range(1, 200):
	y_t[i] = np.random.normal(y_t[i - 1], 1)

	plt.plot(y_t, label=" $y_t$ ", lw=3)
	plt.plot(x_t, label=" $x_t$ ", lw=3)
	plt.xlabel("time, $t$ ")
	plt.legend();

	def autocorr(x):
	# from http://tinyurl.com/afz57c4
	result = np.correlate(x, x, mode='full')
	result = result / np.max(result)
	return result[result.size // 2:]

	colors = ["#348ABD", "#A60628", "#7A68A6"]

	x = np.arange(1, 200)
	plt.bar(x, autocorr(y_t)[1:], width=1, label=" $y_t$ ",
	edgecolor=colors[0], color=colors[0])
	plt.bar(x, autocorr(x_t)[1:], width=1, label=" $x_t$ ",
	color=colors[1], edgecolor=colors[1])

	plt.legend(title="Autocorrelation")
	plt.ylabel("measured correlation \nbetween $y_t$ and $y_{t-k}$ .")
	plt.xlabel("k (lag)")
	plt.title("Autocorrelation plot of $y_t$ and $x_t$ for differing $k$ lags.");

	max_x = 200 // 3 + 1
	x = np.arange(1, max_x)

	plt.bar(x, autocorr(y_t)[1:max_x], edgecolor=colors[0],
	label="no thinning", color=colors[0], width=1)
	plt.bar(x, autocorr(y_t[::2])[1:max_x], edgecolor=colors[1],
	label="keeping every 2nd sample", color=colors[1], width=1)
	plt.bar(x, autocorr(y_t[::3])[1:max_x], width=1, edgecolor=colors[2],
	label="keeping every 3rd sample", color=colors[2])

	plt.autoscale(tight=True)
	plt.legend(title="Autocorrelation plot for $y_t$ ", loc="lower left")
	plt.ylabel("measured correlation \nbetween $y_t$ and $y_{t-k}$ .")
	plt.xlabel("k (lag)")
	plt.title("Autocorrelation of $y_t$ (no thinning vs. thinning) \
	at differing $k$ lags.");

	pm.plots.traceplot(trace=trace, varnames=["centers"])
	pm.plots.plot_posterior(trace=trace["centers"][:,0])
	pm.plots.plot_posterior(trace=trace["centers"][:,1])
	pm.plots.autocorrplot(trace=trace, varnames=["centers"]);

view raw auto_correlation.py hosted with ❤ by GitHub

MCMC에 대한 유용한 팁
결론
참고자료

4. 아무도 알려주지 않는 위대한 이론Permalink

서론

큰 수의 법칙

$\frac{1}{N} \sum_{i=1}^N Z_i \rightarrow E[Z], \;\;\; N \rightarrow \infty$

직관

$\frac{1}{N} \sum_{i=1}^N Z_i = \frac{1}{N} \big( \sum_{Z_i = c_1}c_1 + \sum_{Z_i = c_2}c_2 \big)$

$= c_1 \sum_{Z_i = c_1} \frac{1}{N} + c_2 \sum_{Z_i = c_2} \frac{1}{N}$

$= c_1 \times \text{ (approximate frequency of $c_1$) } + c_2 \times \text{ (approximate frequency of $c_2$) }$

$\approx c_1 \times P(Z = c_1) + c_2 \times P(Z = c_2) = E[Z]$

$D(N) = \sqrt{ E \left[ \left( \frac{1}{N} \sum_{i=1}^N Z_i - 4.5 \right)^2 \right] }$

$Y_k = \left( \frac{1}{N} \sum_{i=1}^N Z_i - 4.5 \right)^2$

$\frac{1}{N_Y} \sum_{k=1}^{N_Y} Y_k \rightarrow E[Y_k] = E \left[ \left( \frac{1}{N} \sum_{i=1}^N Z_i - 4.5 \right)^2 \right]$

$\sqrt{\frac{1}{N_Y} \sum_{k=1}^{N_Y} Y_k} \approx D(N)$

$\frac{1}{N} \sum_{i=1}^N (Z_i - \mu)^2 \rightarrow E[(Z - \mu)^2] = Var(Z)$

	%matplotlib inline
	import numpy as np
	from IPython.core.pylabtools import figsize
	import matplotlib.pyplot as plt

	figsize( 12.5, 5 )

	sample_size = 100000
	expected_value = lambda_ = 4.5
	poi = np.random.poisson
	N_samples = range(1,sample_size,100)

	for k in range(3):
	samples = poi( lambda_, sample_size )
	partial_average = [ samples[:i].mean() for i in N_samples ]
	plt.plot( N_samples, partial_average, lw=1.5, label="average of $n$ samples; seq. %d"%k)

	plt.plot( N_samples, expected_value*np.ones_like( partial_average), ls = "--", label = "true expected value", c = "k")

	plt.ylim( 4.35, 4.65)
	plt.title( "Convergence of the average of \n random variables to its expected value")
	plt.ylabel( "average of $n$ samples")
	plt.xlabel( "# of samples, $n$ ")
	plt.legend();

	figsize( 12.5, 4)

	N_Y = 250 #use this many to approximate D(N)
	N_array = np.arange( 1000, 50000, 2500 ) #use this many samples in the approx. to the variance.
	D_N_results = np.zeros( len( N_array ) )

	lambda_ = 4.5
	expected_value = lambda_ #for X ~ Poi(lambda) , E[ X ] = lambda

	def D_N( n ):
	"""
	This function approx. D_n, the average variance of using n samples.
	"""
	Z = poi( lambda_, (n, N_Y) )
	average_Z = Z.mean(axis=0)
	return np.sqrt( ( (average_Z - expected_value)**2 ).mean() )

	for i,n in enumerate(N_array):
	D_N_results[i] = D_N(n)

	plt.xlabel( " $N$ " )
	plt.ylabel( "expected squared-distance from true value" )
	plt.plot(N_array, D_N_results, lw = 3, label="expected distance between\n\ expected value and \naverage of $N$ random variables.")
	plt.plot( N_array, np.sqrt(expected_value)/np.sqrt(N_array), lw = 2, ls = "--", label = r" $\frac{\sqrt{\lambda}}{\sqrt{N}}$ ")
	plt.legend()
	plt.title( "How 'fast' is the sample average converging?");

view raw greatest_theorem.py hosted with ❤ by GitHub

작은 수의 혼란

	#adding a number to the end of the %run call will get the ith top post.
	%run top_showerthoughts_submissions.py 2

	print("Post contents: \n")
	print(top_post)

	"""
	contents: an array of the text from the last 100 top submissions to a subreddit
	votes: a 2d numpy array of upvotes, downvotes for each submission.
	"""
	n_submissions = len(votes)
	submissions = np.random.randint( n_submissions, size=4)
	print("Some Submissions (out of %d total) \n-----------"%n_submissions)
	for i in submissions:
	print('"' + contents[i] + '"')
	print("upvotes/downvotes: ",votes[i,:], "\n")

	import pymc3 as pm

	def posterior_upvote_ratio( upvotes, downvotes, samples = 20000):
	"""
	This function accepts the number of upvotes and downvotes a particular submission recieved,
	and the number of posterior samples to return to the user. Assumes a uniform prior.
	"""
	N = upvotes + downvotes
	with pm.Model() as model:
	upvote_ratio = pm.Uniform("upvote_ratio", 0, 1)
	observations = pm.Binomial( "obs", N, upvote_ratio, observed=upvotes)

	trace = pm.sample(samples, step=pm.Metropolis())

	burned_trace = trace[int(samples/4):]
	return burned_trace["upvote_ratio"]

	figsize( 11., 8)
	posteriors = []
	colours = ["#348ABD", "#A60628", "#7A68A6", "#467821", "#CF4457"]
	for i in range(len(submissions)):
	j = submissions[i]
	posteriors.append( posterior_upvote_ratio( votes[j, 0], votes[j,1] ) )
	plt.hist( posteriors[i], bins = 10, normed = True, alpha = .9,
	histtype="step",color = colours[i%5], lw = 3,
	label = '(%d up:%d down)\n%s...'%(votes[j, 0], votes[j,1], contents[j][:50]) )
	plt.hist( posteriors[i], bins = 10, normed = True, alpha = .2,
	histtype="stepfilled",color = colours[i], lw = 3, )

	plt.legend(loc="upper left")
	plt.xlim( 0, 1)
	plt.title("Posterior distributions of upvote ratios on different submissions");

	N = posteriors[0].shape[0]
	lower_limits = []

	for i in range(len(submissions)):
	j = submissions[i]
	plt.hist( posteriors[i], bins = 20, normed = True, alpha = .9,
	histtype="step",color = colours[i], lw = 3,
	label = '(%d up:%d down)\n%s...'%(votes[j, 0], votes[j,1], contents[j][:50]) )
	plt.hist( posteriors[i], bins = 20, normed = True, alpha = .2,
	histtype="stepfilled",color = colours[i], lw = 3, )
	v = np.sort( posteriors[i] )[ int(0.05*N) ]
	#plt.vlines( v, 0, 15 , color = "k", alpha = 1, linewidths=3 )
	plt.vlines( v, 0, 10 , color = colours[i], linestyles = "--", linewidths=3 )
	lower_limits.append(v)
	plt.legend(loc="upper left")

	plt.legend(loc="upper left")
	plt.title("Posterior distributions of upvote ratios on different submissions");
	order = np.argsort( -np.array( lower_limits ) )
	print(order, lower_limits)

	def intervals(u,d):
	a = 1. + u
	b = 1. + d
	mu = a/(a+b)
	std_err = 1.65np.sqrt( (ab)/( (a+b)*2(a+b+1.) ) )
	return ( mu, std_err )

	print("Approximate lower bounds:")
	posterior_mean, std_err = intervals(votes[:,0],votes[:,1])
	lb = posterior_mean - std_err
	print(lb)
	print("\n")
	print("Top 40 Sorted according to approximate lower bounds:")
	print("\n")
	order = np.argsort( -lb )
	ordered_contents = []
	for i in order[:40]:
	ordered_contents.append( contents[i] )
	print(votes[i,0], votes[i,1], contents[i])
	print("-------------")

	r_order = order[::-1][-40:]
	plt.errorbar( posterior_mean[r_order], np.arange( len(r_order) ),
	xerr=std_err[r_order], capsize=0, fmt="o",
	color = "#7A68A6")
	plt.xlim( 0.3, 1)
	plt.yticks( np.arange( len(r_order)-1,-1,-1 ), map( lambda x: x[:30].replace("\n",""), ordered_contents) );

view raw raddit_vote.py hosted with ❤ by GitHub

결론
부록
연습문제
참고자료

5. 오히려 큰 손해를 보시겠습니까?Permalink

서론

손실함수

$L(\theta, \hat{\theta}) = (\theta - \hat{\theta})^2$

$L(\theta, \hat{\theta}) = \cases {(\theta - \hat{\theta})^2 & ${\hat{\theta} \lt \theta}$ \cr c (\theta - \hat{\theta})^2 & ${\hat{\theta} \ge \theta, \;\; 0 \lt c \lt 1}$ }$

$L(\theta, \hat{\theta}) = |\theta - \hat{\theta}|$

$L(\theta, \hat{\theta}) = -\theta \log(\hat{\theta}) - (1 - \theta) \log(1 - \hat{\theta}), \;\; \theta \in {0, 1}, \; \hat{\theta} \in [0, 1]$

$l(\hat{\theta}) = E_{\theta} \left[ L(\theta, \hat{\theta}) \right]$

$\frac{1}{N} \sum_{i=1}^N L(\theta_i, \hat{\theta}) \approx E_{\theta} \left[ L(\theta, \hat{\theta}) \right] = l(\hat{\theta})$

	%matplotlib inline
	import scipy.stats as stats
	from IPython.core.pylabtools import figsize
	import numpy as np
	import matplotlib.pyplot as plt

	figsize(12.5, 9)

	norm_pdf = stats.norm.pdf

	plt.subplot(311)
	x = np.linspace(0, 60000, 200)
	sp1 = plt.fill_between(x , 0, norm_pdf(x, 35000, 7500),
	color = "#348ABD", lw = 3, alpha = 0.6,
	label = "historical total prices")
	p1 = plt.Rectangle((0, 0), 1, 1, fc=sp1.get_facecolor()[0])
	plt.legend([p1], [sp1.get_label()])

	plt.subplot(312)
	x = np.linspace(0, 10000, 200)
	sp2 = plt.fill_between(x , 0, norm_pdf(x, 3000, 500),
	color = "#A60628", lw = 3, alpha = 0.6,
	label="snowblower price guess")

	p2 = plt.Rectangle((0, 0), 1, 1, fc=sp2.get_facecolor()[0])
	plt.legend([p2], [sp2.get_label()])

	plt.subplot(313)
	x = np.linspace(0, 25000, 200)
	sp3 = plt.fill_between(x , 0, norm_pdf(x, 12000, 3000),
	color = "#7A68A6", lw = 3, alpha = 0.6,
	label = "Trip price guess")
	plt.autoscale(tight=True)
	p3 = plt.Rectangle((0, 0), 1, 1, fc=sp3.get_facecolor()[0])
	plt.legend([p3], [sp3.get_label()]);

	import pymc3 as pm

	data_mu = [3e3, 12e3]

	data_std = [5e2, 3e3]

	mu_prior = 35e3
	std_prior = 75e2
	with pm.Model() as model:
	true_price = pm.Normal("true_price", mu=mu_prior, sd=std_prior)

	prize_1 = pm.Normal("first_prize", mu=data_mu[0], sd=data_std[0])
	prize_2 = pm.Normal("second_prize", mu=data_mu[1], sd=data_std[1])
	price_estimate = prize_1 + prize_2

	logp = pm.Normal.dist(mu=price_estimate, sd=(3e3)).logp(true_price)
	error = pm.Potential("error", logp)

	trace = pm.sample(50000, step=pm.Metropolis())
	burned_trace = trace[10000:]

	price_trace = burned_trace["true_price"]

	figsize(12.5, 4)

	import scipy.stats as stats

	x = np.linspace(5000, 40000)
	plt.plot(x, stats.norm.pdf(x, 35000, 7500), c = "k", lw = 2,
	label = "prior dist. of suite price")

	_hist = plt.hist(price_trace, bins = 35, normed= True, histtype= "stepfilled")
	plt.title("Posterior of the true price estimate")
	plt.vlines(mu_prior, 0, 1.1*np.max(_hist[0]), label = "prior's mean",
	linestyles="--")
	plt.vlines(price_trace.mean(), 0, 1.1*np.max(_hist[0]), \
	label = "posterior's mean", linestyles="-.")
	plt.legend(loc = "upper left");

	figsize(12.5, 7)
	#numpy friendly showdown_loss


	def showdown_loss(guess, true_price, risk = 80000):
	loss = np.zeros_like(true_price)
	ix = true_price < guess
	loss[~ix] = np.abs(guess - true_price[~ix])
	close_mask = [abs(true_price - guess) <= 250]
	loss[close_mask] = -2*true_price[close_mask]
	loss[ix] = risk
	return loss

	guesses = np.linspace(5000, 50000, 70)
	risks = np.linspace(30000, 150000, 6)
	expected_loss = lambda guess, risk: \
	showdown_loss(guess, price_trace, risk).mean()

	for _p in risks:
	results = [expected_loss(_g, _p) for _g in guesses]
	plt.plot(guesses, results, label = "%d"%_p)

	plt.title("Expected loss of different guesses, \nvarious risk-levels of \
	overestimating")
	plt.legend(loc="upper left", title="Risk parameter")
	plt.xlabel("price bid")
	plt.ylabel("expected loss")
	plt.xlim(5000, 30000);

	import scipy.optimize as sop

	ax = plt.subplot(111)

	for _p in risks:
	_color = next(ax._get_lines.prop_cycler)
	_min_results = sop.fmin(expected_loss, 15000, args=(_p,),disp = False)
	_results = [expected_loss(_g, _p) for _g in guesses]
	plt.plot(guesses, _results , color = _color['color'])
	plt.scatter(_min_results, 0, s = 60, \
	color= _color['color'], label = "%d"%_p)
	plt.vlines(_min_results, 0, 120000, color = _color['color'], linestyles="--")
	print("minimum at risk %d: %.2f" % (_p, _min_results))

	plt.title("Expected loss & Bayes actions of different guesses, \n \
	various risk-levels of overestimating")
	plt.legend(loc="upper left", scatterpoints = 1, title = "Bayes action at risk:")
	plt.xlabel("price guess")
	plt.ylabel("expected loss")
	plt.xlim(7000, 30000)
	plt.ylim(-1000, 80000);

view raw loss_function.py hosted with ❤ by GitHub

베이지안 방법을 통한 기계학습

$R_i(x) = \alpha_i + \beta_ix + \epsilon$
where

$\epsilon \sim \text{Normal}(0, \sigma_i)$ and

$i$ indexes our posterior samples.

$\arg \min_{r} E_{R(x)} \left[ L(R(x), r) \right]$

예제: 금융예측

	figsize(12.5, 4)

	def stock_loss(true_return, yhat, alpha = 100.):
	if true_return * yhat < 0:
	#opposite signs, not good
	return alphayhat2 - np.sign(true_return)yhat \
	+ abs(true_return)
	else:
	return abs(true_return - yhat)

	true_value = .05
	pred = np.linspace(-.04, .12, 75)

	plt.plot(pred, [stock_loss(true_value, _p) for _p in pred], \
	label = "Loss associated with\n prediction if true value = 0.05", lw =3)
	plt.vlines(0, 0, .25, linestyles="--")

	plt.xlabel("prediction")
	plt.ylabel("loss")
	plt.xlim(-0.04, .12)
	plt.ylim(0, 0.25)

	true_value = -.02
	plt.plot(pred, [stock_loss(true_value, _p) for _p in pred], alpha = 0.6, \
	label = "Loss associated with\n prediction if true value = -0.02", lw =3)
	plt.legend()
	plt.title("Stock returns loss if true value = 0.05, -0.02");

	## Code to create artificial data
	N = 100
	X = 0.025*np.random.randn(N)
	Y = 0.5X + 0.01np.random.randn(N)

	ls_coef_ = np.cov(X, Y)[0,1]/np.var(X)
	ls_intercept = Y.mean() - ls_coef_*X.mean()

	plt.scatter(X, Y, c="k")
	plt.xlabel("trading signal")
	plt.ylabel("returns")
	plt.title("Empirical returns vs trading signal")
	plt.plot(X, ls_coef_*X + ls_intercept, label = "Least-squares line")
	plt.xlim(X.min(), X.max())
	plt.ylim(Y.min(), Y.max())
	plt.legend(loc="upper left");

	import pymc3 as pm

	with pm.Model() as model:
	std = pm.Uniform("std", 0, 100)

	beta = pm.Normal("beta", mu=0, sd=100)
	alpha = pm.Normal("alpha", mu=0, sd=100)

	mean = pm.Deterministic("mean", alpha + beta*X)

	obs = pm.Normal("obs", mu=mean, sd=std, observed=Y)

	trace = pm.sample(100000, step=pm.Metropolis())
	burned_trace = trace[20000:]

	pm.plots.traceplot(trace=burned_trace, varnames=["std", "beta", "alpha"])
	pm.plot_posterior(trace=burned_trace, varnames=["std", "beta", "alpha"], kde_plot=True);

	figsize(12.5, 6)
	from scipy.optimize import fmin

	def stock_loss(price, pred, coef = 500):
	"""vectorized for numpy"""
	sol = np.zeros_like(price)
	ix = price*pred < 0
	sol[ix] = coefpred2 - np.sign(price[ix])pred + abs(price[ix])
	sol[~ix] = abs(price[~ix] - pred)
	return sol

	std_samples = burned_trace["std"]
	alpha_samples = burned_trace["alpha"]
	beta_samples = burned_trace["beta"]

	N = std_samples.shape[0]

	noise = std_samples*np.random.randn(N)

	possible_outcomes = lambda signal: alpha_samples + beta_samples*signal + noise

	opt_predictions = np.zeros(50)
	trading_signals = np.linspace(X.min(), X.max(), 50)
	for i, _signal in enumerate(trading_signals):
	_possible_outcomes = possible_outcomes(_signal)
	tomin = lambda pred: stock_loss(_possible_outcomes, pred).mean()
	opt_predictions[i] = fmin(tomin, 0, disp = False)

	plt.xlabel("trading signal")
	plt.ylabel("prediction")
	plt.title("Least-squares prediction vs. Bayes action prediction")
	plt.plot(X, ls_coef_*X + ls_intercept, label ="Least-squares prediction")
	plt.xlim(X.min(), X.max())
	plt.plot(trading_signals, opt_predictions, label ="Bayes action prediction")
	plt.legend(loc="upper left");

view raw financial_prediction.py hosted with ❤ by GitHub

결론
참고자료

6. 우선순위 바로잡기Permalink

서론
주관적인 사전확률분포 vs. 객관적인 사전확률분포
$\mu_p = \frac{1}{N} \sum_{i=0}^N X_i$
알아두면 유용한 사전확률분포
- 감마분포
  $\text{Exp}(\beta) \sim \text{Gamma}(1, \beta)$
  $F(x \mid \alpha, \beta) = \frac{\beta^{\alpha}x^{\alpha - 1}e^{-\beta x}}{\Gamma(\alpha)}$
- 베타분포
  $f_X(x | \alpha, \beta) = \frac{x^{(\alpha - 1)}(1 - x)^{(\beta - 1)}}{B(\alpha, \beta)}$
  Observation : $X \sim \text{Binomial}(N, p)$
  Posterior : $\text{Beta}(1 + X, 1 + N - X)$

예제: 베이지안 MAB (Multi-Armed-Bandits)

	rand = np.random.rand

	class Bandits(object):
	"""
	This class represents N bandits machines.

	parameters:
	p_array: a (n,) Numpy array of probabilities >0, <1.

	methods:
	pull( i ): return the results, 0 or 1, of pulling
	the ith bandit.
	"""
	def __init__(self, p_array):
	self.p = p_array
	self.optimal = np.argmax(p_array)

	def pull(self, i):
	#i is which arm to pull
	return np.random.rand() < self.p[i]

	def __len__(self):
	return len(self.p)


	class BayesianStrategy(object):
	"""
	Implements a online, learning strategy to solve
	the Multi-Armed Bandit problem.

	parameters:
	bandits: a Bandit class with .pull method

	methods:
	sample_bandits(n): sample and train on n pulls.

	attributes:
	N: the cumulative number of samples
	choices: the historical choices as a (N,) array
	bb_score: the historical score as a (N,) array
	"""

	def __init__(self, bandits):
	self.bandits = bandits
	n_bandits = len(self.bandits)
	self.wins = np.zeros(n_bandits)
	self.trials = np.zeros(n_bandits)
	self.N = 0
	self.choices = []
	self.bb_score = []

	def sample_bandits(self, n=1):
	bb_score = np.zeros(n)
	choices = np.zeros(n)

	for k in range(n):
	#sample from the bandits's priors, and select the largest sample
	choice = np.argmax(np.random.beta(1 + self.wins, 1 + self.trials - self.wins))

	#sample the chosen bandit
	result = self.bandits.pull(choice)

	#update priors and score
	self.wins[choice] += result
	self.trials[choice] += 1
	bb_score[k] = result
	self.N += 1
	choices[k] = choice

	self.bb_score = np.r_[self.bb_score, bb_score]
	self.choices = np.r_[self.choices, choices]
	return

	figsize(11.0, 10)

	beta = stats.beta
	x = np.linspace(0.001,.999,200)

	def plot_priors(bayesian_strategy, prob, lw = 3, alpha = 0.2, plt_vlines = True):
	## plotting function
	wins = bayesian_strategy.wins
	trials = bayesian_strategy.trials
	for i in range(prob.shape[0]):
	y = beta(1+wins[i], 1 + trials[i] - wins[i])
	p = plt.plot(x, y.pdf(x), lw = lw)
	c = p[0].get_markeredgecolor()
	plt.fill_between(x,y.pdf(x),0, color = c, alpha = alpha,
	label="underlying probability: %.2f" % prob[i])
	if plt_vlines:
	plt.vlines(prob[i], 0, y.pdf(prob[i]) ,
	colors = c, linestyles = "--", lw = 2)
	plt.autoscale(tight = "True")
	plt.title("Posteriors After %d pull" % bayesian_strategy.N +\
	"s"*(bayesian_strategy.N > 1))
	plt.autoscale(tight=True)
	return

	# Start Main Code
	hidden_prob = np.array([0.85, 0.60, 0.75])
	bandits = Bandits(hidden_prob)
	bayesian_strat = BayesianStrategy(bandits)

	draw_samples = [1, 1, 3, 10, 10, 25, 50, 100, 200, 600]

	for j,i in enumerate(draw_samples):
	plt.subplot(5, 2, j+1)
	bayesian_strat.sample_bandits(i)
	plot_priors(bayesian_strat, hidden_prob)
	#plt.legend()
	plt.autoscale(tight = True)
	plt.tight_layout()

view raw bayesian_bandits.py hosted with ❤ by GitHub

해당 분야 전문가로부터 사전확률분포 유도하기
$w_{opt} = \max_{w} \frac{1}{N} \left( \sum_{i=0}^N \mu_i^T w - \frac{\lambda}{2}w^T\Sigma_i w \right)$
켤레 사전확률분포
제프리 사전확률분포

N이 증가할 때 사전확률분포의 효과

$p(\theta | {\textbf X}) \propto \underbrace{p({\textbf X} |\theta)}_{\textrm{likelihood}} \cdot \overbrace{p(\theta)}^{\textrm{prior}}$

$\log(p(\theta | {\textbf X})) = c + L(\theta; {\textbf X}) +\log(p(\theta))$

서로 다른 사전확률분포에서 시작하더라도 표본 크기가 증가함에 따라 사후확률분포는 수렴함

	figsize(12.5, 15)

	p = 0.6
	beta1_params = np.array([1.,1.])
	beta2_params = np.array([2,10])
	beta = stats.beta

	x = np.linspace(0.00, 1, 125)
	data = stats.bernoulli.rvs(p, size=500)

	plt.figure()
	for i,N in enumerate([0,4,8, 32,64, 128, 500]):
	s = data[:N].sum()
	plt.subplot(8,1,i+1)
	params1 = beta1_params + np.array([s, N-s])
	params2 = beta2_params + np.array([s, N-s])
	y1,y2 = beta.pdf(x, params1), beta.pdf( x, params2)
	plt.plot(x,y1, label = r"flat prior", lw =3)
	plt.plot(x, y2, label = "biased prior", lw= 3)
	plt.fill_between(x, 0, y1, color ="#348ABD", alpha = 0.15)
	plt.fill_between(x, 0, y2, color ="#A60628", alpha = 0.15)
	plt.legend(title = "N=%d" % N)
	plt.vlines(p, 0.0, 7.5, linestyles = "--", linewidth=1)
	#plt.ylim(0, 10)#

view raw prior_effect.py hosted with ❤ by GitHub

결론
부록
참고자료

7. 베이지안 A/B 테스트Permalink

서론

전환율 테스트 개요

	visitors_to_A = 1300
	visitors_to_B = 1275
	conversions_from_A = 120
	conversions_from_B = 125

	from scipy.stats import beta
	alpha_prior = 1
	beta_prior = 1
	posterior_A = beta(alpha_prior + conversions_from_A,
	beta_prior + visitors_to_A - conversions_from_A)
	posterior_B = beta(alpha_prior + conversions_from_B,
	beta_prior + visitors_to_B - conversions_from_B)

	samples = 20000 # 더 나은 근사를 위해서는 이 값이 더 커야 한다 # We want this to be large to get a better approximation.
	samples_posterior_A = posterior_A.rvs(samples)
	samples_posterior_B = posterior_B.rvs(samples)
	print ((samples_posterior_A > samples_posterior_B).mean())

	%matplotlib inline
	from IPython.core.pylabtools import figsize
	from matplotlib import pyplot as plt
	import matplotlib
	import numpy as np

	matplotlib.rc('font', family='Malgun Gothic') # 그림 한글 폰트 지정, 맑은 고딕
	figsize(12.5, 4)
	plt.rcParams['savefig.dpi'] = 300
	plt.rcParams['figure.dpi'] = 300
	x = np.linspace(0,1, 500)
	plt.plot(x, posterior_A.pdf(x), label='A의 사후확률분포')
	plt.plot(x, posterior_B.pdf(x), label='B의 사후확률분포')
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("웹페이지 $A$ 와 $B$ 의 전환율에 대한 사후확률분포")
	plt.legend();

	plt.plot(x, posterior_A.pdf(x), label='A의 사후확률분포')
	plt.plot(x, posterior_B.pdf(x), label='B의 사후확률분포')
	plt.xlim(0.05, 0.15)
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("웹페이지 $A$ 와 $B$ 의 전환율에 대한 사후확률분포(확대)")
	plt.legend();

view raw split_test_introduction.py hosted with ❤ by GitHub

선형손실함수 추가하기

	from numpy.random import dirichlet
	N = 1000
	N_79 = 10
	N_49 = 46
	N_25 = 80
	N_0 = N - (N_79 + N_49 + N_49)
	observations = np.array([N_79, N_49, N_25, N_0])
	prior_parameters = np.array([1,1,1,1])
	posterior_samples = dirichlet(prior_parameters + observations, size=10000)
	print ("Two random samples from the posterior:")
	print (posterior_samples[0])
	print (posterior_samples[1])

	for i, label in enumerate(['p_79', 'p_49', 'p_25', 'p_0']):
	ax = plt.hist(posterior_samples[:,i], bins=50, label=label, histtype='stepfilled')
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("여러 가격 선택의 사후확률분포")
	plt.legend();

	def expected_revenue(P):
	return 79P[:,0] + 49P[:,1] + 25P[:,2] + 0P[:,3]

	posterior_expected_revenue = expected_revenue(posterior_samples)

	plt.hist(posterior_expected_revenue, histtype='stepfilled',label='기대수익', bins=50)
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("기대수익에 대한 사후확률분포")
	plt.legend();

	N_A = 1000
	N_A_79 = 10
	N_A_49 = 46
	N_A_25 = 80
	N_A_0 = N_A - (N_A_79 + N_A_49 + N_A_49)
	observations_A = np.array([N_A_79, N_A_49, N_A_25, N_A_0])

	N_B = 2000
	N_B_79 = 45
	N_B_49 = 84
	N_B_25 = 200
	N_B_0 = N_B - (N_B_79 + N_B_49 + N_B_49)
	observations_B = np.array([N_B_79, N_B_49, N_B_25, N_B_0])

	prior_parameters = np.array([1,1,1,1])

	posterior_samples_A = dirichlet(prior_parameters + observations_A, size=10000)
	posterior_samples_B = dirichlet(prior_parameters + observations_B, size=10000)

	posterior_expected_revenue_A = expected_revenue(posterior_samples_A)
	posterior_expected_revenue_B = expected_revenue(posterior_samples_B)

	plt.hist(posterior_expected_revenue_A, histtype='stepfilled', label='A의 기대수익', bins=50)
	plt.hist(posterior_expected_revenue_B, histtype='stepfilled', label='B의 기대수익', bins=50, alpha=0.8)
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title(" $A$ 와 $B$ 간의 기대수익의 사후확률분포")
	plt.legend();

	p = (posterior_expected_revenue_B > posterior_expected_revenue_A).mean()
	print ("Probability that page B has a higher revenue than page A: %.3f"%p)

	posterior_diff = posterior_expected_revenue_B - posterior_expected_revenue_A
	plt.hist(posterior_diff, histtype='stepfilled', color='#7A68A6', label='B 와 A의 수익 차이', bins=50)
	plt.vlines(0, 0, 700, linestyles='--')
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title(" $A$ 와 $B$ 의 기대수익간 델타의 사후확률분포")
	plt.legend();

view raw revenue_analysis.py hosted with ❤ by GitHub

전환율을 넘어서: t-검정

증분 추정하기

	figsize(12,4)
	visitors_to_A = 1275
	visitors_to_B = 1300
	conversions_from_A = 22
	conversions_from_B = 12
	alpha_prior = 1
	beta_prior = 1

	posterior_A = beta(alpha_prior + conversions_from_A,
	beta_prior + visitors_to_A - conversions_from_A)
	posterior_B = beta(alpha_prior + conversions_from_B,
	beta_prior + visitors_to_B - conversions_from_B)

	samples = 20000
	samples_posterior_A = posterior_A.rvs(samples)
	samples_posterior_B = posterior_B.rvs(samples)

	_hist(samples_posterior_A, 'A')
	_hist(samples_posterior_B, 'B')

	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("웹페이지 $A$ 와 $B$ 의 전환율에 대한 사후확률분포")
	plt.legend();

	def relative_increase(a,b):
	return (a-b)/b
	posterior_rel_increase = relative_increase(samples_posterior_A,
	samples_posterior_B)
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("웹페이지 B의 전환율에 대한 웹페이지 A의 전환율 상대적 증가에 대한 사후확률분포")
	_hist(posterior_rel_increase, 'relative increase', color='#7A68A6');

	print ((posterior_rel_increase > 0.2).mean())
	print ((posterior_rel_increase > 0.5).mean())

	mean = posterior_rel_increase.mean()
	median = np.percentile(posterior_rel_increase, 50)
	conservative_percentile = np.percentile(posterior_rel_increase, 30)
	_hist(posterior_rel_increase,'', color='#7A68A6');
	plt.vlines(mean, 0, 2500, linestyles='-.', label='평균')
	plt.vlines(median, 0, 2500, linestyles=':', label='중앙값', lw=3)
	plt.vlines(conservative_percentile, 0, 2500, linestyles='--',
	label='30% 퍼센타일')
	plt.xlabel('값')
	plt.ylabel('밀도')
	#plt.title("상대적 증가에 대한 사후확률분포의 여러 요약통계량")
	plt.legend();

view raw inferring_increment.py hosted with ❤ by GitHub

결론
참고자료

부록 APermalink

파이썬, PyMC
주피터 노트북
Reddit 실습하기

참고자료Permalink

Twitter Facebook LinkedIn

Sang Hun Kim

Bayesian Methods for Hackers

1. 베이지안 추론의 철학Permalink

2. PyMC 더 알아보기Permalink

3. MCMC 블랙박스 열기Permalink

4. 아무도 알려주지 않는 위대한 이론Permalink

5. 오히려 큰 손해를 보시겠습니까?Permalink

6. 우선순위 바로잡기Permalink

7. 베이지안 A/B 테스트Permalink

부록 APermalink

참고자료Permalink

Comments

You May Also Enjoy

Study Log (2022.09)

Study Log (2022.09)

Study Log (2022.08)

Study Log (2022.07)