{{Infobox
|title = Lecture 13: Scoring and Optimization
|image = [[File:features.png|200px]]
|label1 = Lecture video:
|data1 = [http://example.com web '''TODO'''] [https://www.youtube.com/watch?v=oxhc0Nv_ySw&index=11&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V Youtube]}}

{{#ev:youtube|https://www.youtube.com/watch?v=oxhc0Nv_ySw&index=11&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V|800|center}}

== Features of MT Models ==

So far we haven't fully described the actual model (most commonly) used in phrase-based and syntactic MT, the '''log-linear model'''. For MT, it can be formulated as follows:

<math>P(e|f) \propto \exp \sum_i w_i f_i(e,f)</math>

Essentially, the ''probability'' (or, less ambitiously, the ''score'') of a translation is a weighted sum of features <math>f_i</math>. Feature functions can look at the translation and the source and they output a number. We introduce the common types of features in the following subsections.

Our goal is then to find such a translation hypothesis that maximizes this score, formally:

<math>e^* = \text{argmax}_e P(e|f)</math>

Typically, feature functions are evaluated on ''partial translations'' during the search. That means that each partial translation has a score associated with it and we gradually add the values of features for each extension of the partial translation.

We describe how to obtain the weights <math>w_i</math> in the last section of this lecture.

=== Phrase Translation Probabilities ===

Phrase translation probabilities are calculated from occurrences of phrase pairs extracted from the parallel training data. Usually, MT systems work with the following two conditional probabilities:

* <math>P(\mathbf{e}|\mathbf{f})</math>
* <math>P(\mathbf{f}|\mathbf{e})</math>

These probabilities are estimated by simply counting how many times (for the first formula) we saw <math>\mathbf{e}</math> aligned to <math>\mathbf{f}</math> and how many times we saw <math>\mathbf{f}</math> in total. For example, based on the following excerpt from (sorted) extracted phrase pairs, we estimate that <math>P(\text{naznačena v programu} | \text{estimated in the programme}) = 3/9</math>.

estimated in the programme ||| naznačena v programu
estimated in the programme ||| naznačena v programu
estimated in the programme ||| naznačena v programu
estimated in the programme ||| odhadován v programu
estimated in the programme ||| odhadovány v programu
estimated in the programme ||| odhadovány v programu
estimated in the programme ||| předpokládal program
estimated in the programme ||| v programu uvedeným
estimated in the programme ||| v programu uvedeným

=== Lexical Weights ===

Lexical weights are a method for smoothing the phrase table. Infrequent phrases have unreliable
probability estimates; for instance many long phrases occur together only once
in the corpus, resulting in <math>P(\mathbf{e}|\mathbf{f}) = P(\mathbf{f}|\mathbf{e})
= 1</math>. Several methods exist for computing lexical weights. The most common one
is based on word alignment inside the phrase. The
probability of each ''foreign'' word <math>f_j</math> is estimated as the average of
lexical translation probabilities <math>w(f_j, e_i)</math> over the English words aligned
to it. Thus for the phrase <math>(\mathbf{e},\mathbf{f})</math> with the set of alignment
points <math>a</math>, the lexical weight is:

<math>
\text{lex}(\mathbf{f}|\mathbf{e},a) = \prod_{j=1}^{l_f}
\frac{1}{|{i|(i,j) \in a}|} \sum_{\forall(i,j) \in a}w(f_j, e_i)
</math>

=== Language Model ===

The task of language modeling in machine translation is to estimate how likely a
sequence of words <math>\mathbf{w} = (w_1, \ldots, w_l)</math> is in the target language.

When translating, the decoder generates translation hypotheses which are
probable according to the translation model (i.e. the phrase table). The
language model then scores these hypotheses according to how probable (common,
fluent) they are in the target language. The final translation is then something like a compromise -- the
sentence that is both fluent and a good translation of the input.

Similarly to the translation model, sequence probabilities are learned from data
using maximum likelihood estimation. For language modeling, only monolingual
data are needed (a resource available in much larger amounts than parallel texts).

Naturally, the prediction of the whole sequence <math>\mathbf{e}</math> has to be
decomposed, so that it can be reliably estimated. The most common approach are
''n-gram'' language models which build upon the Markov assumption: a word
depends only on a limited, fixed number of preceding words. The decomposition is
done as follows:

<math>
\begin{align}
P(\mathbf{w}) & = P(w_1)P(w_2|w_1)P(w_3|w_1,w_2) \ldots P(w_l|w_1,\ldots,w_{l-1}) \\
& \approx P(w_1)P(w_2|w_1) \ldots P(w_l|w_{l-n}, \ldots, w_{l-1})
\end{align}
</math>

The first equality follows from the chain rule and the second from ''n''-th order
Markov assumption. Each word is then modeled by at most ''n'' preceding words and
the probability of the whole sequence is the product of probabilities of
individual words. Smoothing is further used to supply probability estimates to unseen n-grams.

A great introduction to language modeling is the [http://videolectures.net/hltss2010_eisner_plm/ video lecture] by Jason Eisner. LMs are covered in more depth in the Stanford NLP lectures on [https://www.coursera.org/course/nlp Coursera]; videos from the Coursera course can be found on [https://www.youtube.com/playlist?list=PLaRKlIqjjguC-20Glu7XVAXm6Bd6Gs7Qi YouTube].

=== Word and Phrase Penalty ===

For each word and for each phrase produced, the decoder pays a constant cost. Tweaking the word penalty can lead to either very short or very long output sentences (the "penalty" can also be negative -- a reward). Changes to the phrase penalty can lead to outputs consisting of word-by-word translations (small or negative phrase penalty -- use as many phrases as possible) or on the other hand, to outputs consisting of very long phrases (as is usually desirable).

=== Distortion Penalty ===

The distortion penalty is the cost which the MT system pays for shuffling words (or phrases) around. There are many definitions possible, the following is commonly used: for each phrase, its value is
the distance (measured in words) between its beginning and the end of the preceding phrase. This '''distance-based''' reordering can be replaced by more sophisticated models, such as [http://www.statmt.org/moses/?n=Advanced.Models#ntoc1 lexicalized reordering].

== Decoding ==

=== Phrase-Based Search ===

We have [[Phrase-based Model#Decoding|already described]] the decoding algorithm for phrase-based MT. Here we discuss how feature values are calculated in the search.

Some of the feature functions that we have described are '''local''', i.e. their value only depends on the current phrase pair. For example, lexical weights, phrase translation probabilities or word penalty are local (word penalty is simply the count of words in the target phrase). As we build the translation, we simply add the scores of these local feature functions to the current translation score.

The most prominent example of a '''non-local''' feature is the language model. If we have a 4-gram LM, for example, we cannot score our new target phrase <math>\mathbf{e} = (e_1,\ldots,e_K)</math> without knowing the three words that precede it in our translation. The reason is that we need to compute the probability of the first word in that phrase (<math>e_1</math>) ''given'' the previous context.

Phrase-based search uses '''hypothesis recombination''' to reduce the number of possible translations. The basic idea is that when we have two partial hypotheses with an identical coverage vector (they have translated identical portions of the source sentence), we can discard the lower-scoring hypothesis '''if''' no future feature function can distinguish between them. Local features do not look outside the current phrase pair so we only need to worry about non-local features: e.g. a 4-gram LM which will consider the partial hypotheses identical only if their last three words do not differ.

This is where the notion of locality comes into play: it complicates recombination during search because partial translations need to maintain a '''state''' -- information for the non-local features (e.g. last three words for the LM). We can then only safely recombine hypotheses which have an identical coverage vector and state.

=== Decoding in SCFG ===

With syntactic MT, the situation is more complicated because hypotheses are not constructed left-to-right. That means that while there was only a single boundary between the current partial translation and its extension, SCFG rules can apply anywhere and we may need to look at words both preceding and following the target-side of the rule. This makes state tracking more complicated than in PBMT.

== Optimization of Feature Weights ==

We now focus on how to find a good set of weights <math>w_i</math> for the features. There are many methods for tuning model parameters in MT, such as MERT (Minimum Error Rate Training, described here), PRO (Pairwise Ranked Optimization), or MIRA (Margin Infused Relaxed Algorithm, a general online optimization algorithm applied successfully to MT).

TODO references to papers!

All of them require a tuning set (development set, held-out set) -- a small parallel corpus separated from the training data on which the performance of the proposed weights is evaluated. Choosing a suitable tuning set is black magic (as are many decisions in MT system development). As a general guideline, it should be as similar to the expected test data as possible and the larger, the better (too large tuning sets can take too long to tune on, though).

Minimum Error Rate Training (MERT) and has become a de-facto standard algorithm for tuning. The tuning process is
iterative:

# Set all weights to some initial values.
# Translate the tuning set using the current weights; for each sentence, output ''n'' best translations and their feature scores.
# Run one iteration of MERT to get a new set of weights.
# If the n-best lists are identical to the previous iteration, return the current weights and exit. Else go back to 2.

The input for MERT is a set of '''n-best lists''' -- the ''n'' best translations
for each sentence in the tuning set. A vector of feature scores is associated
with each sentence.

First, each translation is scored by the objective function (such as BLEU). In
each n-best list, the sentence with the best score is assumed to be the best
translation. The goal of MERT then is to find a set of weights that will
maximize the overall score, i.e. move good translations to the top of the n-best
lists.

MERT addresses the dimensionality of the weight space (the space is effectively
<math>\mathbb{R}^n</math> for ''n'' weights) by optimizing each weight separately.

While the line search is globally optimal (in the one dimension), overall, the
procedure is likely to reach a local optimum. MERT is therefore usually run from
a number of different starting positions and the best set of weights is used.

After convergence (or reaching a pre-set maximum number of iterations), the
weights for log-linear model are known and the system training is finished.

Note that there have even been shared tasks in model optimization. One, by invitation only, in [http://www.statmt.org/wmt11/tunable-metrics-task.html 2011] and one in 2015: [http://www.statmt.org/wmt15/tuning-task/ WMT15 Tuning Task].

== See Also ==

* Bojar, O. 2012. [http://www.cupress.cuni.cz/ink2_ext/index.jsp?include=podrobnosti&id=224545 Čeština a stojový překlad]. Ústav formální a aplikované lingvistiky MFF UK 2012.

Scoring and Optimization

2015-11-10T12:48:57Z

Tamchyna:

{{Infobox
|title = Lecture 13: Scoring and Optimization
|image = [[File:features.png|200px]]
|label1 = Lecture video:
|data1 = [http://example.com web '''TODO'''] [https://www.youtube.com/watch?v=oxhc0Nv_ySw&index=11&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V Youtube]}}

{{#ev:youtube|https://www.youtube.com/watch?v=oxhc0Nv_ySw&index=11&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V|800|center}}

== Features of MT Models ==

So far we haven't fully described the actual model (most commonly) used in phrase-based and syntactic MT, the '''log-linear model'''. For MT, it can be formulated as follows:

<math>P(e|f) \propto \exp \sum_i w_i f_i(e,f)</math>

Essentially, the ''probability'' (or, less ambitiously, the ''score'') of a translation is a weighted sum of features <math>f_i</math>. Feature functions can look at the translation and the source and they output a number. We introduce the common types of features in the following subsections.

Our goal is then to find such a translation hypothesis that maximizes this score, formally:

<math>e^* = \text{argmax}_e P(e|f)</math>

Typically, feature functions are evaluated on ''partial translations'' during the search. That means that each partial translation has a score associated with it and we gradually add the values of features for each extension of the partial translation.

We describe how to obtain the weights <math>w_i</math> in the last section of this lecture.

=== Phrase Translation Probabilities ===

Phrase translation probabilities are calculated from occurrences of phrase pairs extracted from the parallel training data. Usually, MT systems work with the following two conditional probabilities:

* <math>P(\mathbf{e}|\mathbf{f})</math>
* <math>P(\mathbf{f}|\mathbf{e})</math>

These probabilities are estimated by simply counting how many times (for the first formula) we saw <math>\mathbf{e}</math> aligned to <math>\mathbf{f}</math> and how many times we saw <math>\mathbf{f}</math> in total. For example, based on the following excerpt from (sorted) extracted phrase pairs, we estimate that <math>P(\text{naznačena v programu} | \text{estimated in the programme}) = 3/9</math>.

estimated in the programme ||| naznačena v programu
estimated in the programme ||| naznačena v programu
estimated in the programme ||| naznačena v programu
estimated in the programme ||| odhadován v programu
estimated in the programme ||| odhadovány v programu
estimated in the programme ||| odhadovány v programu
estimated in the programme ||| předpokládal program
estimated in the programme ||| v programu uvedeným
estimated in the programme ||| v programu uvedeným

=== Lexical Weights ===

Lexical weights are a method for smoothing the phrase table. Infrequent phrases have unreliable
probability estimates; for instance many long phrases occur together only once
in the corpus, resulting in <math>P(\mathbf{e}|\mathbf{f}) = P(\mathbf{f}|\mathbf{e})
= 1</math>. Several methods exist for computing lexical weights. The most common one
is based on word alignment inside the phrase. The
probability of each ''foreign'' word <math>f_j</math> is estimated as the average of
lexical translation probabilities <math>w(f_j, e_i)</math> over the English words aligned
to it. Thus for the phrase <math>(\mathbf{e},\mathbf{f})</math> with the set of alignment
points <math>a</math>, the lexical weight is:

<math>
\text{lex}(\mathbf{f}|\mathbf{e},a) = \prod_{j=1}^{l_f}
\frac{1}{|{i|(i,j) \in a}|} \sum_{\forall(i,j) \in a}w(f_j, e_i)
</math>

=== Language Model ===

The task of language modeling in machine translation is to estimate how likely a
sequence of words <math>\mathbf{w} = (w_1, \ldots, w_l)</math> is in the target language.

When translating, the decoder generates translation hypotheses which are
probable according to the translation model (i.e. the phrase table). The
language model then scores these hypotheses according to how probable (common,
fluent) they are in the target language. The final translation is then something like a compromise -- the
sentence that is both fluent and a good translation of the input.

Similarly to the translation model, sequence probabilities are learned from data
using maximum likelihood estimation. For language modeling, only monolingual
data are needed (a resource available in much larger amounts than parallel texts).

Naturally, the prediction of the whole sequence <math>\mathbf{e}</math> has to be
decomposed, so that it can be reliably estimated. The most common approach are
''n-gram'' language models which build upon the Markov assumption: a word
depends only on a limited, fixed number of preceding words. The decomposition is
done as follows:

<math>
\begin{align}
P(\mathbf{w}) & = P(w_1)P(w_2|w_1)P(w_3|w_1,w_2) \ldots P(w_l|w_1,\ldots,w_{l-1}) \\
& \approx P(w_1)P(w_2|w_1) \ldots P(w_l|w_{l-n}, \ldots, w_{l-1})
\end{align}
</math>

The first equality follows from the chain rule and the second from ''n''-th order
Markov assumption. Each word is then modeled by at most ''n'' preceding words and
the probability of the whole sequence is the product of probabilities of
individual words. Smoothing is further used to supply probability estimates to unseen n-grams.

A great introduction to language modeling is the [http://videolectures.net/hltss2010_eisner_plm/ video lecture] by Jason Eisner. LMs are covered in more depth in the Stanford NLP lectures on [https://www.coursera.org/course/nlp Coursera]; videos from the Coursera course can be found on [https://www.youtube.com/playlist?list=PLaRKlIqjjguC-20Glu7XVAXm6Bd6Gs7Qi YouTube].

=== Word and Phrase Penalty ===

For each word and for each phrase produced, the decoder pays a constant cost. Tweaking the word penalty can lead to either very short or very long output sentences (the "penalty" can also be negative -- a reward). Changes to the phrase penalty can lead to outputs consisting of word-by-word translations (small or negative phrase penalty -- use as many phrases as possible) or on the other hand, to outputs consisting of very long phrases (as is usually desirable).

=== Distortion Penalty ===

The distortion penalty is the cost which the MT system pays for shuffling words (or phrases) around. There are many definitions possible, the following is commonly used: for each phrase, its value is
the distance (measured in words) between its beginning and the end of the preceding phrase. This '''distance-based''' reordering can be replaced by more sophisticated models, such as [http://www.statmt.org/moses/?n=Advanced.Models#ntoc1 lexicalized reordering].

== Decoding ==

=== Phrase-Based Search ===

We have [[Phrase-based Model#Decoding|already described]] the decoding algorithm for phrase-based MT. Here we discuss how feature values are calculated in the search.

Some of the feature functions that we have described are '''local''', i.e. their value only depends on the current phrase pair. For example, lexical weights, phrase translation probabilities or word penalty are local (word penalty is simply the count of words in the target phrase). As we build the translation, we simply add the scores of these local feature functions to the current translation score.

The most prominent example of a '''non-local''' feature is the language model. If we have a 4-gram LM, for example, we cannot score our new target phrase <math>\mathbf{e} = (e_1,\ldots,e_K)</math> without knowing the three words that precede it in our translation. The reason is that we need to compute the probability of the first word in that phrase (<math>e_1</math>) ''given'' the previous context.

Phrase-based search uses '''hypothesis recombination''' to reduce the number of possible translations. The basic idea is that when we have two partial hypotheses with an identical coverage vector (they have translated identical portions of the source sentence), we can discard the lower-scoring hypothesis '''if''' no future feature function can distinguish between them. Local features do not look outside the current phrase pair so we only need to worry about non-local features: e.g. a 4-gram LM which will consider the partial hypotheses identical only if their last three words do not differ.

This is where the notion of locality comes into play: it complicates recombination during search because partial translations need to maintain a '''state''' -- information for the non-local features (e.g. last three words for the LM). We can then only safely recombine hypotheses which have an identical coverage vector and state.

=== Decoding in SCFG ===

With syntactic MT, the situation is more complicated because hypotheses are not constructed left-to-right. That means that while there was only a single boundary between the current partial translation and its extension, SCFG rules can apply anywhere and we may need to look at words both preceding and following the target-side of the rule. This makes state tracking more complicated than in PBMT.

== Optimization of Feature Weights ==

We now focus on how to find a good set of weights <math>w_i</math> for the features. There are many methods for tuning model parameters in MT, such as MERT (Minimum Error Rate Training, described here), PRO (Pairwise Ranked Optimization), or MIRA (Margin Infused Relaxed Algorithm, a general online optimization algorithm applied successfully to MT).

TODO references to papers!

All of them require a tuning set (development set, held-out set) -- a small parallel corpus separated from the training data on which the performance of the proposed weights is evaluated. Choosing a suitable tuning set is black magic (as are many decisions in MT system development). As a general guideline, it should be as similar to the expected test data as possible and the larger, the better (too large tuning sets can take too long to tune on, though).

Minimum Error Rate Training (MERT) and has become a de-facto standard algorithm for tuning. The tuning process is
iterative:

# Set all weights to some initial values.
# Translate the tuning set using the current weights; for each sentence, output ''n'' best translations and their feature scores.
# Run one iteration of MERT to get a new set of weights.
# If the n-best lists are identical to the previous iteration, return the current weights and exit. Else go back to 2.

The input for MERT is a set of '''n-best lists''' -- the ''n'' best translations
for each sentence in the tuning set. A vector of feature scores is associated
with each sentence.

First, each translation is scored by the objective function (such as BLEU). In
each n-best list, the sentence with the best score is assumed to be the best
translation. The goal of MERT then is to find a set of weights that will
maximize the overall score, i.e. move good translations to the top of the n-best
lists.

MERT addresses the dimensionality of the weight space (the space is effectively
<math>\mathbb{R}^n</math> for ''n'' weights) by optimizing each weight separately.

While the line search is globally optimal (in the one dimension), overall, the
procedure is likely to reach a local optimum. MERT is therefore usually run from
a number of different starting positions and the best set of weights is used.

After convergence (or reaching a pre-set maximum number of iterations), the
weights for log-linear model are known and the system training is finished.

Note that there have even been shared tasks in model optimization. One, by invitation only, in [http://www.statmt.org/wmt11/tunable-metrics-task.html 2011] and one in 2015: [http://www.statmt.org/wmt15/tuning-task/ WMT15 Tuning Task].

=== See Also ===

* Bojar, O. 2012. [http://www.cupress.cuni.cz/ink2_ext/index.jsp?include=podrobnosti&id=224545 Čeština a stojový překlad]. Ústav formální a aplikované lingvistiky MFF UK 2012.

MT Talks

2015-10-10T14:21:15Z

Tamchyna:

[[File:banner.png]]

MT Talks is a series of mini-lectures on machine translation.

Our goal is to hit just the right level of detail and technicality to make the talks interesting and attractive to people who are not yet familiar with the field but mix in new observations and insights so that even old pals will have a reason to watch us.

MT Talks and the expanded notes on this wiki will never be the ultimate resource for MT, but we would be very happy to serve as an ultimate commented ''directory'' of good pointers.

By the way, this is indeed a Wiki, so your contributions are very welcome! Please register and feel free to add comments, corrections or links to useful resources.

== Our Talks ==

01 '''[[Intro]]''': Why is MT difficult, approaches to MT.

02 '''[[MT that Deceives]]''': Serious translation errors even for short and simple inputs.

03 '''[[Pre-processing]]''': Normalization and other technical tricks bound to help your MT system.

04 '''[[MT Evaluation in General]]''': Techniques of judging MT quality, dimensions of translation quality, number of possible translations.

05 '''[[Automatic MT Evaluation]]''': Two common automatic MT evaluation methods: PER and BLEU

06 '''[[Data Acquisition]]''': The need and possible sources of training data for MT. And the diminishing utility of the new data additions due to Zipf's law.

07 '''[[Sentence Alignment]]''': An introduction to the Gale & Church sentence alignment algorithm.

08 '''[[Word Alignment]]''': Cutting the chicken-egg problem.

09 '''[[Phrase-based Model]]''': Copy if you can.

10 '''[[Constituency Trees]]''': Divide and conquer.

11 '''[[Dependency Trees]]''': Trees with gaps.

12 '''[[Rich Vocabulary]]''': Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.

13 '''[[Scoring and Optimization]]''': Features your model features.

14 '''[[Deep Syntax]]''': Prague Family Jewels.

== CodEx – Coding Exercises ==

* [https://codex3.ms.mff.cuni.cz/codex-trans/ Log in to CodEx] and solve programming exercises that complement our talks.
* [[CodEx-Introduction|Brief description of CodEx]]: how to get an account and submit a solution.
* [[CodEx - Important Notes|Important Notes]] on technical issues

== Contributing ==

Due to spamming, we had to restrict permissions for editing the Wiki. If you're interested in contributing, please write an email to '''tamchyna -at- ufal.mff.cuni.cz''' to obtain a username.

== Other Videolectures on MT ==

* [http://www.upc.edu/learning/courses/mooc/2014-2015/approaches-to-machine/approaches-to-machine Approaches to Machine Translation: Rule-Based, Statistical, Hybrid] (an online course on MT by UPC Barcelona)
* [https://www.coursera.org/course/nlangp Natural Language Processing at Coursera] by Michael Collins, includes lectures on word-based and phrase-based models. [http://www.cs.columbia.edu/~mcollins/notes-spring2013.html Further notes]
* [https://www.youtube.com/playlist?list=PLVjXYOjST-AokmIxpCr4GexcdtpeOliBc TAUS Machine Translation and Moses Tutorial] (a series of commented slides, MT overview and practical aspects of the Moses Toolkit)

== Acknowledgement ==

The work on this project has been supported by the grant FP7-ICT-2011-7-288487 ([http://www.statmt.org/mosescore/ MosesCore]).

Admin RootPage

2015-10-07T09:47:31Z

Tamchyna:

0x : How to get started with CodEx MT exercises

14 '''[[Deep Syntax]]''': Prague Family Jewels.

Our [https://www.youtube.com/playlist?list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V YouTube playlist] -- shows some total number of views, although different from individual video views.

[[CodEx - Important Notes]]

Admin RootPage

2015-10-07T09:47:25Z

Tamchyna:

0x : How to get started with CodEx MT exercises

13 '''[[Deep Syntax]]''': Prague Family Jewels.

Our [https://www.youtube.com/playlist?list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V YouTube playlist] -- shows some total number of views, although different from individual video views.

[[CodEx - Important Notes]]

Admin RootPage

2015-08-25T11:47:52Z

Tamchyna:

Scoring and Optimization

2015-08-25T11:47:01Z

Tamchyna:

Scoring and Optimization

2015-08-25T11:46:21Z

Tamchyna:

Scoring and Optimization

2015-08-25T11:46:08Z

Tamchyna:

Scoring and Optimization

2015-08-25T11:45:45Z

Tamchyna: