Bayesian Noise Reduction: Contextual Symmetry Logic Utilizing Pattern Consistency Analysis
2004-01-14
Language:
English
Abstract
Modern day language classification requires the use of machine learning, which relies heavily on presented learning input. Most of today's algorithms (Bayes, Chi-Square, etcetera) are inherently sound and accurate, however regardless of which algorithm is used, a great deal of any algorithm's accuracy is related directly to the quality of data provided - the Garbage In, Garbage Out rule. Bayesian Noise Reduction is a statistical approach to evaluating coherence by instantiating a series of machine-generated contexts to serve as a means of contrast. This makes it possible to identify text that is out of context using a form of pattern consistency checking. BNR attempts to solve the problem commonly referred to as "Bayesian Noise" which, in its simplest definition, refers to irrelevant data present in a message being classified. Bayesian Noise Reduction dubs irrelevant text in order to provide cleaner classification and is implemented as a pre-filter to existing language classification functions.
