Naive Bayes with Correlation Factor for Text Classification Problem

Jiangning Chen; Zhibo Dai; Juntao Duan; Heinrich Matzinger; Ionel; Popescu

arXiv:1905.06115·cs.IR·May 16, 2019

Naive Bayes with Correlation Factor for Text Classification Problem

Jiangning Chen, Zhibo Dai, Juntao Duan, Heinrich Matzinger, Ionel, Popescu

PDF

TL;DR

This paper introduces a modified Naive Bayes classifier that incorporates a correlation factor to improve text classification accuracy, especially with small training datasets.

Contribution

It proposes a novel Naive Bayes-based method with a correlation factor to enhance performance on limited data.

Findings

01

Improved accuracy over traditional Naive Bayes on real-world data

02

Effective handling of small training datasets

03

Correlation factor enhances class distinction

Abstract

Naive Bayes estimator is widely used in text classification problems. However, it doesn't perform well with small-size training dataset. We propose a new method based on Naive Bayes estimator to solve this problem. A correlation factor is introduced to incorporate the correlation among different classes. Experimental results show that our estimator achieves a better accuracy compared with traditional Naive Bayes in real world data.

Equations79

w \cdot d - b = 0,

w \cdot d - b = 0,

l ab e l (d) = j argmax P (C_{j}) P (d ∣ C_{j}) .

l ab e l (d) = j argmax P (C_{j}) P (d ∣ C_{j}) .

P (w, d)

P (w, d)

C = {C_{1}, C_{2}, ..., C_{k}} .

C = {C_{1}, C_{2}, ..., C_{k}} .

d = {x_{1}, x_{2}, \dots, x_{v}} .

d = {x_{1}, x_{2}, \dots, x_{v}} .

\overset{y}{^} (d) = f (d; θ) = (f_{1} (d; θ), f_{2} (d; θ), ..., f_{k} (d; θ))

\overset{y}{^} (d) = f (d; θ) = (f_{1} (d; θ), f_{2} (d; θ), ..., f_{k} (d; θ))

l ab e l (d)

l ab e l (d)

lo g f_{i} (d; θ) = lo g P (C_{i}) + j = 1 \sum v x_{j} lo g θ_{i_{j}} 1 \leq i \leq k

lo g f_{i} (d; θ) = lo g P (C_{i}) + j = 1 \sum v x_{j} lo g θ_{i_{j}} 1 \leq i \leq k

L (C_{i}, θ)

L (C_{i}, θ)

lo g L (C_{i}, θ) = d \in C_{i} \sum j = 1 \sum v x_{j} lo g θ_{i_{j}} .

lo g L (C_{i}, θ) = d \in C_{i} \sum j = 1 \sum v x_{j} lo g θ_{i_{j}} .

\displaystyle\max\

\displaystyle\max\

subject to :

\hat{θ}_{i_{j}} = \frac{\sum _{d \in C_{i}} x _{j}}{\sum _{d \in C_{i}} \sum _{j = 1}^{v} x _{j}} .

\hat{θ}_{i_{j}} = \frac{\sum _{d \in C_{i}} x _{j}}{\sum _{d \in C_{i}} \sum _{j = 1}^{v} x _{j}} .

\hat{θ}_{i_{j}} = \frac{\sum _{d \in C_{i}} x _{j}}{\sum _{d \in C_{i}} m} = \frac{\sum _{d \in C_{i}} x _{j}}{∣ C _{i} ∣ m} .

\hat{θ}_{i_{j}} = \frac{\sum _{d \in C_{i}} x _{j}}{\sum _{d \in C_{i}} m} = \frac{\sum _{d \in C_{i}} x _{j}}{∣ C _{i} ∣ m} .

E [\hat{θ}_{i_{j}}]

E [\hat{θ}_{i_{j}}]

= \frac{\sum _{d \in C_{i}} m θ _{i_{j}}}{∣ C _{i} ∣ m} = θ_{i_{j}} .

E [∣ \hat{θ}_{i_{j}} - θ_{i_{j}} ∣^{2}] = E [\hat{θ}_{i_{j}}^{2}] - 2 θ_{i_{j}} E [\hat{θ}_{i_{j}}] + θ_{i_{j}}^{2} = E [\hat{θ}_{i_{j}}^{2}] - θ_{i_{j}}^{2} .

E [∣ \hat{θ}_{i_{j}} - θ_{i_{j}} ∣^{2}] = E [\hat{θ}_{i_{j}}^{2}] - 2 θ_{i_{j}} E [\hat{θ}_{i_{j}}] + θ_{i_{j}}^{2} = E [\hat{θ}_{i_{j}}^{2}] - θ_{i_{j}}^{2} .

\hat{θ}_{i_{j}}^{2} = \frac{( \sum _{d \in C_{i}} x _{j} ) ^{2}}{∣ C _{i} ∣ ^{2} m ^{2}} = \frac{\sum _{d \in C_{i}} x _{j}^{2} + \sum _{d \neq = d^{'} \in C_{i}} x _{j}^{d} x _{j}^{d^{'}}}{∣ C _{i} ∣ ^{2} m ^{2}},

\hat{θ}_{i_{j}}^{2} = \frac{( \sum _{d \in C_{i}} x _{j} ) ^{2}}{∣ C _{i} ∣ ^{2} m ^{2}} = \frac{\sum _{d \in C_{i}} x _{j}^{2} + \sum _{d \neq = d^{'} \in C_{i}} x _{j}^{d} x _{j}^{d^{'}}}{∣ C _{i} ∣ ^{2} m ^{2}},

E [\frac{\sum _{d \in C_{i}} x _{j}^{2}}{∣ C _{i} ∣ ^{2} m ^{2}}]

E [\frac{\sum _{d \in C_{i}} x _{j}^{2}}{∣ C _{i} ∣ ^{2} m ^{2}}]

= \frac{θ _{i_{j}} ( 1 - θ _{i_{j}} + m θ _{i_{j}} )}{∣ C _{i} ∣ m},

E [\frac{\sum _{d \neq = d^{'} \in C_{i}} x _{j}^{d} x _{j}^{d^{'}}}{∣ C _{i} ∣ ^{2} m ^{2}}]

E [\frac{\sum _{d \neq = d^{'} \in C_{i}} x _{j}^{d} x _{j}^{d^{'}}}{∣ C _{i} ∣ ^{2} m ^{2}}]

= \frac{( ∣ C _{i} ∣ - 1 ) θ _{i_{j}}^{2}}{∣ C _{i} ∣} .

E [\hat{θ}_{i_{j}}^{2}] = \frac{θ _{i_{j}} ( 1 - θ _{i_{j}} )}{∣ C _{i} ∣ m} + θ_{i_{j}}^{2},

E [\hat{θ}_{i_{j}}^{2}] = \frac{θ _{i_{j}} ( 1 - θ _{i_{j}} )}{∣ C _{i} ∣ m} + θ_{i_{j}}^{2},

L_{1} (C_{i}, θ)

L_{1} (C_{i}, θ)

lo g L_{1} (C_{i}, θ) = d \in S \sum [(y_{i} (d) + t) j = 1 \sum v x_{j} lo g θ_{i_{j}}] .

lo g L_{1} (C_{i}, θ) = d \in S \sum [(y_{i} (d) + t) j = 1 \sum v x_{j} lo g θ_{i_{j}}] .

\displaystyle\max\

\displaystyle\max\

subject to :

G_{i} = 1 - j = 1 \sum v θ_{i_{j}},

G_{i} = 1 - j = 1 \sum v θ_{i_{j}},

⎩ ⎨ ⎧ \frac{\partial lo g ( L _{1} )}{\partial θ _{i_{j}}} + λ_{i} \frac{\partial G _{i}}{\partial θ _{i_{j}}} = 0 \forall 1 \leq i \leq k \forall 1 \leq j \leq v j = 1 \sum v θ_{i_{j}} = 1, \forall 1 \leq i \leq k

⎩ ⎨ ⎧ \frac{\partial lo g ( L _{1} )}{\partial θ _{i_{j}}} + λ_{i} \frac{\partial G _{i}}{\partial θ _{i_{j}}} = 0 \forall 1 \leq i \leq k \forall 1 \leq j \leq v j = 1 \sum v θ_{i_{j}} = 1, \forall 1 \leq i \leq k

⎩ ⎨ ⎧ d \in S \sum \frac{( y _{i} ( d ) + t ) x _{j}}{θ _{i_{j}}} - λ_{i} = 0, \forall 1 \leq i \leq k \forall 1 \leq j \leq v j = 1 \sum v θ_{i_{j}} = 1, \forall 1 \leq i \leq k

⎩ ⎨ ⎧ d \in S \sum \frac{( y _{i} ( d ) + t ) x _{j}}{θ _{i_{j}}} - λ_{i} = 0, \forall 1 \leq i \leq k \forall 1 \leq j \leq v j = 1 \sum v θ_{i_{j}} = 1, \forall 1 \leq i \leq k

\hat{θ}_{i_{j}}^{L_{1}} = \frac{\sum _{d \in S} ( y _{i} ( d ) + t ) x _{j}}{\sum _{j = 1}^{v} \sum _{d \in S} ( y _{i} ( d ) + t ) x _{j}} = \frac{\sum _{d \in S} ( y _{i} ( d ) + t ) x _{j}}{m ( ∣ C _{i} ∣ + t ∣ S ∣ )}

\hat{θ}_{i_{j}}^{L_{1}} = \frac{\sum _{d \in S} ( y _{i} ( d ) + t ) x _{j}}{\sum _{j = 1}^{v} \sum _{d \in S} ( y _{i} ( d ) + t ) x _{j}} = \frac{\sum _{d \in S} ( y _{i} ( d ) + t ) x _{j}}{m ( ∣ C _{i} ∣ + t ∣ S ∣ )}

E [\hat{θ}_{i_{j}}^{L_{1}}]

E [\hat{θ}_{i_{j}}^{L_{1}}]

E [∣ \hat{θ}_{i_{j}}^{L_{1}} - θ_{i_{j}} ∣]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Naive Bayes with Correlation Factor for Text Classification Problem

Jiangning Chen

School of Mathematics