Logic could be learned from images

Qian Guo; Yuhua Qian; Xinyan Liang; Yanhong She; Deyu Li; Jiye Liang

arXiv:1908.01931·cs.CV·June 30, 2021

Logic could be learned from images

Qian Guo, Yuhua Qian, Xinyan Liang, Yanhong She, Deyu Li, Jiye Liang

PDF

Open Access

TL;DR

This paper introduces a new task called LiLi for learning and reasoning logic relations directly from images without predefined patterns, demonstrating the potential and challenges of neural networks in data-driven logic reasoning.

Contribution

The study proposes the LiLi task, creates six datasets for logic reasoning from images, and develops a novel divide and conquer neural network framework to improve reasoning accuracy.

Findings

01

Standard neural networks perform poorly on complex logic tasks.

02

Adding label information significantly improves model performance.

03

The LiLi datasets serve as benchmarks for visual logic reasoning.

Abstract

Logic reasoning is a significant ability of human intelligence and also an important task in artificial intelligence. The existing logic reasoning methods, quite often, need to design some reasoning patterns beforehand. This has led to an interesting question: can logic reasoning patterns be directly learned from given data? The problem is termed as a data concept logic. In this study, a learning logic task from images, called a LiLi task, first is proposed. This task is to learn and reason the logic relation from images, without presetting any reasoning patterns. As a preliminary exploration, we design six LiLi data sets (Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication), in which each image is embedded with a n-digit number. It is worth noting that a learning model beforehand does not know the meaning of the n-digit numbers embedded in images and the…

Tables5

Table 1. Table 1: The hyper-parameter settings on all models.

Model	hyper-parameter
CNN-LSTM	Conv(32,(5,5),l2(1.e-4))- $>$ BatchNormalization()- $>$ MaxPooling((2,2))- $>$
	Conv(64,(3,3),l2(1.e-4))- $>$ BatchNormalization()- $>$ MaxPooling((2,2))- $>$
	LSTM(1024, dropout=0.5)
MLP	Dense(256)- $>$ Dense(256)- $>$ Dense(256)
CNN-MLP	Conv(32,(5,5))- $>$ BatchNormalization()- $>$ MaxPooling((2,2))- $>$
	Conv(64,(3,3))- $>$ BatchNormalization()- $>$ MaxPooling((2,2))- $>$
	Dense(4096)
Autoencoder	Conv(32,(5,5))- $>$ MaxPooling((2,2))- $>$ Conv(64,(5,5))- $>$ MaxPooling((2,2))
	Conv(64,(5,5))- $>$ UpSampling((2,2))- $>$ Conv(32,(5,5))- $>$ UpSampling((2,2))
	Cropping2D(((0,1),(0,0)))- $>$ Conv(1,(5,5))

Table 2. Table 2: The test accuracies of Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication on 10,000 training data sets.

Model	Operations
	$⋆$			$⋆, ⋆$		$⋆, ⋆, ⋆$
	Bitwise And	Bitwise Or	Bitwise Xor	Addition	Subtraction	Multiplication
CNN-LSTM	100%	100%	100%	0.07%	0.38%	0.10%
MLP	100%	100%	100%	0.21%	0.21%	0.08%
CNN-MLP	100%	100%	100%	96.33%	98.69%	0.07%
Autoencoder	100%	100%	100%	96.78%	97.34%	0.08%
ResNet18	99.96%	98.52%	99.80%	99.86%	99.49%	0.10%
ResNet50	99.92%	99.86%	99.69%	99.14%	99.64%	0.10%
ResNet152	100%	100%	100%	98.74%	98.93%	0.14%

Table 3. Table 3: The test accuracies of Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication on 150,000 training data sets.

Model	Operations
	$⋆$			$⋆, ⋆$		$⋆, ⋆, ⋆$
	Bitwise And	Bitwise Or	Bitwise Xor	Addition	Subtraction	Multiplication
CNN-LSTM	100%	100%	100%	84.21%	79.22%	0.20%
MLP	100%	100%	100%	98.79%	97.39%	0.16%
CNN-MLP	100%	100%	100%	99.96%	99.96%	0.35%
Autoencoder	100%	100%	100%	98.17%	98.66%	0.16%
ResNet18	100%	100%	100%	99.50%	99.50%	0.24%
ResNet50	100%	100%	100%	99.56%	99.79%	0.26%
ResNet152	100%	100%	100%	99.98%	99.87%	0.24%

Table 4. Table 4: The test accuracies of CNN2-MLP on Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication data sets

# training samples	Operations
	$⋆$			$⋆, ⋆$		$⋆, ⋆, ⋆$
	Bitwise And	Bitwise Or	Bitwise Xor	Addition	Subtraction	Multiplication
150,000	100%	100%	100%	67.47%	62.92%	0.28%
10,000	100%	100%	100%	0.24%	0.20%	0.05%

Table 5. Table 5: The test accuracy of each subtask of DCM using 150,000 training examples.

Operation	Network branches
Operation	Carry subtask	Operation without carry subtask	Synthetic subtask
Multiplication	86.25%	98.38%	84.46%

Equations31

W^{*}

W^{*}

= ar g W min \frac{1}{N} i = 1 \sum N L (L P N_{W} (x_{i}), y_{i}),

\begin{array}[]{lcr}Antecedent\ 1:&A_{1}&\longrightarrow B_{1}\\ Antecedent\ 2:&A_{2}&\longrightarrow B_{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&\vdots&\\ Antecedent\ n:&A_{n}&\longrightarrow B_{n}\\ Antecedent\ *:&A_{*}&\\ \hline\cr Consequence:&&B_{*},\\ \end{array}

\begin{array}[]{lcr}Antecedent\ 1:&A_{1}&\longrightarrow B_{1}\\ Antecedent\ 2:&A_{2}&\longrightarrow B_{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&\vdots&\\ Antecedent\ n:&A_{n}&\longrightarrow B_{n}\\ Antecedent\ *:&A_{*}&\\ \hline\cr Consequence:&&B_{*},\\ \end{array}

B_{*} = R_{z} (A, B) \circ A_{*},

B_{*} = R_{z} (A, B) \circ A_{*},

\begin{array}[]{ll}Antecedent\ 1:&If\ the\ input\ is\ x_{1}\ then\ the\ output\ is\ y_{1}\\ Antecedent\ 2:&If\ the\ input\ is\ x_{2}\ then\ the\ output\ is\ y_{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n:&If\ the\ input\ is\ x_{n}\ then\ the\ output\ is\ y_{n}\\ Antecedent\ n+1:&If\ the\ input\ is\ x_{n+1}\\ Antecedent\ n+2:&If\ the\ input\ is\ x_{n+2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n+m:&If\ the\ input\ is\ x_{n+m}\\ \\ \hline\cr Consequence\ n+1:&The\ output\ is\ y_{n+1}\\ Consequence\ n+2:&The\ output\ is\ y_{n+2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Consequence\ n+m:&The\ output\ is\ y_{n+m},\\ \end{array}

\begin{array}[]{ll}Antecedent\ 1:&If\ the\ input\ is\ x_{1}\ then\ the\ output\ is\ y_{1}\\ Antecedent\ 2:&If\ the\ input\ is\ x_{2}\ then\ the\ output\ is\ y_{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n:&If\ the\ input\ is\ x_{n}\ then\ the\ output\ is\ y_{n}\\ Antecedent\ n+1:&If\ the\ input\ is\ x_{n+1}\\ Antecedent\ n+2:&If\ the\ input\ is\ x_{n+2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n+m:&If\ the\ input\ is\ x_{n+m}\\ \\ \hline\cr Consequence\ n+1:&The\ output\ is\ y_{n+1}\\ Consequence\ n+2:&The\ output\ is\ y_{n+2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Consequence\ n+m:&The\ output\ is\ y_{n+m},\\ \end{array}

\begin{array}[]{lll}Antecedent\ 1:&If\ the\ input\ squence\ is\ (x_{1}^{1}\ ,x_{1}^{2}\ ,\ldots,x_{1}^{m_{I}})\\ &then\ the\ output\ squence\ is\ (y_{1}^{1}\ ,y_{1}^{2}\ ,\ldots,y_{1}^{m_{O}})\\ Antecedent\ 2:&If\ the\ input\ squence\ is\ (x_{2}^{1}\ ,x_{2}^{2}\ ,\ldots,x_{2}^{m_{I}})\\ &then\ the\ output\ squence\ is\ (y_{2}^{1}\ ,y_{2}^{2}\ ,\ldots,y_{2}^{m_{O}})\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n:&If\ the\ input\ squence\ is\ (x_{n}^{1}\ ,x_{n}^{2}\ ,\ldots,x_{n}^{m_{I}})\\ &then\ the\ output\ squence\ is\ (y_{n}^{1}\ ,y_{n}^{2}\ ,\ldots,y_{n}^{m_{O}})\\ Antecedent\ n+1:&If\ the\ input\ squence\ is\ (x_{n+1}^{1}\ ,x_{n+1}^{2}\ ,\ldots,x_{n+1}^{m_{I}})\\ Antecedent\ n+2:&If\ the\ input\ squence\ is\ (x_{n+2}^{1}\ ,x_{n+2}^{2}\ ,\ldots,x_{n+2}^{m_{I}})\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n+m:&If\ the\ input\ squence\ is\ (x_{n+m}^{1}\ ,x_{n+m}^{2}\ ,\ldots,x_{n+m}^{m_{I}})\\ \\ \hline\cr Consequence\ n+1:&The\ output\ squence\ is\ (y_{n+1}^{1}\ ,y_{n+1}^{2}\ ,\ldots,y_{n+1}^{m_{O}})\\ Consequence\ n+2:&The\ output\ squence\ is\ (y_{n+2}^{1}\ ,y_{n+2}^{2}\ ,\ldots,y_{n+2}^{m_{O}})\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Consequence\ n+m:&The\ output\ squence\ is\ (y_{n+m}^{1}\ ,y_{n+m}^{2}\ ,\ldots,y_{n+m}^{m_{O}}),\\ \end{array}

\begin{array}[]{lll}Antecedent\ 1:&If\ the\ input\ squence\ is\ (x_{1}^{1}\ ,x_{1}^{2}\ ,\ldots,x_{1}^{m_{I}})\\ &then\ the\ output\ squence\ is\ (y_{1}^{1}\ ,y_{1}^{2}\ ,\ldots,y_{1}^{m_{O}})\\ Antecedent\ 2:&If\ the\ input\ squence\ is\ (x_{2}^{1}\ ,x_{2}^{2}\ ,\ldots,x_{2}^{m_{I}})\\ &then\ the\ output\ squence\ is\ (y_{2}^{1}\ ,y_{2}^{2}\ ,\ldots,y_{2}^{m_{O}})\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n:&If\ the\ input\ squence\ is\ (x_{n}^{1}\ ,x_{n}^{2}\ ,\ldots,x_{n}^{m_{I}})\\ &then\ the\ output\ squence\ is\ (y_{n}^{1}\ ,y_{n}^{2}\ ,\ldots,y_{n}^{m_{O}})\\ Antecedent\ n+1:&If\ the\ input\ squence\ is\ (x_{n+1}^{1}\ ,x_{n+1}^{2}\ ,\ldots,x_{n+1}^{m_{I}})\\ Antecedent\ n+2:&If\ the\ input\ squence\ is\ (x_{n+2}^{1}\ ,x_{n+2}^{2}\ ,\ldots,x_{n+2}^{m_{I}})\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n+m:&If\ the\ input\ squence\ is\ (x_{n+m}^{1}\ ,x_{n+m}^{2}\ ,\ldots,x_{n+m}^{m_{I}})\\ \\ \hline\cr Consequence\ n+1:&The\ output\ squence\ is\ (y_{n+1}^{1}\ ,y_{n+1}^{2}\ ,\ldots,y_{n+1}^{m_{O}})\\ Consequence\ n+2:&The\ output\ squence\ is\ (y_{n+2}^{1}\ ,y_{n+2}^{2}\ ,\ldots,y_{n+2}^{m_{O}})\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Consequence\ n+m:&The\ output\ squence\ is\ (y_{n+m}^{1}\ ,y_{n+m}^{2}\ ,\ldots,y_{n+m}^{m_{O}}),\\ \end{array}

\begin{array}[]{lcc}Training\ antecedent:&(x_{1}^{1},x_{1}^{2},\ldots,x_{1}^{m_{I}})&\longrightarrow(y_{1}^{1},y_{1}^{2},\ldots,y_{1}^{m_{O}})\\ &(x_{2}^{1},x_{2}^{2},\ldots,x_{2}^{m_{I}})&\longrightarrow(y_{2}^{1},y_{2}^{2},\ldots,y_{2}^{m_{O}})\\ &\vdots&\vdots\\ &(x_{n}^{1},x_{n}^{2},\ldots,x_{n}^{m_{I}})&\longrightarrow(y_{n}^{1},y_{n}^{2},\ldots,y_{n}^{m_{O}})\\ Testing\ antecedent:&(x_{n+1}^{1},x_{n+1}^{2},\ldots,x_{n+1}^{m_{I}})&\\ &(x_{n+2}^{1},x_{n+2}^{2},\ldots,x_{n+2}^{m_{I}})&\\ &\vdots\\ &(x_{n+m}^{1},x_{n+m}^{2},\ldots,x_{n+m}^{m_{I}})&\\ \hline\cr Consequence:&&(y_{n+1}^{1},y_{n+1}^{2},\ldots,y_{n+1}^{m_{O}})\\ &&(y_{n+2}^{1},y_{n+2}^{2},\ldots,y_{n+2}^{m_{O}})\\ &&~{}~{}~{}\vdots\\ &&(y_{n+m}^{1},y_{n+m}^{2},\ldots,y_{n+m}^{m_{O}}),\\ \end{array}

\begin{array}[]{lcc}Training\ antecedent:&(x_{1}^{1},x_{1}^{2},\ldots,x_{1}^{m_{I}})&\longrightarrow(y_{1}^{1},y_{1}^{2},\ldots,y_{1}^{m_{O}})\\ &(x_{2}^{1},x_{2}^{2},\ldots,x_{2}^{m_{I}})&\longrightarrow(y_{2}^{1},y_{2}^{2},\ldots,y_{2}^{m_{O}})\\ &\vdots&\vdots\\ &(x_{n}^{1},x_{n}^{2},\ldots,x_{n}^{m_{I}})&\longrightarrow(y_{n}^{1},y_{n}^{2},\ldots,y_{n}^{m_{O}})\\ Testing\ antecedent:&(x_{n+1}^{1},x_{n+1}^{2},\ldots,x_{n+1}^{m_{I}})&\\ &(x_{n+2}^{1},x_{n+2}^{2},\ldots,x_{n+2}^{m_{I}})&\\ &\vdots\\ &(x_{n+m}^{1},x_{n+m}^{2},\ldots,x_{n+m}^{m_{I}})&\\ \hline\cr Consequence:&&(y_{n+1}^{1},y_{n+1}^{2},\ldots,y_{n+1}^{m_{O}})\\ &&(y_{n+2}^{1},y_{n+2}^{2},\ldots,y_{n+2}^{m_{O}})\\ &&~{}~{}~{}\vdots\\ &&(y_{n+m}^{1},y_{n+m}^{2},\ldots,y_{n+m}^{m_{O}}),\\ \end{array}

\begin{array}[]{rcc}Training\ antecedent\ set:&I_{train}&\longrightarrow O_{train}\\ Testing\ antecedent\ set:&I_{test}&\\ \hline\cr Consequence\ set:&&~{}~{}~{}~{}~{}~{}O_{test},\\ \end{array}

\begin{array}[]{rcc}Training\ antecedent\ set:&I_{train}&\longrightarrow O_{train}\\ Testing\ antecedent\ set:&I_{test}&\\ \hline\cr Consequence\ set:&&~{}~{}~{}~{}~{}~{}O_{test},\\ \end{array}

O_{t es t} = R (I_{t r ain}, O_{t r ain}) \circ I_{t es t},

O_{t es t} = R (I_{t r ain}, O_{t r ain}) \circ I_{t es t},

W^{*}

W^{*}

= ar g W min \frac{1}{N} i = 1 \sum N M S E (f (L P N_{W} (x_{i}^{1}, x_{i}^{2})), y_{i})

= ar g W min \frac{1}{N} i = 1 \sum N k = 1 \sum K (f (L P N_{W} (x_{i}^{1}, x_{i}^{2}))_{k} - y_{i}_{k})^{2},

\begin{array}[]{ll}Antecedent\ 1:&If\ two\ input\ images\ are\ x_{1}^{1}\ and\ x_{1}^{2}\ then\ the\ output\ image\ is\ y_{1}\\ Antecedent\ 2:&If\ two\ input\ images\ are\ x_{2}^{1}\ and\ x_{2}^{2}\ then\ the\ output\ image\ is\ y_{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n:&If\ two\ input\ images\ are\ x_{n}^{1}\ and\ x_{n}^{2}\ then\ the\ output\ image\ is\ y_{n}\\ Antecedent\ n+1:&If\ two\ input\ images\ are\ x_{n+1}^{1}\ and\ x_{n+1}^{2}\\ Antecedent\ n+2:&If\ two\ input\ images\ are\ x_{n+2}^{1}\ and\ x_{n+2}^{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n+m:&If\ two\ input\ images\ are\ x_{n+m}^{1}\ andx_{n+m}^{2}\\ \hline\cr Consequence\ n+1:&The\ output\ image\ is\ y_{n+1}\\ Consequence\ n+2:&The\ output\ image\ is\ y_{n+2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Consequence\ n+m:&The\ output\ image\ is\ y_{n+m},\\ \end{array}

\begin{array}[]{ll}Antecedent\ 1:&If\ two\ input\ images\ are\ x_{1}^{1}\ and\ x_{1}^{2}\ then\ the\ output\ image\ is\ y_{1}\\ Antecedent\ 2:&If\ two\ input\ images\ are\ x_{2}^{1}\ and\ x_{2}^{2}\ then\ the\ output\ image\ is\ y_{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n:&If\ two\ input\ images\ are\ x_{n}^{1}\ and\ x_{n}^{2}\ then\ the\ output\ image\ is\ y_{n}\\ Antecedent\ n+1:&If\ two\ input\ images\ are\ x_{n+1}^{1}\ and\ x_{n+1}^{2}\\ Antecedent\ n+2:&If\ two\ input\ images\ are\ x_{n+2}^{1}\ and\ x_{n+2}^{2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Antecedent\ n+m:&If\ two\ input\ images\ are\ x_{n+m}^{1}\ andx_{n+m}^{2}\\ \hline\cr Consequence\ n+1:&The\ output\ image\ is\ y_{n+1}\\ Consequence\ n+2:&The\ output\ image\ is\ y_{n+2}\\ ~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots&~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\vdots\\ Consequence\ n+m:&The\ output\ image\ is\ y_{n+m},\\ \end{array}

\begin{array}[]{lcl}Training\ antecedent:&(x_{1}^{1},x_{1}^{2})&\longrightarrow y_{1}\\ &(x_{2}^{1},x_{2}^{2})&\longrightarrow y_{2}\\ &\vdots\\ &(x_{n}^{1},x_{n}^{2})&\longrightarrow y_{n}\\ Testing\ antecedent:&(x_{n+1}^{1},x_{n+1}^{2})&\\ &(x_{n+2}^{1},x_{n+2}^{2})&\\ &\vdots\\ &(x_{n+m}^{1},x_{n+m}^{2})&\\ \hline\cr Consequence:&&y_{n+1}\\ &&y_{n+2}\\ &&~{}~{}~{}\vdots\\ &&y_{n+m},\\ \end{array}

\begin{array}[]{lcl}Training\ antecedent:&(x_{1}^{1},x_{1}^{2})&\longrightarrow y_{1}\\ &(x_{2}^{1},x_{2}^{2})&\longrightarrow y_{2}\\ &\vdots\\ &(x_{n}^{1},x_{n}^{2})&\longrightarrow y_{n}\\ Testing\ antecedent:&(x_{n+1}^{1},x_{n+1}^{2})&\\ &(x_{n+2}^{1},x_{n+2}^{2})&\\ &\vdots\\ &(x_{n+m}^{1},x_{n+m}^{2})&\\ \hline\cr Consequence:&&y_{n+1}\\ &&y_{n+2}\\ &&~{}~{}~{}\vdots\\ &&y_{n+m},\\ \end{array}

\begin{array}[]{rcc}Training\ antecedent\ set:&I_{train}&\longrightarrow O_{train}\\ Testing\ antecedent\ set:&I_{test}&\\ \hline\cr Consequence\ set:&&~{}~{}~{}~{}~{}~{}O_{test},\\ \end{array}

\begin{array}[]{rcc}Training\ antecedent\ set:&I_{train}&\longrightarrow O_{train}\\ Testing\ antecedent\ set:&I_{test}&\\ \hline\cr Consequence\ set:&&~{}~{}~{}~{}~{}~{}O_{test},\\ \end{array}

O_{t es t} = R (I_{t r ain}, O_{t r ain}) \circ I_{t es t},

O_{t es t} = R (I_{t r ain}, O_{t r ain}) \circ I_{t es t},

H > f (h_{1}, h_{2}, ..., h_{k}),

H > f (h_{1}, h_{2}, ..., h_{k}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Image Retrieval and Classification Techniques · Machine Learning and Data Classification

Full text

Logic could be learned from images

Qian Guo

[email protected]

Yuhua Qian 111Corresponding author.

[email protected]

Xinyan Liang

[email protected]

Yanhong She

[email protected],[email protected]

Deyu Li

[email protected]

Jiye Liang

[email protected]

Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, Shanxi, China

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China

School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China

College of Science, Xi’an Shiyou University, Xi’an 710065, Shaan’xi, China

Abstract

Logic reasoning is a significant ability of human intelligence and also an important task in artificial intelligence. The existing logic reasoning methods, quite often, need to design some reasoning patterns beforehand. This has led to an interesting question: can logic reasoning patterns be directly learned from given data? The problem is termed as a data concept logic. In this study, a learning logic task from images, called a LiLi task, first is proposed. This task is to learn and reason the logic relation from images, without presetting any reasoning patterns. As a preliminary exploration, we design six LiLi data sets (Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication), in which each image is embedded with a n-digit number. It is worth noting that a learning model beforehand does not know the meaning of the n-digit numbers embedded in images and the relation between the input images and the output image. In order to tackle the task, in this work we use many typical neural network models and produce fruitful results. However, these models have the poor performances on the difficult logic task. For furthermore addressing this task, a novel network framework called a divide and conquer model by adding some label information is designed, achieving a high testing accuracy.

keywords:

Logic reasoning, data concept logic, LiLi task, reasoning patterns

MSC:

[2010] 00-01, 99-00

1 Introduction

Human intelligence integrates cognitive functions such as perception, learning, memory, problem solving and logic reasoning [1]. Among them, logic reasoning is a significant ability of human intelligence. Applying the reasoning, humans obtain some rules hidden in complex phenomenon, and even forecast the unknown events. One of the goals of artificial intelligence is to mimic human cognitive functions to the utmost. As a part of cognitive functions, logic reasoning is also an important task in artificial intelligence [2].

Many logic reasoning methods such as fuzzy reasoning [3, 4, 5, 6], FCA [7, 8, 9, 10], probabilistic reasoning [11, 12, 13, 14], evidential reasoning [15, 16], Bayesian reasoning [17, 18] and rough reasoning [19, 20, 21, 22], have been proposed. However, quite often, these methods need to design some reasoning patterns beforehand. For example, in the FCA, one first obtains a formal context applying the domain expert knowledge, then computes the concept lattice from the formal context, and finally achieves knowledge reasoning using the disjunction and conjunction operations. This process not only costs a large amount of time, but also heavily depends on the domain expert experience. But, without mastering special domain knowledge beforehand, human still can directly reason from given data. For example, without mastering knowledge of 3D reconstruction beforehand, people can reconstruct 3D model of an unseen 2D image in his mind through observing and reasoning many 2D images and corresponding 3D scenes in real world. This has led to an interesting research topic: can machine directly learn logic reasoning patterns from given data? And these logical patterns are termed as the data concept logic (DCL).

As a preliminary exploration, in this study, we design a task of the DCL which is called learning logic task from images, just a LiLi task shown in Fig. 1(7). Unlike the logical operation defined by human (LOH) using some domain expert knowledge, a LiLi task is to learn and reason the relation between two input images and one output image without any reasoning patterns beforehand, i.e. LiLi does not know any reasoning patterns about $R$ . In summary, there are some differences below between a LiLi task and a LOH.

$\bullet$

For LiLi, one does not know any reasoning patterns about $R$ except for giving a data set, while for LOH whose focus is that how to define a reasonable logical operation, one always possesses lots of domain knowledge about $R$ .

$\bullet$

LPN induced by a LiLi models an abstract or low level logical relation in term of the pixel values. However, the existing logical operation models a semantic or high level logical relation in term of the numbers or symbols.

$\bullet$

LPN induced by a LiLi is a data-driven method to model the logical relation, while LOH is an expert-driven method.

Learning logic task from images (LiLi task) is also a very important computer vision task. Unfortunately, to the best of our knowledge, there are only a bit of work on the LiLi task shown in Fig. 1(7). [23] mined the logical patterns from Fashion-Logic data sets in a data-driven way. Zhou et al. [24] proposed abductive learning framework which can learn perception and reasoning modules concurrently. In contrast, a variety of models based on deep convolutional neural networks (CNNs) have achieved the state-of-the-art performances, even super-human on some tasks for the common computer vision tasks such as object recognition [25, 26, 27], object detection [28, 29], semantic segmentation [30, 31], image captioning [32, 33], visual question answering (VQA)[34, 35], image generator [36, 37] (see Fig. 1). It is well known that the logic reasoning is one of the abilities that the general/strong artificial intelligence has to possess. In the existing computer vision tasks, image captioning and visual question answering seem to need some reasoning abilities, especially VQA (indeed VQA performs need more knowledge: image itself, common sense, domain knowledge, and so on). In fact, because of some shortcomings of existing benchmark data sets (described in Sect. 3.1), the systems can correctly answer the questions without reasoning [2, 38, 39]. Hence, it is desired to provide a new task, such as the LiLi task, to test the reasoning ability of models.

Our contributions are as follows:

The data concept logic (DCL) is proposed to directly learn the concept logical patterns from the given data. 2. 2.

We propose a LiLi task where the abstract or low logical relation between two input images and one output image needs to be learned and reasoned without any reasoning patterns beforehand. 3. 3.

We provide an inference form of the LiLi task that is the consistent with classical propositional calculus form. 4. 4.

Six LiLi data sets with three difficulty levels: Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication, are provided. 5. 5.

Unlike a semantic or high level logical relation defined by human, an abstract or low level logical relation is expressed by a novel data-driven method called as LPN. 6. 6.

The performances of these typical neural networks: CNN-LSTM, MLP, CNN-MLP, Autoencoder and ResNets, are tested on six LiLi data sets. 7. 7.

The divide and conquer model (DCM) is proposed using a decomposing strategy to solve the difficult task Multiplication, achieving a better performance than the typical neural networks used in this paper.

The remainder of this paper is organized as follows: Sect. 2 proposes the DCL. Sect. 3 proposes six LiLi data sets, the LiLi task and its inference form. Sect. 4 presents the performance evaluation of the typical neural networks on six LiLi data sets. In Sect. 5, the DCM is devised to solve the difficult logic task Multiplication. Finally, we draw conclusions in Sect. 6.

2 DCL

In this section, we first detail the DCL proposed in this paper, and then provide an inference form of DCL.

2.1 DCL

Data concept logic (DCL) is a data-driven tool for learning to obtain logic concepts from a given data set directly. Applying the learned concepts, it can output the logical relations among the input data. It is noted that DCL merely uses pure original data cues, and can not know other information such as the meaning of symbols/numbers in data in advance. The DCL can be formalized as follows.

Definition 1.

A data concept logic is termed as a triple $\mathcal{R}=(I,R,O)$ , where $I=\{x_{i}~{}|~{}x_{i}=(x_{i}^{1},x_{i}^{2},\ldots,x_{i}^{m_{I}}),i=1,2,\ldots,N\}$ is an input sequence with the length $m_{I}$ , $O=\{y_{i}~{}|~{}y_{i}=(y_{i}^{1},y_{i}^{2},\ldots,y_{i}^{m_{O}}),i=1,2,\ldots,N\}$ is an output sequence with the length $m_{O}$ , $R:I\rightarrow O$ is a reasoning pattern (relation mapping) from the input $I$ to the output $O$ .

The aim of DCL is to learn the $R$ from the input $I$ to the output $O$ . In this paper, we propose a deep learning network framework: Logical Pattern Network ( $LPN$ ) parameterized by $W$ to learn the $R$ . This model can be learned by solving the following optimization problem.

[TABLE]

where $\mathcal{L}$ is a loss function, and $N$ is the number of the training samples.

The universal approximation theorem tells us that neural networks are able to approximate any measurable function with any precision [40]. Theoretically, the logical pattern $R$ can be represented by one neural network. In the DCL, $R$ is hidden in the LPN, and mining $R$ from data can be regarded as the iterative optimization process of parameter $W$ of LPN. At each iteration, the value of $W$ changes in the direction that the loss $\mathcal{L}$ becomes smaller. When the loss is small enough, the iteration stops and $R$ is obtained.

The workflow of a DCL task is illustrated in Fig. 2, where $I$ is the set of input data, $O$ is the set of ground-truth output data, $\hat{O}$ indicates the set of logical relation patterns reasoned by $f(LPN_{W}(x_{i}^{1},x_{i}^{2},\ldots,x_{i}^{m_{I}}))$ , $O/I$ is the ground-truth logical relation set for a given input set $I$ , $\hat{O}/I$ is the prediction logical relation set for a given input set $I$ using $LPN$ , Loss is used to evaluate the difference between $O/I$ and $\hat{O}/I$ . $LPN$ indicates the logical pattern network.

2.2 Inference form of DCL

Human, in our daily life, often makes inferences using some known antecedents. And this process can be formalized as the following form [4].

[TABLE]

Formula 2 exactly is also the mathematical model of the classical propositional calculus [4] where the consequence of the antecedent $*$ is inferred using the known $n$ antecedents. There exist many methods addressing the task. For example, Zadeh [41] provided an inference rule called ‘compositional rule of inference’ (CRI) to make such an inference whose antecedents and consequences contain fuzzy concepts. Specially, an implication $A\rightarrow B$ first is translated into a fuzzy relation $R_{z}(A,B)$ from $A$ to $B$ . And then, $B_{*}$ can be inferred by the composition of $R_{z}$ and $A_{*}$ by the following formula.

[TABLE]

where $R_{z}:[0,1]^{2}\rightarrow[0,1]$ defined beforehand by the human experts is a duality function. $\circ$ denotes a composition operator.

Inspired by fuzzy reasoning [4], a DCL task can be written as the following inference form based on the IF THEN rule.

[TABLE]

where $x_{i}$ is the input of the LPN, $y_{i}$ is the output of the LPN.

It should be noted that $x_{i}$ and $y_{i}$ can be many kinds of objects in LPN. For example, antecedents and consequences contain fuzzy concepts as shown in Formula 2.

In this paper, $x_{i}$ and $y_{i}$ are images. Specifically, $x_{i}$ is an input sequence with the length $m_{I}$ , $y_{i}$ is an output sequence with the length $m_{O}$ . Based on this, Formula 4 can be written as the following form.

[TABLE]

where $(x_{i}^{1},x_{i}^{2},\ldots$ , $x_{i}^{m_{I}})$ is the input data fed into the LPN, $(y_{i}^{1},y_{i}^{2},\ldots$ , $y_{i}^{m_{O}})$ is the output data expressing the relation of the input data.

In Formula 5, the $n$ antecedents from 1 to $n$ constituting the training set are used to train the LPN inference model. And the $m$ antecedents from $n+1$ to $n+m$ constituting the testing set are used to test the inference ability of LPN. Based on this, Formula 5 can be further simplified as the following form.

[TABLE]

Formula 6 can be further simplified as the following form by $I_{train}$ $=$ $\{(x_{1}^{1},x_{1}^{2},$$\ldots,$$x_{1}^{m_{I}}),$ $(x_{2}^{1},x_{2}^{2},$$\ldots,$$x_{2}^{m_{I}}),$ $\ldots,$ $(x_{n}^{1},x_{n}^{2},$ $\ldots,x_{n}^{m_{I}})\},$ $O_{train}$ $=$$\{(y_{1}^{1},$$y_{1}^{2},$$\ldots,$$y_{1}^{m_{O}}),$ $(y_{2}^{1},$$y_{2}^{2},$$\ldots,$$y_{2}^{m_{O}}),$ $\ldots,$ $(y_{n}^{1},$$y_{n}^{2},$$\ldots,$$y_{n}^{m_{O}})\},$ $I_{test}$ $=$ $\{(x_{n+1}^{1},$$x_{n+1}^{2},$$\ldots,$$x_{n+1}^{m_{I}}),$ $(x_{n+2}^{1},$$x_{n+2}^{2},$ $\ldots,$$x_{n+2}^{m_{I}}),$$\ldots,$$(x_{n+m}^{1},x_{n+m}^{2},$$\ldots,$$x_{n+m}^{m_{I}})\},$ and $O_{test}$ $=$ $\{(y_{n+1}^{1},$$y_{n+1}^{2},$$\ldots,$$y_{n+1}^{m_{O}}),$ $(y_{n+2}^{1},$$y_{n+2}^{2},$$\ldots,$$y_{n+2}^{m_{O}}),$ $\ldots,$ $(y_{n+m}^{1},$$y_{n+m}^{2},$$\ldots,$$y_{n+m}^{m_{O}})\}$

[TABLE]

In fact, Formula 7 contains three implications, i.e. $(I_{train}\rightarrow O_{train})\rightarrow(I_{test}\rightarrow O_{test})$ . One can obtain the consequence $O_{test}$ of the antecedent $I_{test}$ by translating three implications to the following form.

[TABLE]

where $R(I_{train},O_{train})$ learned using a data-driven method is a high-dimension function.

From the above analysis, one can find that the DCL has the consistent inference form with the classical propositional calculus. The comparison of the DCL and the LOH is illustrated in Fig. 3. From Fig. 3, one can see that one fundamental task of DCL or LOH is to obtain the relation $R$ . For this task, they have a very obvious difference: for LOH, $R$ needs to be defined beforehand by the experts, while for DCL, $R$ is learned from a given data set.

Based on the above analysis, it is desired to design a human-free and data-driven method directly learn the reasoning pattern from given data. In this study, we explore this problem by proposing the LiLi task. What follows, the LiLi task will be detailed and formalized.

3 A LiLi task

In this section, we first construct six LiLi data sets, then detail the LiLi task proposed in this paper, and finally provide its inference form consistent with the classical propositional calculus form.

3.1 LiLi data sets

The existing logic reasoning data sets such as CLEVR [2] and VQA [42] have made outstanding contributions to testing the logic reasoning ability of machines, but they have also some shortcomings: (1) Because of biases of the data sets, some questions can be answered through directly perceiving images rather than reasoning [2, 38, 39]. For example, the question is what color is the object in the given image, and the answer can be obtained directly from the image through perception. (2) The existing logic reasoning data sets may seem complex, but the typical neural networks and their results suggest that the logics that are embedded in these data sets are relatively simple for machines. More difficult logic data sets should be designed. (3) Some questions from the existing logic reasoning data sets have multiple answers, so it is not easy to judge whether the answers of these questions are correct or not. These shortcomings make it difficult to assess the reasoning abilities of machines using these data sets.

Therefore, we construct the LiLi data sets to overcome these shortcomings. In this paper, these logical relations: Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication are selected to construct the LiLi data sets. (1) Questions are able to be answered only if a model has both perception and reasoning abilities. (2) The typical neural networks are almost powerless for the logic of multiplication (detailed in Sect. 4). It indicates that the LiLi task is really worth studying. (3) The construction process of the LiLi data set is controlled by us and only one correct answer can be obtained from each sample. Hence, it is easy to evaluate the correctness of the answer.

We construct the LiLi data sets to verity the performance of the proposed LPN model. It is worth noting that the LPN model does not know the logical relations hidden in images beforehand. The bitwise operations are binary numbers and arithmetic operations are decimals. For Bitwise And, Bitwise Or and Bitwise Xor data sets, the size of the images is set to 15 $\times$ 120, so the number embedded in one image is at most a 14-digit number. For Addition, Subtraction and Multiplication data sets, the size of the images is set to 15 $\times$ 60, hence the number embedded in one image is at most a 7-digit number. This step ensures that the proportion of numbers used for training is a very small fraction of all possible combinations. Each of these samples consists of two input images each containing an integer number. The pair of two input images marked $x_{i}^{1}$ and $x_{i}^{2}$ are then generated from a pre-specified range as detailed below. The output image marked $y_{i}$ is generated according to the result of the operation on the two input images. The numbers embedded in images $x_{i}^{1}$ , $x_{i}^{2}$ and $y_{i}$ are A, B and E.

The details about these data sets are here:

$\bullet$

Bitwise And: For per sample, both A and B have 14 binary digits. E is the bitwise and of A and B. For example, A and B are “00111101110111” and “10010101110000”, respectively. So, E is “00010101110000”. The sample is shown in Fig. 4(a).

$\bullet$

Bitwise Or: For per sample, both A and B have 14 binary digits. E is the bitwise or of A and B. For example, A and B are “10001111100010” and “10110100101110”, respectively. So, E is “10111111101110”. The sample is shown in Fig. 4(b).

$\bullet$

Bitwise Xor: For per sample, both A and B have 14 binary digits. E is the bitwise xor of A and B. For example, A and B are “00110101010110” and “00111101110000”, respectively. So, E is “00001000100110”. The sample is shown in Fig. 4(c).

$\bullet$

Addition: For per sample, the range of A and B are 0 $\sim$ 4999999. E is the sum of A and B. For example, A and B are “646724” and “4087801”, respectively. So, E is “4734525”. The sample is shown in Fig. 4(d).

$\bullet$

Subtraction: For per sample, the range of A and B are 0 $\sim$ 9999999. E is the difference between A and B. In order to ensure a positive result, A is chosen to be larger or equal to B. For example, A and B are “6740693” and “3502317”, respectively. So, E is “3238376”. The sample is shown in Fig. 4(e).

$\bullet$

Multiplication: For per sample, the range of A and B are 0 $\sim$ 3160. E is the product of A and B. For example, A and B are “1257” and “1377”, respectively. So, E is “1730889”. The sample is shown in Fig. 4(f).

According to the difficulty of the logical relations embedded in data sets, these data sets are divided into 3 levels: one-star( $\star$ , easy), two-star( $\star$$\star$ , intermediate), and three-star( $\star$$\star$$\star$ , difficult).

Bitwise And, Bitwise Or and Bitwise Xor data sets ( $\star$ ): (1) The value of each digit of E is only determined by the values at the same position in A and B, e.g., in Fig. 4(a), the value at 2th (the rightmost position is 1th) position in E is only determined by the values at 2th position in A and B , so the value at 2th position in E is “0” (1&0=0); (2) The possible value of each digit of E is 0 or 1.

Addition and Subtraction data sets ( $\star$$\star$ ): (1) The value of each digit of E is determined by the carry or borrow and the values at the same position in A and B, e.g., in Fig. 4(d), the value at 2th position in E is determined by the carry of the sum of values at 1th position in A and B and the values at 2th position in A and B; (2) The possible value of carry or borrow part is 0 or 1, so the possible value of each digit (except the rightmost position) of E has two possibilities in 0 $\sim$ 9, we choose one of the two possibilities as the final result based on the carry or borrow case. E.g., in Fig. 4(d), the carry of the sum of values at 1th position in A and B is “0”, the sum of values at 2th position in A and B is “2” (2+0=2), so the value at 2th position in E is “2” (0+2=2).

Multiplication data set ( $\star$$\star$$\star$ ): (1) The value at a given position in E is determined by the values at the given positions in A and B and all positions in A and B before that given position. E.g., in Fig. 4(f), the value at 2th position in E is determined by the values at 1th and 2th positions in A and the values at 1th and 2th positions in B. (2) The number of the possible value of each digit (except the rightmost position) of the E on Multiplication data set is more than that on other LiLi data sets.

3.2 LiLi task

In this paper, we focus on the scene where a model directly learns and reasons the relation between two input images and one output image, without any reasoning patterns beforehand. In this task, we first generate three images, two for the input and one for the output. The output image expresses the relation between two input images. In addition, the n-digit number embedded in the images are not explicitly introduced, which means that the meaning of contents embedded in images and the relation between two input images and one output image are not known at all. One example is used to illustrate the LiLi task. If the n-digit numbers embedded in two input images are “234” and“432”, the output image are “666”, the logical relation between two input images and the output image is addition. It can be formalized as follows.

Given a data concept logic system as a set of triple $\mathcal{R}=(I,R,O)$ , where $I$$=$$\{x_{i}~{}$$|$$~{}x_{i}$$=$$(x_{i}^{1},$$x_{i}^{2}),$$i=1,2,\ldots,N\}$ is an input sequence, $O=\{y_{i}\}_{i=1}^{N}$ is the output sequence, where $x_{i}^{1},x_{i}^{2}$ and $y_{i}$ are three images with $K$ pixels shown in Fig. 4. $R$ denotes the logical relation between the pair of images $x_{i}\in I$ and $y_{i}\in O$ .

At the semantic or high level, $R$ is called as Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication denoted as $\&,|,\wedge,+,-$ or $\times$ , and they are easily understood by human beings. However, at the abstract or low level, $R$ may be a high-dimensional mapping that is extremely difficult to define the mapping by human, in this paper, $R:[-1,1]^{2K}\rightarrow\{0,1\}^{K}$ . Hence, it is desired to design a novel method to express an abstract or low level logical relation.

In this task, given a data set $D=\{(x_{i},y_{i})\}_{i=1}^{N}$ , where $y_{i}$ denotes the logical relation between the pair of images $x_{i}^{1}$ and $x_{i}^{2}$ . When drawing these images, we use the pixel value 0 for black, the pixel value 1 for white. For the input images, we scale every pixel value into -1 $\sim$ 1 by subtracting the mean, so $x_{i}^{1},x_{i}^{2}\in[-1,1]^{K}$ . For the output image $y_{i}\in\{0,1\}^{K}$ . This task can be viewed as finding a mapping from the input space $I=\{x_{i}\}_{i=1}^{N}$ to the output space $O=\{y_{i}\}_{i=1}^{N}$ by a supervised learning strategy. In this study, this task can be transformed into a regression problem with the Mean Square Error (MSE) loss function, i.e. $\mathcal{L}$ is $MSE$ . It can be by solving the following optimization problem.

[TABLE]

where $f$ is a sigmoid function to transform $LPN_{W}(x_{i}^{1},x_{i}^{2})$ to [0,1], i.e. $f(LPN_{W}$$(x_{i}^{1},x_{i}^{2})$$)$ $\in$ $[0,1]^{K}$ , and LPN is parameterized by $W$ . Formula 9 is differentiable with respect to the parameter $W$ , and can be efficiently solved by using the gradient descent method.

Based on above analysis and discussion, we illustrate the workflow of the LiLi task shown in Fig. 5, where $I$ is the set of input image data, $O$ is the set of ground-truth output image data, $\hat{O}$ indicates the set of logical relation patterns reasoned by $f(LPN_{W}(x_{i}^{1},x_{i}^{2}))$ , $O/I$ is the ground-truth logical relation set for a given input image set $I$ , $\hat{O}/I$ is the prediction logical relation set for a given input image set $I$ using $LPN$ , Loss is used to evaluate the difference between $O/I$ and $\hat{O}/I$ . $LPN$ indicates the logical pattern network, which is implemented in this paper using CNN-LSTM, MLP, Autoencoder, ResNet18, ResNet50, ResNet152 and DCM, respectively. More implementation details about LPN see Sects. 4.1 and 5.

From Formula 9 and Fig. 5, one observes that the LPN merely needs to be provided some training data to automatically learn the logical patterns between a pair of the given images without providing any reasoning patterns beforehand. This is an absolutely data-driven strategy to mine the logical patterns hidden in data.

3.3 Inference form of a LiLi task

Based on the inference form of the DCL 2.2, a LiLi task can be written as the following inference form based on the IF THEN rule.

[TABLE]

where $x_{i}^{1}$ and $x_{i}^{2}$ are the input images, $y_{i}$ is the output image expressing the relation between two input images.

In Formula 10, the $n$ antecedents from 1 to $n$ constituting the training set are used to train the LPN inference model. And the $m$ antecedents from $n+1$ to $n+m$ constituting the testing set are used to test the inference ability of LPN. Based on this, Formula 10 can be further simplified as the following form.

[TABLE]

Formula 11 can be further simplified as the following form by $I_{train}$ $=$ $\{(x_{1}^{1},$$x_{1}^{2}),$ $(x_{2}^{1},$$x_{2}^{2}),$$\ldots,$ $(x_{n}^{1},$$x_{n}^{2})\}$ , $O_{train}$ $=$ $\{y_{1},$ $y_{2},$$\ldots,$$y_{n}\}$ , $I_{test}$ $=$ $\{(x_{n+1}^{1},$$x_{n+1}^{2}),$ $(x_{n+2}^{1},$$x_{n+2}^{2}),$$\ldots,$ $(x_{n+m}^{1},$$x_{n+m}^{2})\}$ , and $O_{test}$ $=$ $\{y_{n+1},$$y_{n+2},$$\ldots,$$y_{n+m}\}$ .

[TABLE]

One can obtain the consequence $O_{test}$ of the antecedent $I_{test}$ by translating three implications $(I_{train}\rightarrow O_{train})\rightarrow(I_{test}\rightarrow O_{test})$ included by Formula 12 to the following form.

[TABLE]

where $R(I_{train},O_{train}):[-1,1]^{2K}\rightarrow\{0,1\}^{K}$ learned using a data-driven method is a high-dimension mapping function.

According to the above analysis, one can find that on the one hand, the LiLi task has the consistent inference form with the classical propositional calculus, on the other hand they have some different aspects as follows.

$\bullet$

$R_{z}:[0,1]^{2}\rightarrow[0,1]$ is a duality function. However, $R(I_{train},O_{train}):[-1,1]^{2K}\rightarrow\{0,1\}^{K}$ is a complex function with high dimensions ( $K$ takes 1800 or 900 in this paper).

$\bullet$

$R_{z}$ needs to be defined beforehand by the experts, while $R$ is learned from a given data set because it is almost impossible to be defined the function beforehand by human.

In real world, there exist many complex relations that can not be provided beforehand by human beings. When facing this situation, the classical propositional calculus can not work well, even cannot work. Hence, it is desired to design a human-free and data-driven method to learn an unknown relation function. This is the our most main motivation.

4 Experiments

In this section, we compare the performances of several typical deep neural networks on the six LiLi data sets. Next, we detail used models and experimental setup.

4.1 Models and experimental setup

For all models, two images as input are fed into the models, and one image as output is used to compare with the ground truth image. These models are trained to produce one output image in which the correct number is embedded by optimising a mean square error (mse) loss and using the ADAM or SGD optimiser. The early-stopping is used to choose the optimiser and hyper-parameters of smallest loss estimated on the validation set. In addition, the batch size is set to 32. The hyper-parameter settings and further details on all models see in Table 1. Finally, the performance values are reported on the testing set.

$\bullet$

CNN-LSTM: We develop the model using a standard LSTM module [43]. Since LSTMs are designed to process inputs sequentially, we first pass images sequentially and independently through a 2-layer CNN, and the resulting sequence is handed over to the LSTM. The final hidden state of the LSTM is passed through a fully-connected layer with sigmoid activation function. The model is trained using batch normalization after each convolutional layer and dropout is applied to the LSTM hidden state.

$\bullet$

MLP: The MLP is implemented followed by [44]. The model has three hidden layers each with 256 nodes with ReLU activation functions and one output layer with sigmoid activation. All nodes between adjacent layers are fully-connected.

$\bullet$

CNN-MLP: Inspired by [45], we implement a 2-layer CNN with ReLU activation functions and batch normalizations. The input images are treated as a set of separate greyscale input feature maps for the CNN. The convolutional output is passed through two-layer fully-connected layers, in which the first layer using a ReLU activation function and the second layer using a sigmoid activation function.

$\bullet$

Autoencoder: A simple autoencoder network is implemented using the idea of [46]. In this model, a 2-layer CNN is used as the encoder network and a 2-layer upsampling network as the decoder network. At last, a convolutional layer is used as the output layer with a sigmoid activation.

$\bullet$

ResNet: We use ResNet architecture as described in [26] and modify the softmax activation function to sigmoid activation function on the last layer of the network. In this paper, we train ResNet-18, ResNet-50 and ResNet-152 on all LiLi data sets and get nearly performances.

4.2 Experiments and analysis on LiLi data sets

In this subsection, we test several typical deep neural networks on these LiLi data sets. Each data set consists of 10,000 training samples, 10,000 validation samples and 20,000 testing samples. The testing samples are not included in the training or validation samples. All models are trained on each training set and stopped when the losses on validation sets no longer decrease. We use an OCR software [47] to recognize the numbers embedded in the predicted images, and then compare them with the ground truth numbers. For one predicted image, it is right when all digits are equal to the ground truth digits. The accuracies of Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication data sets are shown in Table 2.

From Table 2, one observes that all models get the good performances on Bitwise And, Bitwise Or and Bitwise Xor data sets. Only CNN-MLP, Autoencoder and ResNets get the good performances on Addition and Subtraction data sets. Unfortunately, all models fail on Multiplication data set.

The validation loss curves on Bitwise And, Bitwise Or and Bitwise Xor data sets are shown in Fig.6(a), Fig.6(b) and Fig.6(c). Because of the early-stopping, the epochs of these models are different. From these figures, one finds that all models converge to small losses. In addition, the MLP, CNN-MLP and Autoencoder converge faster than the CNN-LSTM and ResNets. The validation loss curves on Addition and Subtraction data sets are shown in Fig.6(d) and Fig.6(e). From these figures, one observes that the losses of the CNN-MLP, Autoencoder and ResNets are smaller than other models. Moreover, both of CNN-MLP and Autoencoder converge faster than the ResNets. The validation loss curve on Multiplication data set is shown in Fig.6(f). One can see, from it, that all models have very large losses when they converge.

Next, we try to see if increasing data set size could improve model performances. In this scene, all models are trained on 150,000 training data sets and stopped when the losses on validation data sets no longer decrease. The accuracies of all models on six LiLi data sets are shown in Table 3.

From Table 3, one observes that most models get the good performances on Bitwise And, Bitwise Or, Bitwise Xor, Addition and Subtraction data sets. It means the performances of models can be improved by increasing the size of data sets. This provides a strategy to solve difficult logic learning problems.

The validation loss curves are shown in Fig.7. From Fig.7, one observes that the most of the models converge to smaller losses than before. The validation loss curves on Bitwise And, Bitwise Or and Bitwise Xor data sets are shown in Fig.7(a) to 7(c). From these figures, one finds that the CNN-LSTM and ResNets converge faster than before. The validation loss curves on Addition, Subtraction and Multiplication data sets are shown in Fig.7(d), Fig.7(e) and Fig.7(f), respectively. From Fig.7(d) and Fig.7(e), one observes that the losses of all models are smaller than before. But, from Fig.7(f), we observe that all models still have very large losses when they converge on Multiplication data set. A good phenomenon is that the losses of all models are smaller than before.

One guess: the space position plays a significant role in the process of learning logical patterns. It is worth noting that the CNN-LSTM only gets about 80% accuracies on Addition and Subtraction data sets even increasing the size of data sets. However, it get 100% accuracy on Bitwise And, Bitwise Or and Bitwise Xor data sets. The reason is that the CNN-LSTM is fed the input images one by one, learn the features of the images separately so that they almost do not consider the carry or borrow case on addition or subtraction. Each digit of the result of addition and subtraction is affected by the adjacent positions (the influences from carry or borrow), while each digit of the result of bitwise and, bitwise or and bitwise xor is not. If the models want to get high accuracies, they should dispose 2 input images a and b simultaneously on Addition and Subtraction data sets. In order to verify this idea, we develop a model called CNN2-MLP that is similar to CNN-MLP. These two models have same structure and hyper-parameter settings except CNN2-MLP learns features of each of two input images separately. And their structures are shown in Fig.8.

The validation loss curves of CNN-MLP and CNN2-MLP on the six LiLi data sets are shown in Fig. 9. For Bitwise And, Bitwise Or and Bitwise Xor data sets, both of them converge to the small losses. For Addition and Subtraction data sets, the validation loss of CNN2-MLP is large on 10,000 training data sets. When the size of training data set increasing, the validation loss of CNN2-MLP is smaller than before but still larger than the validation loss of CNN-NLP. For Multiplication data set, both of them converge to the large losses. The test accuracies of CNN2-MLP on Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication data sets are shown in Table 4. CNN2-MLP can not get the good performances on Addition and Subtraction data sets, but still work well on Bitwise And, Bitwise Or and Bitwise Xor data sets. These experiment results verify that the space position plays a significant role in the process of learning logical patterns.

As the size of the given data increases, the MLP tends to have good performances on Addition and Subtraction data sets. This is because each digit of the result of the addition and subtraction is affected by the adjacent positions in both input images. In particular, for the MLP, the relation between two images at their arbitrary positions, when data set size is small, it can not focus on the exact relation on their adjacent positions. As soon as the data set gets larger, the defect can be made up.

From what has been discussed above, we can divide these models into three categories:

CNN-LSTM: This model is appropriate for this type of task where each digit of the result is only affected by the same position of the input numbers (e.g. Bitwise And, Bitwise Or and Bitwise Xor data sets). 2. 2.

MLP: The model is appropriate for this type of task where each digit of the result is affected by all the positions of the input numbers (MLP is more appropriate than other models on Multiplication data sets). If the size of data set is large enough, MLP can focus on the same or adjacent positions of the input numbers (e.g. Bitwise And, Bitwise Or, Bitwise Xor, Addition and Subtraction data sets). 3. 3.

CNN-MLP, Autoencoder and ResNets: These models are appropriate for this type of task where each digit of the result is affected by the same or adjacent positions of the input numbers (e.g. Bitwise And, Bitwise Or, Bitwise Xor, Addition and Subtraction data sets).

Next, from the standpoint of the visual effects, these models are compared. These predicted results output by the models with the poor performances are shown. For Addition and Subtraction data sets, only the CNN-LSTM and MLP get the poor performances; for Multiplication data sets, all models get the poor performances.

For Addition and Subtraction data sets, the visual effects are shown in Fig.10 and Fig.11. From Fig.10(a) and Fig.11(a), one observes that most models can clearly learn the first and last digits, and other digits obscurely in output images. As the size of the training data set increases, from Fig.10(b) and Fig.11(b), one observes that most models can clearly learn most digits in output images. For Multiplication data set, the visual effects are shown in Fig.12. From Fig.12(a), we observe that most models can only clearly learn the first and last digits and other digits obscurely in output images. As the size of the training data set increases, from Fig.12(b), one sees that most models can clearly learn more digits than before, but still obscurely for most digits in output images. There are many reasons why the performances of the predicted result on the digits is poor. Some predicted digits are very obscure, e.g. the $p_{1}$ is shown in Fig.11(b)). Some are similar to other digits, e.g. the $p_{2}$ is shown in Fig.10(b)). Some are right but OCR can not recognize them, e.g. the $p_{3}$ is shown in Fig.11(b). Hence the accuracies can be higher in fact.

From above experimental results, one observes that these models can not solve the difficult LiLi task: Multiplication. In the next section, an effective solution is provided by dividing this task into a few easier subtasks.

5 Divide and conquer model for Multiplication data set

Although increasing the size of data set has effects on solving the difficult logic learning problems, all models still get the poor performances on Multiplication data set. To our knowledge, many problems are complex and difficult to solve directly, but it becomes easier when decomposed [48, 49, 50, 51, 52]. Artificial algorithm decomposition can effectively reduce the difficulty of learning [53]. Inspired by this, we propose the DCM to address complex task adopting the decomposition strategy.

We decompose a complex task into $k$ subtasks through the DCM, and the decomposition criterion is that the combination difficulty of subtasks is lower than the complex task.

[TABLE]

where $H$ is the difficulty of this complex task, $h_{i}$ is the difficulty of the $i^{th}$ subtask, $f$ is the combination difficulty of subtasks and it is determined by all subtasks.

As one sees from Fig.6(f) and Fig.7(f), the MLP is more robust and can converge to a smaller loss than other models. For Multiplication, the value at a given position of E is determined by the values at the given position in A and B and all positions in A and B before that given position. MLP is exactly more appropriate this scene than other models. So we select the MLP as the decomposition module of the DCM.

In this experiment, Multiplication data set is regenerated by adding some information. For training set, each of these samples consists of 4 input images each containing a single integer number. The input images are marked a, b, c and d. The output image marked e is generated by the result of the multiplication operation. The numbers embedded in images $a,b,c,d$ and $e$ are $A,B,C,D$ and $E$ . For testing set, only generate image $a,b$ and $e$ . For per sample, the ranges of A and B are 0 $\sim$ 3160. E is the product of A and B. The carry operation occurs when the product of two numbers on one digit is more than ten, and C is used to record the value of carry part, while D is used to record the value of non-carry part. So, the multiplication is divided into the carry part and non-carry part, in other words, the sum of C and D is equal to E. For example, let A and B be “2261” and “584”, respectively, and then, C, D and E equal to “1256300”, “64124” and “1320424”, respectively. The calculation procedure is shown in Fig. 13.

The DCM is divided into three subtasks: carry subtask, non-carry subtask and synthetic subtask. First, the carry subtask and non-carry subtask are used to learn the carries of multiplication and multiplication without carry, respectively. And then, the synthetic subtask is used to learn the synthetic pattern of the carry subtask and non-carry subtask. The network structures of these three subtasks are similar, but the network parameters are different.

Carry subtask: During training, the images a and b are used as the input, image c as the ground-truth result. The network of the carry subtask is fully-connected layers and uses the ReLU as the activation functions in the hidden layers and the sigmoid in the output layer. The carry subtask has 5 hidden layers, and each layer has 256 units. 2. 2.

Non-carry subtask: During training, the images a and b are used as the input, the image d as the ground-truth result. The network of the non-carry subtask is fully-connected layers and uses the ReLU as the activation functions in the hidden layers and the sigmoid in the output layer. The non-carry subtask has 5 hidden layers, and each layer has 256 units. 3. 3.

Synthetic subtask: During training, the images c and d are used as the input, the image e as the ground-truth result. The network of the synthetic subtask is fully-connected layers and uses the ReLU as the activation functions in the hidden layers and the sigmoid in the output layer. The synthetic subtask has 3 hidden layers, and each layer has 256 units.

The ground-truth image is named as x (x can be c, d and e), and the predicted image is named as x’. We hope the number embedded in predicted image e’ is equal to the number embedded in ground truth image e, i.e., E’ = E.

Training: During training procedure, the images a and b are used as the input, e as the ground truth result and e’ as the output. It is interesting that the images c and d are both the input and ground truth results. For the carry subtask and non-carry subtask, the images c and d are the ground truth images, however, for the synthetic subtask, the image c and d are the input images. Taking the multiplication formula “2490 $\times$ 2644 = 6583560” for example explains the training procedure which is shown in Fig. 14(a). A, B, C, D and E are “2490”, “2644”, “2575300”, “4008260” and “6583560”, respectively. The carry subtask, non-carry subtask and synthetic subtask are trained separately. For the carry subtask and non-carry subtask, the images a and b are used as the inputs, the images c and d as the ground truth images and the image c’ and d’ as the outputs, respectively. For the synthetic subtask, the images c and d are used as input, the image e as the ground truth image and image e’ as output. The smaller the differences between predicted image c’, d’ and e’ as well as ground-truth image c, d and e are, the better the performance of DCM is. 2. 2.

Testing: In the testing procedure, DCM is an end-to-end model. We take the multiplication formula “123 $\times$ 124 = 15252” for example to explain the testing procedure which is shown in Fig. 14(b). A and B are “123” and “124”, respectively. In the testing procedure, the DCM only takes images a and b as the inputs, and then directly gets a predicted image e’ at the output of the synthetic subtask. Specifically, the inputs are firstly passed through the carry subtask and non-carry subtask to get a carry prediction layer and a non-carry prediction layer, respectively. Then, the two prediction layers are concatenated and passed through the synthetic subtask to get the final prediction result E’. E’ is “15252” and equals to E which shows that the DCM correctly found the relation between the images a and b only using the pure visual information.

The DCM is trained using the stochastic gradient descent with momentum 0.9, optimising a mean square error (mse) loss and batch size is fixed 256. The learning rate starts with 0.8, and reduces slowly when the loss plateaus. The training on the carry subtask, non-carry subtask and synthetic subtask terminates when the loss no longer reduces.

The accuracy of each subtask of DCM is shown in Table 5. In contrast, the DCM achieves the surprising accuracy $84.5\%$ which is higher than the MLP on Multiplication data set. Some visual effects from the testing are shown in Fig. 15. In Fig. 15(a), both DCM and MLP get correct predicted images. In Fig. 15(b), the DCM gets the correct predicted image, but the MLP does not. In Fig. 15(c), both DCM and MLP predict wrong images. It can be seen that the last two digits and first two digits in the image of the MLP are predicted correctly, but the rest central 3 digits are uncertain. However, for the DCM, only one digit of the number embedded in the predicted image is uncertain. That is to say, the DCM can confirm more digits than the MLP.

This owns to the special structure of the DCM. DCM divides a complex task into three simple subtasks, carry subtask, non-carry subtask and synthetic subtask, each subtask only learns one aspect of the task. This helps to reduce uncertainty of each predicted digit embedded in the image e’. In order to explain the reason for the effectiveness of the DCM conveniently, we employ some symbols in advance. The goal of the visual logic learning of the arithmetic operations is to compute the value of number 3 in a formula like “number 1 operation number 2 = number 3”. We call the digit of number n at the mth position (the rightmost position is 1th) “d ${}_{n}^{m}$ ”. The complexity of the task is determined by the degree of uncertainty (the amount of possibilities of each digit) in the process of learning logical relation between the input images and output image. For Addition, “ $d_{3}^{m}$ ” only has two possibilities, “ $(d_{1}^{m}+d_{2}^{m})mod10$ ” or “ $(d_{1}^{m}+d_{2}^{m}+1)mod10$ ”. The case of “ $d_{3}^{m}$ ” on Subtraction is similar to addition. However, the degree of uncertainty of multiplication is stronger than that of addition and subtraction, where “ $d_{3}^{m}$ ” has ten possibilities.

We assume a formula such as “ $d_{1}^{2}d_{1}^{1}\times d_{2}^{2}d_{2}^{1}=d_{3}^{4}d_{3}^{3}d_{3}^{2}d_{3}^{1}$ ” or “ $d_{1}^{2}d_{1}^{1}\times d_{2}^{2}d_{2}^{1}=d_{3}^{3}d_{3}^{2}d_{3}^{1}$ ” (if $d_{3}^{4}$ =0). The scope of each digit “ $d_{3}^{m}$ ” (except the digit at rightmost position) is very big, the digit at rightmost position is always an unique and determined value “ $(d_{1}^{1}\times d_{2}^{1})mod10$ ”. The DCM can reduce the degree of uncertainty of predicted number 3. For example, “ $d_{3}^{2}$ ” is determined by the carry and non-carry part during multiplication. In the MLP, the scope of “ $d_{3}^{2}$ ” is 0 $\sim$ 9, and the scope of the carry at the 2th position is 0 $\sim$ 8. So the carry at the 2th position is to choose one value in 0 $\sim$ 8 out of the range 0 $\sim$ 9. The non-carry at the 2th position is to choose one value in 0 $\sim$ 9. So, there are 900 possibilities ( $C_{10}^{9}C_{9}^{1}C_{10}^{1}$ ) for “ $d_{3}^{2}$ ” in fact. In the MLP, “ $d_{3}^{2}$ ” is directly computed. In contrast, our method is first to compute carry and non-carry respectively, and then synthetic these two subtasks. The scope of the carry at the 2th position is 0 $\sim$ 8, so the carry at the 2th position only needs to determine which one is right in 0 $\sim$ 8. The non-carry at the 2th position is to choose one of 0 $\sim$ 9. Hence, there are 90 possibilities ( $C_{9}^{1}C_{10}^{1}$ ) for “ $d_{3}^{2}$ ”. The DCM largely reduces the number of possible values from 900 to 90. Therefore, the DCM confirms more digits than that of the MLP, when the predictions of two models are all wrong.

6 Conclusion

In this study, we have explored an interesting and important research topic: can logic reasoning patterns be directly learned from given data? As a preliminary exploration, the topic has been investigated through a called LiLi task: directly learning logic from a training image set. In this work, many typical neural network models have been used to solve the LiLi task with the good performances on easy and intermediate logic data sets. In order to further solve the difficult task, a new network framework called DCM has been developed using a decompose strategy and adding some label information. This idea also can be applied to other complex logic learning tasks. For example, it is difficult to compute decimal bit operation directly, we can convert the decimal to binary first, and then compute binary bit operations. The DCM provides a strategy to solve some difficult logic reasoning tasks through combing the domain expert knowledge with data-driven model.

This work is only a preliminary exploration towards learning logic from data. Several issues are worthwhile investigating along this direction, such as mining visual functional relations among multiple variables and directly learning rules from data. These issues are very challenging and meaningful. To this end, more logic reasoning data sets containing complex formulas embedded in the images and more effective models for solving logical reasoning tasks should be specially designed.

Acknowledgments

This work was supported by National Key R&D Program of China (No. 2018YFB1004300), National Natural Science Fund of China (No. 61672332, 61432011, 61502289), Key R&D program (International Science and Technology Cooperation Project) of Shanxi Province, China (No. 201903D421003), Program for the Young San Jin Scholars of Shanxi (No. 2016769), Young Scientists Fund of the National Natural Science Foundation of China (No. 61802238, 61906115, 61603228, 62006146, 61906114), Shanxi Province Science Foundation for Youths (No. 201901D211169, 201901D211170, 201901D211171), Research Project Supported by Shanxi Scholarship Council of China (No. HGKY2019001), and Scientific and Technologial Innovation Programs of Higher Education Institutions in Shanxi (No. 2020L0036).

References

[1]

R. Colom, S. Karama, R. E. Jung, R. J. Haier, Human intelligence and brain networks, Dialogues in clinical neuroscience 12 (4) (2010) 489.

[2]

J. Johnson, B. Hariharan, L. V. D. Maaten, F. F. Li, C. L. Zitnick, R. Girshick, CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning, in: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 1988–1997.

[3]

G. Wang, Fuzzy reasoning and fuzzy logic, in: Soft Computing in Intelligent Systems and Information Processing. Proceedings of the 1996 Asian Fuzzy Systems Symposium, Kenting, China, 1996, pp. 478–483.

[4]

M. Mizumoto, Comparison of fuzzy reasoning methods, Fuzzy Sets and Systems 8 (3) (1982) 253–283.

[5]

J. Yen, Fuzzy logic-a modern perspective, IEEE Transactions on Knowledge and Data Engineering 11 (1) (1999) 153–165.

[6]

D. W. Pei, On the strict logic foundation of fuzzy reasoning, Soft Computing 8 (8) (2004) 539–545.

[7]

R. Wille, Restructuring lattice theory: an approach based on hierarchies of concepts, in: I. Rival (Ed.), Ordered sets, Springer, 1982, pp. 445–470.

[8]

J. Tadrat, V. Boonjing, P. Pattaraintakorn, A new similarity measure in formal concept analysis for case-based reasoning, Expert Systems with Applications 39 (1) (2012) 967–972.

[9]

J. Golinskapilarek, E. Orlowska, Relational reasoning in formal concept analysis, in: IEEE International Fuzzy Systems Conference, London, UK, 2007.

[10]

M. W. Shao, M. M. Lv, K. W. Li, C. Z. Wang, The construction of attribute (object)-oriented multi-granularity concept lattices, International Journal of Machine Learning and Cybernetics 11 (4) (2020) 1017–1032.

[11]

N. J. Nilsson, Probabilistic logic, Artificial Intelligence 28 (1) (1986) 71–87.

[12]

N. J. Nilsson, Probabilistic logic revisited, Artificial Intelligence 59 (1-2) (1993) 39–42.

[13]

Y. She, X. He, Y. Qian, W. Xu, J. Li, A quantitative approach to reasoning about incomplete knowledge, Information Sciences 451-452 (2018) 100–111.

[14]

S. Y. Li, L. M. Tam, H. K. Chen, C. S. Chen, A novel-designed fuzzy logic control structure for control of distinct chaotic systems, International Journal of Machine Learning and Cybernetics (11) (2020) 2391–2406.

[15]

J. Pearl, Evidential reasoning using stochastic simulation of causal models, Artificial Intelligence 32 (2) (1987) 245–257.

[16]

S.-M. Chen, S.-H. Cheng, C.-H. Chiou, Fuzzy multiattribute group decision making based on intuitionistic fuzzy sets and evidential reasoning methodology, Information Fusion 27 (2016) 215–227.

[17]

Z. Yang, S. Bonsall, J. Wang, Fuzzy rule-based bayesian reasoning approach for prioritization of failures in fmea, IEEE Transactions on Reliability 57 (3) (2008) 517–528.

[18]

J. B. Tenenbaum, T. L. Griffiths, C. Kemp, Theory-based bayesian models of inductive learning and reasoning, Trends in Cognitive Sciences 10 (7) (2006) 309–318.

[19]

Y. Qian, X. Liang, W. Qi, J. Liang, L. Bing, A. Skowron, Y. Yao, J. Ma, C. Dang, Local rough set: A solution to rough data analysis in big data, International Journal of Approximate Reasoning 97 (2018) 38–63.

[20]

Y. She, X. He, H. Shi, Y. Qian, A multiple-valued logic approach for multigranulation rough set model, International Journal of Approximate Reasoning 82 (2017) 270–284.

[21]

Y. Lin, J. Li, A. Tan, J. Zhang, Granular matrix-based knowledge reductions of formal fuzzy contexts, International Journal of Machine Learning and Cybernetics (11) (2020) 643–656.

[22]

M. Li, M. Chen, W. Xu, Double-quantitative multigranulation decision-theoretic rough fuzzy set model, International Journal of Machine Learning and Cybernetics 10 (5) (2019) 3225–3244.

[23]

Q. Guo, Y. Qian, X. Liang, Mining logic patterns from visual data, in: International Conference on Data Mining Workshops, Beijing, China, 2019.

[24]

W. Z. Dai, Q. Xu, Y. Yu, Z. H. Zhou, Bridging machine learning and logical reasoning by abductive learning, in: Advances in Neural Information Processing Systems, Vancouver, Canada, 2019.

[25]

G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 4700–4708.

[26]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.

[27]

X. Liang, Q. Guo, Y. Qian, W. Ding, Q. Zhang, Evolutionary deep fusion method and its application in chemical structure recognition, IEEE Transactions on Evolutionary Computation (2021) 1–1doi:10.1109/TEVC.2021.3064943.

[28]

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6) (2017) 1137–1149.

[29]

K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2961–2969.

[30]

E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (4) (2017) 640–651.

[31]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4) (2018) 834–848.

[32]

O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: Lessons learned from the 2015 mscoco image captioning challenge, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (4) (2016) 652–663.

[33]

J. Johnson, A. Karpathy, L. Fei-Fei, Densecap: Fully convolutional localization networks for dense captioning, in: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 4565–4574.

[34]

Z. Yang, X. He, J. Gao, L. Deng, A. Smola, Stacked attention networks for image question answering, in: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 21–29.

[35]

Q. Wu, C. Shen, P. Wang, A. Dick, A. van den Hengel, Image captioning and visual question answering based on attributes and external knowledge, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (6) (2018) 1367–1381.

[36]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in Neural Information Processing Systems, Montr $\acute{e}$ al, Canada, 2014, pp. 2672–2680.

[37]

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, in: International Conference on Machine Learning, New York City, USA, 2016, pp. 1060–1069.

[38]

R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko, Learning to reason: End-to-end module networks for visual question answering, in: IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 804–813.

[39]

P. Zhang, Y. Goyal, D. Summers-Stay, D. Batra, D. Parikh, Yin and yang: Balancing and answering binary visual questions, in: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 5014–5022.

[40]

K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2 (5) (1989) 359–366.

[41]

Zadeh, L. A., Outline of a new approach to the analysis of complex systems and decision processes, IEEE Transactions on Systems, Man and Cybernetics SMC-3 (1) (1973) 28–44.

[42]

S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Parikh, VQA: Visual question answering, International Journal of Computer Vision 123 (1) (2015) 4–31.

[43]

A. Graves, Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780.

[44]

Y. Hoshen, S. Peleg, Visual learning of arithmetic operation, in: Association for the Advancement of Artificial Intelligence, Phoenix, USA, 2016, pp. 3733–3739.

[45]

Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436.

[46]

G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.

[47]

R. Smith, An overview of the tesseract ocr engine, in: Ninth International Conference on Document Analysis and Recognition, Vol. 2, Curitiba, Brazil, 2007, pp. 629–633.

[48]

Y. Qian, J. Liang, Y. Yao, C. Dang, Mgrs: A multi-granulation rough set, Information Sciences 180 (6) (2010) 949–970.

[49]

L. Ke, Q. Zhang, R. Battiti, Hybridization of decomposition and local search for multiobjective optimization, IEEE Transactions Cybernetics 44 (10) (2014) 1808–1820.

[50]

J. Liang, J. Fadili, G. Peyré, A multi-step inertial forward-backward splitting method for non-convex optimization, in: Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 4035–4043.

[51]

Y. Qian, J. Liang, W. Pedrycz, C. Dang, Positive approximation: An accelerator for attribute reduction in rough set theory, Artificial Intelligence 174 (2010) 597–618.

[52]

A. Tan, W. Z. Wu, S. Shia, S. Zhao, Granulation selection and decision making with multigranulation rough set over two universes, International journal of machine learning and cybernetics 10 (9) (2019) 2501–2513.

[53]

L. Chen, P. Huang, Y. Li, Z. Meng, Edge-dependent efficient grasp rectangle search in robotic grasp detection, IEEE/ASME Transactions on Mechatronics (2020) 1–1doi:10.1109/TMECH.2020.3048441.

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Colom, S. Karama, R. E. Jung, R. J. Haier, Human intelligence and brain networks, Dialogues in clinical neuroscience 12 (4) (2010) 489.
2[2] J. Johnson, B. Hariharan, L. V. D. Maaten, F. F. Li, C. L. Zitnick, R. Girshick, CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning, in: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 1988–1997.
3[3] G. Wang, Fuzzy reasoning and fuzzy logic, in: Soft Computing in Intelligent Systems and Information Processing. Proceedings of the 1996 Asian Fuzzy Systems Symposium, Kenting, China, 1996, pp. 478–483.
4[4] M. Mizumoto, Comparison of fuzzy reasoning methods, Fuzzy Sets and Systems 8 (3) (1982) 253–283.
5[5] J. Yen, Fuzzy logic-a modern perspective, IEEE Transactions on Knowledge and Data Engineering 11 (1) (1999) 153–165.
6[6] D. W. Pei, On the strict logic foundation of fuzzy reasoning, Soft Computing 8 (8) (2004) 539–545.
7[7] R. Wille, Restructuring lattice theory: an approach based on hierarchies of concepts, in: I. Rival (Ed.), Ordered sets, Springer, 1982, pp. 445–470.
8[8] J. Tadrat, V. Boonjing, P. Pattaraintakorn, A new similarity measure in formal concept analysis for case-based reasoning, Expert Systems with Applications 39 (1) (2012) 967–972.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Logic could be learned from images

Abstract

keywords:

MSC:

1 Introduction

2 DCL

2.1 DCL

Definition 1**.**

2.2 Inference form of DCL

3 A LiLi task

3.1 LiLi data sets

3.2 LiLi task

3.3 Inference form of a LiLi task

4 Experiments

4.1 Models and experimental setup

4.2 Experiments and analysis on LiLi data sets

5 Divide and conquer model for Multiplication data set

6 Conclusion

Acknowledgments

References

Definition 1.