Semi-Supervised Learning Detector for MU-MIMO Systems with One-bit ADCs

Seonho Kim; Song-Nam Hong

arXiv:1902.00866·cs.IT·February 5, 2019

Semi-Supervised Learning Detector for MU-MIMO Systems with One-bit ADCs

Seonho Kim, Song-Nam Hong

PDF

Open Access

TL;DR

This paper introduces a semi-supervised learning detector for MU-MIMO systems with one-bit ADCs, reducing pilot data requirements while maintaining high detection performance.

Contribution

It proposes a semi-supervised learning approach using EM algorithm to estimate system parameters with less labeled data compared to existing supervised methods.

Findings

01

SSL detector achieves similar performance to SL detector

02

Significantly reduces pilot-overhead

03

Effective in multiuser MU-MIMO with one-bit ADCs

Abstract

We study an uplink multiuser multiple-input multiple-output (MU-MIMO) system with one-bit analog-to-digital converters (ADCs). For such system, a supervised-learning (SL) detector has been recently proposed by modeling a non-linear end-to-end system function into a parameterized Bernoulli-like model. Despite its attractive performance, the SL detector requires a large amount of labeled data (i.e., pilot signals) to estimate the parameters of the underlying model accurately. This is because the amount of the parameters grows exponentially with the number of users. To overcome this drawback, we propose a semi-supervised learning (SSL) detector where both pilot signals (i.e., labeled data) and some part of data signals (i.e., unlabeled data) are used to estimate the parameters via expectation-maximization (EM) algorithm. Via simulation results, we demonstrate that the proposed SSL detector…

Equations74

\frac{1}{m} i = 0 \sum m - 1 ∣ s_{i} ∣^{2} = SNR .

\frac{1}{m} i = 0 \sum m - 1 ∣ s_{i} ∣^{2} = SNR .

\tilde{x}_{k} [t] = M (w_{k} [t]) \in S,

\tilde{x}_{k} [t] = M (w_{k} [t]) \in S,

\tilde{r} [t] = \overset{ˉ}{H} \tilde{x} [t] + \tilde{z} [t],

\tilde{r} [t] = \overset{ˉ}{H} \tilde{x} [t] + \tilde{z} [t],

r [t] = \mbox s i g n (H x (w [t]) + z [t]),

r [t] = \mbox s i g n (H x (w [t]) + z [t]),

{\bf H}=\left[{\begin{array}[]{cc}{\rm Re}({\bf\tilde{H}})&-{\rm Im}({\bf\tilde{H}})\\ {\rm Im}({\bf\tilde{H}})&{\rm Re}({\bf\tilde{H}})\\ \end{array}}\right]\in\mathbb{R}^{N\times 2K},

{\bf H}=\left[{\begin{array}[]{cc}{\rm Re}({\bf\tilde{H}})&-{\rm Im}({\bf\tilde{H}})\\ {\rm Im}({\bf\tilde{H}})&{\rm Re}({\bf\tilde{H}})\\ \end{array}}\right]\in\mathbb{R}^{N\times 2K},

c_{j} = [\mbox s i g n (h_{1}^{T} x (g (j))), \dots, \mbox s i g n (h_{N}^{T} x (g (j)))]^{T}

c_{j} = [\mbox s i g n (h_{1}^{T} x (g (j))), \dots, \mbox s i g n (h_{N}^{T} x (g (j)))]^{T}

q = f (w, H) = c_{j},

q = f (w, H) = c_{j},

\mbox \bb P (r_{n} [t] ∣ q_{n} = c_{j, n}) = {ϵ_{j, n} 1 - ϵ_{j, n} if r_{n} [t] \neq = c_{j, n} if r_{n} l [t] = c_{j, n}

\mbox \bb P (r_{n} [t] ∣ q_{n} = c_{j, n}) = {ϵ_{j, n} 1 - ϵ_{j, n} if r_{n} [t] \neq = c_{j, n} if r_{n} l [t] = c_{j, n}

ϵ_{j, n} = Δ Q (∣ h_{n}^{T} x (g (j)) ∣),

ϵ_{j, n} = Δ Q (∣ h_{n}^{T} x (g (j)) ∣),

p (r [t] ∣ j, \boldmath θ_{j})

p (r [t] ∣ j, \boldmath θ_{j})

= n = 1 : r_{n} [t] \neq = c_{j, n} \prod N ϵ_{j, n} n = 1 : r_{n} [t] = c_{j, n} \prod N (1 - ϵ_{j, n})

L = {(r [1], 0), \dots, (r [T], 0), \dots, (r [T_{t}], m^{K} - 1)},

L = {(r [1], 0), \dots, (r [T], 0), \dots, (r [T_{t}], m^{K} - 1)},

j_{t} = Δ ⌊(t - 1) / T ⌋ \in [0 : m^{K} - 1],

j_{t} = Δ ⌊(t - 1) / T ⌋ \in [0 : m^{K} - 1],

\overset{c}{^}_{j, n}

\overset{c}{^}_{j, n}

\overset{ϵ}{^}_{j, n}

\hat{j} = j \in [0 : m^{K} - 1] argmax p (r [t] ∣ j, \boldmath θ_{j}) .

\hat{j} = j \in [0 : m^{K} - 1] argmax p (r [t] ∣ j, \boldmath θ_{j}) .

U = {r [T_{t} + 1], r [T_{t} + 2], \dots, r [T_{t} + T_{u}]} .

U = {r [T_{t} + 1], r [T_{t} + 2], \dots, r [T_{t} + T_{u}]} .

\hat{\boldmath θ} = \boldmath θ argmax lo g \mbox \bb P (D ∣ \boldmath θ) .

\hat{\boldmath θ} = \boldmath θ argmax lo g \mbox \bb P (D ∣ \boldmath θ) .

lo g \mbox \bb P (D ∣ \boldmath θ)

lo g \mbox \bb P (D ∣ \boldmath θ)

= lo g t = 1 \prod T_{t} \mbox \bb P (r [t], g^{- 1} (w [t]) = j_{t} ∣ \boldmath θ_{j_{t}}) t = T_{t} + 1 \prod T_{t} + T_{u} \mbox \bb P (r [t] ∣ \boldmath θ)

= t = 1 \sum T_{t} lo g \mbox \bb P (j_{t} ∣ \boldmath θ_{j_{t}}) p (r [t] ∣ j_{t}, \boldmath θ_{j_{t}})

+ t = T_{t} + 1 \sum T_{t} + T_{u} lo g j = 0 \sum m^{K} - 1 p (r [t], j ∣ \boldmath θ_{j}),

γ_{j} [t]

γ_{j} [t]

γ_{j} [t] = 1_{{j = j_{t}}} .

γ_{j} [t] = 1_{{j = j_{t}}} .

γ_{j} [t] = \frac{p ( r [ t ] ∣ j , \boldmath θ _{j}^{i} )}{\sum _{j = 0}^{m^{K} - 1} p ( r [ t ] ∣ j , \boldmath θ _{j}^{i} )} .

γ_{j} [t] = \frac{p ( r [ t ] ∣ j , \boldmath θ _{j}^{i} )}{\sum _{j = 0}^{m^{K} - 1} p ( r [ t ] ∣ j , \boldmath θ _{j}^{i} )} .

\boldmath θ^{i + 1} = \boldmath θ argmax ψ (\boldmath θ ∣ \boldmath θ^{i}),

\boldmath θ^{i + 1} = \boldmath θ argmax ψ (\boldmath θ ∣ \boldmath θ^{i}),

ψ (\boldmath θ ∣ \boldmath θ^{i})

ψ (\boldmath θ ∣ \boldmath θ^{i})

= Δ t = 1 \sum T_{t} + T_{u} j = 0 \sum m^{K} - 1 γ_{j} [t] lo g \mbox \bb P (r [t], g^{- 1} (w [t]) = j ∣ \boldmath θ_{j})

= t = 1 \sum T_{t} + T_{u} j = 0 \sum m^{K} - 1 γ_{j} [t] (lo g p (r [t] ∣ j, \boldmath θ_{j}) - K lo g m),

ψ (\boldmath θ ∣ \boldmath θ^{i})

ψ (\boldmath θ ∣ \boldmath θ^{i})

\displaystyle+{\sum_{j=0}^{m^{K}-1}}{\sum_{t=1}^{T_{t}+T_{u}}}{\sum_{n=1}^{N}}\Big{(}\gamma_{j}[t]{{\bf 1}_{\{r_{n}{[t]}\neq c_{j,n}\}}}\log{\epsilon_{j,n}}

\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;+\gamma_{j}[t]{{\bf 1}_{\{r_{n}{[t]}=c_{j,n}\}}}{\log{(1-\epsilon_{j,n})}}\Big{)}.

(\hat{\boldmath ϵ}^{i + 1}, \hat{c}^{i + 1})

(\hat{\boldmath ϵ}^{i + 1}, \hat{c}^{i + 1})

\displaystyle=\operatornamewithlimits{argmax}_{(\hbox{\boldmath$\epsilon$},{\bf c})}{\sum_{j=0}^{m^{K}-1}}{\sum_{n=1}^{N}}{\sum_{t=1}^{T_{t}+T_{u}}}\Big{(}{\gamma_{j}[t]}{{\bf 1}_{\{r_{n}{[t]}\neq c_{j,n}\}}}{\log{\epsilon_{j,n}}}

\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;+{\gamma_{j}[t]}{{\bf 1}_{\{r_{n}{[t]}=c_{j,n}\}}}\log{(1-\epsilon_{j,n})}\Big{)}.

\displaystyle(\hat{\epsilon}_{j,n}^{i+1},\hat{c}_{j,n}^{i+1})=\operatornamewithlimits{argmax}_{(\epsilon_{j,n},c_{j,n})}{\sum_{t=1}^{T_{t}+T_{u}}}\Big{(}{\gamma_{j}[t]}{{\bf 1}_{\{r_{n}{[t]}\neq c_{j,n}\}}}{\log{\epsilon_{j,n}}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MIMO Systems Optimization · Analog and Mixed-Signal Circuit Design · Distributed Sensor Networks and Detection Algorithms

Full text

Semi-Supervised Learning Detector for MU-MIMO Systems with One-bit ADCs

Seonho Kim and Song-Nam Hong

Ajou University, Suwon, Korea,

email: {kimsh1005 and snhong}@ajou.ac.kr

Abstract

We study an uplink multiuser multiple-input multiple-output (MU-MIMO) system with one-bit analog-to-digital converters (ADCs). For such system, a supervised-learning (SL) detector has been recently proposed by modeling a non-linear end-to-end system function into a parameterized Bernoulli-like model. Despite its attractive performance, the SL detector requires a large amount of labeled data (i.e., pilot signals) to estimate the parameters of the underlying model accurately. This is because the amount of the parameters grows exponentially with the number of users. To overcome this drawback, we propose a semi-supervised learning (SSL) detector where both pilot signals (i.e., labeled data) and some part of data signals (i.e., unlabeled data) are used to estimate the parameters via expectation-maximization (EM) algorithm. Via simulation results, we demonstrate that the proposed SSL detector can achieve the performance of the existing SL detector with significantly lower pilot-overhead.

Index Terms:

Massive MIMO, one-bit ADC, MIMO detection, Machine Learning, Semi-Supervised Learning, EM Algorithm

I Introduction

Massive multiple-input multiple-output (MIMO) is a promising technology for beyond 5G cellular systems where a large number of antennas at the BS is used to improve the capacity and energy-efficiency [1]. In contrast, it can cause the hardware cost and the radio-frequency (RF) circuit power consumption to increase significantly [2]. Especially, a high-resolution analog-to-digital converter (ADC) is the major problem as the power consumption of an ADC increases exponentially with the number of quantization bits and linearly with the baseband bandwidth[3]. To overcome this challenge, the use of low-resolution ADCs (e.g., 1 $\sim$ 3 bits) for massive MIMO systems has received increasing attention over the past years. The one-bit ADC is particularly attractive as there is no need for an automatic gain controller, which reduces the hardware complexity significantly[4]. In this case, simple zero-threshold comparators quantize the in-phase and quadrature components of the continuous-valued received signals separately. Although low-resolution ADCs provides the advantages, it gives rise to numerous technical challenges in channel estimation and MIMO detections.

For uplink MU-MIMO systems with one-bit ADCs, numerous channel estimation methods were developed as least-square (LS) based method [5], maximum likelihood (ML) method[6], zero-forcing (ZF) type method [6] and Bussgang decomposition based method [7]. Also, regarding MIMO detections, the optimal ML detection was developed in [6], and the low complexity methods were presented in [8, 9]. Inspired by coding theory, the MIMO detection problems have been reconstructed as an equivalent coding problem [10]. Using the resulting model, a weighted minimum distance (wMD) decoding (i.e., an alternative expression of the ML detector) was presented. Very recently, supervised-learning (SL) detectors were proposed in [11, 12, 13] for the considered communication system with one-bit quantized signals. Especially, in our prior work [12], we proposed the generative model, called Bernoulli-like model, by considering the traits of one-bit quantized signals. Despite its attractive performance, the SL detector in [12] requires a large amount of pilot overhead to estimate the model parameters accurately. Thus, it is necessary to reduce a pilot overhead so that the SL detector will be used in practical systems.

In this paper, we study an uplink MU-MIMO system with one-bit ADCs where $K$ users with single-transmit antenna communicate with one BS with $N_{\rm r}$ receive antennas. Also, it is assumed that the BS is not aware of a channel state information (CSI) as in practical communication systems, and needs to estimate it using pilot signals during training phase (see Fig. 1). A block-fading channel is assumed in which the channel is static during the coherence time $T_{c}$ and changes independently in block-to-block. We assign the first $T_{t}<T_{c}$ time slots to the channel training phase and the remaining $T_{d}=T_{c}-T_{t}$ time slots are dedicated to the data transmission phase as shown in Fig. 1. Inspired by semi-supervised learning[14], for such system, we propose a semi-supervised learning (SSL) detector which can significantly reduce the pilot-overhead of the existing SL detector in [12]. The main idea of the proposed SSL detector is that it uses both pilot signals (i.e., labeled data) and some part of data signals (i.e., unlabeled) data to estimate the parameters of the underlying Bernoulli-like model via an efficient expectation-maximization (EM) algorithm. Via simulation results, we demonstrate that the proposed SSL detector can achieve the same performance of the SL detector with a significantly reduced pilot-overhead (e.g., $50\%$ overhead reduction).

This paper is organized as follows. In Section II, we describe an uplink MU-MIMO system with one-bit ADCs and equivalent parallel binary discrete memoryless channels in a coding-theoretic viewpoint. In Section III, we briefly review a SL detector for the considered system. In Section IV, we propose a novel SSL detector with parameter update rules which are built on EM algorithm. Section V provides the simulation results to verify the superiority of the proposed SSL detector. Finally, conclusion is provided in Section VI.

Notation: Lower and upper boldface letters represent column vectors and matrices, respectively. Let $[a:b]\stackrel{{\scriptstyle\Delta}}{{=}}\{a,a+1,\ldots,b\}$ for any integers $a$ and $b>a$ , and when $a=1$ , it can be further shortened as $[b]$ . For any $k\in[0:K-1]$ , we let $g(k)=[b_{0},b_{1},\ldots,b_{K-1}]^{{\sf T}}$ represent the $m$ -ary expansion of $k$ where $k=b_{0}m^{0}+\cdots+b_{K-1}m^{K-1}$ for $b_{i}\in[0:m-1]$ . We also let $g^{-1}(\cdot)$ denote its inverse function. For a vector, $g(\cdot)$ is applied element-wise. Likewise, if a scalar function is applied to a vector, it will be performed element-wise. ${\rm Re}({\bf a})$ and ${\rm Im}({\bf a})$ represent the real and complex part of a complex vector ${\bf a}$ , respectively.

II Preliminaries

In this section, we describe the system model and define an equivalent $N$ parallel binary discrete memoryless channels (DMCs).

II-A System model

We consider a single-cell uplink MU-MIMO system in which $K$ users with a single-antenna communicate with one BS with an array of $N_{\rm r}>K$ antennas. We denote $w_{k}\in\mathcal{W}=[0:m-1]$ as the user $k$ ’s message for $k\in[K]$ , each of which contains $\log{m}$ information bits. Also Let $m$ -ary constellation set by ${\cal S}=\{s_{0},...,s_{m-1}\}$ with power constraint as

[TABLE]

At time slot $t$ , the user $k$ transmits the symbol ${\tilde{x}}_{k}[t]$ as

[TABLE]

where ${\cal M}:\mathcal{W}\rightarrow{\cal S}$ denotes a modulation function. When all the $K$ users transmit the symbols ${\tilde{\bf x}[t]}=[\tilde{x}_{1}[t],\ldots,\tilde{x}_{K}[t]]^{{\sf T}}$ , the BS receives the discrete-time complex-valued baseband signal vector ${\bf\tilde{r}[t]}\in\mathbb{C}^{N_{\rm r}}$ , given by

[TABLE]

where ${\bf\tilde{H}}\in\mbox{\bb C}^{N_{\rm r}\times K}$ is the channel matrix between the BS and the $K$ users, for example, the $i$ -th row of ${\bf\tilde{H}}$ is the channel vector between the $i$ -th receiver antenna at the BS and the $K$ users. Also, ${\bf\tilde{z}}[t]=[{\tilde{z}}_{1}[t],\ldots,{\tilde{z}}_{N_{\rm r}}[t]]^{{\sf T}}\in\mathbb{C}^{N_{\rm r}}$ denotes the noise vector whose elements are distributed as circularly symmetric complex Gaussian random variables with zero-mean and unit-variance, i.e., ${{\tilde{z}}_{i}}[t]\sim{\cal C}{\cal N}(0,1)$ .

In the MIMO system with one-bit ADCs, each receiver antenna of the BS is equipped with RF chain followed by two one-bit ADCs that are applied to each real and imaginary part respectively. We define $\mbox{sign}(\cdot):\mbox{\bb R}\rightarrow\{-1,1\}$ as the one-bit ADC quantizer function with $\hat{r}[t]=\mbox{sign}(\tilde{r}[t])=1$ if $\tilde{r}[t]\geq 0$ , and $\hat{r}[t]=-1$ , otherwise. Then, the BS receives the quantized output vector as $\hat{{\bf r}}_{\rm R}[t]=\mbox{sign}({\rm Re}({\bf\tilde{r}}[t]))$ and $\hat{{\bf r}}_{\rm I}[t]=\mbox{sign}({\rm Im}({\bf\tilde{r}}[t]))$ . For the ease of representation, we rewrite the complex input-output relationship in (3) into the equivalent real representation as

[TABLE]

where ${\bf r}[t]=[\hat{{\bf r}}_{\rm R}^{{\sf T}}[t],\hat{{\bf r}}_{\rm I}^{{\sf T}}[t]]^{{\sf T}}$ , ${\bf x}({\bf w}[t])=[{\rm Re}(\tilde{{\bf x}}[t])^{{\sf T}},{\rm Im}(\tilde{{\bf x}}[t])^{{\sf T}}]^{{\sf T}}$ , ${\bf z}[t]=[{\rm Re}(\tilde{{\bf z}[t]})^{{\sf T}},{\rm Im}(\tilde{{\bf z}}[t])^{{\sf T}}]^{{\sf T}}\in\mathbb{R}^{N}$ , and

[TABLE]

where $N=2N_{\rm r}$ . This real system representation will be used in the sequel.

II-B Equivalent N parallel B-DMCs

In [10], it was shown that a real system representation (4) can be transformed into an equivalent $N$ parallel B-DMCs via a coding-theoretic viewpoint. In the resulting $N$ parallel B-DMCs, the channel input/output and the channel transition probabilities are defined as follows.

Auto-encoding function: Given ${\bf H}$ , we can create a spatial-domain code ${\cal C}=[{\bf c}_{0},\ldots,{\bf c}_{m^{K}-1}]$ , each of which is given by

[TABLE]

where note that each codeword of ${\cal C}$ can be considered as a noiseless channel output in (4). In Fig. 2, the channel input ${\bf q}$ of the equivalent channel is determined by the auto-encoding function $f(\cdot)$ such as

[TABLE]

for $j=g^{-1}({\bf w})\in[0:m^{K}-1]$ .

Effective channel: As shown in Fig. 2, the effective channel consists of the $N$ parallel BSCs with the channel input ${\bf q}$ and the channel output ${\bf r}$ . This channel is specified by the following channel transition probabilities: For the $n$ -th BSC, the transition probability, depending on user’s message ${\bf w}=g(j)$ and the corresponding codeword ${\bf c}_{j}$ , are defined as

[TABLE]

where the error-probability of the $n$ -th BSC is computed as

[TABLE]

where $Q(x)=\frac{1}{2\pi}\int_{x}^{\infty}\exp\left(-u^{2}/2\right)du$ .

The purpose of this paper is to design a decoding function in Fig. 2 which decodes $\hat{{\bf w}}[t]$ from an observation ${\bf r}[t]$ , by leveraging the equivalent effective channel (i.e., the channel transition probabilities in (7)). We remark that the parameters of the transition probabilities are not known a priori and should be estimated with pilot signals during the training phase.

III The Overview of SL Detector

In this section, we briefly review the supervised-learning (SL) detector proposed in [12] with the assumption that a channel matrix ${\bf H}$ is not known. In the SL detector, thus, we need to estimate the parameters ${\cal C}$ and $\epsilon_{j,\ell}$ using pilot signals as in parameterized supervised learnings. From (7), we can define the generative model of ${\bf r}[t]$ , named Bernoulli-like model, which are fully described by the parameter vector $\hbox{\boldmath$ \theta $}=[\hbox{\boldmath$ \theta $}_{0},\ldots,\hbox{\boldmath$ \theta $}_{m^{K}-1}]$ where $\hbox{\boldmath$ \theta $}_{j}=[{\bf c}_{j},\hbox{\boldmath$ \epsilon $}_{j}]$ , such as

[TABLE]

for $j\in[0:m^{K}-1]$ . We remark that each class $j$ has its own probability distribution parameterized by $\hbox{\boldmath$ \theta $}_{j}=[{\bf c}_{j},\hbox{\boldmath$ \epsilon $}_{j}]$ .

The SL detector in [12] performs with the following two-phase during each coherence time $T_{c}$ .

Parameter Estimation: In this phase, the parameter vector $\theta$ is estimated using $T_{t}$ pilot signals. We first obtain the labeled data ${\cal L}$ such as

[TABLE]

where $({\bf r}[t],j_{t})$ represents the pilot signal corresponding to the label $j_{t}$ . Since $T$ pilot signals are transmitted for each codeword, the overall pilot-overhead is equal to $T_{\rm t}=T\cdot{m^{K}}$ . Also, for $t\in[T_{t}]$ , the labels are determined as

[TABLE]

$\lfloor\cdot\rfloor$ denotes the floor function. In [12], from the labeled data ${\cal L}$ , the parameter vector $\theta$ is determined via the optimal maximum-likelihood (ML) estimation as

[TABLE]

for $n\in[N]$ and $j\in[0:m^{K}-1]$ .

Data Detection: From the Bernoulli-like model parameterize by (12) and (13), the ML detection performs as

[TABLE]

IV The Proposed SSL Detector

Despite its superior performance, the SL detector proposed in [12] suffers from the heavy pilot-overhead because a larger number of pilot signals are required so that an empirical transition probability in (13) is close to the true transition probability in (8). Moreover, this overhead becomes larger as the number of users $K$ increases, because the number of parameters to be estimated increases exponentially with the $K$ (see (12) and (13)). To address the above problem, we propose a semi-supervised learning (SSL) detector in which the parameter vector $\theta$ is estimated by leveraging both data signals (i.e., unlabeled data ${\cal U}$ ) and pilot signals (i.e., labeled data ${\cal L}$ ). Here, the unlabeled data ${\cal U}$ is collected during $T_{u}$ time slots (see Fig. 1) such as

[TABLE]

Also, we let ${\cal D}={\cal L}\cup{\cal U}$ denote the observed data to be used for parameter-estimation in the proposed SSL detector.

Parameter Estimation: In this phase, the parameter vector $\hbox{\boldmath$ \theta $}=[\hbox{\boldmath$ \theta $}_{0},\ldots,\hbox{\boldmath$ \theta $}_{m^{K}-1}]$ is updated from the given data ${\cal D}$ so that the conditional probabilities of the observations (i.e., the received binary signals) are maximized. This ML estimation is mathematically formulated as

[TABLE]

Note that from the Bernoulli-like model, we know the probability distribution $p({\bf r}[t]|j,\hbox{\boldmath$ \theta $}_{j})$ defined in (9) for the given parameter $\hbox{\boldmath$ \theta $}_{j}$ , which will be used in the below. Also, the labels of the labeled data ate given as $\{j_{t}=\lfloor(t-1)/T\rfloor:t\in[T_{t}]\}$ in (11).

For any fixed parameter $\theta$ , the objective function in (16) is represented as

[TABLE]

where recall that $p({\bf r}[t]|j,\hbox{\boldmath$ \theta $}_{j})$ is defined in (9), and $\mbox{\bb P}(j_{t}|\hbox{\boldmath$ \theta $}_{j_{t}})=1/m^{K}$ since the users’ messages are assumed to be generated uniformly and randomly. Definitely, the above objective function is non-convex especially due to the second-term caused by the unlabeled data and thus, the optimization problem in (16) is too complex to be solved. We thus solve it using Expectation-Maximization (EM) algorithm[15].

The EM algorithm consists of the following two steps, named expectation-step (E-step) and maximization-step (M-step), respectively: Given the up-to-date parameter vector $\hbox{\boldmath$ \theta $}^{i}$ , it finds the updated parameter vector $\hbox{\boldmath$ \theta $}^{i+1}$ .

E-step: In this step, we compute the following probability distribution using the latest parameter vector $\hbox{\boldmath$ \theta $}^{i}$ :

[TABLE]

This is specified by considering the difference of the labeled and unlabeled data as follows:

•

(Labeled Data) For $t\in[T_{t}]$ and $j\in[0:m^{K}-1]$ ,

[TABLE]

•

(Unlabeled Data) For $t\in[T_{t}+1:T_{t}+T_{u}]$ and $j\in[0:m^{K}-1]$ ,

[TABLE]

M-step: In this step, we find an updated parameter vector $\hbox{\boldmath$ \theta $}^{i+1}$ using the $\gamma_{j}[t]$ in the above as follows:

[TABLE]

where the objective function is defined as

[TABLE]

where the second equality is from the Bayes rule and (9). Note that $\gamma_{j}[t]$ in the above is constant with respect to $\hbox{\boldmath$ \theta $}_{j}$ . Also, from the Bernoulli-like model in (9), the objective function in (22) can be specified as

[TABLE]

Since the first-term in the above is constant with respect to $\theta$ , the parameter vector $\theta$ can be optimized by only maximizing the second-term as follows:

[TABLE]

Obviously, we can see that maximizing (IV) is equivalent to maximizing the individual terms in (IV): For each fixed $j$ and $n$ , we have

[TABLE]

To solve the above problem, we introduce the useful lemma in the below.

Lemma 1

Suppose $a_{\ell}\geq 0$ for $1\leq{\ell}\leq{n}$ , Then $\sum_{\ell=1}^{n}{a_{\ell}}\log{p_{\ell}}$ is maximized over all probability vectors $p=(p_{1},\ldots,p_{n})$ by $p_{\ell}=\frac{a_{\ell}}{\sum_{i=1}^{n}{a_{i}}}$ . $\blacksquare$

First of all, we observe that the optimal $c_{j,n}$ should satisfy the following constraint for any $\epsilon_{j,n}<0.5$ :

[TABLE]

Also, we can see that this constraint is satisfied by assigning

•

$\hat{c}_{j,n}^{i+1}=1$ if $\sum_{t=1}^{T_{t}+T_{u}}{r_{n}{[t]}}{{\gamma_{j}}[t]}>0$ ;

•

$\hat{c}_{j,n}^{i+1}=-1$ if $\sum_{t=1}^{T_{t}+T_{u}}{r_{n}{[t]}}{{\gamma_{j}}[t]}<0$ .

Equivalently, we obtain that

[TABLE]

Next, applying Lemma 1 in the below to (IV), the error-probability $\epsilon_{j,n}^{i+1}$ is optimized as

[TABLE]

Finally, we can compute the log-likelihood (17) using the updated parameter vector $\hbox{\boldmath$ \theta $}^{i+1}$ as

[TABLE]

which is used to check the convergence of EM algorithm. The overall procedures are summarized in Fig. 3 and Algorithm 1 where $\varepsilon\geq 0$ denotes the pre-determined threshold for the stopping criterion.

Data Detection: For $t\in[T_{t}+1:T_{t}+T_{u}]$ , the SSL detector performs using the latest $\gamma_{j}[t]$ in (18) as

[TABLE]

Also, for $t\in[T_{t}+T_{u}+1:T_{c}]$ , the detection process of the SSL detector is equivalent to that of the SL detector in Section III. We remark that the performance-complexity tradeoff of the proposed SSL detector is controlled by the choice of $T_{u}$ .

V Simulation Results

We evaluate the average bit-error rate (BER) performances of the proposed SSL detector and the conventional SL detector. For the simulations, a Rayleigh fading channel is considered where each element of a channel matrix H is drawn from an independent and identically distributed (i.i.d.) circularly symmetric complex gaussian random variable with zero mean and unit variance. a user is assumed to send binary data ( $m=2$ ) and QPSK modulation is applied. A block fading duration (i.e., coherence time interval) is set to be $T_{d}=512,T_{u}=10\cdot T_{t}$ and $T_{t}=T\cdot{m^{K}}$ .

Fig. 4 shows the BER performances of the SSL detector, SL detector, and maximum likelihood detection (MLD) with channel state information at a receiver (CSIR) in a condition of various training duration. It is notable that the performance of proposed SSL detector outperforms the conventional SL detector in the entire SNR regimes where the pilot-overhead is same. In particular, for $T=1$ , the performance of the proposed SSL detector almost achieves that of the SL detector with $T=4$ . This implies that the SSL detector reduces training span ( $T_{t}$ ) considerably without degradation in performance, by making the best use of information from the generative model and data signals. Also, when compared with MLD in CSIR, this result shows that the proposed method allows the empirical conditional probability to converge into true conditional probability without increasing the number of pilots.

VI Conclusion

In this paper, we presented a novel semi-supervised learning detector inspired by semi-supervised learning. Specifically, the proposed SSL detector updates parameters by using data signals through the maximum likelihood estimation under the Bernoulli-like model. Such parameter updates can significantly reduce pilot-overhead that is an issue in the existing SL detector. The simulation results demonstrated that the performance of the SSL detector almost achieves that of the SL detector, even with a quite lower pilot-overhead than that of the SL detector. We would like to emphasize that a SSL detector would be a strong practical framework in a field of machine learning based detector, in that compared with pilot signals, data signals are fairly cheap to obtain. On going work, we are investigating to develop more practical SSL detectors which require low complexity or are appropriate for time-varying channel system.

Acknowledgement

This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1702-00.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive mimo: Benefits and challenges,” IEEE journal of selected topics in signal processing , vol. 8, no. 5, pp. 742–758, 2014.
2[2] H. Yang and T. L. Marzetta, “Total energy efficiency of cellular large scale antenna system multiple access mobile networks,” in Online Conference on Green Communications (Green Com), 2013 IEEE . IEEE, 2013, pp. 27–32.
3[3] A. Mezghani and J. A. Nossek, “Modeling and minimization of transceiver power consumption in wireless networks,” in Smart Antennas (WSA), 2011 International ITG Workshop on . IEEE, 2011, pp. 1–8.
4[4] S. Hoyos, B. M. Sadler, and G. R. Arce, “Monobit digital receivers for ultrawideband communications,” IEEE Transactions on Wireless Communications , vol. 4, no. 4, pp. 1337–1344, 2005.
5[5] C. Risi, D. Persson, and E. G. Larsson, “Massive mimo with 1-bit adc,” ar Xiv preprint ar Xiv:1404.7736 , 2014.
6[6] J. Choi, J. Mo, and R. W. Heath, “Near maximum-likelihood detector and channel estimator for uplink multiuser massive mimo systems with one-bit adcs,” IEEE Transactions on Communications , vol. 64, no. 5, pp. 2005–2018, 2016.
7[7] Y. Li, C. Tao, G. Seco-Granados, A. Mezghani, A. L. Swindlehurst, and L. Liu, “Channel estimation and performance analysis of one-bit massive mimo systems,” IEEE Trans. Signal Process , vol. 65, no. 15, pp. 4075–4089, 2017.
8[8] C. Mollén, J. Choi, E. G. Larsson, and R. W. Heath, “One-bit adcs in wideband massive mimo systems with ofdm transmission,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on . IEEE, 2016, pp. 3386–3390.