# Parallel Architecture Prototype for 60 GHz High Data Rate Wireless Single Carrier Receiver

Tatjana Chavdarova, Aristotel Tentov, Marija Kalendar SS. Cyril and Methodius University - Faculty of Electrical Engineering and Information Technologies Karpos II bb, PO Box 574, 1000 Skopje, Macedonia E-mail: <u>{tatjana, toto, marijaka}@feit.ukim.edu.mk</u>

Abstract—Nowadays a huge attention of the academia and research teams is attracted to the potential of the usage of the 60 GHz frequency band in the wireless communications. The use of the 60GHz frequency band offers great possibilities for wide variety of applications that are yet to be implemented. These applications also imply huge implementation challenges. Such example is building a high data rate transceiver which at the same time would have very low power consumption. In this paper we present a prototype of Single Carrier - SC transceiver system, illustrating a brief overview of the baseband design, emphasizing the most important decisions that need to be done. A brief overview of the possible approaches when implementing the equalizer, as the most complex module in the SC transceiver, is also presented. The main focus of this paper is to suggest a parallel architecture for the receiver in a Single Carrier communication system. This would provide higher data rates that the communication system can achieve, for a price of higher power consumption. The suggested architecture of such receiver is illustrated in this paper, giving the results of its implementation in comparison with its corresponding serial implementation.

*Keywords:* 60 GHz, millimeter wave, single carrier, VHDL implementation, frequency domain equalization, wireless personal area network.

## I. INTRODUCTION

Although the wireless technologies have been significantly improved and widely used in the past few years, the demand for high-definition (HD) video streaming, file transfer, wireless docking station, gaming, short-range backhaul, wireless desktop, and wireless Gigabit Ethernet has also increased.

In order to respond accordingly to the needs of such huge application domain, a complete communication system with very high data rates (above 5 Gb/s) has to be designed. The comparison of the desired data rates, with those being achieved while using the 2.4 GHz and 5 GHz frequency bands, implies that a different approach has to be considered. Particularly attractive and promising candidates for multi Gb/s wireless transceiver systems are the millimeter-wave technologies providing up to 7 GHz unlicensed spectrum, which is available in many countries worldwide. The frequency bandwidth at 60 GHz is one of the largest unlicensed bandwidths ever to be allocated, offering the great potential in terms of capacity and flexibility [1]. When using higher frequencies, as a result of the increased free space path loss (21.6 dB worse than at 5GHz), the propagation of the carrier wave is very short. Therefore, the low-power transmissions will not propagate very far. However, considering that the 60 GHz band still confines the operation within a room in indoor environment, and that this actually reduces the likelihood of co-channel interference (and thus opens the possibility of higher frequency re-use density), this is considered as an advantage [2]. The huge bandwidth available for 60 GHz and UWB systems also simplifies the system design of these technologies. A system with much lower spectral efficiency can be designed to deliver a Gb/s transmission to provide low cost and simple implementation.

The compact size of 60 GHz radio also permits multipleantenna solutions at the user terminal that are otherwise difficult if not impossible at lower frequencies. The form factor of 60 GHz systems is approximately 140 times smaller and thus can be integrated into consumer electronic products. These advantages led to the development of the 60-GHz millimeter-wave-based wireless standards, that includes the IEEE802.15.3c, IEEE802.11ad, ECMA-387, and ISO/IEC 13156 standards. These standards exploit the 60 GHz frequency band to provide high data rate media streaming, as well as rapid data transfer.

Despite of the various advantages offered, 60 GHz based communications suffer a number of critical problems that must be solved, making the implementation of low cost and low power 60 GHz system a huge challenge.

Two modulations are mainly used within the 60 GHz frequency band: Orthogonal Frequency Division Multiplexing – OFDM and Single Carrier – SC Modulation technique. When using OFDM, conceptually multiple carriers are sent at a specific moment, in parallel, each occupying a narrow frequency band. On the contrary, the SC modulation technique transmits a single carrier to which the symbols are being coded with high symbol rate. These two modulations are preferred choices due to their high bandwidth efficiency, for a cost of higher implementation complexity. Each technique implies important advantages and disadvantages over the other, but in summary the SC modulation is recommended when the system would be mainly deployed in line-of-sight - LOS cases with small delay spread. On the contrary, the OFDM modulation is recommended when the delay spread of the channel is longer. In this paper our focus is on the Single Carrier modulation technique.



Fig. 1. Main component modules of SC-FDE transceiver.

This paper is organized in several sections, as follows. The second section gives an overview of the main components of a Single Carrier transceiver. Both the transmitter and receiver component modules are being demonstrated, with higher accent to the SC receiver modules as a more complex baseband design, in comparison with the transmitter. In section three, the different types of possible implementations for SC equalizers are listed, emphasizing the advantages and disadvantages of each. In the fourth section we present parallel baseband architecture of the receiver, while choosing a concrete type of the previously illustrated possible implementations of the equalizer. Within this section we present the proportional results between the implemented parallel architecture and the corresponding hardware implementation of the receiver when it is implemented serially. In the following section we conclude the comparison between the two examined architectures. At the end of the paper our future plans for improvement of the presented architecture are listed.

#### II. SINGLE CARRIER TRANSCEIVER

In this paper we focus on SC baseband modulation with frequency-domain equalization at 60 GHz. The transceiver is designed according to IEEE 802.11ad standards, implemented with VHDL, and so far tested using Xilinx Virtex-5 development board for FPGAs.

In the following sub-sections, each of the conceptual modules of the system is being illustrated, emphasizing its functions in the whole system (illustrated in Fig. 1) and the important implementation aspects.

### A. SC transmitter

In SC modulated system, the transmitter has a relatively simple design. Its main functions are to perform symbol modulation, and to form the frame according to the IEEE 802.11ad standard.

#### 1) Symbol modulation

On the physical layer, the stream of bits coming from the upper layers is being grouped in multiple bits that form a single symbol. For this purpose we are using the 64-QAM modulation technique, which transforms 6 incoming bits into one symbol. Each symbol has two components, the in-phase and quadrature component, to which we refer as I/Q components [11]. These components are needed for transmitting the information, since they give the amplitude and the phase changes of the signal that carries the information. Although the standard recommends using less bits per symbol when using SC modulation in combination with QAM (because of the increasing Peak to Average Power Ratio -PAPR of the system), we further explore the 64-QAM modulation with intention to convey more information in one symbol, while suggesting a different



Fig. 2. Main component modules of SC receiver.

approach that would reduce the needed processing during the de-mapping at the receiver (explained in later section).

For this purpose, instead of mapping the bits sequentially, the symbols in the constellation diagram are mapped using Grey codes [12]. This would enable us to de-map each received symbol on bit level, as it would be later illustrated. *2) SC frame format* 

The format of the frame being composed at this side of the communication, should be followed on the receiving side. Therefore, it is the transmitting side of the communication that, beside the field for frame synchronization, adds in each frame a preamble with additional information, needed on the receiving side. This includes the Golay complementary sequences that comprise the fields of the SC preamble. The main fields included in each SC frame are shown in Fig. 3. Each packet payload consists of 52 symbols. According to the IEEE 802.11ad standard recommendation, the Cyclic Prefix length should be ¼ of the packet length. Thus, we choose CP length of 12 symbols, as explained in [6].

### B. SC receiver

As illustrated in Figure 6, at the receiving side, first of all, the received carrier analog signal has to be converted into digital signal, using A/D converter with high sampling rate. In order to get valid results, the sampling rate of the A/D converter should be higher than twice the highest frequency that the system under test can pass with significant gain. Additionally, an even higher sampling rate would be recommended, in order to avoid aliasing distortion [9].

Afterwards, using the synchronization fields, prepended at the frame that is being currently received, the receiver determinates the start of a frame. Then, using the separated fields in the preamble of the frame, the receiver is able to do channel estimation. This information is needed when performing the equalization, in order to decrease the BER (bit error rate), which among other factors, increases as a result of the occurred inter-symbol interference in the wireless channel. After equalization, the symbol de-mapping can be performed, forming a stream of bits for the upper layers.



Fig. 3. Structure of a SC frame and preamble according to the IEEE 802.11ad standard [6]. The SYNC field is used for frame synchronization and is consisted of 14 repetitions of the GCS with length 128; The SFD field is used for establishing frame timing and is consisted of 4 GCSs with length of 128; The CES field is used for channel estimation purposes and is consisted of 2 GCSs with length 128 and 4 GCSs with length 256. The exact GCSs are given in [7].

The function of some of the modules illustrated in Fig. 2, as well as their implementation aspects, are explained in more detail in continuance.

## *1) Golay channel estimator module*

More robust system to the channel noise can be implemented if the equalization is performed using information about the channel distortion which is conveyed dynamically, while the system is running. This would make the system more adaptable to the environment, and moreover, such approach has great potential of using it for driving the system in a different operational mode, in order to minimize the power consumption. For example, if the channel impulse response is relatively small in time domain, we could implement a less processing requiring technique that does minimal channel equalization. This leads us to the conclusion that the dynamic determination of the characteristics of the wireless channel is of huge importance, and at the same time, it has great potential for improving the transceiver system.

For these purpose the so called Golay channel estimator has been widely used. The estimator is based on the utilization of the Golay complementary sequences – GCS.GCS are two sequences, whose aperiodic autocorrelations sum to zero in all out-of-phase positions, except in zero [5]. They find huge application mainly because the false peaks cancel each other exactly, and that their autocorrelations can be performed in parallel using a single, hardware efficient and fast correlator [10].

If we have two Golay Complementary sequences, 'a' and 'b', each of the sequences is convolved with the channel impulse response after sending it through the channel H. When receiving the particular sequence, a Golay correlator is being used, which for known input sequences yields the autocorrelation of one of the sequences which is convolved with the channel impulse response. This is done for both of the sequences. Summing the results of the previous steps would yield the impulse response h(t) of the channel [9].

## 2) Channel equalization

When using the 60 GHz frequency band, the inter-symbol interference – ISI, which occurs in the wireless channel is much more severe compared with the 2.4 and 5 GHz frequency bands [13], [14]. More specifically, ISI at high frequencies spans over several tens to hundreds of symbol periods. Therefore, the equalization is inevitable part of the baseband processing, and fundamental for reliable data transmission for such channels. The main purpose for performing the equalization is to compensate for the occurred distortion in the channel, which includes ISI. Moreover, the equalization is designed according to the characteristics of the wireless channel, i.e. the channel impulse response.

## 3) Symbol de-mapping

Since each symbol is mapped with 6-bit binary number that represents a Gray code, we are able to perform the demapping of the received symbol into string of bits, on bit

 TABLE I

 Results of the Performed Synthesis of the 64-QAM de-mapper

| Parameter                              | Value                                        | Usage |
|----------------------------------------|----------------------------------------------|-------|
| Number of Slice LUTs                   | 24 out of 69120                              | 0%    |
| Number of LUT Flip Flop<br>pairs used: | 24                                           | /     |
| Number of IOs                          | 70                                           | 10%   |
| Number of bonded IOBs                  | 66 out of 640                                | /     |
| Maximum combinational path delay       | 5.403ns<br>(4.094ns logic,<br>1.309ns route) | /     |

Summary of the performed synthesis of the 64-QAM de-mapper where the sequence of bits is de-mapped by comparing I and Q values on bit-level. The de-mapper was implemented using VHDL. Utilized FPGA-based platform is Xilinx Virtex-5, with selected device 5vlx110tff1136, and speed grade of -1.

level, as suggested in [12]. This approach is different from the conventional maximum likelihood (ML) algorithm which is usually used as a decision rule. The ML algorithm in this case is impractical since it requires many multiplications, additions, subtractions and comparisons. Therefore, assuming that all bits are equally likely distributed in the constellation plan, and using the Grey codes for each constellation point (as it was afore illustrated), we can implement low complexity demodulator.

This means that with simple comparisons of the bits of I and Q values, we can determine the sequence of bits being sent. Thus, when using 64-QAM, the worst case scenario is performing total 14 comparisons, where several can be performed simultaneously. Therefore, the power consumption of this demodulator is very small, and the demapping would require only few cycle periods.

The simplicity of the de-mapping module, when implemented with VHDL, is demonstrated with the results of the synthesis report, summarized in Table 1. The results prove that the device utilization is below 1%.

## III. CHANNEL EQUALIZATION

When designing a Single Carrier transceiver system, huge attention should be paid to the type of equalizer that the system would employ. This is a consequence of the fact that the equalizer is the most time requiring and power consuming module of the system. At the same time, the reliability of the system depends on the employed equalizing technique, considering the needed retransmissions that can slow down the system. If the power consumption of the system is of huge importance too, then designing an equalizer with optimal performance could be especially challenging.

There are numerous possibilities when implementing an equalizer for SC modulation. In this section a brief overview of the possible approaches is depicted. The main decision needed to be taken is whether the equalization should be performed in time or in frequency domain. The later implies using Fast Fourier Transform - FFT and Inverse Fast Fourier Transform - IFFT where the equalization is being performed i.e. the receiver.



Fig. 4. Block diagram of Decision feedback equalizer - DFE.

### A. Time-domain equalization

Since the carrier in an SC transceiver is emitted in time domain, it is intuitive to consider time-domain equalization. The equalizing implemented in time domain is often comprised of two conceptual parts. The first part deals with the precursor ISI, and is referred as linear equalizer -LE. The output of LE is a linear combination (weighted sum) of the received signal and a finite number of previous input values. The cursor is equal to the symbol with the highest amplitude, and its position defines the channel delay [4]. The second part does adaptive equalizing and is conceptually based on the DFE equalizer, explained in later subsection.

Many implementations show that the traditional SC system with adaptive time domain equalization has very high signal processing complexity. This is more emphasized when the length of the channel impulse response exceeds 20 data symbols, which, when using the 60 GHz band, is often the case.

The biggest advantage using this approach is the lack of implementation of the FFT and IFFT required for frequency-domain equalization, which does reduce the power consumption of the system. If this aspect is very important, the time-domain equalization becomes valuable candidate for considering.

#### B. Frequency-domain equalization

As it is emphasized in [6], when implementing the equalizer in time domain, the complexity of the module is very high. Since the yielded mathematical equations that need to be implemented in hardware are very complex, and thus consume a lot of processing time, often in practice the time-domain approach is avoided for SC systems. Moreover, to get more accurate results the distance between the transmitter and receiver should be known in advance. Since this is very impractical as it conflicts with the practical advantages of using wireless networks from users' perspective, often it is considered worth going into the frequency domain, performing simple one-tap equalization, after which we go back into time domain and perform the symbol de-mapping.

In practice, it is usual to implement the equalizer in frequency domain, in which case, the equalizing operation is reduced to simple multiplication of the incoming data with coefficients that are previously calculated and stored in buffers. When the equalizer is implemented in the frequency domain, its complexity is comparable with the one of an OFDM system. As aforementioned, this implies that an FFT and IFFT are both used at the receiver (Fig. 1), where in between a complex number multiplication is being performed (this multiplication can be optimized as



Fig. 5. Diagram of example implementation of a hybrid equalizer, [3].

illustrated in [6]). The values of the coefficients participating in the multiplication are directly connected with the characteristics of the 60 GHz channel.

#### C. Decision Feedback equalization

To improve the accuracy of an FDE equalizer, very often DFE equalizers are being used. These, so called Decision Feedback Equalizers, use a feedback link which starts from the symbol being detected and affect the values with which we do the multiplication, as illustrated in Fig. 4. The idea of the DFE equalizers is simple: supposing that the detected symbol (using specific demodulation technique) is correct, we can "predict" its effect on the symbols that are yet to be equalized (hopefully corrected) and demodulated.

Since the symbols are being detected after the IFFT operation (thus in time domain) and the equalization is performed in frequency domain, very often in practice, a third FFT module is used in order to improve the precision of the equalizer. The output of the third FFT is used to dynamically change the values of the coefficients that are used to perform the equalization in the frequency domain.

If the power consumption of the system is important too, a different approach is often considered, which is illustrated in [6]. Instead of using a third FFT module, intermediate values are dynamically calculated, and then subtracted from the already equalized (in the first part of the equalizer) values of the received signal. Consequently, the calculation of these values consumes additional energy, but this is proven to be essential, since the part of the equalizer that deals only with the precursor ISI ([13]) does not have good performance in practice, in terms of efficient equalization.

This type of equalizer has lower hardware complexity, comparing it with the TDEs, and thus is often a preferred choice for a SC system. In contrary to the linear equalizer, it does not amplify the channel noise. In other words, it is harder for an error to occur. However, if a symbol is incorrectly detected, the propagation of this error is much bigger, leading to increased number of bit errors.

## D. Hybrid equalizers

Considering the various disadvantages and advantages of the aforementioned types of equalizers, very often in practice a hybrid type of equalization is implemented.

Example implementation of such equalizer is demonstrated in Fig. 5, [3]. The equalizer consists of linear equalizer (LE) and two decision-feedback equalizers: a main DFE and a sub-DFE. To cancel the pre-cursor ISI, in this implementation a LE with a limited number of taps is used. The sub-DFE is used to compensate the latency of the loop by limiting the feedback delay to one symbol period.



Fig. 7. SC frequency domain equalizer without DFE. The figure illustrates the modules of FDE equalizer, where the coefficients for performing the equalization are dynamically calculated using the CES field of the SC preamble. First, the CIR is calculated, then using FFT the CFR is being calculated, and the results are stored in buffers (which values are re-calculated before storing depending on the equalization algorithm being applied). After the FFT operation of the CP and the Payload of the frame, the complex number multiplication can be performed with the stored FIR values.

#### IV. PARALLEL ARCHITECTURE OF SC RECEIVER

The results presented in [6] show data rates that are below the goals of our project. Thus, to achieve higher throughput of the SC transceiver, in this paper we propose a different architecture, where the leading idea for performance improvement is hardware parallelization. The aim is to achieve data processing with high frequencies in order to accomplish multi-gigabit data rates. Since we use an FPGA platform, such high frequencies are not achievable even with more advanced FPGA boards. Thus, we propose a design of the receiver in a 60 GHz SC communication system, which has 4-parallel hardware architecture as illustrated in Fig. 6.

Since we present 4-parallel lines of processing, the clock period can be set to  $\frac{1}{4}$  of the input symbol rate. In other words, the clocking signal of the ADC (the input symbol sampling rate) is 4 times faster than the signal clocking the rest of the modules. This way, by setting each of the

modules to use the maximal available frequency, we would achieve 4 times bigger throughput of the receiver. This can be implemented if the input symbol sampling rate is not smaller than 4 times the maximal frequency. In the rest of the cases we should use <sup>1</sup>/<sub>4</sub> of the input symbol rate as stated afore.

The ADC converted values, which works 4 times faster than the rest of the modules, are stored in reserved buffer. This buffer copies the values of the A/D convertor when there are enough values for computing (4 frames). Then the synchronization is performed according to the sync field in the preamble of each frame, for which purpose we use 4 separate synchronization modules. From this point on, the four frames are being processed in parallel, where at each stage the corresponding field of the preamble is removed, as demonstrated in Fig. 6.

## A. Type of equalizer for a parallel SC receiver

As illustrated in the previous section, there are many types of equalizers that can be implemented. Although when the power consumption of the system is important the timedomain equalizers should also be further investigated, in this paper we focus on the implementation of a frequencydomain equalizer, in continuance to our previous work [6].

Since the nature of the DFE approach is serial processing of the data, because the output of the equalizer is fed back to the input of the second part of the equalizer (the part of the equalizer that deals with the post-cursor ISI), this approach contradicts to our intent to increase the data throughput of the system by hardware parallelization. We should also note the disadvantage of this type of equalizer, its long error propagation.



Fig. 6. Quad-parallel SC receiver architecture.

Consequently, in this paper we suggest equalization performed in frequency domain with coefficients that are being dynamically calculated. This topic has already been induced afore, when introducing the Golay complementary sequences, specifically when emphasizing their application in the communication systems. By using the GCS, we can dynamically determine the multipath channel impulse response – CIR information. By knowing the CIR, we would be able to calculate the multipath channel frequency response – CFR, and multiply the output of the FFT with these values (as illustrated in Fig. 7).

$$H(4k+j) = \frac{1}{2} (FC_{ra}(4k+j) + FC_{rb}(4k+j)) \quad (1)$$

The CFR calculation can be performed as demonstrated in [15]. The Golay correlator, based on the GCSs that the preamble of each frame consists of, calculates the cross-correlation values C'<sub>ra</sub>(4n+j) and C'<sub>ra</sub>(4n+j), j=0~3. At each stage in the Golay correlator the output is 1 bit shifted and C'<sub>ra</sub>(4n+j) and C'<sub>ra</sub>(4n+j) are sent to the FFT to calculate their frequency domain equivalent values (FC'<sub>ra</sub>(4k+j) and FC'<sub>ra</sub>(4k+j)). The final CFR values are average of the two at each frequency index (as illustrated in equation 1). Once the CFR is being calculated, the linear equalization algorithm can be performed, yielding the final equalizing W<sub>1</sub> values, where l=0,1,...,63.

#### B. Parallel data processing

From the four frames, that are processed in parallel, their CES fields from their preambles are firstly removed. The CES field is used by the Golay correlator in order to determine the channel impulse response. The afore-illustrated equalizer implies that it has two phases. First of all, the FFT should be used for the calculation of the needed  $W_1$  coefficients. If the Minimum Mean Square Error - MMSE equalization algorithm is applied, the FIR values should be normalized and inversed before storing them [18]. Otherwise, if the Zero Forcing – ZF algorithm is being used, the receiver would be able to perform the equalization.

In order to complete the equalization, the FFT first takes the rest of the frame (the CP and the payload) and performs the FFT operation, by which point we have the frequency equivalent of the received frame. As emphasized in [6], the CP must not be removed before the FFT operation is done,

TABLE II Results of the Performed Synthesis of the Implemented Parallel Architecture

| Parameter                        | Value       |  |  |  |  |
|----------------------------------|-------------|--|--|--|--|
| Maximum frequency                | 229.609 MHz |  |  |  |  |
| Maximum combinational path delay | 0.55ns      |  |  |  |  |
|                                  |             |  |  |  |  |

The results are referring to the FFT, the optimized multiplication of complex numbers [6], and the IFFT, of the illustrated parallel architecture.

when working in SC mode. The final step of the calculation of the  $W_1$  coefficients can be performed simultaneously with the FFT calculation of the CP and the payload.

TABLE III Comparison Between Serial and Parallel Architecture Based on the Synthesis Report

| BASED ON THE STATILESIS REFORT               |                            |                       |  |  |  |  |  |  |
|----------------------------------------------|----------------------------|-----------------------|--|--|--|--|--|--|
| Parameter                                    | Serial<br>architecture [6] | Parallel architecture |  |  |  |  |  |  |
| Maximum frequency                            | 229.609 MHz                | 229.609 MHz           |  |  |  |  |  |  |
| Maximum combinational path delay             | 0.55ns                     | 0.55ns                |  |  |  |  |  |  |
| Minimum input arrival time before clock      | 2.291ns                    | 3.189ns               |  |  |  |  |  |  |
| Maximum output required time after clock     | 3.259ns                    | 3.259ns               |  |  |  |  |  |  |
| Total number of paths /<br>destination ports | 197005 / 19625             | 563772 / 70204        |  |  |  |  |  |  |
| Delay                                        | 4.355ns                    | 4.355ns               |  |  |  |  |  |  |
| Cell:in->out -> fanout                       | 13                         | 24                    |  |  |  |  |  |  |
| Cell:in->out -> Gate<br>Delay                | 0.818ns                    | 0.818ns               |  |  |  |  |  |  |
| Cell:in->out -> Net Delay                    | 0.546ns                    | 0.832ns               |  |  |  |  |  |  |

Summary comparison between the serial and parallel architectures. For this purpose only the FFT, complex number multiplication and IFFT of both types of architecture are compared. The implementation has been performed by using VHDL, Virtex-5 FPGA board, and 5vlx110tff1136 as selected device with speed -1.

When the FFT operation is done, the multiplication of the  $W_1$  coefficients with the output of the FFT can be performed. Since we have multiplication of complex numbers, this operation can be optimized as illustrated in [6]. The result of the multiplication is again converted into time domain and stored into buffers.

The equalized data can then be de-modulated using four de-modulators that perform the corresponding demodulation technique. By this operation from each complex pair of I/Q values we get the equivalent of the detected symbol. In case of 64-QAM, this would yield a sequence of 6 bits, that is stored in a buffer.

## C. Performance analysis of the parallel architecture

As expected, the parallelization improves the performance of the system in aspect of the processed data per second (the throughput of the system), for a price of increasing the power consumption. The results of the performed synthesis of the part of the parallel architecture, which is consisted of the FFT, the optimized multiplication of complex numbers [6] and the IFFT, are presented in Table 2.

In order to compare the parallel architecture with its serial counterpart demonstrated in [6], we only consider the receiver where the equalization is performed. Both architectures represent different types of equalizers, which have in common that they perform the equalization in the frequency domain. Therefore, the comparison only makes sense if we compare only the part of the equalizers consisting of the FFT, the complex number multiplication and the IFFT.

The proposed architecture was implemented using VHDL and the Xilinx Virtex-5 FPGA based platform. The results of the synthesis reports for both architectures are summarized in Table 3. For both types of the architecture the used device was 5vlx110tff1136, and selected speed grade of -1.

The summary of the device utilization, comparing the serial and parallel architecture is shown in Fig. 8:

|   | Device utilization summary:                                             | ice utilization summary: |        |       |     |   |   | Device utilization summary:                                             |         |          |        |          |
|---|-------------------------------------------------------------------------|--------------------------|--------|-------|-----|---|---|-------------------------------------------------------------------------|---------|----------|--------|----------|
|   |                                                                         |                          |        |       |     |   |   |                                                                         |         |          |        |          |
|   | Selected Device : 5vlx110tff1136-1                                      |                          |        |       |     |   |   | Selected Device : 5vlx110tff1136-1                                      |         |          |        |          |
|   | Slice Logic Utilization:                                                |                          |        |       |     |   |   | Slice Logic Utilization:                                                |         |          |        |          |
|   | Number of Slice Registers:                                              | 11749                    | out of | 69120 | 16% |   |   | Number of Slice Registers:                                              | 25292   | out of   | 69120  | 36%      |
| 1 | Number of Slice LUTs:                                                   | 10688                    | out of | 69120 | 15% |   | 1 | Number of Slice LUTs:                                                   | 21236   | out of   | 69120  | 30%      |
|   | Number used as Logic:                                                   | 7471                     | out of | 69120 | 10% |   |   | Number used as Logic:                                                   | 17828   | out of   | 69120  | 25%      |
|   | Number used as Memory:                                                  | 3217                     | out of | 17920 | 17% |   |   | Number used as Memory:                                                  | 3408    | out of   | 17920  | 19%      |
|   | Number used as RAM:                                                     | 680                      |        |       |     |   |   | Number used as RAM:                                                     | 1184    |          |        |          |
| L | Number used as SRL:                                                     | 2537                     |        |       |     |   |   | Number used as SRL:                                                     | 2224    |          |        |          |
|   | Slice Logic Distribution:                                               |                          |        |       |     |   |   | cline Lonia Distribution.                                               |         |          |        |          |
| - | Number of LUT Flip Flop pairs used:                                     | 13591                    |        |       |     |   |   | Slice Logic Distribution:<br>Number of LUT Flip Flop pairs used:        | 22016   |          |        |          |
| 4 | Number of Lot Filp Flop pairs used:<br>Number with an unused Flip Flop: |                          | out of | 12501 | 13% |   |   | Number of LOT FILP FIOP pairs used:<br>Number with an unused Flip Flop: |         |          | 22016  | -14%     |
|   | Number with an unused LUT:                                              |                          | out of |       | 21% |   |   |                                                                         |         | out of   |        | -14%     |
|   |                                                                         |                          |        |       | 65% |   |   |                                                                         |         |          |        |          |
|   | Number of fully used LUT-FF pairs:                                      |                          | out of | 13591 | 65% |   |   | Number of fully used LUT-FF pairs:                                      |         | OUT OT   | 22016  | 111% (*) |
| L | Number of unique control sets:                                          | 293                      |        |       |     |   | L | Number of unique control sets:                                          | 416     |          |        |          |
|   | IO Utilization:                                                         |                          |        |       |     |   |   | IO Utilization:                                                         |         |          |        |          |
|   | Number of IOs:                                                          | 130                      |        |       |     | E |   | Number of IOs:                                                          | 514     |          |        |          |
| L | Number of bonded IOBs:                                                  | 130                      | out of | 640   | 20% |   | L | Number of bonded IOBs:                                                  | 514     | out of   | 640    | 80%      |
|   | Specific Feature Utilization:                                           |                          |        |       |     |   |   | Specific Feature Utilization:                                           |         |          |        |          |
|   | Number of BUFG/BUFGCTRLs:                                               | 1                        | out of | 32    | 3%  |   |   | Number of BUFG/BUFGCTRLs:                                               | 1       | out of   | 32     | 3%       |
|   | Number of DSP48Es:                                                      | 22                       | out of | 64    | 34% |   | 4 | Number of DSP48Es:                                                      | 152     | out of   | 64     | 237% (*) |
|   |                                                                         |                          |        |       |     |   |   |                                                                         | ~       |          |        |          |
| L |                                                                         |                          |        |       |     |   | L | WARNING:Xst:1336 - (*) More than 100                                    | ∧ OT De | vice res | ources | are used |

Fig. 8. Device utilization comparison of the synthesis reports for the serial (on the left side) and the parallel (on the right side) receiver architecture, respectively.

As can be noticed, many of the resources are not increased by a multiple of four after the synthesis. The number of used slice registers and LUTs is about twice higher than the same of the serial architecture, instead of four times higher, as it was primarily expected.

#### V. CONCLUSION

The 60 GHz band represents a great technology for building multi-Gb/s wireless systems for indoor communications. Along with the millimeter-wave technologies the SC and OFDM modulations are widely used. In this paper, we focus on the SC modulation technique, where the carrier is occupying the whole frequency band, and thus more information is conveyed.

In this paper, we enumerate the modules from which an SC transceiver is consisted of, while focusing on the equalizer, as the most challenging module when designing the SC system. The possible types of implementations of the equalizer are briefly listed.

To achieve high data rates, we propose quad-parallel hardware architecture of the receiver. The results of the performed synthesis of such architecture, which was implemented on an FPGA board, are also presented, in comparison with the corresponding serial implementation.

Important conclusion that can be drawn from the presented results in this paper is that the used resources do not increase by a multiple of 4 as expected. Therefore, we suggest that further improvement of the presented design is definitely worth considering even in cases when the power consumption is important. Furthermore, the parallel architecture is essential for achieving data rates of 5 Gb/s and beyond.

Since the final transceiver would potentially employ both modulations (OFDM and SC), and moreover, since for the OFDM modulation the FFT is essential part of the receiver, we consider that the approach of implementing the equalization in the frequency domain as quite reasonable. This is based on the idea of module reutilization. In other words, the design can be such that when the transceiver works in SC or in OFDM mode it will use the same modules, whenever possible.

#### VI. FUTURE PLANS

Although this approach potentially offers great advantages, which are crucial for achieving one of the goals of our project - the high data rates, it also introduces potential disadvantages which implies that it should be further measured, compared with other implementations, and accordingly improved.

Our future plans that refer to the above suggested architecture include using a FFT module that does the transform using parallel architecture within the module itself. This is a promising approach, since there is lot of research done in this area which does improve the FFT performance significantly [16], [17].

The tradeoff between the power consumption and the improvement of the speed of data processing when implementing additional FFT should be also investigated. Since the presented architecture has two phases, one in which the equalizing coefficients are calculated, and another in which the equalization is performed, namely we can improve the performance if additional FFT is used. Meanwhile, during the calculation of the FIR (from the CIR) and its normalized and inversed values, we can calculate the frequency equivalent of the CP and the payload of the frame (with the other FFT). After the two FFT modules finish the conversion, we can perform the second phase, the complex number multiplication, by which we would decrease the processing time.

Since the results presented in this paper are promising, our future plans include increasing the level of parallelization. Instead of having quad-parallel architecture, we can implement higher parallelization achieving higher data rates. Here, an important limitation is the frequency of the ADC. As mentioned, important consideration is the reutilization of the implemented modules. Each of the two modulations has disadvantages and advantages depending on the scenario under which the system works. The dynamic switching between the two modulation techniques is offering great improvement of the system. Thus, our future plans include implementation of a design able to share the modules between the two supported modulations.

#### ACKNOWLEDGMENT

This work was partially supported by the ERC Starting Independent Researcher Grant VISION (Contract n. 240555).

#### REFERENCES

- S. Yong, P. Xia, and A. Valdes-Garcia, "60 GHz technology for Gbps WLAN and WPAN: from theory to practice," ISBN 9780470972939, 2011.
- [2] Agilent Technologies, "Wireless LAN at 60 GHz IEEE 802.11ad Explained : Application Note," 5990-9697EN, USA, 2013.
- [3] J. Park, B. Richards, and B. Nikolic, "A 2-Gb/s 5.6-mW Digital Equalizer for a LOS/NLOS Receiver in the 60GHz Band," IEEE Asian Solid-State Circuits Conference, China, Nov. 2010.
- [4] S. Binggeli, "Fractionally Spaced Equalizer for NLOS Receiver in the 60 GHz Band," Master Thesis, Berkeley University of California, 2011.
- [5] M. G. Parker, K. G. Patersony, and C. Tellamburaz, "Golay Complementary Sequences," Jan. 2004.
- [6] T. Chavdarova, G. Jakimovski, B. Jovanov, A. Tentov, and M. Malenko, "Analysis and implementation of frequency domain equalizer for single carrier system in the 60 GHz band," presented at 9th Annual International Joint Conferences on Computer, Information, Systems Sciences, and Engineering, CISSE Online E-conference, Dec. 12-14 2013 (to be published).
- [7] IEEE P802.11 Wireless LANs, "Complete Proposal for 802.11ad," doc: IEEE 802.11-10-0499-02-00ad, May 2010.
- [8] IEEE 802.15.3c (2009), "IEEE Standard for information technology Telecommunications and information exchange between systems – Local and metropolitan area networks – Specific requirements. Part 15.3: Wireless medium access control (MAC) and physical layer (PHY) specifications for high rate wireless personal area networks (WPANs) Amendment 2: Millimeter-wave-based alternative physical layer extension," [Online]. Available: http://www.ieee802.org/15/
- [9] Scott Foster, "Impulse Response Measurement Using Golay Codes," Trans, ICASSP '86, pp.929-932, 1986. [Online]. Available: http://www.researchgate.net/publication/224737472 Impulse respons e measurement\_using\_Golay\_codes
   [10] A. Hernández, J. Ureña, D. Hernanz, J. J. García, M. Mazo, J.
- [10] Å. Hernández, J. Ureña, D. Hernanz, J. J. García, M. Mazo, J. Dérutin, J. Serot, and S. E. Palazuelos, "Real-time implementation of an efficient Golay correlator (EGC) applied to ultrasonic sensorial systems," March 27 2003. [Online]. Available: <u>http://www.geintrauah.org/system/files/private/a2003.pdf</u>
- [11] National Instruments, "What is I/Q Data?" Sep 12, 2013. [Online]. Available: <u>http://www.ni.com/white-paper/4805/en/</u>
- [12] Fung, Andy W. Y., "Suboptimal soft-bit level demapper for M-QAM-OFDM systems," Hong Kong, Dec. 2010.
- [13] C. Langton, "Inter Symbol Interference (ISI) and raised cosine filtering," 2002.
  [14] H. Xu, V. Kukshya, and T.S Rappaport, "Spatial and temporal
- [14] H. Xu, V. Kukshya, and T.S Rappaport, "Spatial and temporal characteristics of 60-GHz indoor channels," IEEE J. Selected Areas in Communications, vol. 20, pp.620-620, Mar.2002.
- [15] F. Hsiao, A. Tang, D. Yang, M. Pham, M. F. Chang, "A 7Gb/s SC-FDE/OFDM MMSE Equalizer for 60GHz Wireless Communications," IEEE Asian Solid-State Circuits Conference, Korea, Nov. 2011.
- [16] Miloš Krstić, Maxim Piz, Marcus Ehrig, Eckhard Grass, "OFDM Datapath Baseband Processor for 1 Gbps Datarate," IHP, Im Technologiepark 25, Frankfurt, Germany.
- [17] Somasundaram Meiyappan, "Implementation and performance evaluation of parallel FFT algorithms," Matriculation No.: HT023601A, National University of Singapore, Singapore.
- [18] K. Takeda and F. Adachi, "Frequency-Domain MMSE Channel Estimation for Frequency-Domain Equalization of DS-CDMA Signals," IEICE Trans. Communications, vol. E920-B, pp1746-1753, July 2007.