A Simple Modelling Tool for Fast Combined Simulation of Interconnec- tions, Inter-Symbol Interference and Equalization in High-Speed Serial Interfaces for Chip-to-Chip Communications

Article history: Received: 14 January, 2020 Accepted: 20 February, 2020 Online: 10 April, 2020


Introduction
This paper extends the work presented at the 42 nd International Conference on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2019) [1] and describes a simple and efficient tool for fast system-level simulation of high-speed serial interfaces, a topic that has received much attention in the past two decades due to its relevance in modern electronic systems: As the miniaturization of CMOS integrated circuits (IC) keeps following the path described by Moore's law [2], the amount of components integrated onto single devices, the number of functionalities available on single ICs and their speed increases significantly every few years. Over the last decades, evidence has arisen that the major bottleneck in performance shifted from computational capabilities and the associated power consumption towards communication between different ICs [3]. In fact, many applications such as modern microprocessors, servers, micro-controllers, FPGAs, even portable devices and, recently, automotive systems require High-Speed I/O (HSIO) modules capable of handling data rates up to 128 Gb/s with energies per bit as low as 1 pJ [4][5][6][7][8][9]. Moreover, as in many appli-cations chip area and pin availability pose strict design constraints, the aforementioned devices cannot support parallel I/O that would reduce the data rate of individual channels, implying that such communications need to be implemented as high-speed serial interfaces (HSSI). In the present paper, the terms HSIO and HSSI will be used interchangeably to denote high-speed serial communication devices.
At multi-Gb/s data rates, performances are highly affected by impedance discontinuities in the interconnections due to PCB characteristics, presence of vias and package features; non-perfect impedance matching due to fabrication imperfections or poor compatibility for different devices; and by the dispersive nature of the transmissive medium at high frequencies [4,10]. All these phenomena concur in causing Inter-Symbol Interference (ISI), which manifests itself as a smoothing and widening of the pulses sent along the channel so that they are superimposed to other symbols transmitted in the neighbouring unit intervals (UIs), thus increasing the Bit-Error Rate (BER) at the receiver, dramatically impairing the quality of the transmission [4,10,11].
In order to cope with this, HSIOs are required to implement complex equalization strategies, both at the transmitter or at the receiver, and in the analog, mixed-signal or digital domains [10,[12][13][14][15]. Such techniques include Feed-Forward Equalization (FFE) at the transmitter, Continuous-Time Linear Equalization (CTLE) and Decision-Feedback Equalization at the receiver [10]. FFE uses an FIR filter that applies a pre-distortion to the transmitted pulses in order to preemptively compensate for the channel distortion; CTLE comprises a peaking amplifier mainly employed to compensate for the high frequency attenuation of the channel and possibly provide additional gain control at low frequency; in DFE the history of recent received bits is stored into a shift register and used to correct the received analog signal in order to cancel ISI at the input of the slicer, either through FIR or IIR filters.
One of the main challenges in the design and implementation of such equalization techniques is the fact that HSIOs are supposed to operate on a variety of channels whose features are unknown at design time. Thus, the optimal parameters of the equalizers cannot be precisely known and set a priori during the design phase, unless the resulting suboptimal performance can be tolerated, when it does not completely impede communication. Even in such rare cases where the transmissive medium is well known, the design itself is intrinsically dependent on process, voltage and temperature (PVT) variations and technology corners, all of which need to be counteracted by the equalizers. Therefore, calibration and adaptation strategies are required in order to find the optimal equalization parameters for the actual channel [13,[15][16][17]. Full adaptation automatically performs such a task, and is usually implemented in the form of Sign-Sign Least-Mean Squares (SS-LMS) algorithms due to the short time required to adjust the equalization parameters and the simplicity of their realization [11,12,15,16,[18][19][20][21].
Moreover, HSIOs are also equipped with algorithms for clock and data recovery (CDR), and even performance monitoring, hence making up very complex electronic systems [12,22]. Such a complexity cannot be conveniently handled through transistor-level descriptions because of the extremely long simulation times that they require. Therefore, various system-level models have been proposed in the last decades to aid the design of HSSIs, mainly using statistical techniques [19,[23][24][25][26]. Such tools are very important for the initial system-level assessment in the design of chip-to-chip HSIOs for selecting design specifications such as the number of equalization taps, the amount of high-frequency content that needs to be equalized, evaluating the Signal-to-Noise Ratio (SNR) and the overall jitter that can be tolerated without degrading the BER.
The design and analysis of HSIOs comprising a variety of complex equalization techniques require efficient system-level models capable of producing fast and accurate predictions of the system behaviour. Extending the work presented in [1] and relating it with contributions from [19,27], this paper shows how fast system-level simulations of high-speed serial interfaces can be performed with a simple modular model. The paper proceeds as follows: Section 2, starting from the architecture of a generic HSSI, describes the numerical model, how it evaluates performance accounting for jitter and how fully-adaptive equalization is computed; Section 3 shows some sample simulation results and comparisons with post-layout transistor-level simulations, demonstrating the capabilities of the proposed approach; finally, conclusions are drawn in Section 4.  Figure 1: Scheme of a generic high-speed serial interface with equalization: h tot,i is the overall pulse response of the TX (FFE+driver) + channel + RX (ampli-fier+CTLE+DFE) system; c is a generic equalization parameter (e.g. a filter tap), which may be either statically set or automatically adapted [19].

Architecture of the Transceiver
In order to accurately model the system performance of a generic HSIO device, the general model depicted in Figure 1 and extensively described in [19] is considered: Denoting by the subscript i the sampling instant t b = iT b (where T b corresponds to a bit period, i.e. one Unit Interval UI), the data sequence d i is sent by the differential transmitter tx at a bitrate f b = 1/T b , optionally implementing FFE; the channel, whose sampled pulse response is h ch,i , can be modelled either as two independent single-ended lines or as a coupled differential line; the receiver rx contains an amplifier, a CTLE and a DFE, and produces the analog voltage y i ; the slicer makes decisions on such a voltage (d i = sign(y i ), i.e.d i = 1 if y i > 0 V and −1 otherwise) and its sampling point can be modified with respect to the one determined by the CDR in order to perform optimal sampling [12,19]; moreover, when performing full adaptation, the analog voltage y i is compared with a reference voltage dLev [15] to determine the error e i (i.e. the distance of the actual sample to dLev, usually defined as the desired voltage level corresponding to a '1' bit), and use this information to perform adaptation. Assuming that the BER is small (either because the channel has low loss or because it is well equalized), the reconstructed datad i is equal to the transmitted data d i .

Numerical Model of the Transceiver
The numerical model implemented in Matlab exploits a fast approach for the modelling of ISI and equalization in HSIOs, the flowchart of which is here summarised in Figure 2 and detailed in the following paragraphs.

Transmitter
The idealised transmitted waveform is modelled as the trapezoidal pulse v pulse (t), shown in Figure 3a and characterised by its duration, amplitude and by the slope of its edges (rise and fall times t rise = t fall ); such a waveform is easily Fourier-transformed, giving V pulse ( f ), to which FFE is applied by summing weighted delayed versions of the transformed pulse itself: where w n are the weights of the N ffe FFE taps, subject to the constraint N ffe −1 n=0 |w n | = 1 due to the fact that the power available in the

Microstrip geometry and parameters
Package, PCB, etc.
CTLE's poles and zeroes Figure 2: Diagram of the procedure implemented to obtain the received pulse response [1]. F and F −1 stand for the Fourier and the inverse Fourier transforms, respectively.   driver is limited. The effect of such an operation is shown in the frequency domain in Figure 3b. The proposed approach considers a transmitter's impedance which is kept constant and does not change at high or low outputs, as is the case in Input-Output Buffer Information Specification (IBIS) models [28,29]. Moreover, the pulse shape used in the proposed method (trapezoidal shape with t rise = t fall ) is chosen in order to exploit the channel's linearity and hence use the channel's pulse response in the various computations instead of its step response [26]; in other words, a sequence of pulses v pulse (t) having the same amplitude results in a constant voltage level, which is not the case with other pulse shapes, see [30].

Communication Channel
The transmissive medium is generally modelled as a differential line, made up either of two independent microstrips (to simulate lines placed at some distance from each other as to minimise interactions) or a single coupled microstrip excited with an odd mode (to realistically reproduce differential signalling). This is then used to reproduce the salient features of any other type of transmission line, such as the target attenuation at a certain frequency or its characteristics impedance.
The microstrip features are computed from the line's geometry and material parameters following the approach defined in [31][32][33][34][35][36] for single-ended lossy microstrips, or extended to the coupled case according to [37,38], and then used to extract its per-unit-length parameters r( f ), l( f ), c( f ), g( f ) considering dielectric losses and skin effect, all of which are among the main contributors to ISI [4].
Such a result can then be combined with H ext ( f ), a transfer function representing the socket or package and incorporating notch filters, which can be used to take into account discontinuities, vias, etc. in order to provide a complete description of realistic channels.
www.astesj.com H ext ( f ) can be calculated with a model of the package, e.g. in terms of parasitic resistance, inductance and capacitance, which allows a straightforward evaluation of its transfer function in terms of poles and zeroes, while the contributions due to impedance discontinuities or vias can be taken into account by fitting the features of actual measurements of the transmission line's S parameters to the transfer function of notch filters in the form where ξ is the filter's damping factor and f 0 is its notch frequency. Using as additional parameters the driver's and the receiver's termination impedances (Ztx and Zrx, respectively), the transmission line transfer function is computed from the telegrapher's equations as where γ = (r + iωl)(g + iωc) is the propagation coefficient, L is the line length and Γ tx/rx = Ztx/rx−Z 0 ( f ) Ztx/rx+Z 0 ( f ) are the reflection coefficients corresponding to the transmitter and the receiver. Note that, due to the inclusion of Ztx/rx, (3) takes into account possible non-perfect matching among driver, transmission line and receiver, which is shown as an example in Figure 4: The mismatch produces a reflection that contributes to ISI.

Receiver
The CTLE is modelled as a rational function H CTLE ( f ) characterised by the CTLE's poles and zeroes; optionally, an extraction of the CTLE's transfer function from simulations of the transistor-level HSIO can be used to reproduce more accurately a realistic implementation (and the frequency response of other analog blocks in the receiver can be similarly taken into account). The received signal associated to the transmitted trapezoidal pulse has spectrum V rx ( f ) V tx ( f )H ch ( f )H CTLE ( f ); an example of this is shown in Figure 3c with some of its sub-components.
h(t) h eq Figure 5: Procedure used by the model to apply the DFE correction to the received analog pulse response h rx (t) to obtain h(t) and eventually its sampled version h eq . The DFE taps are rectangular pulses 1 UI wide and centred on the sampling point determined by the CDR.
The received pulse response h(t) at the slicer's input is then obtained through inverse Fourier transform of V rx ( f ) using the procedure in [39], yielding h rx (t) (an example of h rx (t) is shown in Figure 3d). Application of the DFE correction is performed as shown in Figure 5, i.e. by subtracting from h rx (t) rectangular pulses with amplitude equal to the tap weights a i and centred on the sampling point determined by the CDR.
The procedure above implicitly assumes that the CDR has reached its steady state and is locked. Its impact on the behaviour of the HSSI is twofold: It determines the sampling point for data, error and edge samples (which is related to the position of the "rectangles" associated to the DFE, as mentioned above), while the jitter at its output is responsible for a reduction of BER (as will be explained in Section 2.4). For what the sampling point is concerned, we can simply assume that the data samples correspond to the maximum of the pulse response h(t); alternatively, an Alexander CDR [40] can be emulated by determining the time instants corresponding to h rx,−0.5 and h rx,0.5 (which are the positions of the edge samples of the CDR in a real implementation [27]) and then assume that the data sample is exactly in between. On top of it, an algorithm for optimal sampling point may be used to determine a shift from the output of the CDR, which results in an improved sampling position [12,19]. Any of the above can be selected and all of them aim at sampling as close to the centre of the eye as possible in order to reduce the probability of error.
Sampling by the bit period T b is eventually performed in order to www.astesj.com 530

Evaluating the HSSI Performance
Performances of the transceiver are determined mainly by computing the eye diagram, constructed by folding the received signal over a time length of 1 UI, which allows to observe all the transitions that take place during operation of the serial link and their density; and by calculating the bathtub plot, which shows the cumulative distribution function of the received errors over the same time span of the eye diagram, indicating the sampling positions that result in an increased BER [4]. Both such metrics require probabilistic calculations in order to maintain computation times low [19,23].
From the sampled pulse response h eq one can compute all possible values of the voltage y i at the samplers, due to all the possible sequences of bits that can be sent, as where L is a column vector containing such voltage levels and P is a permutation matrix which contains all the possible bit sequences of a certain length that can be transmitted. In fact, P is structured as a truth table: It features a number of columns equal to n pre + 1 + n post , where n pre and n post are the number of pre-and post-cursors, respectively, which can be chosen according to their relevance in the pulse response; while the number of rows is 2 n pre +1+n post , i.e. the number of all possible sequences composed of n pre + 1 + n post bits. In other words, L considers all the possible ways in which the samples of the pulse response can combine due to ISI, hence simulating observation of the received analog voltage y i over a sufficiently long time span. Moreover, L implicitly depends on the choice of the sampling instant t s through the sampled pulse response h eq . By assuming that the eye is vertically symmetric (the transceiver behaviour when transmitting a '1' bit is the same as though a '0' was sent, just with a sign reversal), only the cases in whichd = 1 are useful for the purpose of computing the HSIO performance. By coding the '1' and '0' bits as 1 and −1 values, respectively, and keeping only the non-redundant rows, e.g. for one pre-and two post-cursors such a reduced matrix (denoted by ') reads www.astesj.com 531 Equation (4) provides an easy way to compute the eye diagram and the bathtub plot. The eye diagram can be computed by sampling where T b is the bit period (1 UI), and creating histograms eye 1 (V, t s ) of the corresponding L(t s ); due to the fact that the eye for the '0' bit is just the flipped version of the one for the '1' bit (as follows from the assumption of symmetry), they can be combined to obtain the overall eye diagram The bathtub plot, i.e. the BER corresponding to a voltage threshold equal to 0 V as a function of the sampling instant t s , is then given by the probability that a '1' bit is misinterpreted for a '0' (that is the same as the probability that a '0' bit is misinterpreted for a '1', due to the above assumption of symmetry): The overall procedure for computing the eye diagram and the bathtub plot is shown in Figure 6, summarising the flow described in this Section and in the following.

Including the Effect of Jitter
The effect of jitter on the receiver can be optionally taken into account by simply convolving the single eye 1/0 of (6) and the probability density function of sampling time t s corresponding to the jitter component of interest pdf x (t s ) [41]. As an example, considering an oscillator in the receiver affected by random jitter, the period jitter of which has variance σ pj (in other words, a clock with phase noise going as 1/ f 2 , which means that the jitter values in different periods are uncorrelated) and a CDR having a bandwidth BW cdr , the squared variance of the absolute jitter present in the recovered clock can be easily shown to be given by The simple example shown in Figure 7 assumes a jitter characterised by a Gaussian distribution described as where σ rj is the variance of the random jitter affecting the recovered clock as per (8), which in general may include other sources than random jitter of the clock alone.

Including Fully-Adaptive Equalization
As briefly mentioned in the Introduction, the problem concerning the optimal settings of the equalizers' parameters is not trivial, and often one must resort to full adaptation in order to automatically find optimal equalization parameters. The implementation of fully-adaptive techniques based on an SS-LMS algorithm in the simulation approach described in this paper is relatively straightforward and was thoroughly described in [19]: Briefly, it involves computing quantities in the form where µ c is the step size,d is the data sample received at time i, e j is the error at time j = i + k between the analog voltage y j at the samplers and the desired voltage dLev corresponding to a '1' bit and c (k) is the k-th iteration on a generic parameter that can be adapted (the taps amplitude of FFE or DFE, the positions of poles and zeroes in a CTLE modelled e.g. as H ctle (s) = c 0 + c 1 s, the sampling phase as well as dLev itself). Note that sign d i sign e j corresponds to correlating the error made at a certain time with the bit received at possibly another time in the past or in the future, where obviously the latter can be considered only when data and errors are parallelized before computation of the fully-adaptive algorithm [8,17,27,[42][43][44]. Such correlations provide information www.astesj.com 532 on whether to increase or decrease the corresponding parameter c and bring it to convergence, and are usually collected and averaged over time [12,27]. Such a correlation can be easily evaluated from h eq by multiplication with a matrix similar to P of (5) (further mathematical details are given in [19]). By applying equalization to the pulse response of the transceiver plus channel, as mentioned in Section 2.2, iteration of (4) and Equation 10 provides the evolution over time of the adaptation procedure until the equalization parameters converge in the neighbourhood of the optimum [19].

Results
As an example of the power and versatility of the proposed approach, we consider here the fully-adaptive equalization of an HSIO transmitting at 20 Gb/s with rise/fall times equal to 20 % of the UI and ±0.25 V differential voltage swing on a channel attenuating 12 dB at 10 GHz. A fixed 2-tap FFE with a pre-emphasis of approximately 6 dB (w -1 = −0.25, w 0 = 0.75) was applied at transmit side to reduce the first pre-cursor. The package was described by an LC π-network with L = 2 nH and C = 100 fF, while an impedance discontinuity was modelled by adding a notch filter with ξ = 0.1 centred at f 0 = 27 GHz; both features were included through the function H ext ( f ) mentioned in Section 2.2.2. Figure 8 shows the resulting simultaneous adaptation of dLev, CTLE's zeroes, DFE taps and sampling phase as a function of the number of iterations performed by the fully-adaptive algorithm described above. As expected, dLev converges to the average value corresponding to the '1' bit (i.e. the peak value of h(t): h 0 ) and, as it approaches such a value, the other equalization parameters start to adapt and eventually converge: The DFE taps reach the values of the corresponding post-cursors (h 1 , h 2 and h 3 ), while the phase is shifted w.r.t. the position determined by the CDR to a value that zeroes the first pre-cursor. Frequency representations of all relevant signals and transfer functions of the transceiver are plotted in Figure 9a, the unequalised eye diagram is depicted in Figure 9b, the channel pulse response is shown in Figure 9c prior to equalization, after the fixed FFE and after full adaptation, a situation to which corresponds the eye diagram of Figure 9d.
In order to validate the proposed numerical model, a comparison was carried out in [27] between the eye diagram obtained with the model itself and that obtained through post-layout transistor-level simulations. An HSSI for automotive applications implemented in 28 nm planar CMOS technology was simulated at 12 Gb/s at transistor level with full adaptation enabled; the numerical model was then used as a comparison in terms of performance and behaviour of the SS-LMS adaptive algorithm when the HSIO was used to communicate over a realistic, high-loss channel (−33 dB at 6 GHz) representing a transmission line as will likely be defined by the MIPI A-PHY standard. The results of Figure 10 show a good degree of accuracy in reproducing the transistor-level simulations when relevant features of the post-layout transistor-level implementation (chiefly, the transfer functions of the CTLE and of the Variable-Gain Amplifiers in the receiver) were extracted and used in the tool, as explained in Section 2.2.3. As a means of comparison, in order to observe convergence of the fully-adaptive algorithm and be able to compute an eye diagram containing enough UIs, transistor-level simulations run at a speed of about 6 h per µs of simulation for at least 1 µs, whereas the proposed method takes about 5 s to provide the results. Preliminary tests employing behavioural models for the HSSI indicate a simulation speed of about 7.3 s per µs of simulation (not shown). The above is meant to be just a rough comparison, mainly because the various models do not necessarily implement all the components of an HSIO (e.g. the transistor-level simulation does not consider the digital part of the system), but it still provides some figures to consider when dealing with such kind of simulations.

Conclusions
We have presented a fast tool exploiting a simple modelling approach to evaluate the performance of high-speed serial interfaces for chip-to-chip communications. An efficient probabilistic algorithm was developed to evaluate the eye diagram with low computational effort. Sharing the same motivations of other similar models developed in the past in the literature, such an approach represents a powerful alternative to time-domain simulations, since complex systems working with BER as low as 10 −15 require simulating very large amounts of bit periods, which may be very time consuming.  niques, of jitter and of full adaptation of the equalizers can be easily included in the model, so that the proposed simulation approach can be used for the system-level assessment of high-speed interfaces that need to comply with various standards. As examples of the capabilities of the proposed approach, we reported results from two interfaces: One transmitting at 20 Gb/s over a relatively lowloss channel (−12 dB at Nyquist frequency) and another operating at 12 Gb/s over a high-loss MIPI A-PHY line (−33 dB at Nyquist frequency). Both cases show that the combination of various equalization techniques is required to obtain suitable BERs, and that the proposed approach provides results that are comparable with much longer, time-domain post-layout transistor-level simulations, thus demonstrating the power of our model to evaluate the performance of realistic high-speed serial interfaces.

Conflict of Interest
The authors declare no conflict of interest.