Design and Implementation of Reconfigurable Neuro-Inspired Computing Model on a FPGA

A R T I C L E I N F O A B S T R A C T Article history: Received: 23 July, 2020 Accepted: 12 September, 2020 Online: 17 September, 2020 In this paper we design a large scale reconfigurable digital bio-inspired computing model. We consider the reconfigurable and event driven parameters in the developed fieldprogrammable neuromorphic computing system. The various Intellectual Property (IP) cores are developed for the modules such as Block RAM, Differential Clock, Floating Point, and First In First Out (FIFO) for the design of the neuron model in Xilinx ISE, with exploration of register transfer logic (RTL) and hardware synthesis using Verilog code. The architecture for design at device level offers the best possible design tradeoff for specific processor architectures and development choices. In this paper we perform algorithmic design of a large scale reconfigurable logical bio-inspired computing model. The proposed algorithm is implemented on Field Programmable Gate Array (FPGA) to develop a neuron model to be utilized in neuromorphic computing system.


Introduction
This research manuscript is an extension of work originally presented in International Conference on Artificial Intelligence and Signal Processing [1].The bio-inspired computing is achieved with core building blocks of neuromorphic engineering which mainly constitute circuits and systems and is proposed as structures of spin devices [2].The above direction opens a new path and induces a key approach for developing bio-inspired algorithms for implementations of bio-inspired algorithm based computing systems [3].In nature the crucial role of learning and memory is achieved with help of synapses.The synapses which are plastic in nature, is formed with inter cellular connections of neurons, and the combination of these biological structures form the basic building blocks of neural networks [4].Synapses can change their state based on the neural activity of coupled neurons.The functionality of neurons and synapses is mimicked in hardware by utilizing very large scale integration technology, plays a key role in design of neuromorphic computing systems [5].The pathway to efficient neuromorphic systems is encoding the neural and synaptic functionalities in an electronic spin.It shows the potentials to exploit energy efficiency, performance, reliability, and magnetization using electric fields, and enhanced memory density of spintronic memory devices [6].The bio-inspired computing systems is presented, with an aim to establish interaction framework, between two directions of natural system computation and artificial system computation [7].In the in-memory computing for emerging memory devices, there is no separation between memory and logic, to overcome Von-Neumann bottleneck and also in-memory computing devices are designed with zero-off state power and due to this, they have a distinct advantage of the non-volatile state [8].The in-memory computing is combined with a high gate or synapse density which enables forming of cross-bar array in the device, which can be easily integrated with CMOS with high density, operating with high current and voltage consuming high dynamic power, with Inmemory computing device contains long switching time hence they operate with limited speed and have limited endurance, again the cross-bar in-memory computing is highly parallel, operate with low-power, low cost [9].The verticals of Cross-bar inmemory computing are bio-inspired computing, deep learning, inmemory logic, chip/data security, architecture, device modeling [10].The static random access memory latches and capacitors are utilized in a very large scale integrated devices, an architecture to implement strewed memory elements is embodied in this inmemory computing device as depicted in Figure 1 is aimed towards supporting the use of memristive devices as digital and synapse-like memory elements.The key contributions of the research manuscript are: • Algorithmic design of a large scale reconfigurable logical bio-inspired computing model.

Background
The emerging research architectures which support memorybased computing, exponential performance scaling, which enables mixed mode technology solutions.The problem of establishing the bridge between natural and artificial computation is one of motivation which illuminates the bio-inspired computation in artificial systems [11].The paradox of programming a bio-inspired computer is the need to figure out the new class of algorithms, we are also missing out some very important basic concepts.The intelligent computational system are initially built with a Boolean logic or functions, next through logical phase, and into semiconductor technology phase, further into computational complexity phase and ends at an experimental computation phase [12].The lesson learnt here is while dealing with intelligence the first stage is to probe into evolution, complexity, and also thermodynamics which is not an equivalent of a Boolean logic or functions, during the second stage new and novel electronics technology is required which is not an equivalent of electronics technology, which was defined during the computation phase, in the further stage the implementation complexity is not an equivalent to computational complexity, and the final stage is the practical intelligence stage which is also not an equivalent practical computation stage, defined during process of computation [13].The factor synaptic plasticity, which accounts for the determination of the magnitude of the synaptic weights.The plasticity is also called as the learning of the synaptic junctions which probes the cognitive abilities to the bio-inspired architectures [14].For analysis, if an experiment is performed considering the circuit with four access transistors to decouple read and write current paths, with the peripheral circuits for timing window, and aimed towards the spike-timing-dependent plasticity (STDP) implementation.Apart from these four transistors, one more transistor named as MSTDP is also connected to pre-charge line in the circuit, which is responsible to implement the STDP and this transistor is biased in the sub threshold saturation regime.The gate voltage of MSTDP transistor is called as PRE voltage, which starts increasing linearly as the pre neuron spikes [15].As the post neuron is triggered, the POST signal is activated with the current flows through the device.The current is also known as the programming current , which is the 1 ns duration write current and is exponentially related with the magnitude to the delay factor of pre-neuron and post-neuron spikes.As the STDP measurements are taken between the % changes in the synaptic weights with respect to spike timing difference in (ms) , the synaptic weight is updated which depends on difference in the timing of post and pre neuron spikes [16].

Algorithm Design
In this section the design of algorithm is enumerated for a FPGA based large scale logical reconfigurable neuron model.To realize the spiking neural functionalities by utilizing the leakyintegrate magnetization dynamics [17].To enable the abstraction of the magnetic functions as stochastic spiking neurons, the parameter required is the thermal noise which is prevalent in nano-magnets at certain temperatures which are not equal to zero [18].The reconfigurable neuron model consists of two units such as finite state machine unit (FSM) unit and communication unit.The Algorithm 1 illustrates the abstract view of the top level model hierarchy.The IP core module as described in Figure 2(a) emulates the layer of the network and loads data (weight or input) and then process the data to obtain the synapse weight output as depicted in Figure 2(b).Further the flag signal is controlled in order to make the FSM to halt while the other sub programs are running [19].The layer of the network is mimicked with reading the weight information from weight RAM, along with load input data from input RAM and processing the data as a simple neuron model [20].The 8 bit parameter data width is applied as input to FPGA from data floating unit of USB module and from FPGA the parameter data is passed to FIFO block.The input data is further connected to din of input RAM, along with connecting row index to address of input RAM, and connection of read data to input of outgoing FIFO [21].The outputs of last add operation is directly wired to the resultant RAM.The data packets are segmented at this stage and the information of data packets are available at the next stage.In the next stage the packet data is segmented into respective component registers.Write the input and weight data to padded variables when flag is high.The Figure 3 illustrates the above design process as sub-module read and write memory with FSM unit (with block RAM, Transmit FIFO, Receive FIFO) and universal asynchronous receiver and transmitter (UART) unit.Further also describes the communication link between FSM unit and the UART unit.Here the data sent from universal serial bus (USB) transforms to parallel data and is then sent to the FIFO for processing [22].The Figure 3 can be considered as register transfer logic (RTL) schematic of the top layer module.

Implementation
In this section the implementation details of large scale reconfigurable digital bio-inspired computing model is described.The algorithm described is implemented on the hardware FPGA environment satisfying the requirement of hardware combined with the software co-design concept.The hardware used is FPGA ALTERA DE2 with a cyclone chip.
The Figure 4 represents the topology of the FSM of a RAM read/write process with FIFO pop data in sequence.The complete architecture of the bio-inspired computing system consists of system controller based on an advanced reduced instruction set computing (RISC) machine (ARM) processor, core-array of two dimensions, and a UART Controller.The interpretation of data is dependent of the order of popping the data.Sub-module MATMUL contains two dot product operations in parallel, which consists of floating point IPs as shown in Figure 5 in the form of   The implementation of a single core consists of 16 inputs and 4 outputs, implying that the weight RAM is 64 rows in depth as described in Figure 7.The implementation of bio-inspired computing system is done on Altera Cyclone IV FPGA contained as a part of ALTERA DE2 Board.The VERILOG language was used to program the bio-inspired framework and compiled in Xilinx ISE platform with x86 64 bit CPU executing on Linux Ubuntu 16.04 operating system.

Results Obtained
In this section the results obtained with various Intellectual Property (IP) cores which are developed for the modules such as Block RAM, Differential Clock, Floating Point, and First In First Out (FIFO) for the design of the neuron model in Xilinx ISE, with exploration of register transfer logic (RTL) and hardware synthesis using Verilog code are presented.
The Figure 8(a) depicts the complete setup of bio-inspired computing system implementation and the Figure 8(b) represents JTAG adapter connection with FPGA ALTERA DE2 Board.The Figure 9 illustrates the behavioral simulation of developed reconfigurable bio-inspired computing is obtained in Xilinx ISE environment.We can look at output ram data out signal to double check if the data is correct in simulation.The execution is based on vector less activity propagation with peak memory and execution is carried out in Vivado-v-2014.2 FPGA has considerable static power consumption, but normally efficient power is measured as difference in idle state and real-time data processing for the machine.The equation for obtaining the parameters is described in Equation ( 1). % Synaptic Pruning = (Number of Neurons pruned) / (Size of Network x Accuracy x Energy) (1) Figure 9 Behavioral simulation of the developed reconfigurable bio-inspired computing architecture On the other hand, a poor input output reflects the system glitch at which glue logic in FPGA DE2 board is affected in worst case, as the system changes the incoming events dynamically.The Table 1 depicts the architecture considerations for design at device level and offers the best possible design tradeoff for specific processor architectures and development choices.The parameters are described in Table 1 such as Computational Efficiency, Energy Consumption, Throughput, Accuracy, Entropy are compared with previous work with existing neuron model with the proposed neuron model.It is noteworthy that in the proposed design except LUT all other parameters have the same value for logic utilization post synthesis and post implementation, the reason behind this is the designed system is more of device specific.During Idle mode, the device does not process events and therefore there is no computation.The ADC14DS065/080/095/105 converts the analog data into 14 bit words, but it outputs the data on 1 or 2 serial data lines per channel.The digital output operates at LVCMOS voltage levels except for the serial signals and clock outputs LVDS signals.
These devices operate up to 65 million samples per second (MSPS) in a single lane mode while the higher data rates operate in a dual lane mode, each lane operates at half the data rate to keep the required clock frequencies from being excessive.Using this technique, the FPGA interface can support the highest data rate of 105 (MSPS) with a high throughput as shown in Table 1.The FPGA then will combine the two data streams appropriately to create the correct signals.
The serial data bus uses less board space for the signals, is easier to route and achieves similar data rates to a parallel interface with less wires for data bus.The parallel data bus from the ADC14155 can be connected to the FPGA using an I/O bank configured for 1.8 LVCMOS inputs.The data rate of this bus is 5-155 MHz, which is well within the I/O capabilities of the FPGA.The designed FPGA module further consists of the blocks required for interfacing an ADC with the FPGA with 3.2x enhanced energy efficiency and 2x enhanced computational efficiency as described in Table 1.For the increased flexibility, a combination of serial interface registers and parallel pin controls (CTRL1 to CTRL3) were used to configure the device.To enable this option, RESET pin was configured low.The parallel interface control pins CTRL1 to CTRL3 were available.After power up, the device is automatically configured according to the voltage settings on these pins.In the bus of ADC test for the CYCLONE-V FPGA device, the selected component bus is the 12 channel ADC_RAM component.The data captured through the single clock, the signals included are clock signals such as clk_p, clk_m, the reset signal and the 14 bit input and output data vector signals along with a data type register which is 14 vector.The timings of the spikes are analyzed, in a cross correlation of the spikes timings over the FPGA implementation.It is observed that 93% and 92% of the spikes are correlated in the 32 and 16-bits implementation respectively with the implementation with a zero lag delay.On the other hand, 8-bit implementation is slightly different, with 87% of the spike shifted between 0 and 3ms and centered in a 1.5ms shift average with entropy of ~12 % for 100 Neurons with ~2% of performance improvement as described in the Table 1.The analog to digital converter configuration through the serial mode with the signals as seen in spike cross correlation simulation result depicted in the Figure 11.The signals are clock, reset, serial clock, adc_ reset, serial data enable, serial data, and state of the system, the address data consists of a 16 bit vector and these parameters are calculated in terms of rate of change of Block-RAM generation in Cyclone-V FPGA as depicted in Figure 12 and with differential clock simulation time in FPGA based bio-inspired computing module is illustrated in Figure 13.

Figure 1 :
Figure 1: Block Diagram of FPGA based Bio-Inspired Computing System

Figure 2 (
Figure 2(a) IP-core module (b) Details of synapse weight information

Figure 4 FSMFigure 5 Figure 6
Figure 4 FSM Topology of Read/Write Process

Figure 7
Figure 7 Implementation of a single core consists of 16 inputs and 4 outputs, implying that the weight RAM is 64 rows in depth.

Figure 8 (
Figure 8 (a) The bio-inspired computing system setup (b) FPGA ALTERA DE2 Board setup with Xilinx JTAG adapter

Figure 13 Differential
Figure 13 Differential Clock simulation time (in ms) for 1 ms real step in Cyclone-V