Fully customized high-speed low-power 64-bit accumulator in TSMC 65nm
• Implemented a modified Sklansky tree adder based topology in Cadence design tools
• Completed schematic level optimization
• Fine-tuned layout placement and routing for delay and power optimization
• Achieved logic delay of 150ps in post-layout simulation, best performance in class
ACCUMULATOR is a heavily-studied filed because it is often in the critical paths of a computing system, mainly because it contains a binary adder.
The proposed accumulator includes a converter, an adder, two registers, and a synchronous up counter. The system takes 16-bit sign magnitude as input, and sum them together as 2’s complement.
The first 64 outputs are 0, and next 64 bits are the first sum of 64 inputs, and then following with the sum of second 64 bits, and final 64-bit 0 are output.
High-performance adders of 16 bits and larger typically use a prefix tree to compute group generate and propagate signals before calculating the sums from the generate prefixes. To balance the best performance and easier to draw a layout, a modified version of 22-bit Sklansky tree adder is implemented, that has a FANOUT of [8:1:1:1] for lower 16-bit, which informs a better performance with fewer delays. A 6-bit synchronous up counter is implemented within this project, to supply counting for 64 cycles. It consists of a 6-bit shift register with the output fed back to the input. Once reset, the first bit is initialized to 1 and the others are initialized to 0. The output signal toggles once every 64 cycles, and serve as CLK signal for registers.
A clock will provide each register for conditional counting. To convert from 16-bit sign magnitude input to 22-bit 2’s complement output, a converter is implemented with this accumulator. If the sign bit (left-most bit) is 0, the numbers will be kept and extended to 22 bits. If the sign bit is 1, which indicates a negative number, all bits will flip and add 1, and extended to 22 bits.
Two 22-bit registers are involved within this design, one of them is used for holding the output, and the other is used for accumulation. 22 flip-flops are evolved with this module; data is collected with the rising edge of the clock, and capture output at the 65th clock cycle. The propose accumulator was synthesized with 65nm standard CMOS library. This paper investigates the implementation of the accumulator with a delay for critical paths and power consumption. Section II presents the implementation of the whole system. Section III Analysis the result, and Section IV summarizes and provides future work.