Friday, June 7, 2013

System On Chip Architecture


SOC covers many topics

– processor: pipelined, superscalar, VLIW, array, vector
– storage: cache, embedded and external memory
– interconnect: buses, network-on-chip
– impact: time, area, power, reliability, configurability
– customisability: specialized processors, reconfiguration
– productivity/tools: model, explore, re-use, synthesise, verify
– examples: crypto, graphics, media, network, comm, security
– future: autonomous SOC, self-optimising/verifying design
THE NEED OF FORMAL VERIFICATION

In VLSI design flow, the Verification tasks on Chip-RTL and Synthesis on the Chip-RTL
are done Parrallely by different teams. There is a Point in the Design cycle when the RTL gets freezed - which means the Chip-RTL will not be re-synthesized after that.

But the Verification activities are going on as usual which may lead to identification of bugs or connectivity issues between different blocks of the Chip.

Now, these bugs or identified connectivity issues are also to be implemented in RTL-code as well as synthesized RTL (Netlist).

There are tools for doing so. These tools add gates, wires, flip flops etc in the Synthesized Netlist as per the Bug or connectivity issue.

Formal Verification is the Process of Verifying that there is no mismatch between the Synthesized Netlist and Updated/corrected RTL Code and both are equivalent.

As Updated/Corrected RTL is approved by the Design Verification Team , Formal verification is a sure-shot method of verifying that the smae changes are implemented in the Synthesized RTL (Netlist) which is the one to be finally given to Physical Design Team for Final Product/Chip implementation.

Thursday, May 23, 2013

Some Limitations of Static Timing Analysis


Practical Approach to Static Timing Analysis



STATIC TIMING ANALYSIS CAN BE PERFORMED AT DIFFERENT STAGES OF THE VLSI DESIGN FLOW AS EXPLAINED BELOW IN THE DIAGRAM -







Saturday, March 2, 2013

Interesting Digital design Problems


1.  Design a digital circuit (with minimum logic) to detect the No. of 1's in 8-bit Vector Input signal ?
     Then, try to generalize the circuit for any n-bit input vector ?

2.  Design a digital circuit to detect "1010111100000......................upto 100 bits" (Take Any Pattern)
     with minimum possible logic ?

3.  Design a digital circuit with following specification ::
     
    INPUTS :
    Data_in[7:0]  :           8-bit Data stream coming in continously at clk 10 mhz -
    Master_Key[23:0] :     16-bit key (fixed for the entire operation of the circuit)

    OUTPUTS:
    Data_out[7:0] :         8-bit Data sream coming out continously at clk 10 mhz

   For (First Input Byte at Data_in[7:0])
    { If { Data_in[7:0]  > = Master_Key[23:16] } then
         Data_out[7:0] = Data_in[7:0]  xor  Master_Key[23:16]
      else
         Data_out[7:0] = Data_in[7:0]  xor  Master_key[15:8]
      end }

   For {Second Input Byte at Data_in[7:0]}
     { If { Data_in[7:0]  >= Master_Key[23:16] } then
         temp_var[7:0]  =    Data_in[For first byte]   xor  Master_Key[15:8]
         Data_out[7:0]    =    temp_var1[7:0]  xor  Data_out[For First input byte]
      else
         temp_var[7:0]  =    Data_in[For first byte]   xor  Master_Key[15:8]
         Data_out[7:0]   =  temp_var1[7:0]  xor  Master_key[7:0]
      end }
   
   For {Third Input Byte at Data_in[7:0]}
     { If { Data_in[7:0]  >= Master_Key[23:16] } then
         temp_var[7:0]  =    Data_in[For Second byte]   xor  Master_Key[15:8]
         Data_out[7:0]   =  temp_var[7:0]  xor  Data_out[For second input byte]
      else
         temp_var[7:0]  =    Data_in[For Second byte]   xor  Master_Key[15:8]
         Data_out[7:0] =  temp_var[7:0]  xor  Data_out[For First input byte]
      end }

  // For All Inputs from fourth Byte onwards (fourth Byte included)
   For {Fourth Input Byte Onwards}
    {If { Data_in[7:0]  >= Master_Key[23:16] } then
         temp_var[7:0]  =    Data_in[For Previous Byte]   xor  Master_Key[15:8]
         Data_out[7:0] = temp_var[7:0]  xor  Data_out[For Previous Byte]
      else
        temp_var[7:0]  =    Data_in[For Previous Byte]   xor  Master_Key[15:8]
         Data_out[7:0] =  temp_var[7:0]  xor  Data_out[For Pre-Previous Byte]
      end }

Where, Pre-Previous means " Output byte before the previous output byte"

                 

Friday, March 1, 2013

Static Timing Analysis

Introduction

Static timing analysis is an important step in VLSI design flow for analyzing the performance
of a digital design. Static Timing Analysis is a technique for estimating the delay, maximum
operating frequency of a digital circuit and finding any timing violations in the digital circuit 
without simulation by ensuring that every register to register path in the design does not 
violate the setup and hold time of every flip flop. An accurate and efficient static timing 
analysis has many benefits, such as providing quick and efficient information to enhance the
design performance and easing the design debugging procedure. The basic concepts required
for understanding the complete Static Timing Analysis are discussed first. These concepts are
Set Up and Hold Time Violations, False Paths, Multicycle paths and Clock skew.These concepts
are then used to understand the calculations of path delay, maximum operating frequency and
the requirements for the correct working of the digital circuit.

Set Up and Hold Time Violations

It is a fundamental design principle that timing must satisfy every flip-flop's setup and hold time
requirements, otherwise the flip-flop may go into metastable state ,where the output of flip flop
is unpredicatble. Set up time is the time period for which the data at the data input of the flip 
flop must remain stable before the triggering of the flip-flop by clock edge.

Setup violations occurs when the data path is too slow compared to the clock speed. 
The best way to fix the setup violations is by reducing the delay in the data path

Hold time is the time period for which the data at the data input 
of the flip flop must remain stable after the triggering of the flip-flop by clock edge.


Hold violations occurs when data is too fast when compared to the clock speed.
Hold violations can be fixed by adding more delay to the data path.

False paths

A false path is a path, which exists in the chip but it would never be exercised in the operation of the chip. STA tools can report violations on false paths because there is no knowledge of circuit function. so STA tools needs to be informed about false paths in the circuit so that it should not report any violation. As the STA tool determines delay, it considers only the paths that actually affect the output. If the path is never activated or sensitized, it can't contribute to the delay. Any path that doesn't change or doesn't affect the operation of the circuit should be labeled as false path

Multi-Cycle Paths

A Multi-cycle path in a design is a Register-to-Register path, through some combinational logic where if the source register changes, the path will require N number of clock cycles (where N>1) before the computation is propagated to the destination register



Clock Skew

clock skew is a phenomenon in synchronous circuit in which the clock signal arrives 
at different flip flops at different times. This can be caused by many different things, 
such as wire-interconnect length, clock gating, temperature variations and differences in input 
capacitance on the clock inputs of devices using the clock. The clock skew can be 
positive or negative depending on how the clock-tree is made for the circuit.
clock skew plays an important role in determining the maximum operating frequency of the circuit.

Timing Constraints

Timing constraints are how the designer tells the STA tool about the timing behavior
of the ASIC. The three minimum constraints are defining the clock, input delay, and
output delay. There are four types of timing paths available. They are :

  • Input to Register (Sync),
  • Register to Register (Sync),
  • Register to Output (Sync) and
  • Input to Output(Async). Each path has a start and endpoint

When the clocks are defined, all Register to Register paths are assumed to be
constrained in one clock cycle. A path originates from either an Input port or a
Register clock pin, while an end point is either an Output port or a Register data
pin. All start and end point must be timing constrained.


Calculation of path delays






Calculation of Maximum operating frequency

The maximum running frequency of a digital circuit of single clock domain is calculated based on the
maximum register-to-register delay of each clock domain



The delays in Figure are as follows:
  1. tCQ1: The clock to output delay of the first FF.
  2. tRDQ1: The propagation delay from the first flip-flop output to the input of the second FF.
  3. tCK2: The clock skew which is the timing difference between the arrival of clock edges at the clock inputs of two flip flops

The short-path problem will occur when

tCK2 > tCQ1 + tRDQ1 -tHOLD2

Where tHOLD2 is the hold-time requirement of the sink flip-flop.
The tool lists the paths for each clock domain that are selected by the user. The maximum running frequency of each clock domain is calculated based on the maximum register-to-register delay of each clock domain. It picks the longest register-register path of each clock domain, adds the setup time requirement of the destination register, and considers it as the maximum clock frequency.

The user can apply constraints on the clock frequency. Based on the user's clock period requirement, the tool calculates the maximum allowed register-to-register path delay based on the following equation,

max reg-to-reg path delay = clock period requirement – setup time requirement + clock skew

Metastability .......

There are 2 parameters associated with every flip-flop - setup and hold time.
Whenever there are setup and hold time violations in any flip-flop, it enters a state where its output is unpredictable (in between 0 and 1) : this state is known as metastable state (quasi stable state); at the end of metastable state, the flip-flop settles down to either '1' or '0'. This whole process is known as metastability. The below diagram explains it better ::






But the question is why does flip-flop goes into metastable state whenever set-up or hold time is violated ?

To answer this, we should understand the internal working of flipflop at the transistor level.

Consider the flip-flop in Figure below. Assume that the clock is low, node A is at 1, and input D changes from 0 to 1. As a result, node A is falling and node B is rising. When the clock rises, it disconnects the input from node A and closes the A B loop. If A and B happen to be around their metastable levels, it would take them a long time to diverge toward legal digital values.In fact, one popular definition says that if the output of a flip-flop changes later than the nominal clock-to-Q propagation delay, then the flip-flop must have been metastable




some solutions to the problem of metastability ::

1. Using faster flipflops decreases the setup and hold times of the flipflop, which in turn
decreases the time window that the flipflop is vulnerable to metastability

2. Using 2 flop/ 3 flop Synchronizers

Thursday, February 28, 2013

VHDL: Rules for Variables and Variable Use

Variables are objects used to store intermediate values between sequential VHDL statements.
Variables are allowed only in processes, functions and procedures and are always local to them.
Variables are much like variables in conventional software programming language.
They immediately take on and store the value assigned to them.

Variables are commonly not understood and are therefore not used much.Variables can be
very powerful when used correctly. Here we explain on how to properly use variables. Variables are used to carry combinatorial signals (when used properly, otherwise they can infer sequential logic also) within a process.Variables are updated differently than signals in simulation and synthesis.

In simulation, variables are updated immediately, as soon as an assignment is made.This differs from how signals are updated in simulation. Signals are not updated until all processes that are scheduled to run in the current delta cycle have executed. A variable can be used to carry a combinatorial signal within both a clocked process and a combinatorial process.
This is how synthesis tools treat variables – as intended combinatorial signals but the way coding is done can change the synthesis results. Especially, the order in which signal and variable assignments are made results in the difference.

Figure below shows how to use a variable correctly.  In this case, the variable v maintains its combinatorial intent of a simple two-input and-gate that drives an input to an or-gate for both the a and b registers.

Figure:: Correct Use of Variables



In Figure below, you read from the variable incorrect_v before you assign to it. Thus, incorrect_v uses its previous value, therefore inferring a register because that's the only way to have previous value available inside Sequential process. Had this been a combinatorial process, a latch would have been inferred.



Figure:: Incorrect Use of Variables


Conclusion/Rule for variable usage ::
Always make an assignment to a variable before it is read. Otherwise, variables will infer either latches (in combinatorial processes) or registers (in clocked processes) to maintain their previous value.  The primary intent of a variable is for a combinatorial signal. 

Friday, February 22, 2013

FIFO DESIGN ........

FIFO Design is one of the very tricky & critical design problem ...
First question is ......
Q. Why do we require FIFOs in our design ?
A. Answer to this quest. is whereever we need to transfer multi-bit data from one clock-domain to another,
     that too for mostly asynchronous clock domains i.e. whereever asynchronous clock-domain-crossing is
     involved otherwise there can be issues like data Incoherency or data loss.

     Thus, FIFO is basically used to take-in n-bit data say Data In from some block at say ClockA rate
     and then give-out the n-bit data say Data Out to some other block at Clock B rate as shown below.


     Now What are these FIFO Full and FIFO Empty doing in the figure above ?
     ............................................................
     ..............think ...........think ..................

     As we are writing data in this FIFO at some rate and reading data at some other rate, there is always
     a possibility of FIFO getting totally filled with input data. So, in such a case, there should be some way to
     inform the input Block Not to send any more data. FIFO Full signal is just meant for that purpose.
     Similarly FIFO Empty is there to let the receiving block know that there is no more data to be read now.

     Now, How to generate these FIFO Full and FIFO Empty signals correctly ?
     ## This is the most tricky part of the FIFO design. :)
     ## Try to work out on this .... it will open up many things for you ....

     For those who are crazy to quickly know ............
     Consider the below diagram for understanding FIFOdesign precisely ::

   

   
   

Tuesday, February 19, 2013

State Machine Design

Lets consider some sample problems which are best solved by State Machine designs...

1. If we have to design a digital logic to detect the sequence "10" in the input bit-stream ...
2. If Master M wants to communicate with Slave S, then Slave S can recognize the
    Master A out of many  masters by detecting some pattern.
    Let that pattern be Hex "F687" ...
3. If we have to detect the Even/odd No. of 1's in the input bit stream ...
4. If we have say, 5-bit input data and we have to detect the event -> " When the Sum
    of the 5-bit input data  is divisible by 5/7/9 etc "
5. State machines can also be used to generate source data at a rate, such that it is stable for at least
    1 complete cycle of the destination clock

There can be many such Problems in Real Chips which are resolved using State Machine design concepts only or In-addition to other concepts ....

Now, The question is .......

How State Machine design is done ?

1. Step 1 is to identify the No. of states that will be required to implement a state machine.
    This will depend on the kind of state machine you are going to build. For e.g. for
    detecting a sequence of  bits like "10" in the input bit-stream,
    How many states you will require ........?
    ....................................
    ..................................
    Think about it  ...........
     ..................................
     Think, if you can implement with lesser No. of states
     ...................................
     ..................................
2.  Now start from first state. This be the state when system starts after reset.
     Let first state name be = initial
     Conside both 1 and 0 inputs at this state and decide what should happen
     on receiving both the inputs respectively.
     In General, If we are moving closer to fully detect the pattern (means if we have 
     partially detected some part [few bit(s))of the full pattern] then we have to 
     move to the next state, otherwise we have to either stay at the same stay or we 
     have to move back to one state up OR 2 states up OR 3 states up depending on 
     the complete  pattern to be detected. 
    Few examples and some practice will clear this concept concisely.

     Another important aspect of the state machine design is whether we want the 
     output (to indicate PATTERN DETECTED ) to be High during transition b/w
     particular states or just For a particular State.
     This leads us to 2 types of state machines - Mealy OR Moore respectively.
      .........................
      Can you try to find which type of state machine will have more states ?
      ...........................
      ...........................
     
       For the pattern "10" detection, Below is our first state machine :













      Similarly, You can try creating Both Mealy and Moore machines for the examples
      given at the beginning of this topic.
      ## Remember Practice makes the man perfect

       Lets have some more discussion on these two types of state machines ::

       Mealy Machine ::

       - In a Mealy machine, the outputs are a function of the present state and the
          value of the inputs as shown in figure above.
ƒ       - Accordingly, the outputs may change  asynchronously in response to any
         change in the inputs .
          Why Asynchronously ? Think ...........
          Can you think of the kind of the Circuit the mealy machine will have ?
           .....................................
        Moore Machine ::

        - In a Moore machine the outputs depend only on the present state as shown in
           figure below.
        - More states than mealy but output is Synchronous with state
Quests to think about ........

          1. How can we map the states with the actual digital logic design elements ?
          2. Does more states means more logic is reqd. ?