Computer Architecture

What is Computer Architecture?

Computer architecture is a specification detailing how a set of software and hardware technology standards interact to form a computer system or platform. In short, computer architecture refers to how a computer system is designed and what technologies it is compatible with.

Computer system has five basic units that help the computer to perform operations, which are given below:

Input Unit

Output Unit

Storage Unit

Arithmetic Logic Unit

Control Unit

What is the technical difference between 32 bit and 64 bit operating system? [BUET madrasha board-2018]

The terms 32-bit and 64-bit refer to the way a computer's processor (also called a CPU), handles information. The 64-bit version of Windows handles large amounts of random access memory (RAM) more effectively than a 32-bit system.

Difference between 32-bit and 64-bit operating systems

In computing, there exist two type processor i.e., 32-bit and 64-bit. These processor tells us how much memory a processor can have access from a CPU register. For instance,

A 32-bit system can access 2³² memory addresses, i.e 4 GB of RAM or physical memory.
A 64-bit system can access 2⁶⁴ memory addresses, i.e actually 18-Billion GB of RAM. In short, any amount of memory greater than 4 GB can be easily handled by it.

Advantages of 64-bit over 32-bit

§ Using 64-bit one can do a lot in multi-tasking, user can easily switch between various applications without any windows hanging problems.

§ Gamers can easily plays High graphical games like Modern Warfare, GTA V, or use high-end softwares like Photoshop or CAD which takes a lot of memory, since it makes multi-tasking with big softwares easy and efficient for users. However upgrading the video card instead of getting a 64-bit processor would be more beneficial.

How many processor have in core i3, core i5 and i7 processor?

Different processor families have different levels of efficiency, so how much they get done with each clock cycle is more important than the GHz number itself.

Intel’s current core processors are divided into three ranges(Core i3, Core i5 and Core i7), with several models in each range.The differences between these ranges aren’t same on laptop chips as on desktops. Desktop chips follow a more logical pattern as compared to laptop chips, but many of the technologies and terms, we are about to discuss, such as cache memory, the number of cores, Turbo boost and Hyper-Threading concepts is same. Laptop processors have to balance power efficiency with performance – a constraint that doesn’t really apply to desktop chips. Similar is the case with the Mobile processors.
Let’s start differentiating the processors on the basis of the concepts discussed below!

Model	Core i3	Core i5	Core i7
Number of cores	2	4	4
Hyper-threading	Yes	No	Yes
Turbo boost	No	Yes	Yes
K model	No	Yes	Yes

What determine whether a microprocessor is 16 bit or 32 bit? Justify your answer.[AME BB-2017]

The bit size (8-bit, 16-bit, 32-bit) of a microprocessor is determined by the hardware, specifically the width of the data bus. The Intel 8086 is a 16-bit processor because it can move 16 bits at a time over the data bus. The Intel 8088 is an 8-bit processor even though it has an identical instruction set. This is similar to the Motorola 68000 and 68008 processors. The bit size is not determined by the programmer's view (the register width and the address range).

DMA operation

block transfer DMA:-this is the most common type of DMA used with microprocessors. As mentioned before in this type of DMA the peripheral device request the DMA transfer via DMA request line , which is connected directly or through a DMA controller chip to the microprocessor . the microprocessor completes the current instruction and sends a DMACK to the peripheral device in order to indicate that the bus can be used for DMA operation. The DMA controller chip then completes the DMA transfer and transfers the control of the bus to the
microprocessor.

Computer Organization and Architecture | Pipelining (Execution, Stages and Throughput)

To improve the performance of a CPU we have two options:
1) Improve the hardware by introducing faster circuits.
2) Arrange the hardware such that more than one operation can be performed at the same time.

Since, there is a limit on the speed of hardware and the cost of faster circuits is quite high, we have to adopt the 2^nd option.

Pipelining : Pipelining is a process of arrangement of hardware elements of the CPU such that its overall performance is increased. Simultaneous execution of more than one instruction takes place in a pipelined processor.

I F S | | | | | |

| | | I F S | | |

With pipelining = 5/3 minutes = 1.67m

I F S | |

| I F S |

| | I F S (5 minutes)

Thus, pipelined operation increases the efficiency of a system.

Design of a basic pipeline

§ In a pipelined processor, a pipeline has two ends, the input end and the output end. Between these ends, there are multiple stages/segments such that output of one stage is connected to input of next stage and each stage performs a specific operation.

§ Interface registers are used to hold the intermediate output between two stages. These interface registers are also called latch or buffer.

§ All the stages in the pipeline along with the interface registers are controlled by a common clock.

Execution in a pipelined processor
Execution sequence of instructions in a pipelined processor can be visualized using a space-time diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be executed. We can visualize the execution sequence through the following space-time diagrams:

Non overlapped execution:

STAGE / CYCLE	1	2	3	4	5	6	7	8
S1	I₁				I₂
S2		I₁				I₂
S3			I₁				I₂
S4				I₁				I₂

Total time = 8 Cycle

Overlapped execution:

STAGE / CYCLE	1	2	3	4	5
S1	I₁	I₂
S2		I₁	I₂
S3			I₁	I₂
S4				I₁	I₂

Total time = 5 Cycle

Pipeline Stages

RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set. Following are the 5 stages of RISC pipeline with their respective operations:

§ Stage 1 (Instruction Fetch)
In this stage the CPU reads instructions from the address in the memory whose value is present in the program counter.

§ Stage 2 (Instruction Decode)
In this stage, instruction is decoded and the register file is accessed to get the values from the registers used in the instruction.

§ Stage 3 (Instruction Execute)
In this stage, ALU operations are performed.

§ Stage 4 (Memory Access)
In this stage, memory operands are read and written from/to the memory that is present in the instruction.

§ Stage 5 (Write Back)
In this stage, computed/fetched value is written back to the register present in the instruction.

Performance of a pipelined processor
Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:

ET_pipeline = k + n – 1 cycles

= (k + n – 1) Tp

In the same case, for a non-pipelined processor, execution time of ‘n’ instructions will be:

ET_non-pipeline = n * k * Tp

So, speedup (S) of the pipelined processor over non-pipelined processor, when ‘n’ tasks are executed on the same processor is:

S = Performance of pipelined processor /

Performance of Non-pipelined processor

As the performance of a processor is inversely proportional to the execution time, we have,

S = ET_non-pipeline / ET_pipeline

=> S = [n * k * Tp] / [(k + n – 1) * Tp]

S = [n * k] / [k + n – 1]

When the number of tasks ‘n’ are significantly larger than k, that is, n >> k

S = n * k / n

S = k

where ‘k’ are the number of stages in the pipeline.

Also, Efficiency = Given speed up / Max speed up = S / S_max
We know that, Smax = k

So, Efficiency = S / k

Throughput = Number of instructions / Total time to complete the instructions

So, Throughput = n / (k + n – 1) * Tp

Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1

Hazards : Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycles.

Any condition that causes a stall in the pipeline operations can be called a hazard.

There are primarily three types of hazards:

i. Data Hazards

ii. Control Hazards or instruction Hazards

iii. Structural Hazards.

i. Data Hazards: Data Hazards occur when there is data dependency between instructions.

Whenever there are two instructions one of which depends on the data obtained from the other.

A=3+A

B=A*4

For the above sequence, the second instruction needs the value of ‘A’ computed in the first instruction.

Thus the second instruction is said to depend on the first.

If the execution is done in a pipelined processor, it is highly likely that the interleaving of these two instructions can lead to incorrect results due to data dependency between the instructions. Thus the pipeline needs to be stalled as and when necessary to avoid errors.

Another example

For example
I1: Add R1,R2,R3 ; R1=R2+R3
I2: LD R5,10(R1) ; Load R5 with memory from 10+R1
Now we need to wait for I1 to execute to know what R1 will be before I2 can move forward. This is called Read After Write Hazard (RAW) and also called true dependency. Other type of hazards in data hazards are Write after Write (WAW) and Write after read (WAR)

ii. Structural Hazards:

If you see I1 and I4 both need access to memory at the same time leading to structural hazards. So structural hazards happen when there is a resource conflict.

iii. Control hazards:

The instruction fetch unit of the CPU is responsible for providing a stream of instructions to the execution unit. The instructions fetched by the fetch unit are in consecutive memory locations and they are executed.

Control Hazards occur when a conditional instruction takes the branch and move the instruction pointer to a different location. This causes previously fetched instructions to be invalid and the pipeline needs to be flushed for proper operation.

why pentium is superscalar processor? [BUET M.SC Admission -2015]

A superscalar CPU can execute more than one instruction per clock cycle. ... However, most CISC-based processors (such as the Intel Pentium) now include some RISC architecture as well, which enables them to execute instructions in parallel. Nearly all processors developed after 1998 are superscalar.

Why is Pentium a superscalar processor but 80386 is not?

Because the Pentium could issue more than one instruction per cycle and the 80386 could not.

superscalar

A superscalar CPU can execute more than one instruction per clock cycle. Because processing speeds are measured in clock cycles per second (megahertz), a superscalar processor will be faster than a scalar processor rated at the same megahertz.

A superscalar architecture includes parallel execution units, which can execute instructions simultaneously. This parallel architecture was first implemented in RISC processors, which use short and simple instructions to perform calculations. Because of their superscalar capabilities, RISC processors have typically performed better than CISC processors running at the same megahertz. However, most CISC-based processors (such as the Intel Pentium) now include some RISC architecture as well, which enables them to execute instructions in parallel. Nearly all processors developed after 1998 are superscalar.

Advantages of microcontoller over microprocessor?[BUET M.SC Admission -2015]

Microcontrollers and microprocessors may seem like very different devices; however, it is important to note that all microcontrollers contain microprocessors. The key difference between a microcontroller and a multifunctional PC microprocessor is the overall level of complexity. Microcontroller processors are designed to fill a smaller, more focused variety of roles while making use of less expensive and less complex circuitry. The main advantage of a microcontroller is that it allows electronic automation in situations where a full-sized computer is not needed.

What is the difference between computer architecture and computer Organization[EGCB-2018]

I am going to summarize the differences between computer architecture and computer organization in an easy to memorize tabular form as shown below:

Computer Organization	Computer Architecture
Often called microarchitecture (low level)	Computer architecture (a bit higher level)
Transparent from programmer (ex. a programmer does not worry much how addition is implemented in hardware)	Programmer view (i.e. Programmer has to be aware of which instruction set used)
Physical components (Circuit design, Adders, Signals, Peripherals)	Logic (Instruction set, Addressing modes, Data types, Cache optimization)
How to do ? (implementation of the architecture)	What to do ? (Instruction set)

Architecture

1. interface between hardware and software

2. abstract model and is programmer's view in terms of instructions,addressing modes and registers

3. describes what computer does

4. while designing computer system architecture is considered first

5. it deals with high level design issues

eg : is there a multiplication instruction??

Organisation

1. deals with components of connection in a system

2. expresses the realization of architecture

3. describes how computer does a task

4. organization is done on the basis of architecture

5. deals with low level design issues

Computer Architecture and Computer Organization Examples

§ Intel and AMD make X86 CPUs where X86 refers to the computer architecture used. X86 is an example on a CISC architecture (CISC stands for Complex Instruction Set Computer). CISC instructions are complex and may take multiple CPU cycles to execute. As you can see, one architecture (X86) but two different computer organizations (Intel and AMD flavors).

§ nVidia and Qualcomm on the other hand make GPUs (graphics processing unit as opposed to a CPU central processing unit). These GPUs are based on the ARM (Advanced RISC Machines) architecture. ARM is an example on a RISC architecture (RISC stands for Reduced Instruction Set Computer). Instructions in an ARM architecture are relatively simple and typically execute in one clock cycle. Similarly, ARM here is the computer architecture while both nVidia and Qualcomm develop their own flavor of computer organization (i.e architecture implementation).

What’s difference between CPU Cache and TLB?

Both CPU Cache and TLB are hardware used in microprocessors but what’s the difference, especially when someone says that TLB is also a type of Cache?

First thing first. CPU Cache is a fast memory which is used to improve latency of fetching information from Main memory (RAM) to CPU registers. So CPU Cache sits between Main memory and CPU. And this cache stores information temporarily so that the next access to the same information is faster. A CPU cache which used to store executable instructions, it’s called Instruction Cache (I-Cache). A CPU cache which is used to store data, it’s called Data Cache (D-Cache). So I-Cache and D-Cache speeds up fetching time for instructions and data respectively. A modern processor contains both I-Cache and D-Cache. For completeness, let us discuss about D-cache hierarchy as well. D-Cache is typically organized in a hierarchy i.e. Level 1 data cache, Level 2 data cache etc.. It should be noted that L1 D-Cache is faster/smaller/costlier as compared to L2 D-Cache. But the basic idea of ‘CPU cache‘ is to speed up instruction/data fetch time from Main memory to CPU.

Translation Lookaside Buffer (i.e. TLB) is required only if Virtual Memory is used by a processor. In short, TLB speeds up translation of virtual address to physical address by storing page-table in a faster memory. In fact, TLB also sits between CPU and Main memory. Precisely speaking, TLB is used by MMU when physical address needs to be translated to virtual address. By keeping this mapping of virtual-physical addresses in a fast memory, access to page-table improves. It should be noted that page-table (which itself is stored in RAM) keeps track of where virtual pages are stored in the physical memory. In that sense, TLB also can be considered as a cache of the page-table.

But the scope of operation for TLB and CPU Cache is different. TLB is about ‘speeding up address translation for Virtual memory’ so that page-table needn’t to be accessed for every address. CPU Cache is about ‘speeding up main memory access latency’ so that RAM isn’t accessed always by CPU. TLB operation comes at the time of address translation by MMU while CPU cache operation comes at the time of memory access by CPU. In fact, any modern processor deploys all I-Cache, L1 & L2 D-Cache and TLB.

Difference between risc and cisc processor basis on address and data bus.[EGCB-2018]

Different Types of RAM (Random Access Memory )

RAM(Random Access Memory) is a part of computer’s Main Memory which is directly accessible by CPU. RAM is used to Read and Write data into it which is accessed by CPU randomly. RAM is volatile in nature, it means if the power goes off, the stored information is lost. RAM is used to store the data that is currently processed by the CPU. Most of the programs and data that are modifiable are stored in RAM.

Integrated RAM chips are available in two form:

SRAM(Static RAM)

DRAM(Dynamic RAM)

The block diagram of RAM chip is given below.

The SRAM memories consist of circuits capable of retaining the stored information as long as the power is applied. That means this type of memory requires constant power. SRAM memories are used to build Cache Memory.

DRAM

DRAM stores the binary information in the form of electric charges that applied to capacitors. The stored information on the capacitors tend to lose over a period of time and thus the capacitors must be periodically recharged to retain their usage. The main memory is generally made up of DRAM chips.

Types of DRAM

There are mainly 5 types of DRAM:

Asynchronous DRAM (ADRAM): The DRAM described above is the asynchronous type DRAM. The timing of the memory device is controlled asynchronously. A specialized memory controller circuit generates the necessary control signals to control the timing. The CPU must take into account the delay in the response of the memory.

Synchronous DRAM (SDRAM): These RAM chips’ access speed is directly synchronized with the CPU’s clock. For this, the memory chips remain ready for operation when the CPU expects them to be ready. These memories operate at the CPU-memory bus without imposing wait states. SDRAM is commercially available as modules incorporating multiple SDRAM chips and forming the required capacity for the modules.

Double-Data-Rate SDRAM (DDR SDRAM): This faster version of SDRAM performs its operations on both edges of the clock signal; whereas a standard SDRAM performs its operations on the rising edge of the clock signal. Since they transfer data on both edges of the clock, the data transfer rate is doubled. To access the data at high rate, the memory cells are organized into two groups. Each group is accessed separately.

Rambus DRAM (RDRAM): The RDRAM provides a very high data transfer rate over a narrow CPU-memory bus. It uses various speedup mechanisms, like synchronous memory interface, caching inside the DRAM chips and very fast signal timing. The Rambus data bus width is 8 or 9 bits.

Cache DRAM (CDRAM): This memory is a special type DRAM memory with an on-chip cache memory (SRAM) that acts as a high-speed buffer for the main DRAM.

Difference between SRAM and DRAM

Below table lists some of the differences between SRAM and DRAM:

Read Only Memory (ROM) –

Stores crucial information essential to operate the system, like the program essential to boot the computer.It is not volatile.Always retains its data.

Used in embedded systems or where the programming needs no change.

Used in calculators and peripheral devices.

ROM is further classified into 4 types- ROM, PROM, EPROM, and EEPROM.

Types of Read Only Memory (ROM) –

PROM (Programmable read-only memory) – It can be programmed by user. Once programmed, the data and instructions in it cannot be changed.

EPROM (Erasable Programmable read only memory) – It can be reprogrammed. To erase data from it, expose it to ultra violet light. To reprogram it, erase all the previous data.

EEPROM (Electrically erasable programmable read only memory) – The data can be erased by applying electric field, no need of ultra violet light. We can erase only portions of the chip.

Difference between RAM and ROM

Cache Memory is a special very high-speed memory. It is used to speed up and synchronizing with high-speed CPU. Cache memory is costlier than main memory or disk memory but economical than CPU registers. Cache memory is an extremely fast memory type that acts as a buffer between RAM and the CPU. It holds frequently requested data and instructions so that they are immediately available to the CPU when needed.

Cache memory is used to reduce the average time to access data from the Main memory. The cache is a smaller and faster memory which stores copies of the data from frequently used main memory locations. There are various different independent caches in a CPU, which stored instruction and data.

Levels of memory:

Level 1 or Register –

It is a type of memory in which data is stored and accepted that are immediately stored in CPU. Most commonly used register is accumulator, Program counter, address register etc.

Level 2 or Cache memory –

It is the fastest memory which has faster access time where data is temporarily stored for faster access.

Level 3 or Main Memory –

It is memory on which computer works currently it is small in size and once power is off data no longer stays in this memory

Level 4 or Secondary Memory –

It is external memory which is not fast as main memory but data stays permanently in this memory

Cache Performance:

When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache.

If the processor finds that the memory location is in the cache, a cache hit has occurred and data is read from cache

If the processor does not find the memory location in the cache, a cache miss has occurred. For a cache miss, the cache allocates a new entry and copies in data from main memory, then the request is fulfilled from the contents of the cache.

The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.

Hit ratio = hit / (hit + miss) = no. of hits/total accesses

We can improve Cache performance using higher cache block size, higher associativity, reduce miss rate, reduce miss penalty, and reduce Reduce the time to hit in the cache.

Cache Mapping:

There are three different types of mapping used for the purpose of cache memory which are as follows: Direct mapping, Associative mapping, and Set-Associative mapping. These are explained as following below.

1. Direct Mapping –The simplest technique, known as direct mapping, maps each block of main memory into only one possible cache line.

2. In Direct mapping- assigned each memory block to a specific line in the cache. If a line is previously taken up by a memory block when a new block needs to be loaded, the old block is trashed.

3. Associative Mapping –In this type of mapping, the associative memory is used to store content and addresses both of the memory word. Any block can go into any line of the cache.

4. Set-associative Mapping- This form of mapping is an enhanced form of direct mapping where the drawbacks of direct mapping are removed. Set associative addresses the problem of possible thrashing in the direct mapping method.

Application of Cache Memory –

Usually, the cache memory can store a reasonable number of blocks at any given time, but this number is small compared to the total number of blocks in the main memory.

The correspondence between the main memory blocks and those in the cache is specified by a mapping function.

Types of Cache –

Primary Cache

A primary cache is always located on the processor chip. This cache is small and its access time is comparable to that of processor registers.

Secondary Cache

Secondary cache is placed between the primary cache and the rest of the memory. It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed on the processor chip.

Machine Control Instruction

These type of instructions control machine functions such as Halt, Interrupt, or do nothing. This type of instructions alters the different type of operations executed in the processor.

Following are the type of Machine control instructions:

1. NOP (No operation)

2. HLT (Halt)

3. DI (Disable interrupts)

4. EI (Enable interrupts)

5. SIM (Set interrupt mask)

6. RIM (Reset interrupt mask)

NOP (No operation) –

Opcode- NOP

Operand- None

Length- 1 byte

M-Cycles- 1

T-states- 4

Hex code- 00

It is used when no operation is performed. No flags are affected during the execution of NOP. The instructon is used to fill in time delay or to delete and insert instructions while troubleshooting.

HLT (Halt and enter wait state) –

Opcode- HLT

Operand- None

Length- 1 byte

M-Cycles- 2 or more

T-states- 5 or more

Hex code- 76

The Microprocessor finishes executing the current instruction and halts any further execution. The contents of the registers are unaffected during the HLT state.

DI (Disable interrupts) –

Opcode- DI

Operand- None

Length- 1 byte

M-Cycles- 1

T-states- 4

Hex code- F3

Disable interrupt is used when the execution of a code sequence cannot be interrupted. For example, in critical time delays, this instruction is used at the beginning of the code and the interrupts are enabled at the end of the code. The 8085 TRAP cannot be disabled.

EI (Enable interrupts) –

Opcode- EI

Operand- None

Length- 1 byte

M-Cycles- 1

T-states- 4

Hex code- FB

After a system reset or the acknowledgement of an interrupt, the Interrupt Enable the flip-flop is reset, thus disabling the interrupts.

SIM (Set interrupt mask) –

Opcode- SIM

Operand- None

Length- 1 byte

M-Cycles- 1

T-states- 4

Hex code- 30

This SIM instruction is used to implementation of different interrupts of 8085 microprocessor like RST 7.5, 6.5 and 5.5 and also serial data output. It does not affect TRAP interrupt.

RIM (Reset interrupt mask) –

Opcode- RIM

Operand- None

Length- 1 byte

M-Cycles- 1

T-states- 4

Hex code- 20

This is a multipurpose instruction used to read the status of 8085 interrupts 7.5, 6.5, 5.5 and to read serial data input bit.

How the negative numbers are stored in memory?

Prerequisite – Base conversions, 1’s and 2’s complement of a binary number, 2’s complement of a binary string

Suppose the following fragment of code, int a = -34; Now how will this be stored in memory. So here is the complete theory. Whenever a number with minus sign is encountered, the number (ignoring minus sign) is converted to its binary equivalent. Then the two’s complement of the number is calculated. That two’s complement is kept at place allocated in memory and the sign bit will be set to 1 because the binary being kept is of a negative number. Whenever it comes on accessing that value firstly the sign bit will be checked if the sign bit is 1 then the binary will be two’s complemented and converted to equivalent decimal number and will be represented with a minus sign.

Let us take an example:

Example –

int a = -2056;

Binary of 2056 will be calculated which is:

00000000000000000000100000001000 (32 bit representation, according of storage of int in C)

2’s complement of the above binary is:

11111111111111111111011111111000.

So finally the above binary will be stored at memory allocated for variable a.

When it comes on accessing the value of variable a, the above binary will be retrieved from the memory location, then its sign bit that is the left most bit will be checked as it is 1 so the binary number is of a negative number so it will be 2’s complemented and when it will be 2’s complemented will be get the binary of 2056 which is:

00000000000000000000100000001000

The above binary number will be converted to its decimal equivalent which is 2056 and as the sign bit was 1 so the decimal number which is being gained from the binary number will be represented with a minus sign. In our case -2056.

Memory mapped I/O and Isolated I/O

As a CPU needs to communicate with the various memory and input-output devices (I/O) as we know data between the processor and these devices flow with the help of the system bus. There are three ways in which system bus can be allotted to them :

Separate set of address, control and data bus to I/O and memory.

Have common bus (data and address) for I/O and memory but separate control lines.

Have common bus (data, address, and control) for I/O and memory.

Isolated I/O –

Then we have Isolated I/O in which we Have common bus(data and address) for I/O and memory but separate read and write control lines for I/O.

Memory Mapped I/O –

In this case every bus in common due to which the same set of instructions work for memory and I/O.

Differences between memory mapped I/O and isolated I/O –

ISOLATED I/O	MEMORY MAPPED I/O
Memory and I/O have seperate address space	Both have same address space
All address can be used by the memory	Due to addition of I/O addressable memory become less for memory
Separate instruction control read and write operation in I/O and Memory	Same instructions can control both I/O and Memory
In this I/O address are called ports.	Normal memory address are for both
More efficient due to seperate buses	Lesser efficient
Larger in size due to more buses	Smaller in size

Single Accumulator based CPU organization

The computers, present in the early days of computer history, had accumulator based CPUs. In this type of CPU organization, the accumulator register is used implicitly for processing all instructions of a program and store the results into the accumulator. The instruction format that is used by this CPU Organisation is One address field. Due to this the CPU is known as One Address Machine.

The main points about Single Accumulator based CPU Organisation are:

Accumulator is the default address thus after data manipulation the results are stored into the accumulator.

One address instruction is used in this type of organization.

The format of instruction is: Opcode + Address

Opcode indicates the type of operation to be performed.

Mainly two types of operation are performed in single accumulator based CPU organization:

Data transfer operation –

In this type of operation, the data is transferred from a source to a destination.

For ex: LOAD X, STORE Y

Here LOAD is memory read operation that is data is transfer from memory to accumulator and STORE is memory write operation that is data is transfer from accumulator to memory.

ALU operation –

In this type of operation, arithmetic operations are performed on the data.

For ex: MULT X

where X is the address of the operand. The MULT instruction in this example performs the operation,

AC <-- AC * M[X]

AC is the Accumulator and M[X] is the memory word located at location X.

This type of CPU organization is first used in PDP-8 processor and is used for process control and laboratory applications. It has been totally replaced by the introduction of the new general register based CPU.

Advantages –

One of the operands is always held by the accumulator register. This results in short instructions and less memory space.

Instruction cycle takes less time because it saves time in instruction fetching from memory.

Disadvantages –

When complex expressions are computed, program size increases due to the usage of many short instructions to execute it. Thus memory size increases.

As the number of instructions increases for a program, the execution time increases.

Performance of Computer

Computer performance is the amount of work accomplished by a computer system. The word performance in computer performance means “How well is the computer doing the work it is supposed to do?”. It basically depends on response time, throughput and execution time of a computer system.

Response time is the time from start to completion of a task. This also includes:

Operating system overhead.

Waiting for I/O and other processes

Accessing disk and memory

Time spent executing on the CPU or execution time.

Throughput is the total amount of work done in a given time.

CPU execution time is the total time a CPU spends computing on a given task. It also excludes time for I/O or running other programs. This is also referred to as simply CPU time.

Performance is determined by execution time as performance is inversely proportional to execution time.

Performance = (1 / Execution time)

And,(Performance of A / Performance of B)

= (Execution Time of B / Execution Time of A)

If given that Processor A is faster than processor B, that means execution time of A is less than that of execution time of B. Therefore, performance of A is greater than that of performance of B.

Example –

Machine A runs a program in 100 seconds, Machine B runs the same program in 125 seconds

(Performance of A / Performance of B)

= (Execution Time of B / Execution Time of A)

= 125 / 100 = 1.25

That means machine A is 1.25 times faster than Machine B.

And, the time to execute a given program can be computed as:

Execution time = CPU clock cycles x clock cycle time Since clock cycle time and clock rate are reciprocals, so,Execution time = CPU clock cycles / clock rate .

The number of CPU clock cycles can be determined by,

CPU clock cycles

= (No. of instructions / Program ) x (Clock cycles / Instruction)

= Instruction Count x CPI

Which gives,

Execution time

= Instruction Count x CPI x clock cycle time

= Instruction Count x CPI / clock rate

The units for CPU Execution time are:

How to Improve Performance?

To improve performance you can either:

Decrease the CPI (clock cycles per instruction) by using new Hardware.

Decrease the clock time or Increase clock rate by reducing propagation delays or by use pipelining.

Decrease the number of required cycles or improve ISA or Compiler.

Difference between CALL and JUMP instructions

CALL instruction is used to call a subroutine. Subroutines are often used to perform tasks that need to be performed frequently. The JMP instruction is used to cause the PLC to skip over rungs.

The differences Between CALL and JUMP instructions are:

JUMP	CALL
Program control is transferred to a memory location which is in the main program	Program Control is transferred to a memory location which is not a part of main program
Immediate Addressing Mode	Immediate Addressing Mode + Register Indirect Addressing Mode
Initialisation of SP(Stack Pointer) is not mandatory	Initialisation of SP(Stack Pointer) is mandatory
Value of Program Counter(PC) is not transferred to stack	Value of Program Counter(PC) is transferred to stack
After JUMP, there is no return instruction	After CALL, there is a return instruction
Value of SP does not changes	Value of SP is decremented by 2
3 Machine cycles are required to execute this instruction	5 Machine cycles are required to execute this instruction

Memory based Vs Register based addressing modes

Prerequisite – Addressing Modes

Addressing modes are the operations field specifies the operations which need to be performed. The operation must be executed on some data which is already stored in computer registers or in the memory. The way of choosing operands during program execution is dependent on addressing modes of the instruction. “The addressing mode specifies a rule for interpreting or modifying the address field of the instruction before the operand is actually referenced. “Basically how we are interpreting the operand which is given in the instruction is known as addressing mode.

Addressing mode very much depend on the type of CPU organisation. There are three types of CPU organisation:

1. Single Accumulator organisation

2.General register organisation

3. Stack organisation

Addressing modes is used for one or both of the purpose. These can also be said as the advantages of using addressing mode:

MEMORY BASED ADDRESSING MODES vs REGISTER BASED ADDRESSING MODES

MEMORY BASED ADDRESSING MODES	REGISTER BASED ADDRESSING MODES
The operand is present in memory and its address is given in the instruction itself. This addressing mode is taking proper advantage of memory address, e.g., Direct addressing mode	An operand will be given in one of the register and register number will be provided in the instruction.With the register number present in instruction, operand is fetched, e.g., Register mode
The content of base register is added to the address part of the instruction to obtain the effective address. A base register is assumed to hold a base address and the address field of the instruction gives displacement relative to the base address, e.g., Base Register Addressing Mode	If we are having a table of data and our program needs to access all the values one by one we need something which decrements the program counter/or any register which has base address. Though in this case register is basically decreased, it is register based addressing mode, e.g., In Auto decrements mode
The content of the index register is added to the address part that is given in the instruction to obtain the effective address. Index Mode is used to access an array whose elements are in successive memory locations, e.g., Indexed Addressing Mode	If we are having a table of data and our program needs to access all the values one by one we need something which increment the program counter/or any register which has base address, e.g., Auto increment mode
The content of program counter is added to the address part of the instruction in order to obtain the effective address. The address part of the instruction in this case is usually a signed number which can be either positive or negative, e.g., Relative addressing mode	Instructions generally used for initializing registers to a constant value is register based addressing mode,and this technique is very useful approach, e.g., Immedia

Addressing Modes

Addressing Modes– The term addressing modes refers to the way in which the operand of an instruction is specified. The addressing mode specifies a rule for interpreting or modifying the address field of the instruction before the operand is actually executed.

Addressing modes for 8086 instructions are divided into two categories:

1) Addressing modes for data

2) Addressing modes for branch

The 8086 memory addressing modes provide flexible access to memory, allowing you to easily access variables, arrays, records, pointers, and other complex data types. The key to good assembly language programming is the proper use of memory addressing modes.

An assembly language program instruction consists of two parts

The memory address of an operand consists of two components:

IMPORTANT TERMS

· Starting address of memory segment.

· Effective address or Offset: An offset is determined by adding any combination of three address elements: displacement, base and index.

ü Displacement: It is an 8 bit or 16 bit immediate value given in the instruction.

ü Base: Contents of base register, BX or BP.

ü Index: Content of index register SI or DI.

According to different ways of specifying an operand by 8086 microprocessor, different addressing modes are used by 8086.

Addressing modes used by 8086 microprocessor are discussed below:

· Implied mode:: In implied addressing the operand is specified in the instruction itself. In this mode the data is 8 bits or 16 bits long and data is the part of instruction.Zero address instruction are designed with implied addressing mode.

Example: MOV AL, 35H (move the data 35H into AL register)

· Immediate addressing mode (symbol #):In this mode data is present in address field of instruction .Designed like one address instruction format.

Note:Limitation in the immediate mode is that the range of constants are restricted by size of address field.

· Register mode: In register addressing the operand is placed in one of 8 bit or 16 bit general purpose registers. The data is in the register that is specified by the instruction.

Here one register reference is required to access the data.

Example: MOV AX,CX (move the contents of CX register to AX register)

· Register Indirect mode: In this addressing the operand’s offset is placed in any one of the registers BX,BP,SI,DI as specified in the instruction. The effective address of the data is in the base register or an index register that is specified by the instruction.

Here two register reference is required to access the data.

The 8086 CPUs let you access memory indirectly through a register using the register indirect addressing modes.

MOV AX, [BX](move the contents of memory location s

addressed by the register BX to the register AX)

· Auto Indexed (increment mode): Effective address of the operand is the contents of a register specified in the instruction. After accessing the operand, the contents of this register are automatically incremented to point to the next consecutive memory location.(R1)+.

Here one register reference,one memory reference and one ALU operation is required to access the data.

Example:

Add R1, (R2)+ // OR

R1 = R1 +M[R2]

R2 = R2 + d

Useful for stepping through arrays in a loop. R2 – start of array d – size of an element

· Auto indexed ( decrement mode): Effective address of the operand is the contents of a register specified in the instruction. Before accessing the operand, the contents of this register are automatically decremented to point to the previous consecutive memory location. –(R1)

Here one register reference,one memory reference and one ALU operation is required to access the data.

Example:

Add R1,-(R2) //OR

R2 = R2-d

R1 = R1 + M[R2]

Auto decrement mode is same as auto increment mode. Both can also be used to implement a stack as push and pop . Auto increment and Auto decrement modes are useful for implementing “Last-In-First-Out” data structures.

· Direct addressing/ Absolute addressing Mode (symbol [ ]): The operand’s offset is given in the instruction as an 8 bit or 16 bit displacement element. In this addressing mode the 16 bit effective address of the data is the part of the instruction.

Here only one memory reference operation is required to access the data.

Example:ADD AL,[0301] //add the contents of offset address 0301 to AL

· Indirect addressing Mode (symbol @ or () ):In this mode address field of instruction contains the address of effective address.Here two references are required.

1st reference to get effective address.

2nd reference to access the data.

Based on the availability of Effective address, Indirect mode is of two kind:

1. Register Indirect:In this mode effective address is in the register, and corresponding register name will be maintained in the address field of an instruction.

Here one register reference,one memory reference is required to access the data.

2. Memory Indirect:In this mode effective address is in the memory, and corresponding memory address will be maintained in the address field of an instruction.

Here two memory reference is required to access the data.

· Indexed addressing mode: The operand’s offset is the sum of the content of an index register SI or DI and an 8 bit or 16 bit displacement.

Example:MOV AX, [SI +05]

· Based Indexed Addressing: The operand’s offset is sum of the content of a base register BX or BP and an index register SI or DI.

Example: ADD AX, [BX+SI]

Based on Transfer of control, addressing modes are:

PC relative addressing mode: PC relative addressing mode is used to implement intra segment transfer of control, In this mode effective address is obtained by adding displacement to PC.

EA= PC + Address field value

PC= PC + Relative value.

Base register addressing mode:Base register addressing mode is used to implement inter segment transfer of control.In this mode effective address is obtained by adding base register value to address field value.

EA= Base register + Address field value.

PC= Base register + Relative value.

Note:

1. PC relative nad based register both addressing modes are suitable for program relocation at runtime.

2. Based register addressing mode is best suitable to write position independent codes.

Interaction of a Program with Hardware

When a Programmer wrote a program, then how it is feeded to the computer and how it actually works?

So, this article is about the process how the program code that we write in any text editor is feeded to computer as we all know computer works on only two numbers that is 0 or 1.

We write code in text editor using any language like C++, JAVA, Python etc.

This code is given to the compiler and it actually converts it to assembly code that is very close to machine hardware as it depend on instruction set which is then converted to binary that is 0 and 1 which actually represent digital voltage feeded to transistors inside the chip.

Now we have voltages which is actually required to run the hardware.These voltages actually connect the correct circuitry inside the chip and perform that specific task for example addition, subtraction etc .All these operations are done by combination of little transistors if we go into low level or flip-flops which are combination of gates and gates are combination of transistors. So, it all started with the invention of transistors.

The chip has lot of circuits inside it to perform various task like arithmetic nd logical task.

The computer hardware also contain RAM which is another chip which can store data temporary and Hard disk which can permanently store data.

Operating system is also responsible to feed the software to the right hardware like keyboard, mouse, screen etc.

The following picture depict the whole process:

Branch Prediction in Pentium

Why do we need branch prediction?

The gain produced by Pipelining can be reduced by the presence of program transfer instructions eg JMP, CALL, RET etc

They change the sequence causing all the instructions that entered the pipeline after program transfer instructions invalid

Thus no work is done as the pipeline stages are reloaded.

Branch prediction logic:

To avoid this problem, Pentium uses a scheme called Dynamic Branch Prediction. In this scheme, a prediction is made for the branch instruction currently in the pipeline. The prediction will either be taken or not taken. If the prediction is true then the pipeline will not be flushed and no clock cycles will be lost. If the prediction is false then the pipeline is flushed and starts over with the current instruction.

It is implemented using 4 way set associated cache with 256 entries. This is called Branch Target Buffer (BTB). The directory entry for each line consists of:

Valid bit: Indicates whether the entry is valid or not.

History bit: Track how often bit has been taken.

Source memory address is from where the branch instruction was fetched. If the directory entry is valid then the target address of the branch is stored in corresponding data entry in BTB.

Working of Branch Prediction:

BTB is a lookaside cache that sits to the side of Decode Instruction(DI) stage of 2 pipelines and monitors for branch instructions.

The first time that a branch instruction enters the pipeline, the BTB uses its source memory to perform a lookup in the cache.

Since the instruction was never seen before, it is BTB miss. It predicts that the branch will not be taken even though it is unconditional jump instruction.

When the instruction reaches the EU(execution unit), the branch will either be taken or not taken. If taken, the next instruction to be executed will be fetched from the branch target address. If not taken, there will be a sequential fetch of instructions.

When a branch is taken for the first time, the execution unit provides feedback to the branch prediction. The branch target address is sent back which is recorded in BTB.

A directory entry is made containing the source memory address and history bit is set as strongly taken.

What is RISC?

A reduced instruction set computer is a computer which only uses simple commands that can be divided into several instructions which achieve low-level operation within a single CLK cycle, as its name proposes “Reduced Instruction Set”.

RISC Architecture

The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on simple orders and acts fast.

RISC Architecture

This is small or reduced set of instructions. Here, every instruction is expected to attain very small jobs. In this machine, the instruction sets are modest and simple, which help in comprising more complex commands. Each instruction is of the similar length; these are wound together to get compound tasks done in a single operation. Most commands are completed in one machine cycle. This pipelining is a crucial technique used to speed up RISC machines.

What is CISC?

A complex instruction set computer is a computer where single instructions can perform numerous low-level operations like a load from memory, an arithmetic operation, and a memory store or are accomplished by multi-step processes or addressing modes in single instructions, as its name proposes “Complex Instruction Set ”.

CISC Architecture

The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based on single commands, which are skilled in executing multi-step operations.

CISC Architecture

CISC computers have small programs. It has a huge number of compound instructions, which takes a long time to perform. Here, a single set of instruction is protected in several steps; each instruction set has additional than 300 separate instructions. Maximum instructions are finished in two to ten machine cycles. In CISC, instruction pipelining is not easily implemented.

Difference between RISC and CISC Architecture

RISC	CISC
1. RISC stands for Reduced Instruction Set Computer.	1. CISC stands for Complex Instruction Set Computer.
2. RISC processors have simple instructions taking about one clock cycle. The average clock cycle per instruction (CPI) is 1.5	2. CSIC processor has complex instructions that take up multiple clocks for execution. The average clock cycle per instruction (CPI) is in the range of 2 and 15.
3. Performance is optimized with more focus on software	3. Performance is optimized with more focus on hardware.
4. It has no memory unit and uses a separate hardware to implement instructions..	4. It has a memory unit to implement complex instructions.
5. It has a hard-wired unit of programming.	5. It has a microprogramming unit.
6. The instruction set is reduced i.e. it has only a few instructions in the instruction set. Many of these instructions are very primitive.	6. The instruction set has a variety of different instructions that can be used for complex operations.
7. The instruction set has a variety of different instructions that can be used for complex operations.	7. CISC has many different addressing modes and can thus be used to represent higher-level programming language statements more efficiently.
8. Complex addressing modes are synthesized using the software.	8. CISC already supports complex addressing modes
9. Multiple register sets are present	9. Only has a single register set
10. RISC processors are highly pipelined	10. They are normally not pipelined or less pipelined
11. The complexity of RISC lies with the compiler that executes the program	11. The complexity lies in the microprogram
12. Execution time is very less	12. Execution time is very high
13. Code expansion can be a problem	13. Code expansion is not a problem
14. Decoding of instructions is simple.	14. Decoding of instructions is complex
15. It does not require external memory for calculations	15. It requires external memory for calculations

Source:

Tutorial point- https://www.tutorialspoint.com/

Java T point- https://www.javatpoint.com/

Geeksforgeeks- https://www.geeksforgeeks.org

Techopedia - https://www.techopedia.com/

guru99- https://www.guru99.com/

techterms - https://techterms.com/

webopedia - https://www.webopedia.com/

study - https://study.com/

wikipedia - https://en.wikipedia.org/

cprogramming - https://www.cprogramming.com/

w3schools - https://www.w3schools.com/

Electronic hub- https://www.electronicshub.org/

Computer Architecture

Read Only Memory (ROM) –

Stores crucial information essential to operate the system, like the program essential to boot the computer.It is not volatile.Always retains its data.

Used in embedded systems or where the programming needs no change.

Used in calculators and peripheral devices.

ROM is further classified into 4 types- ROM, PROM, EPROM, and EEPROM.

Types of Read Only Memory (ROM) –

PROM (Programmable read-only memory) – It can be programmed by user. Once programmed, the data and instructions in it cannot be changed.

EPROM (Erasable Programmable read only memory) – It can be reprogrammed. To erase data from it, expose it to ultra violet light. To reprogram it, erase all the previous data.

EEPROM (Electrically erasable programmable read only memory) – The data can be erased by applying electric field, no need of ultra violet light. We can erase only portions of the chip.

Cache memory is used to reduce the average time to access data from the Main memory. The cache is a smaller and faster memory which stores copies of the data from frequently used main memory locations. There are various different independent caches in a CPU, which stored instruction and data.

Levels of memory:

Level 1 or Register –

It is a type of memory in which data is stored and accepted that are immediately stored in CPU. Most commonly used register is accumulator, Program counter, address register etc.

Level 2 or Cache memory –

It is the fastest memory which has faster access time where data is temporarily stored for faster access.

Level 3 or Main Memory –

It is memory on which computer works currently it is small in size and once power is off data no longer stays in this memory

Level 4 or Secondary Memory –

It is external memory which is not fast as main memory but data stays permanently in this memory

Cache Performance:

When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache.

If the processor finds that the memory location is in the cache, a cache hit has occurred and data is read from cache

If the processor does not find the memory location in the cache, a cache miss has occurred. For a cache miss, the cache allocates a new entry and copies in data from main memory, then the request is fulfilled from the contents of the cache.

The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.

Hit ratio = hit / (hit + miss) = no. of hits/total accesses

We can improve Cache performance using higher cache block size, higher associativity, reduce miss rate, reduce miss penalty, and reduce Reduce the time to hit in the cache.

What is RISC?

RISC Architecture

What is CISC?

CISC Architecture

Difference between RISC and CISC Architecture

Posted by: সাখাওয়াত হোসেন

এই পোস্টগুলি আপনার ভাল লাগতে পারে

একটি মন্তব্য পোস্ট করুন

0 মন্তব্যসমূহ

Popular Posts

Viva প্রস্তুতি

Categories

Random Posts

লেবেল

Popular Posts

Menu Footer Widget