What is Computer Architecture?
Computer architecture is a specification detailing how a set of software and hardware technology standards interact to form a computer system or platform. In short, computer architecture refers to how a computer system is designed and what technologies it is compatible with.
Computer system has five basic units that help the computer to perform operations, which are given below:
Input Unit
Output Unit
Storage Unit
Arithmetic Logic Unit
Control Unit

What is the technical difference between 32 bit and 64 bit operating system? [BUET madrasha board-2018]
The terms 32-bit and 64-bit refer
to the way a computer's processor (also called a CPU), handles
information. The 64-bit version of Windows handles large amounts of random
access memory (RAM) more effectively than a 32-bit system.
Difference between 32-bit and 64-bit
operating systems
In computing, there exist two type processor
i.e., 32-bit and 64-bit. These processor tells us how much memory a processor
can have access from a CPU register. For instance,
A 32-bit system can
access 232 memory addresses, i.e 4 GB of RAM or physical
memory.
A 64-bit system can access 264 memory
addresses, i.e actually 18-Billion GB of RAM. In short, any amount of memory
greater than 4 GB can be easily handled by it.
Advantages of 64-bit over
32-bit
§
Using 64-bit one can do a lot in
multi-tasking, user can easily switch between various applications without any windows
hanging problems.
§
Gamers can easily plays High graphical games
like Modern Warfare, GTA V, or use high-end softwares like Photoshop or CAD
which takes a lot of memory, since it makes multi-tasking with big softwares
easy and efficient for users. However upgrading the video card instead of
getting a 64-bit processor would be more beneficial.
How
many processor have in core i3, core i5 and i7 processor?
Different processor families
have different levels of efficiency, so how much they get done with each clock
cycle is more important than the GHz number itself.
Intel’s current core
processors are divided into three ranges(Core i3, Core i5 and Core i7), with
several models in each range.The differences between these ranges aren’t same
on laptop chips as on desktops. Desktop chips follow a more logical pattern as
compared to laptop chips, but many of the technologies and terms, we are about
to discuss, such as cache memory, the number of cores, Turbo boost and
Hyper-Threading concepts is same. Laptop processors have to balance power
efficiency with performance – a constraint that doesn’t really apply to desktop
chips. Similar is the case with the Mobile processors.
Let’s start differentiating the processors on the basis of the concepts
discussed below!
Model |
Core i3 |
Core i5 |
Core
i7 |
Number of cores |
2 |
4 |
4 |
Hyper-threading |
Yes |
No |
Yes |
Turbo boost |
No |
Yes |
Yes |
K model |
No |
Yes |
Yes |
What
determine whether a microprocessor is 16 bit or 32 bit? Justify your
answer.[AME BB-2017]
The bit size (8-bit, 16-bit, 32-bit) of a microprocessor is
determined by the hardware, specifically the width of the data bus. The Intel
8086 is a 16-bit processor because it can move 16 bits at a time over the data
bus. The Intel 8088 is an 8-bit processor even though it has an identical
instruction set. This is similar to the Motorola 68000 and 68008 processors.
The bit size is not determined by the programmer's view (the register width and
the address range).
DMA operation
block transfer DMA:-this is the most
common type of DMA used with microprocessors. As mentioned before in this type
of DMA the peripheral device request the DMA transfer via DMA request line ,
which is connected directly or through a DMA controller chip to the
microprocessor . the microprocessor completes the current instruction and sends
a DMACK to the peripheral device in order to indicate that the bus can be used
for DMA operation. The DMA controller chip then completes the DMA transfer and
transfers the control of the bus to the
microprocessor.


Computer Organization and Architecture | Pipelining
(Execution, Stages and Throughput)
To improve the performance
of a CPU we have two options:
1) Improve the hardware by introducing faster circuits.
2) Arrange the hardware such that more than one operation can be performed at
the same time.
Since, there is a limit on
the speed of hardware and the cost of faster circuits is quite high, we have to
adopt the 2nd option.
Pipelining : Pipelining is a process of arrangement of hardware
elements of the CPU such that its overall performance is increased.
Simultaneous execution of more than one instruction takes place in a pipelined
processor.
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)
With pipelining = 5/3 minutes = 1.67m
I F S | |
| I F S |
| | I F S (5 minutes)
Thus, pipelined operation increases the
efficiency of a system.
Design of a basic pipeline
§ In a pipelined processor, a pipeline has two
ends, the input end and the output end. Between these ends, there are multiple
stages/segments such that output of one stage is connected to input of next
stage and each stage performs a specific operation.
§ Interface registers are used to hold the
intermediate output between two stages. These interface registers are also
called latch or buffer.
§ All the stages in the pipeline along with the
interface registers are controlled by a common clock.
Execution in a pipelined processor
Execution sequence of instructions in a pipelined processor can be visualized
using a space-time diagram. For example, consider a processor having 4 stages
and let there be 2 instructions to be executed. We can visualize the execution
sequence through the following space-time diagrams:
Non overlapped execution:
STAGE
/ CYCLE |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
S1 |
I1 |
I2 |
||||||
S2 |
I1 |
I2 |
||||||
S3 |
I1 |
I2 |
||||||
S4 |
I1 |
I2 |
Total time = 8 Cycle
Overlapped execution:
STAGE
/ CYCLE |
1 |
2 |
3 |
4 |
5 |
S1 |
I1 |
I2 |
|||
S2 |
I1 |
I2 |
|||
S3 |
I1 |
I2 |
|||
S4 |
I1 |
I2 |
Total time = 5 Cycle
Pipeline Stages
RISC processor has 5 stage instruction pipeline
to execute all the instructions in the RISC instruction set. Following are the
5 stages of RISC pipeline with their respective operations:
§ Stage 1 (Instruction Fetch)
In this stage the CPU reads instructions from the address in the memory whose
value is present in the program counter.
§ Stage 2 (Instruction Decode)
In this stage, instruction is decoded and the register file is accessed to get
the values from the registers used in the instruction.
§ Stage 3 (Instruction Execute)
In this stage, ALU operations are performed.
§ Stage 4 (Memory Access)
In this stage, memory operands are read and written from/to the memory that is
present in the instruction.
§ Stage 5 (Write Back)
In this stage, computed/fetched value is written back to the register present
in the instruction.
Performance of a pipelined processor
Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’
tasks to be completed in the pipelined processor. Now, the first instruction is
going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’
instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So,
time taken to execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor,
execution time of ‘n’ instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over non-pipelined
processor, when ‘n’ tasks are executed on the same processor is:
S = Performance of
pipelined processor /
Performance of
Non-pipelined processor
As the performance of a processor is inversely
proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n
– 1]
When the number of tasks ‘n’ are significantly
larger than k, that is, n >> k
S = n * k / n
S = k
where ‘k’ are the number of stages in the
pipeline.
Also, Efficiency = Given speed
up / Max speed up = S / Smax
We know that, Smax = k
So, Efficiency = S / k
Throughput = Number of instructions / Total time to complete the
instructions
So, Throughput = n / (k + n –
1) * Tp
Note: The cycles per instruction (CPI) value of
an ideal pipelined processor is 1
Hazards : Pipeline hazards are situations that prevent the next
instruction in the instruction stream from executing during its designated
clock cycles.
Any condition that causes a stall in the
pipeline operations can be called a hazard.
There
are primarily three types of hazards:
i.
Data Hazards
ii.
Control Hazards or instruction Hazards
iii.
Structural Hazards.
i. Data Hazards: Data Hazards occur when there is data dependency between instructions.
Whenever
there are two instructions one of which depends on the data obtained from the
other.
A=3+A
B=A*4
For
the above sequence, the second instruction needs the value of ‘A’ computed in
the first instruction.
Thus
the second instruction is said to depend on the first.
If
the execution is done in a pipelined processor, it is highly likely that the
interleaving of these two instructions can lead to incorrect results due to
data dependency between the instructions. Thus the pipeline needs to be stalled
as and when necessary to avoid errors.
Another
example
For example
I1: Add R1,R2,R3 ; R1=R2+R3
I2: LD R5,10(R1) ; Load R5 with memory from 10+R1
Now we need to wait for I1 to execute to know what R1 will be before I2 can
move forward. This is called Read After Write Hazard (RAW) and also called true
dependency. Other type of hazards in data hazards are Write after Write (WAW)
and Write after read (WAR)
ii. Structural Hazards:
If you see I1 and I4
both need access to memory at the same time leading to structural hazards. So
structural hazards happen when there is a resource conflict.
iii. Control hazards:
The
instruction fetch unit of the CPU is responsible for providing a stream of
instructions to the execution unit. The instructions fetched by the fetch unit
are in consecutive memory locations and they are executed.
Control Hazards occur
when a conditional instruction takes the branch and move the instruction
pointer to a different location. This causes previously fetched instructions to
be invalid and the pipeline needs to be flushed for proper operation.
why
pentium is superscalar processor?
[BUET M.SC Admission -2015]
A superscalar CPU
can execute more than one instruction per clock cycle. ... However, most
CISC-based processors (such as the Intel Pentium)
now include some RISC architecture as well, which enables them to execute
instructions in parallel. Nearly all processors developed
after 1998 are superscalar.
Why is Pentium a superscalar processor
but 80386 is not?
Because the Pentium could
issue more than one instruction per cycle and the 80386 could not.
superscalar
A superscalar CPU can execute more than one instruction per clock
cycle. Because processing speeds are measured in clock cycles per second (megahertz), a superscalar processor will be faster than a scalar
processor rated at the same megahertz.
A superscalar architecture includes parallel execution
units, which can execute instructions simultaneously. This parallel
architecture was first implemented in RISC processors, which use short and simple instructions
to perform calculations. Because of their superscalar capabilities, RISC
processors have typically performed better than CISC processors running at the same megahertz. However,
most CISC-based processors (such as the Intel Pentium) now include some RISC
architecture as well, which enables them to execute instructions in parallel.
Nearly all processors developed after 1998 are superscalar.
Advantages
of microcontoller over microprocessor?[BUET M.SC Admission -2015]
Microcontrollers and microprocessors may
seem like very different devices; however, it is important to note that all
microcontrollers contain microprocessors. The key difference between a
microcontroller and a multifunctional PC microprocessor is the overall level of
complexity. Microcontroller processors are designed to fill a smaller, more
focused variety of roles while making use of less expensive and less complex
circuitry. The main advantage of a microcontroller is that it allows electronic
automation in situations where a full-sized computer is not needed.
What
is the difference between computer architecture and computer
Organization[EGCB-2018]
I
am going to summarize the differences between computer architecture and
computer organization in an easy to memorize tabular form as shown below:
Computer Organization |
Computer Architecture |
Often called microarchitecture
(low level) |
Computer architecture (a bit
higher level) |
Transparent from programmer (ex. a
programmer does not worry much how addition is implemented in hardware) |
Programmer view (i.e. Programmer
has to be aware of which instruction set used) |
Physical components (Circuit
design, Adders, Signals, Peripherals) |
Logic (Instruction set, Addressing
modes, Data types, Cache optimization) |
How to do ? (implementation of the
architecture) |
What to do ? (Instruction set) |
Architecture
1. interface between hardware and software
2. abstract model and is programmer's view in
terms of instructions,addressing modes and registers
3. describes what computer does
4. while designing computer system architecture
is considered first
5. it deals with high level design issues
eg : is there a multiplication instruction??
Organisation
1. deals with components of connection in a
system
2. expresses the realization of architecture
3. describes how computer does a task
4. organization is done on the basis of
architecture
5. deals with low level design issues
Computer Architecture
and Computer Organization Examples
§ Intel and AMD make X86 CPUs where X86 refers
to the computer architecture used. X86 is an example on a
CISC architecture (CISC stands for Complex Instruction Set
Computer). CISC instructions are complex and may take multiple CPU cycles
to execute. As you can see, one architecture (X86) but two different computer
organizations (Intel and AMD flavors).
§ nVidia and Qualcomm on the other hand make
GPUs (graphics processing unit as opposed to a CPU central processing unit).
These GPUs are based on the ARM (Advanced RISC Machines) architecture. ARM is
an example on a RISC architecture (RISC stands for Reduced Instruction Set
Computer). Instructions in an ARM architecture are relatively simple and
typically execute in one clock cycle. Similarly, ARM here is the computer
architecture while both nVidia and Qualcomm develop their own flavor of
computer organization (i.e architecture implementation).
What’s difference between CPU Cache and TLB?
Both
CPU Cache and TLB are hardware used in microprocessors but what’s the
difference, especially when someone says that TLB is also a type of Cache?
First
thing first. CPU Cache is a fast memory which is used to improve latency of
fetching information from Main memory (RAM) to CPU registers. So CPU Cache sits
between Main memory and CPU. And this cache stores information temporarily so that
the next access to the same information is faster. A CPU cache which used to
store executable instructions, it’s called Instruction Cache (I-Cache). A CPU
cache which is used to store data, it’s called Data Cache (D-Cache). So I-Cache
and D-Cache speeds up fetching time for instructions and data respectively. A
modern processor contains both I-Cache and D-Cache. For completeness, let us
discuss about D-cache hierarchy as well. D-Cache is typically organized in a
hierarchy i.e. Level 1 data cache, Level 2 data cache etc.. It should be noted
that L1 D-Cache is faster/smaller/costlier as compared to L2 D-Cache. But the
basic idea of ‘CPU cache‘ is to speed up instruction/data fetch time from Main
memory to CPU.
Translation
Lookaside Buffer (i.e. TLB) is required only if Virtual Memory is used by a
processor. In short, TLB speeds up translation of virtual address to physical
address by storing page-table in a faster memory. In fact, TLB also sits
between CPU and Main memory. Precisely speaking, TLB is used by MMU when
physical address needs to be translated to virtual address. By keeping this
mapping of virtual-physical addresses in a fast memory, access to page-table
improves. It should be noted that page-table (which itself is stored in RAM)
keeps track of where virtual pages are stored in the physical memory. In that
sense, TLB also can be considered as a cache of the page-table.
But
the scope of operation for TLB and CPU Cache is different. TLB is about
‘speeding up address translation for Virtual memory’ so that page-table needn’t
to be accessed for every address. CPU Cache is about ‘speeding up main memory
access latency’ so that RAM isn’t accessed always by CPU. TLB operation comes
at the time of address translation by MMU while CPU cache operation comes at
the time of memory access by CPU. In fact, any modern processor deploys all
I-Cache, L1 & L2 D-Cache and TLB.
Difference
between risc and cisc processor basis on address and data bus.[EGCB-2018]
Different Types of RAM (Random Access Memory )
RAM(Random
Access Memory) is a part of computer’s Main Memory which is directly accessible
by CPU. RAM is used to Read and Write data into it which is accessed by CPU
randomly. RAM is volatile in nature, it means if the power goes off, the stored
information is lost. RAM is used to store the data that is currently processed
by the CPU. Most of the programs and data that are modifiable are stored in
RAM.
Integrated
RAM chips are available in two form:
SRAM(Static RAM)
DRAM(Dynamic RAM)
The block
diagram of RAM chip is given below.
The SRAM memories consist of
circuits capable of retaining the stored information as long as the power is
applied. That means this type of memory requires constant power. SRAM memories
are used to build Cache Memory.
DRAM
DRAM
stores the binary information in the form of electric charges that applied to
capacitors. The stored information on the capacitors tend to lose over a period
of time and thus the capacitors must be periodically recharged to retain their
usage. The main memory is generally made up of DRAM chips.
Types of
DRAM
There are
mainly 5 types of DRAM:
Asynchronous DRAM (ADRAM):
The DRAM described above is the asynchronous type DRAM. The timing of the
memory device is controlled asynchronously. A specialized memory controller
circuit generates the necessary control signals to control the timing. The CPU
must take into account the delay in the response of the memory.
Synchronous DRAM (SDRAM):
These RAM chips’ access speed is directly synchronized with the CPU’s clock.
For this, the memory chips remain ready for operation when the CPU expects them
to be ready. These memories operate at the CPU-memory bus without imposing wait
states. SDRAM is commercially available as modules incorporating multiple SDRAM
chips and forming the required capacity for the modules.
Double-Data-Rate SDRAM (DDR SDRAM): This faster version of SDRAM performs its operations on
both edges of the clock signal; whereas a standard SDRAM performs its
operations on the rising edge of the clock signal. Since they transfer data on
both edges of the clock, the data transfer rate is doubled. To access the data
at high rate, the memory cells are organized into two groups. Each group is
accessed separately.
Rambus DRAM (RDRAM):
The RDRAM provides a very high data transfer rate over a narrow CPU-memory bus.
It uses various speedup mechanisms, like synchronous memory interface, caching
inside the DRAM chips and very fast signal timing. The Rambus data bus width is
8 or 9 bits.
Cache DRAM (CDRAM):
This memory is a special type DRAM memory with an on-chip cache memory (SRAM)
that acts as a high-speed buffer for the main DRAM.
Difference
between SRAM and DRAM
Below
table lists some of the differences between SRAM and DRAM:

Read Only Memory (ROM) –
Stores
crucial information essential to operate the system, like the program essential
to boot the computer.It is not volatile.Always retains its data.
Used
in embedded systems or where the programming needs no change.
Used
in calculators and peripheral devices.
ROM
is further classified into 4 types- ROM, PROM, EPROM, and EEPROM.
Types
of Read Only Memory (ROM) –
PROM (Programmable
read-only memory) – It can be programmed by user. Once programmed, the data and
instructions in it cannot be changed.
EPROM (Erasable
Programmable read only memory) – It can be reprogrammed. To erase data from it,
expose it to ultra violet light. To reprogram it, erase all the previous data.
EEPROM (Electrically
erasable programmable read only memory) – The data can be erased by applying
electric field, no need of ultra violet light. We can erase only portions of
the chip.
Difference between RAM and ROM

Cache Memory
is a special very high-speed memory. It is used to speed up and synchronizing
with high-speed CPU. Cache memory is costlier than main memory or disk memory
but economical than CPU registers. Cache memory is an extremely fast memory
type that acts as a buffer between RAM and the CPU. It holds frequently
requested data and instructions so that they are immediately available to the
CPU when needed.
Cache
memory is used to reduce the average time to access data from the Main memory.
The cache is a smaller and faster memory which stores copies of the data from
frequently used main memory locations. There are various different independent
caches in a CPU, which stored instruction and data.
Levels
of memory:
Level 1 or Register –
It
is a type of memory in which data is stored and accepted that are immediately
stored in CPU. Most commonly used register is accumulator, Program counter,
address register etc.
Level 2 or Cache memory –
It
is the fastest memory which has faster access time where data is temporarily stored
for faster access.
Level 3 or Main Memory –
It
is memory on which computer works currently it is small in size and once power
is off data no longer stays in this memory
Level 4 or Secondary Memory –
It
is external memory which is not fast as main memory but data stays permanently
in this memory
Cache Performance:
When
the processor needs to read or write a location in main memory, it first checks
for a corresponding entry in the cache.
If
the processor finds that the memory location is in the cache, a cache hit has
occurred and data is read from cache
If
the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data
from main memory, then the request is fulfilled from the contents of the cache.
The
performance of cache memory is frequently measured in terms of a quantity
called Hit ratio.
Hit
ratio = hit / (hit + miss) = no. of
hits/total accesses
We
can improve Cache performance using higher cache block size, higher
associativity, reduce miss rate, reduce miss penalty, and reduce Reduce the
time to hit in the cache.
Cache
Mapping:
There are three different types of
mapping used for the purpose of cache memory which are as follows: Direct
mapping, Associative mapping, and Set-Associative mapping. These are explained
as following below.
1.
Direct Mapping –The simplest technique, known as direct
mapping, maps each block of main memory into only one possible cache line.
2.
In Direct mapping- assigned each memory block to a
specific line in the cache. If a line is previously taken up by a memory block
when a new block needs to be loaded, the old block is trashed.
3.
Associative Mapping –In this type of mapping, the associative
memory is used to store content and addresses both of the memory word. Any
block can go into any line of the cache.
4.
Set-associative Mapping- This form of mapping is an
enhanced form of direct mapping where the drawbacks of direct mapping are
removed. Set associative addresses the problem of possible thrashing in the
direct mapping method.
Application
of Cache Memory –
Usually, the cache memory can store
a reasonable number of blocks at any given time, but this number is small
compared to the total number of blocks in the main memory.
The correspondence between the main
memory blocks and those in the cache is specified by a mapping function.
Types
of Cache –
Primary
Cache
A primary cache is always located
on the processor chip. This cache is small and its access time is comparable to
that of processor registers.
Secondary
Cache
Secondary cache is placed between
the primary cache and the rest of the memory. It is referred to as the level 2
(L2) cache. Often, the Level 2 cache is also housed on the processor chip.
Machine
Control Instruction
These type of instructions control
machine functions such as Halt, Interrupt, or do nothing. This type of
instructions alters the different type of operations executed in the processor.
Following are the type of Machine
control instructions:
1. NOP (No operation)
2. HLT (Halt)
3. DI (Disable interrupts)
4. EI (Enable interrupts)
5. SIM (Set interrupt mask)
6. RIM (Reset interrupt mask)
NOP
(No operation) –
Opcode- NOP
Operand- None
Length- 1 byte
M-Cycles- 1
T-states- 4
Hex code- 00
It is used when no operation is
performed. No flags are affected during the execution of NOP. The instructon is
used to fill in time delay or to delete and insert instructions while
troubleshooting.
HLT
(Halt and enter wait state) –
Opcode- HLT
Operand- None
Length- 1 byte
M-Cycles- 2 or more
T-states- 5 or more
Hex code- 76
The Microprocessor finishes
executing the current instruction and halts any further execution. The contents
of the registers are unaffected during the HLT state.
DI
(Disable interrupts) –
Opcode- DI
Operand- None
Length- 1 byte
M-Cycles- 1
T-states- 4
Hex code- F3
Disable interrupt is used when the
execution of a code sequence cannot be interrupted. For example, in critical
time delays, this instruction is used at the beginning of the code and the
interrupts are enabled at the end of the code. The 8085 TRAP cannot be
disabled.
EI
(Enable interrupts) –
Opcode- EI
Operand- None
Length- 1 byte
M-Cycles- 1
T-states- 4
Hex code- FB
After a system reset or the
acknowledgement of an interrupt, the Interrupt Enable the flip-flop is reset,
thus disabling the interrupts.
SIM
(Set interrupt mask) –
Opcode- SIM
Operand- None
Length- 1 byte
M-Cycles- 1
T-states- 4
Hex code- 30
This SIM instruction is used to
implementation of different interrupts of 8085 microprocessor like RST 7.5, 6.5
and 5.5 and also serial data output. It does not affect TRAP interrupt.
RIM
(Reset interrupt mask) –
Opcode- RIM
Operand- None
Length- 1 byte
M-Cycles- 1
T-states- 4
Hex code- 20
This is a multipurpose instruction
used to read the status of 8085 interrupts 7.5, 6.5, 5.5 and to read serial
data input bit.
How
the negative numbers are stored in memory?
Prerequisite – Base conversions,
1’s and 2’s complement of a binary number, 2’s complement of a binary string
Suppose the following fragment of code,
int a = -34; Now how will this be stored in memory. So here is the complete
theory. Whenever a number with minus sign is encountered, the number (ignoring
minus sign) is converted to its binary equivalent. Then the two’s complement of
the number is calculated. That two’s complement is kept at place allocated in
memory and the sign bit will be set to 1 because the binary being kept is of a
negative number. Whenever it comes on accessing that value firstly the sign bit
will be checked if the sign bit is 1 then the binary will be two’s complemented
and converted to equivalent decimal number and will be represented with a minus
sign.
Let us take an example:
Example –
int a = -2056;
Binary of 2056 will be calculated
which is:
00000000000000000000100000001000
(32 bit representation, according of storage of int in C)
2’s complement of the above binary
is:
11111111111111111111011111111000.
So finally the above binary will be
stored at memory allocated for variable a.
When it comes on accessing the
value of variable a, the above binary will be retrieved from the memory
location, then its sign bit that is the left most bit will be checked as it is
1 so the binary number is of a negative number so it will be 2’s complemented
and when it will be 2’s complemented will be get the binary of 2056 which is:
00000000000000000000100000001000
The above binary number will be
converted to its decimal equivalent which is 2056 and as the sign bit was 1 so
the decimal number which is being gained from the binary number will be represented
with a minus sign. In our case -2056.
Memory mapped I/O and Isolated I/O
As a CPU needs to
communicate with the various memory and input-output devices (I/O) as we know
data between the processor and these devices flow with the help of the system
bus. There are three ways in which system bus can be allotted to them :
Separate set of address,
control and data bus to I/O and memory.
Have common bus (data and
address) for I/O and memory but separate control lines.
Have common bus (data,
address, and control) for I/O and memory.
Isolated I/O –
Then we have Isolated I/O
in which we Have common bus(data and address) for I/O and memory but separate
read and write control lines for I/O.
Memory Mapped I/O –
In this case every bus in
common due to which the same set of instructions work for memory and I/O.
Differences between memory
mapped I/O and isolated I/O –
ISOLATED I/O |
MEMORY MAPPED I/O |
Memory and I/O have seperate address space |
Both have same address space |
All address can be used by the memory |
Due to addition of I/O addressable memory become less for
memory |
Separate instruction control read and write operation in
I/O and Memory |
Same instructions can control both I/O and Memory |
In this I/O address are called ports. |
Normal memory address are for both |
More efficient due to seperate buses |
Lesser efficient |
Larger in size due to more buses |
Smaller in size |
Single
Accumulator based CPU organization
The computers, present in the early days of computer
history, had accumulator based CPUs. In this type of CPU organization, the
accumulator register is used implicitly for processing all instructions of a
program and store the results into the accumulator. The instruction format that
is used by this CPU Organisation is One address field. Due to this the CPU is
known as One Address Machine.
The main points about Single Accumulator based CPU
Organisation are:
Accumulator is the default address thus after data
manipulation the results are stored into the accumulator.
One address instruction is used in this type of
organization.
The format of instruction is: Opcode + Address
Opcode indicates the type of operation to be
performed.
Mainly two types of operation are performed in
single accumulator based CPU organization:
Data
transfer operation –
In this type of operation, the data is transferred
from a source to a destination.
For ex: LOAD X, STORE Y
Here LOAD is memory read operation that is data is
transfer from memory to accumulator and STORE is memory write operation that is
data is transfer from accumulator to memory.
ALU
operation –
In this type of operation, arithmetic operations are
performed on the data.
For ex: MULT X
where X is the address of the operand. The MULT
instruction in this example performs the operation,
AC <-- AC * M[X]
AC is the Accumulator and M[X] is the memory word
located at location X.
This type of CPU organization is first used in PDP-8
processor and is used for process control and laboratory applications. It has
been totally replaced by the introduction of the new general register based
CPU.
Advantages
–
One of the operands is always held by the
accumulator register. This results in short instructions and less memory space.
Instruction cycle takes less time because it saves
time in instruction fetching from memory.
Disadvantages
–
When complex expressions are computed, program size
increases due to the usage of many short instructions to execute it. Thus
memory size increases.
As the number of instructions increases for a
program, the execution time increases.
Performance
of Computer
Computer performance is the amount of work
accomplished by a computer system. The word performance in computer performance
means “How well is the computer doing the work it is supposed to do?”. It
basically depends on response time, throughput and execution time of a computer
system.
Response time is the time from start to completion
of a task. This also includes:
Operating
system overhead.
Waiting for I/O and other processes
Accessing disk and memory
Time spent executing on the CPU or execution time.
Throughput is the total amount of work done in a
given time.
CPU execution time is the total time a CPU spends
computing on a given task. It also excludes time for I/O or running other
programs. This is also referred to as simply CPU time.
Performance is determined by execution time as
performance is inversely proportional to execution time.
Performance = (1 / Execution time)
And,(Performance of A / Performance of B)
= (Execution Time of B / Execution Time of A)
If given that Processor A is faster than processor
B, that means execution time of A is less than that of execution time of B.
Therefore, performance of A is greater than that of performance of B.
Example
–
Machine A runs a program in 100 seconds, Machine B
runs the same program in 125 seconds
(Performance of A /
Performance of B)
= (Execution Time of B / Execution Time of A)
= 125 / 100 = 1.25
That means machine A is 1.25 times faster than Machine
B.
And, the time to execute a given program can be
computed as:
Execution time
= CPU clock cycles x clock cycle time Since clock cycle time and clock
rate are reciprocals, so,Execution time
= CPU clock cycles / clock rate .
The number of CPU clock cycles can be determined by,
CPU
clock cycles
= (No. of instructions / Program ) x (Clock cycles /
Instruction)
= Instruction Count x CPI
Which gives,
Execution
time
= Instruction Count x CPI x clock cycle time
= Instruction Count x CPI / clock rate
The units for CPU Execution time are:
How
to Improve Performance?
To improve performance you can either:
Decrease the CPI (clock cycles per instruction) by
using new Hardware.
Decrease the clock time or Increase clock rate by
reducing propagation delays or by use pipelining.
Decrease the number of required cycles or improve
ISA or Compiler.
Difference between CALL and JUMP instructions
CALL instruction is used to call a
subroutine. Subroutines are often used to perform tasks that need to be
performed frequently. The JMP instruction
is used to cause the PLC to skip over rungs.
The differences Between CALL and JUMP instructions
are:
JUMP |
CALL |
Program control is transferred to
a memory location which is in the main program |
Program Control is transferred to
a memory location which is not a part of main program |
Immediate Addressing Mode |
Immediate Addressing Mode +
Register Indirect Addressing Mode |
Initialisation of SP(Stack
Pointer) is not mandatory |
Initialisation of SP(Stack
Pointer) is mandatory |
Value of Program Counter(PC) is
not transferred to stack |
Value of Program Counter(PC) is
transferred to stack |
After JUMP, there is no return
instruction |
After CALL, there is a return
instruction |
Value of SP does not changes |
Value of SP is decremented by 2 |
3 Machine cycles are required to
execute this instruction |
5 Machine cycles are required to
execute this instruction |
Memory
based Vs Register based addressing modes
Prerequisite – Addressing Modes
Addressing
modes are the operations field specifies the operations
which need to be performed. The operation must be executed on some data which
is already stored in computer registers or in the memory. The way of choosing
operands during program execution is dependent on addressing modes of the
instruction. “The addressing mode specifies a rule for interpreting or
modifying the address field of the instruction before the operand is actually
referenced. “Basically how we are interpreting the operand which is given in
the instruction is known as addressing mode.
Addressing mode very much depend on the type of CPU
organisation. There are three types of CPU organisation:
1. Single Accumulator organisation
2.General register organisation
3. Stack organisation
Addressing modes is used for one or both of the
purpose. These can also be said as the advantages of using addressing mode:
MEMORY
BASED ADDRESSING MODES vs REGISTER BASED ADDRESSING MODES
MEMORY BASED ADDRESSING MODES |
REGISTER BASED ADDRESSING MODES |
The operand is present in memory and its address is given
in the instruction itself. This addressing mode is taking proper advantage of
memory address, e.g., Direct addressing mode |
An operand will be given in one of the register and
register number will be provided in the instruction.With the register number
present in instruction, operand is fetched, e.g., Register mode |
The content of base register is added to the address part
of the instruction to obtain the effective address. A base register is
assumed to hold a base address and the address field of the instruction gives
displacement relative to the base address, e.g., Base Register Addressing
Mode |
If we are having a table of data and our program needs to
access all the values one by one we need something which decrements the
program counter/or any register which has base address. Though in this case
register is basically decreased, it is register based addressing mode, e.g.,
In Auto decrements mode |
The content of the index register is added to the address
part that is given in the instruction to obtain the effective address. Index
Mode is used to access an array whose elements are in successive memory
locations, e.g., Indexed Addressing Mode |
If we are having a table of data and our program needs to
access all the values one by one we need something which increment the
program counter/or any register which has base address, e.g., Auto increment
mode |
The content of program counter is added to the address
part of the instruction in order to obtain the effective address. The address
part of the instruction in this case is usually a signed number which can be
either positive or negative, e.g., Relative addressing mode |
Instructions generally used for initializing registers to
a constant value is register based addressing mode,and this technique is very
useful approach, e.g., Immedia |
Addressing
Modes
Addressing Modes– The term
addressing modes refers to the way in which the operand of an instruction is
specified. The addressing mode specifies a rule for interpreting or modifying
the address field of the instruction before the operand is actually executed.
Addressing modes for 8086
instructions are divided into two categories:
1) Addressing modes for data
2) Addressing modes for branch
The 8086 memory addressing modes
provide flexible access to memory, allowing you to easily access variables,
arrays, records, pointers, and other complex data types. The key to good assembly language programming
is the proper use of memory addressing modes.
An assembly language program
instruction consists of two parts
The memory address of an operand
consists of two components:
IMPORTANT
TERMS
·
Starting address of memory segment.
·
Effective
address or Offset: An offset is determined by adding any
combination of three address elements: displacement, base and index.
ü Displacement:
It is an 8 bit or 16 bit immediate value given in the instruction.
ü Base:
Contents of base register, BX or BP.
ü Index:
Content of index register SI or DI.
According to different ways of
specifying an operand by 8086 microprocessor, different addressing modes are
used by 8086.
Addressing modes used by 8086
microprocessor are discussed below:
·
Implied
mode:: In implied addressing the operand is specified in
the instruction itself. In this mode the data is 8 bits or 16 bits long and
data is the part of instruction.Zero address instruction are designed with
implied addressing mode.
Example: MOV AL, 35H (move the data 35H into AL
register)
·
Immediate
addressing mode (symbol #):In this mode data is present in address
field of instruction .Designed like one address instruction format.
Note:Limitation in the immediate
mode is that the range of constants are restricted by size of address field.
·
Register
mode: In register addressing the operand is placed in one
of 8 bit or 16 bit general purpose registers. The data is in the register that
is specified by the instruction.
Here one register reference is
required to access the data.
Example: MOV AX,CX (move the
contents of CX register to AX register)
·
Register
Indirect mode: In this addressing the operand’s offset
is placed in any one of the registers BX,BP,SI,DI as specified in the
instruction. The effective address of the data is in the base register or an
index register that is specified by the instruction.
Here two register reference is
required to access the data.
The 8086 CPUs let you access memory
indirectly through a register using the register indirect addressing modes.
MOV AX, [BX](move the contents of
memory location s
addressed by the register BX to the
register AX)
·
Auto
Indexed (increment mode): Effective address of the operand
is the contents of a register specified in the instruction. After accessing the
operand, the contents of this register are automatically incremented to point
to the next consecutive memory location.(R1)+.
Here one register reference,one
memory reference and one ALU operation is required to access the data.
Example:
Add R1, (R2)+ // OR
R1 = R1 +M[R2]
R2 = R2 + d
Useful for stepping through arrays
in a loop. R2 – start of array d – size of an element
·
Auto
indexed ( decrement mode): Effective address of the operand
is the contents of a register specified in the instruction. Before accessing
the operand, the contents of this register are automatically decremented to
point to the previous consecutive memory location. –(R1)
Here one register reference,one
memory reference and one ALU operation is required to access the data.
Example:
Add R1,-(R2) //OR
R2 = R2-d
R1 = R1 + M[R2]
Auto decrement mode is same as auto increment mode. Both can also be used to
implement a stack as push and pop . Auto increment and Auto decrement modes are
useful for implementing “Last-In-First-Out” data structures.
·
Direct
addressing/ Absolute addressing Mode (symbol [ ]):
The operand’s offset is given in the instruction as an 8 bit or 16 bit
displacement element. In this addressing mode the 16 bit effective address of
the data is the part of the instruction.
Here only one memory reference
operation is required to access the data.
Example:ADD AL,[0301] //add the contents of offset address 0301 to
AL
·
Indirect
addressing Mode (symbol @ or () ):In this mode address
field of instruction contains the address of effective address.Here two
references are required.
1st reference to get effective
address.
2nd reference to access the data.
Based on the availability of
Effective address, Indirect mode is of two kind:
1.
Register
Indirect:In this mode effective address is in the register,
and corresponding register name will be maintained in the address field of an
instruction.
Here one register reference,one
memory reference is required to access the data.
2.
Memory
Indirect:In this mode effective address is in the memory, and
corresponding memory address will be maintained in the address field of an
instruction.
Here two memory reference is
required to access the data.
·
Indexed
addressing mode: The operand’s offset is the sum of the
content of an index register SI or DI and an 8 bit or 16 bit displacement.
Example:MOV AX, [SI +05]
·
Based
Indexed Addressing: The operand’s offset is sum of the
content of a base register BX or BP and an index register SI or DI.
Example: ADD AX, [BX+SI]
Based
on Transfer of control, addressing modes are:
PC
relative addressing mode: PC relative addressing mode is
used to implement intra segment transfer of control, In this mode effective
address is obtained by adding displacement to PC.
EA= PC + Address field value
PC= PC + Relative value.
Base
register addressing mode:Base register addressing mode is
used to implement inter segment transfer of control.In this mode effective
address is obtained by adding base register value to address field value.
EA= Base register + Address field
value.
PC= Base register + Relative value.
Note:
1. PC relative nad based register
both addressing modes are suitable for program relocation at runtime.
2. Based register addressing mode
is best suitable to write position independent codes.
Interaction
of a Program with Hardware
When a Programmer wrote a program,
then how it is feeded to the computer and how it actually works?
So, this article is about the
process how the program code that we write in any text editor is feeded to
computer as we all know computer works on only two numbers that is 0 or 1.
We write code in text editor using
any language like C++, JAVA, Python etc.
This code is given to the compiler
and it actually converts it to assembly code that is very close to machine
hardware as it depend on instruction set which is then converted to binary that
is 0 and 1 which actually represent digital voltage feeded to transistors
inside the chip.
Now we have voltages which is
actually required to run the hardware.These voltages actually connect the
correct circuitry inside the chip and perform that specific task for example
addition, subtraction etc .All these operations are done by combination of
little transistors if we go into low level or flip-flops which are combination
of gates and gates are combination of transistors. So, it all started with the
invention of transistors.
The chip has lot of circuits inside
it to perform various task like arithmetic nd logical task.
The computer hardware also contain
RAM which is another chip which can store data temporary and Hard disk which
can permanently store data.
Operating system is also
responsible to feed the software to the right hardware like keyboard, mouse,
screen etc.
The following picture depict the
whole process:
Branch
Prediction in Pentium
Why
do we need branch prediction?
The gain produced by Pipelining can
be reduced by the presence of program transfer instructions eg JMP, CALL, RET
etc
They change the sequence causing
all the instructions that entered the pipeline after program transfer
instructions invalid
Thus no work is done as the
pipeline stages are reloaded.
Branch
prediction logic:
To avoid this problem, Pentium uses
a scheme called Dynamic Branch Prediction. In this scheme, a prediction is made
for the branch instruction currently in the pipeline. The prediction will
either be taken or not taken. If the prediction is true then the pipeline will
not be flushed and no clock cycles will be lost. If the prediction is false
then the pipeline is flushed and starts over with the current instruction.
It is implemented using 4 way set
associated cache with 256 entries. This is called Branch Target Buffer (BTB).
The directory entry for each line consists of:
Valid bit: Indicates whether the
entry is valid or not.
History bit: Track how often bit
has been taken.
Source memory address is from where
the branch instruction was fetched. If the directory entry is valid then the
target address of the branch is stored in corresponding data entry in BTB.
Working of Branch Prediction:
BTB is a lookaside cache that sits
to the side of Decode Instruction(DI) stage of 2 pipelines and monitors for
branch instructions.
The first time that a branch
instruction enters the pipeline, the BTB uses its source memory to perform a
lookup in the cache.
Since the instruction was never
seen before, it is BTB miss. It predicts that the branch will not be taken even
though it is unconditional jump instruction.
When the instruction reaches the
EU(execution unit), the branch will either be taken or not taken. If taken, the
next instruction to be executed will be fetched from the branch target address.
If not taken, there will be a sequential fetch of instructions.
When a branch is taken for the
first time, the execution unit provides feedback to the branch prediction. The
branch target address is sent back which is recorded in BTB.
A directory entry is made
containing the source memory address and history bit is set as strongly taken.
What is RISC?
A reduced instruction set
computer is a computer which only uses simple commands that can be divided into
several instructions which achieve low-level operation within a single CLK
cycle, as its name proposes “Reduced Instruction Set”.
RISC Architecture
The term RISC stands for
‘’Reduced Instruction Set Computer’’. It is a CPU design plan based on simple
orders and acts fast.
RISC
Architecture
This is small or reduced
set of instructions. Here, every instruction is expected to attain very small
jobs. In this machine, the instruction sets are modest and simple, which help
in comprising more complex commands. Each instruction is of the similar length;
these are wound together to get compound tasks done in a single operation. Most
commands are completed in one machine cycle. This pipelining is a crucial
technique used to speed up RISC machines.
What is CISC?
A complex instruction set
computer is a computer where single instructions can perform numerous low-level
operations like a load from memory, an arithmetic operation, and a memory store
or are accomplished by multi-step processes or addressing modes in single
instructions, as its name proposes “Complex Instruction Set ”.
CISC Architecture
The term CISC stands for
‘’Complex Instruction Set Computer’’. It is a CPU design plan based on single
commands, which are skilled in executing multi-step operations.
CISC
Architecture
CISC computers have small
programs. It has a huge number of compound instructions, which takes a long
time to perform. Here, a single set of instruction is protected in several
steps; each instruction set has additional than 300 separate instructions.
Maximum instructions are finished in two to ten machine cycles. In CISC,
instruction pipelining is not easily implemented.
Difference between RISC and CISC Architecture
RISC | CISC |
1. RISC stands for Reduced Instruction Set Computer. | 1. CISC stands for Complex Instruction Set Computer. |
2. RISC processors have simple instructions taking about one clock cycle. The average clock cycle per instruction (CPI) is 1.5 | 2. CSIC processor has complex instructions that take up multiple clocks for execution. The average clock cycle per instruction (CPI) is in the range of 2 and 15. |
3. Performance is optimized with more focus on software | 3. Performance is optimized with more focus on hardware. |
4. It has no memory unit and uses a separate hardware to implement instructions.. | 4. It has a memory unit to implement complex instructions. |
5. It has a hard-wired unit of programming. | 5. It has a microprogramming unit. |
6. The instruction set is reduced i.e. it has only a few instructions in the instruction set. Many of these instructions are very primitive. | 6. The instruction set has a variety of different instructions that can be used for complex operations. |
7. The instruction set has a variety of different instructions that can be used for complex operations. | 7. CISC has many different addressing modes and can thus be used to represent higher-level programming language statements more efficiently. |
8. Complex addressing modes are synthesized using the software. | 8. CISC already supports complex addressing modes |
9. Multiple register sets are present | 9. Only has a single register set |
10. RISC processors are highly pipelined | 10. They are normally not pipelined or less pipelined |
11. The complexity of RISC lies with the compiler that executes the program | 11. The complexity lies in the microprogram |
12. Execution time is very less | 12. Execution time is very high |
13. Code expansion can be a problem | 13. Code expansion is not a problem |
14. Decoding of instructions is simple. | 14. Decoding of instructions is complex |
15. It does not require external memory for calculations | 15. It requires external memory for calculations |
Source:
Tutorial point- https://www.tutorialspoint.com/
Java T point- https://www.javatpoint.com/
Geeksforgeeks- https://www.geeksforgeeks.org
Techopedia - https://www.techopedia.com/
guru99- https://www.guru99.com/
techterms - https://techterms.com/
webopedia - https://www.webopedia.com/
study - https://study.com/
wikipedia - https://en.wikipedia.org/
cprogramming - https://www.cprogramming.com/
w3schools - https://www.w3schools.com/
Electronic hub- https://www.electronicshub.org/
0 মন্তব্যসমূহ