## Maths

#### Permutation & Combination in Depth

#### Probability

#### Set Theory

#### Propositional Logic

### Definition :

All possible arrangements of a collection of things, where the order is important.
* ab and ba are different arrangement in permutation.
* All combination of a,b,c => ab , ba , cb , bc , ac , ca.

### Formulas :

1) No of permutation -> N diff thing taken R at time
NPR = N! / (N-R)!
2) N things taken all at a time - p,q and r alike
= N! / P!Q!R!
3) N things taken R at a time - each thing repeated any number of time.
= NR
4) N things taken R at a time - but all can not be given to a single person.
= NR - n
5) N things taken R at a time - when particular thing is to included in each arrangement.
= R * NPR-1
6) N things taken R at a time - when particular thing never included in each arrangement.
= N-1PR
7) N different things - all m things come together.
= (N-m+1)!m!

### Circular Permutation :

1) Clockwies and anticlockwise are different
Permutation = (N-1)!
2) Clockwies and anticlockwise are same
Permutation = (N-1)! / 2

### Definition :

A collection of things, in which the order does not matter.
* Selecting 2 items from ab => ab & ba are same selection.

### Formulas :

1) No of permutation -> N diff thing taken R at time
NCR = N! / R! * (N-R)!
2) When k objects occur in a selection
=N-kCR-k
3) When k objects are never selected
=N-kCR
4) N different things - selecting at least 1 of them
= NC1 + NC2 + NC3 + ....... + NCN = 2N - 1
5) q,p,r,t where p ,q & r alike , t different
Total combination for atleast one to be selected
= (p+1)*(q+1)*(r+1)*2t - 1
6) Dividing into two different group of m and name
= (m+n)! / m!n!
7) Dividing N different things into R different group suppose x,y,z are N different things to b divided into R groups. x+y+z=N
= N+R-1CR

### Definition :

The binomial coefficient is the number of ways of picking unordered outcomes from possibilities, also known as a combination or combinatorial number. The symbols and are used to denote a binomial coefficient, and are sometimes read as " choose ."

### Formulas :

1) ∑ nCr
= 2n
2) Coefficient of expansion (X+Y)n
= ∑nr=0   nCr Xr Yn-r
3) Coefficient of XaYb
NCa
4)Symmetry
nCr = nCn-r
5) Pascal's identity
nCr = n-1Cr + n-1Cr-1
6) Newton's identity
nCr * nCk = nCk * n-kCr-k

### Definition :

the extent to which an event is likely to occur, measured by the ratio of the favourable cases to the whole number of cases possible.

### Formulas :

1) Equally likely
P(A) = P(B)
2) Mutually exclusive
P(A) ∩ P(B) = ∅
P(A U B) = P(A) + P(B)
3) Collectively Exhaustive
P(A U B ) = P(E) = 1
4) Independent
1 event do not depend on the other
P(A) ∩ P(B) = P(A) * P(B)
5) Multually exclusive & Collectively Exhaustive
P(A) + P(B) = 1
6) Conditional probability
An event A happen given event E already happened
P(A)/P(E) = P(A ∩ E) / P(E) => If they are dependent
P(A/E) = P(A ∩ E)/P(E) = P(A) => If they are independent
7) Pigeon hole principle
the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item.
N=k *(r-1)+1 where k is the number of possible outcomes and r is the number of repetitions we want.

### Formulas :

1) Number of Reflexive relation
=2n2-n
2) Number of Irreflexive relation
=2n2-n
3) Number of Symmetric relation
=2n*(n+1)/2
4) Number of Antisymmetric relation
=2n * 3n*(n-1)/2
5) Number of Asymmetric reltion
=2n2-n

### Formulas :

1) Intermediate theorem : -
If f(a) = +ive & f(b) = -ive & if f(x) is CONTINUOUS => Roots lie between f(a) & f(b)
f(a) * f(b) < 0
2) Bisection Method : -
*A simple bisection procedure for iteratively converging on a solution which is known to lie inside some interval [a,b] proceeds by evaluating the function in question at the midpoint of the original interval x=(a+b)/2 and testing to see in which of the subintervals [a,(a+b)/2] or [(a+b)/2,b] the solution lies. The procedure is then repeated with the new interval as often as needed to locate the solution to the desired accuracy.
*Let an and bn be the endpoints at the nth iteration (with a1=a and b1=b) and let rn be the nth approximate solution. Then the number of iterations required to obtain an error smaller than epsilon is found by noting that
Order : 1
Convergence : Linear
bn - an = b-a / 2n-1
3) False position method : -
*An algorithm for finding roots which retains that prior estimate for which the function value has opposite sign from the function value at the current best estimate of the root. In this way, the method of false position keeps the root bracketed
Using the two-point form of the line
Order : 1
Convergence : Linear
4) Secant method :-
*A root-finding algorithm which assumes a function to be approximately linear in the region of interest. Each improvement is taken as the point where the approximating line crosses the axis. The secant method retains only the most recent estimate, so the root does not necessarily remain bracketed.
Order : 1.62
Convergence : Non Linear
5) Newton Rapson Method : -
*It is a method for finding successively better approximations to the roots (or zeroes) of a real-valued function.
*Given a function ƒ defined over the reals x, and its derivative ƒ', we begin with a first guess x0 for a root of the function f. Provided the function satisfies all the assumptions made in the derivation of the formula, a better approximation x1.The process is repeated as :
Order : 2
6) Trapezoidal rule :-
*The trapezoidal rule (also known as the trapezoid rule or trapezium rule) is a technique for approximating the definite integral
7)Simpson's rule :- In numerical analysis, Simpson's rule is a method for numerical integration, the numerical approximation of definite integrals. Specifically, it is the following approximation:

### Some important points & Formulas :

1) Conjunction
= p ∧ q
2) Disjunction
= p ∨ q
3) Negation
= ¬ p
4) Implies
p => q ≅ ¬ p ∨ q
5) if p,q
p => q
6) p if q
q => p
7) p only if q
p => q
8) p sufficient for q
p => q
9) p necessary for q
q => p
10) p if and only if q
p ⇔ q
11) Quantifiers
(a) Universal Quantifiers
for all x , P(x) is true
∀x , P(x)
(b) Existential Quantifiers
at least 1 value for which P(x) is true
∃ x , P(x)
12) Properties
(a) ∀x ¬ (x) ↔ ¬ ∃x P(x)
(b) ∃ x ¬ P(X) ↔ ¬ ∀x P(x)
(c) ∀ x P(x) ↔ ¬ ∃x ¬ (x)
(d) ∃ x P(x) ↔ ¬ ∀x ¬ P(x)
13) Distributive Property
(a) ∀x ( P(x) ∧ Q(x) ) ↔ ∀x P(x) ∧ ∀x Q(x)
(b) ∀x P(x) ∨ ∀x Q(x) → ∀x ( P(x) ∨ Q(x) )
(c) ∃x ( P(x) ∨ Q(x) ) ↔ ∃x P(x) ∨ ∃x Q(x)
(d) ∃x ( P(x) ∧ Q(x) ) → ∃x P(x) ∧ ∃x Q(x)

#### Linear Algebra

1) Types of Matrix
Row matrix - A row matrix is a matrix with only one row.
Column matrix - A column matrix is a matrix with only one column.
Zero matrix - A zero matrix or a null matrix is a matrix that has all its elements zero.
Square matrix - A square matrix is a matrix with an equal number of rows and columns.
Diagonal matrix - A diagonal matrix is a square matrix that has all its elements zero except for those in the diagonal from top left to bottom right; which is known as the leading diagonal of the matrix.
Unit matrix - A unit matrix is a diagonal matrix whose elements in the diagonal are all ones.
Symmetric matrix - A square matrix for which transpose of Matrix A is equal to matrix A.
Eigen Values of Symmetric matrix are REAL
AT = A

Skew Symmetric matrix - A square matrix for which matrix A is said to be skew-symmetric if transpose of matrix A is equal to negative of Matrix A.
Note that all the main diagonal elements in skew-symmetric matrix are zero.
Eigen Values of Skew Symmetric matrix are purely IMAGINARY or ZERO
AT = -A

Orthagonal matrix - A matrix is ortagonal if its transpose is equal to its inverse.
Eigen Values of Orthagonal matrix are 1 or -1
AT = A-1
A*AT = I

Idempotent matrix - An idempotent matrix is a matrix which, when multiplied by itself, yields itself
A * A = A
A2 = A

Involutory matrix - an involutory matrix is a matrix that is its own inverse. That is, multiplication by matrix A is an involution if and only if A2 = I. Involutory matrices are all square roots of the identity matrix.
A2 = I
2) Rank of a matrix
Number of linearly independent rows
If Determinant(A) ≠ 0 => Rank = n

## Graph Theory

#### Graphs and coloring

1) Graph terminologies
a) Default Graph :- Simple , undirected & unweighed graph is known as Default Graph.
b) Simple Graph :- No parallel edges no self loop.
N vertices & k components ⇒ Edges = (n-k)(n-k+1)/2
c) Regular Graph :- All vertices of same degree.
Degree = nk / 2
d) Self loop :- Edge on same vertex.
e) Parallel Edges :- 2 edge on same vertices.
f) Degree :- Number of edges incident on a vertex.
g) Order of Graph :- Number of vertices.
h) Size of Graph :- Number of edges.
i) Isolated vertex :- Degree of the vertex is zero.

j) Pendent Vertex :- Degree of the vertex is zero.

2) Complete Graph
A complete graph is a graph in which each pair of graph vertices is connected by an edge.
a) Number of edges
= nC2 = n(n-1)/2
b) Maximum degree of vertex
= n-1
Minimum degree of vertex
= 1
Sum of degree of complete graph
= n(n-1)

2) Walk & Path
Walk :- Alternative sequence of vertices and edges that no EDGE appear mode than once.
Closed walk - Walk terminating at original vertex.
Open walk - End and start vertices are different.
Path :- Alternative sequence of vertices and edges that no EDGE as well as VERTEX appear mode than once.
Path is a type of walk.
Closed Path - Path terminating at original vertex.
Open Path - End and start vertices are different.

3) Connected Compenent
There is a path from one vertex to every another vetrex.

3) Euler Graph
Closed walk running through every edge just once.Graph having euler line is euler graph.
Note - In euler graph All vertices are of even degree

4) Unicursal Graph
It is an open Euler line.
Note - In Unicursal graph All vertices except 2 are of even degree

5) Hamiltonian Path & Circuit
A Hamiltonian path is a path in an undirected or directed graph that visits each vertex exactly once.
A Hamiltonian cycle (or Hamiltonian circuit) is a Hamiltonian path that is a cycle. Determining whether such paths and cycles exist in graphs is the Hamiltonian path problem, which is NP-complete.

6) Cut Set
Minimal set of edges whose removal from G leaves G disconnected,provided removal of no proper subset of these edges disconnected G.
Every cutset in a connected graph G must contain atleast 1 branch of any spanning tree.
n vertices tree = (n-1) cutsets

7) Connectivity
Edge connectivity :- Number of edges in the smallest cutset.
Vertex connectivity :- Minimum number of vertices whose removal disconnect remaining graph.

8) Isomorphism
One to one relationship between 2 graphs.
Adjacency preserved & equal number of vertices and edges.

9) Planar Graph
Graph is said to be planar if there existes some geometric representation of G which can be drawn on plane without intersection.
Adjacency preserved & equal number of vertices and edges.
Planar graph should not contain Kuratowski's non planar graph as subset of the graph
Kuratowski's 2 non planar graph
10) Regions
Set of edges forming a boundary.
In any graph with f regions , n vertices and e edges :-
f = e - n + 2
a) e ≤ 3n - 6         b)e ≥ 3f/2

11) Bipartite Graph
Bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint sets and (that is, and are each independent sets) such that every edge connects a vertex in to one in .
No odd length cycle.

12) Colouring
Painting all vertices with no 2 adjacent same colour.
K-chromaticGraph requiring k different colour.
Graph with 1 vertex
⇒ k=1
Graph with atleast 1 edge
⇒ k ≥ 2
Circuit of even vertices
⇒ k = 2
Circuit of odd vertices
⇒ k = 3
Complete graph of n vertices
⇒ k=n

13) Chromatic Partitioning
Independent set of vertices such as no 2 vertices in set are adjacent.
Number of vertices in largest independent set B(G) is called Independent number.
B(G) ≥ n/k {where n is number of verticecs & k is chromatic number}

14) Vertex Cover
Subset of vertices that include atleast 1 vertex incident on every edge of a graph.
Minimum Vertex Cover = 1
Maximum Vertex Cover = In complete graph

14) Edge Cover
Subset of edges so that all vertices are covered.
Minimum Edge Cover = n/2
Maximum Edge Cover = n-1

14) Important points
*Self complemetary
a) G is self complementary ⇒ n(G)=4x or 4x + 1 where x is a constant.
b) G is self complementary ⇒ e = n(n-1)/4
*Undirected graph
a) n-k  ≤  e  ≤  (n-k+1)(n-k)/2
b) K(G) ≤ λ(G)  ≤  δ  ≤  ⌈2e/n⌉  ≤  Δ
where K(G) = Vertex cover
λ(G) = Edge cover
δ = Minimum degree
Δ = Maximum degree
* Perect Matching

## Automata

#### Overview

#### Regular Languages(DFA's and NFA's)

#### CFGs

#### Decidability & TMs

#### Regular Language

1)Deterministic Finite Automata DFA
* A finite state machine is called a deterministic finite automaton (DFA), if
Each of its transitions is uniquely determined by its source state and input symbol, and
reading an input symbol is required for each state transition.
2)Non eterministic Finite Automata DFA
* A nondeterministic finite automaton (NFA), or non deterministic finite state machine, needn't obey the restrictions given in DFA. In particular, every DFA is also a NFA.
An NFA is represented formally by a 5-tuple, (Q, ∑, Δ, q0, F), consisting of
a finite set of states Q
a finite set of input symbols ∑
a transition function Δ : Q * ∑ → P(Q).
an initial (or start) state q0 ∈ Q
a set of states F distinguished as accepting (or final) states F ⊆ Q.
3) Regular language
* Regular languages can also be defined, from the empty set and from some finite number of singleton sets, by the operations of union, composition, and Kleene closure. Specifically, consider any alphabet ∑.

Closure Property :-
Closed under
* Union , Intersection , Concatenation , Defference , Reversal.
Regular languages are closed under everything.

Decisive Property :-
Decidable
* Regular language is empty.
* Membership property.
* Regular language generate infinte number of strings.
* Two Regular languages are equivalent.
* A Regular languages is a subset of another.

#### Context free Language

1)CFL
* A context-free grammar (CFG) is a set of recursive rewriting rules (or productions) used to generate patterns of strings. A CFG consists of the following components: a set of terminal symbols, which are the characters of the alphabet that appear in the strings generated by the grammar.

Closure Property :-
Closed under
* Everything other than Complement , Intersection and Difference.

Decisive Property :-
Decidable {L1 and L2 are two CFL & R is a regular language }
* Emptiness : Is L1 = ∅
* Membership : A sring belongs to a CFL
* Finiteness : A CFL is a finite or not.
* L1 ∪ L2 = CFL
* L1.L2 = CFL
* R ∪ L1 = CFL & R ∩ L1 = CFL
Undecidable
* Equivalence : Is L1 and L2 are equal.
* Intersection : L1 ∩ L2 = CFL.
* Intersection Emptiness : L(G1) ∩ L(G2) = ∅
* Containment : A CFL is subset of another CFL.
* Universality : L1 is a subset of ∑*.
* ∑* - L1 = CFL
* A CFL is subset of another CFL.
* A CFL is ambiguous.

2) Deterministic CFL
* Deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meaning that they admit an unambiguous grammar.

Closure Property :-
Closed under
* Complement , Inverse , Intersection with regular , Union with regular.
Not Closed under
* Union , Intersection , Reversal , Concatenation , Difference.

#### Recursive Language

Recursive Language
* Formally known as Decidable or Tuning-Decidable
A formal language is recursive if for a sring w :
Turing machine halts in a final state if w belongs to the language.
Turing machine halts in a non final state if w doest not belong to the language.

Closure Property :-
* Closed under everything.

Decisive Property :-
Decidable {R is a recursive language}
* Membership : A string belongs to a R.

#### Recursively Enumerable Language

RE
A language is recursively enumerable if some Turing machine accepts it
then halts on a final state.
or halts in a non-final state or loops forever.

Closure Property :-

* Not closed under Complement and Difference.

* L ≠ RE and L` is RE ⇒ L is recursive.
* L is RE but L` is not RE ⇒ L is not Recursive.
* Neither L nor L` is RE ⇒ L is not RE.
* L is not RE ⇒ L` is Recursive.

#### Turing Machine

Turing Machine TM
A language is recursively enumerable if some Turing machine accepts it
then halts on a final state.
or halts in a non-final state or loops forever.

Decisive Property :-
Decidable
* Whether TM has 5 states.
* Whether TM make a left move or right move.
* Whether TM scans a given symbol more than once on a given input in a given number of moves.

Undecidable
* Halting problem.
* Whether TM accpets emoty language.
* Whether TM accepts finite or infinite language.
* Whether TM accepts regular language or CFL.
* Whether TM scans a given symbol more than once on a given input where number of moves are not given.

## Operating System

#### Process Management and Synchronisation

#### Critical Section Problem

#### Memory Management

### Definition :

Process is an instance of a computer program that is being executed.

1) Process state diagram.
2) Scheduler
a) Short term scheduler
*Determine which process is to move from Ready to Running state.
* It is invoked frequently and must br fast.
* Dispatcher - It executes the decision of short term scheduler.
b) Medium term scheduler
* Determine which process to swap in and which process to swap out based on priority.
c) Long term scheduler
* Decide which process to load into Memory.
* Determine the degree of Multiprogramming.

### Definition :

a deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does.

a) Mutual exclusion
The resources involved must be unshareable; otherwise, the processes would not be prevented from using the resource when necessary.
b) Hold and wait
The processes must hold the resources they have already been allocated while waiting for other (requested) resources. If the process had to release its resources when a new resource or resources were requested, deadlock could not occur because the process would not prevent others from using resources that it controlled.
c) No preemption
The processes must not have resources taken away while that resource is being used. Otherwise, deadlock could not occur since the operating system could simply take enough resources from running processes to enable any process to finish.
d) Circular wait
A circular chain of processes, with each process holding resources which are currently being requested by the next process in the chain, cannot exist. If it does, the cycle theorem (which states that "a cycle in the resource graph is necessary for deadlock to occur") indicated that deadlock could occur.

a) Mutual exclusion do not exist.
b) No hold and wait.
c) Preempting resources.
d) No circular wait.

Deadlock can be avoided if certain information about processes are available to the operating system before allocation of resources, such as which resources a process will consume in its lifetime. For every resource request, the system sees whether granting the request will mean that the system will enter an unsafe state, meaning a state that could result in deadlock. The system then only grants requests that will lead to safe states.[1] In order for the system to be able to determine whether the next state will be safe or unsafe, it must know in advance at any time:
* resources currently available.
* resources currently allocated to each process.
* resources that will be required and released by these processes in the future.
One known algorithm that is used for deadlock avoidance is the Banker's algorithm, which requires resource usage limit to be known in advance. However, for many systems it is impossible to know in advance what every process will request. This means that deadlock avoidance is often impossible.

### Definition :

Process Synchronization means sharing system resources by processes in a such a way that, Concurrent access to shared data is handled thereby minimizing the chance of inconsistent data. Maintaining data consistency demands mechanisms to ensure synchronized execution of cooperating processes.

1) Critical Section Problem
A Critical Section is a code segment that accesses shared variables and has to be executed as an atomic action. It means that in a group of cooperating processes, at a given point of time, only one process must be executing its critical section. If any other process also wants to execute its critical section, it must wait until the first one finishes.
2) Solution to Critical Section
A solution to the critical section problem must satisfy the following three conditions :
a) Mutual exclusion
Out of a group of cooperating processes, only one process can be in its critical section at a given point of time.
b) Progress
If no process is in its critical section, and if one or more threads want to execute their critical section then any one of these threads must be allowed to get into its critical section.
c) Bounded Waiting
After a process makes a request for getting into its critical section, there is a limit for how many other processes can get into their critical section, before this process's request is granted. So after the limit is reached, system must grant the process permission to get into its critical section.
3) Semaphores
Semaphore is basically a synchronizing tool and is accessed only through two low standard atomic operations, wait and signal designated by P() and V() respectively.
The classical definition of wait and signal are :
Wait : decrement the value of its argument S as soon as it would become non-negative.
Signal : increment the value of its argument, S as an individual operation.

Properties of Semaphores
a. Simple
b. Works with many processes
c. Can have many different critical sections with different semaphores
d. Each critical section has unique access semaphores
e. Can permit multiple processes into the critical section at once, if desirable.

Types of Semaphores
Semaphores are mainly of two types:
a. Binary Semaphore
It is a special form of semaphore used for implementing mutual exclusion, hence it is often called Mutex. A binary semaphore is initialized to 1 and only takes the value 0 and 1 during execution of a program.
b. Counting Semaphores
These are used to implement bounded concurrency.

Limitations of semaphore
a. Priority Inversion is a big limitation os semaphores.
b. Their use is not enforced, but is by convention only.
b. With improper use, a process may block indefinitely. Such a situation is called Deadlock. We will be studying deadlocks in details in coming lessons.

### Definition :

Main Memory refers to a physical memory that is the internal memory to the computer. The word main is used to distinguish it from external mass storage devices such as disk drives. Main memory is also known as RAM. The computer is able to change only data that is in main memory. Therefore, every program we execute and every file we access must be copied from a storage device into main memory.
A certain part or routine of the program is loaded into the main memory only when it is called by the program, this mechanism is called Dynamic Loading, this enhance the performance.
At times one program is dependent on some other program. In such a case, rather than loading all the dependent programs, CPU links the dependent programs to the main executing program when its required. This mechanism is known as Dynamic Linking.

1. Swapping
A process needs to be in memory for execution. But sometimes there is not enough main memory to hold all the currently active processes in a timesharing system. So, excess process are kept on disk and brought in to run dynamically. Swapping is the process of bringing in each process in main memory, running it for a while and then putting it back to the disk.

2. Contiguous Memory Allocation
In contiguous memory allocation each process is contained in a single contiguous block of memory. Memory is divided into several fixed size partitions. Each partition contains exactly one process. When a partition is free, a process is selected from the input queue and loaded into it. The free blocks of memory are known as holes. The set of holes is searched to determine which hole is best to allocate.

3. Memory Protection
Memory protection is a phenomenon by which we control memory access rights on a computer. The main aim of it is to prevent a process from accessing memory that has not been allocated to it.

4. Memory Allocation
Memory allocation is a process by which computer programs are assigned memory or space. It is of three types :
a) First Fit -
The first hole that is big enough is allocated to program.
b) Best Fit -
The smallest hole that is big enough is allocated to program.
c) Worst Fit -
The largest hole that is big enough is allocated to program.

5. Fragmentation
Fragmentation occurs in a dynamic memory allocation system when most of the free blocks are too small to satisfy any request. It is generally termed as inability to use the available memory.
a) External Fragmentation.
In such situation processes are loaded and removed from the memory. As a result of this, free holes exists to satisfy a request but is non contiguous i.e. the memory is fragmented into large no. Of small holes. This phenomenon is known as External Fragmentation.
b) Internal fragmentation
Also, at times the physical memory is broken into fixed size blocks and memory is allocated in unit of block sizes. The memory allocated to a space may be slightly larger than the requested memory. The difference between allocated and required memory is known as Internal fragmentation i.e. the memory that is internal to a partition but is of no use.

6. Paging
A solution to fragmentation problem is Paging. Paging is a memory management mechanism that allows the physical address space of a process to be non-contagious. Here physical memory is divided into blocks of equal size called Pages. The pages belonging to a certain process are loaded into available memory frames.
Paging Performance

* Estimated memory access time : { memory access time = m}
EMAT = 2*m
* If TLB is used {TBL acess time = c , TLB hit ratio = x}
Estimated memory access time :
EMAT = x * [ c + m ] + ( 1 - x )[ c + 2m ]
* In Multi Level Paging
Estimated memory access time :
EMAT [2 level page table] = x * [ c + m ] + ( 1 - x )[ c + 3m ]
EMAT [N level page table] = x * [ c + m ] + ( 1 - x )[ c + ( n + 1 )m ]
* If TLB and CACHE is used
Estimated memory access time : {memory acess time using cache = mc}
EMAT = x * [ c + mc ] + ( 1 - x )[ c + 2m + mc ]

6. Segmentation
Segmentation is another memory management scheme that supports the user-view of memory. Segmentation allows breaking of the virtual address space of a single process into segments that may be placed in non-contiguous areas of physical memory.

7. Segmentation with paging
Both paging and segmentation have their advantages and disadvantages, it is better to combine these two schemes to improve on each. The combined scheme is known as 'Page the Elements'. Each segment in this scheme is divided into pages and each segment is maintained in a page table. So the logical address is divided into following 3 parts :
Segment numbers(S)
Page number (P)
The displacement or offset number (D)

### Definition :

Virtual Memory is a space where large programs can store themselves in form of pages while their execution and only the required pages or portions of processes are loaded into the main memory. This technique is useful as large virtual memory is provided for user programs when a very small physical memory is there.
* Large programs can be written, as virtual space available is huge compared to physical memory.
* Less I/O required, leads to faster and easy swapping of processes.
* More physical memory available, as programs are stored on virtual memory, so they occupy very less space on actual physical memory.

1. Demand Paging
The basic idea behind demand paging is that when a process is swapped in, its pages are not swapped in all at once. Rather they are swapped in only when the process needs them(On demand). This is termed as lazy swapper, although a pager is a more accurate term.
Initially only those pages are loaded which will be required the process immediately.
The pages that are not moved into the memory, are marked as invalid in the page table. For an invalid entry the rest of the table is empty. In case of pages that are loaded in the memory, they are marked as valid along with the information about where to find the swapped out page.

2. Page Fault
When the process requires any of the page that is not loaded into the memory, a page fault trap is triggered and following steps are followed.
a) The memory address which is requested by the process is first checked, to verify the request made by the process.
b) If its found to be invalid, the process is terminated.
c) In case the request by the process is valid, a free frame is located, possibly from a free-frame list, where the required page will be moved.
d) A new operation is scheduled to move the necessary page from disk to the specified memory location. ( This will usually block the process on an I/O wait, allowing some other process to use the CPU in the meantime. )
e) When the I/O operation is complete, the process's page table is updated with the new frame number, and the invalid bit is changed to valid.
f) The instruction that caused the page fault must now be restarted from the beginning.
There are cases when no pages are loaded into the memory initially, pages are only loaded when demanded by the process by generating page faults. This is called Pure Demand Paging.

*Page falut service time :
EMAT = p * PFST + ( 1 - p) * m
where p = miss ratio/Page faltu rate
PFST = page fault service time
m = memory access time
* Page fault service time when TLB is given :
EMAT = x *[ t + m ] + ( 1 - x )*[ t + p * PFST + ( 1 - p) * m ]
where x = TLB hit ration
t = TLB hit time
m = memory access time

3. Page Replacement
As studied in Demand Paging, only certain pages of a process are loaded initially into the memory. This allows us to get more number of processes into the memory at the same time. but what happens when a process requests for more pages and no free memory is available to bring them in.

4. Page Replacement Algorithms
Page replacement algorithms decide which memory pages to page out (swap out, write to disk) when a page of memory needs to be allocated.
a) FIFO
As new pages are requested and are swapped in, they are added to tail of a queue and the page which is at the head becomes the victim.
Its not an effective way of page replacement but can be used for small systems.
b) LRU Page Replacement
LRU works on the idea that pages that have been most heavily used in the past few instructions are most likely to be used heavily in the next few instructions too.
While LRU can provide near-optimal performance in theory , it is rather expensive to implement in practice.

5. Thrashing
A process that is spending more time paging than executing is said to be thrashing. In other words it means, that the process doesn't have enough frames to hold all the pages for its execution, so it is swapping pages in and out very frequently to keep executing. Sometimes, the pages which will be required in the near future have to be swapped out.
Initially when the CPU utilization is low, the process scheduling mechanism, to increase the level of multiprogramming loads multiple processes into the memory at the same time, allocating a limited amount of frames to each process. As the memory fills up, process starts to spend a lot of time for the required pages to be swapped in, again leading to low CPU utilization because most of the proccesses are waiting for pages. Hence the scheduler loads more processes to increase CPU utilization, as this continues at a point of time the complete system comes to a stop.

## Compilers

#### Overview

#### Lexer

#### Parser

#### Rest of the important topics

### Definition :

A compiler is a computer program (or set of programs) that transforms source code written in a programming language into another computer language.

1) Phases of Compiler
a) Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens as:
< token-name, attribute-value >
b) Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.
c) Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
d) Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is in between the high-level language and the machine language. This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code.
e) Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).
f) Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language. The code generator translates the intermediate code into a sequence of (generally) re-locatable machine code. Sequence of instructions of machine code performs the task as the intermediate code would do.

### Definition :

The lexical analysis is the first phase of compiler. Its main task is to read the input characteristic and produce as output a sequence of tokens that the parser uses for syntax analysis.

Functions of Lexical Analyser
* Dividing program into token.
* Eliminating comment line.
* Eliminating white space character. Eg : Tab , Backslash , Newline
* It will help in giving error messages by providing row number , column number.

Tokens
A token is a set of characters.The following constructs are treated as tokens : keywords, operators, identifiers, constants, literal strings and puctuation symbols such as parenthesis ,commas and semicolons.

Pattern
A pattern is a rule describing the set of lexemes that can represent a particular token in source program.

Lexemes
A lexeme is a sequence of characters in the sequence of characters in the source program that is matched by the pattern for a token.
Tokenization
x = y + 12 ;
This string is passed to lexical analyser.

x - identifier - < id,1 >
= - operator
y - identifier - < id,2 >
+ - operator
12 - int constant
- delimiter

Longest Match Rule
When the lexical analyzer read the source-code, it scans the code letter by letter; and when it encounters a whitespace, operator symbol, or special symbols, it decides that a word is completed.

### Definition :

Syntax analysis or parsing is the second phase of a compiler.A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree.

Derivation
A derivation is basically a sequence of production rules, in order to get the input string.
To decide which non-terminal to be replaced with production rule, we can have two options.
Left-most Derivation
If the sentential form of an input is scanned and replaced from left to right, it is called left-most derivation.
Right-most Derivation
If we scan and replace the input with production rules, from right to left, it is known as right-most derivation.

Syntax tree
The tokenized string from Lexical analyser is passed to syntax analyser to form a syntax tree.

Limitation of Syntax Analyzers
* it cannot determine if a token is valid.
* it cannot determine if a token is declared before it is being used.
* it cannot determine if a token is initialized before it is being used.
* it cannot determine if an operation performed on a token type is valid or not.
These tasks are accomplished by the semantic analyzer,

### Definition :

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar.
Types of Parser
There are 2 types of parser
a) Bottom Up Parser
b) Top Down Parser

Top Down Parser :
Top-down parser parses the input, and starts constructing a parse tree from the root node gradually moving down to the leaf nodes.

Bottom Up Parser :
Top-down parser parses the input, and starts constructing a parse tree from the leaf nodes gradually moving up to the root node.
Recursive descent parsing :
It is a common form of top-down parsing. It is called recursive as it uses recursive procedures to process the input. Recursive descent parsing suffers from backtracking.
Backtracking :
It means, if one derivation of a production fails, the syntax analyzer restarts the process using different rules of same production. This technique may process the input string more than once to determine the right production

Operator Precedence Parser :
An operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar.
There should be no NULL and no 2 adjacent variables on RHS.

Relations between parsing algorithms :
LR(0) ⊂ SLR ⊂ LALR ⊂ LR(1)
LL(1) ⊂ LR(1)

### Definition :

Semantics of a language provide meaning to its constructs, like tokens and syntax structure. Semantics help interpret symbols, their types, and their relations with each other.

Semantic Errors
* Type mismatch
* Undeclared variable
* Reserved identifier misuse.
* Multiple declaration of variable in a scope.
* Accessing an out of scope variable.
* Actual and formal parameter mismatch.

### Definition :

By runtime, we mean a program in execution. Runtime environment is a state of the target machine, which may include software libraries, environment variables, etc., to provide services to the processes running in the system.

Activation Record
Activation Record are also called frames.It is Information(memory) needed by a single execution of a procedure.
A general activation record is :
We assume that the program control flows in a sequential manner and when a procedure is called, its control is transferred to the called procedure. When a called procedure is executed, it returns the control back to the caller. This type of control flow makes it easier to represent a series of activations in the form of a tree, known as the activation tree.

Storage Allocation
It id divided into following :
a) Static Storage Allocation
* A memory created at compiler time is created only ONCE in the STATIC area.
* Recursion is not supported - Even if we try but we do not know the number of times the function is evaluated during run time.
* Binding done at COMPILE time and cannot change at RUN time.
b) Stack Storage Allocation
* When a function is called the activation record is created and pushed onto the stack.
* When the function is completed the records are popped.
* Therefore Recursion is supported.
c) Heap Storage Allocation
* It is a DYNAMIC data structure allocation technique.
* Recursion is also supported.
* Allocation and deallocation can be done using different keywords in different languages such as new delete etc.

Parameter Passing
Some of the common parameter-passing variants supported by various programming languages are:
a)Pass by value :
This is the only mechanism supported by C and Decaf. Values of parameters are copied into called routine. Given the routine has its own copies, changes made to those values are not seen back in the calling function.
b) Pass by reference :
No copying is done, but a reference (usually implemented as a pointer) is given to the value. In addition to allowing the called routine to change the values, it is also efficient means for passing large variables.
c) Pass by name :
This rather unusual mechanism acts somewhat like C preprocessor macros and was introduced in Algol. Rather than evaluating the parameter value, the name or expression is actually substituted into the calling sequence and each access to the parameter in the calling body re-evaluates it. This is not particularly efficient.
d) Copy-restore :
A hybrid between call-by-value and call-by-reference.The actual parameters are evaluated and its r-values are passed to the called procedure as in call-by-value.When the control returns, the r-value of the formal parameters are copied back into the l-value of the actuals.

### Representation of Intermediate code :

THREE-ADDRESS CODES are a form of IR similar to assembler for an imaginary machine. Each three address code instruction has the form
x := y op z
Complicated arithmetic expressions are represented as a sequence of 3-address statements, using temporaries for intermediate values.
For instance the expression x + y + z becomes
t1 := y + z
t2 := x + t1
* Three address code is a linearized representation of a syntax tree, where the names of the temporaries correspond to the nodes.
* The use of names for intermediate values allows three-address code to be easily rearranged which is convenient for optimization.
* The reason for the term three-address code is that each statement usually contain three addresses, two for the operands and one for the result.

### Definition :

Optimization is a program transformation technique, which tries to improve the code by making it consume less resources and deliver high speed.

## Digital Logic

### Definition :

Boolean algebra is the branch of algebra in which the values of the variables are the truth values true and false, usually denoted 1 and 0 respectively. Instead of elementary algebra where the values of the variables are numbers, and the main operations are addition and multiplication, the main operations of Boolean algebra are the conjunction and, denoted ∧, the disjunction or, denoted ∨, and the negation not, denoted ¬. It is thus a formalism for describing logical relations in the same way that ordinary algebra describes numeric relations.

### Laws of Boolean Algebra :

1) Commutative Law
(a) A + B = B + A
(b) A B = B A
2) Associative Law
(a) (A + B) + C = A + (B + C)
(b) (A B) C = A (B C)
3) Distributive Law
(a) A (B + C) = A B + A C
(b) A + (B C) = (A + B) (A + C)
4) Identity Law
(a) A + A = A
(b) A A = A
5)
(a) AB + AB` = A
(b) (A + B)(A + B`) = A
6) Redundance Law
(a) A + A B = A
(b) A (A + B) = A
7)
(a) 0 + A = A
(b) 0 A = 0
8)
(a) 1 + A = 1
(b) 1 A = A
9)
(a) A` + A = 1
(b) A`A = 0
10)
(a) A + A`B = A + B
(b) A(A` + B) = AB
11) De Morgan's Law
(a) ( A + B )` = A` B`
(b) (A B)` = A` + B`

1) OR
2) AND
3) NOT
4) NOR
5) NAND
6) XOR
7)EX-OR
8) SUMMARY

### Definition :

In digital electronics how information is represented is key and there are different radices, i.e. number bases, that a numbering system can use.

### Bases :

Number of Base 2 - BINARY
Binary has only two values 0 and 1. If larger values than 1 are needed, extra columns are added to the left. Each column value is now twice the value of the column to its right.
(4)10 = (100)2
Number of Base 10 - DECIMAL
Decimal has ten values 0 to 9. If larger values than 9 are needed, extra columns are added to the left. Each column value is ten times the value of the column to its right.
(1011)2 = (11)10
Number of Base 8 - OCTAL
Octal has eight values 0 to 7. If larger values than 7 are needed, extra columns are added to the left. Each column value is now 8 times the value of the column to its right.
(100)2 = (64)8
Number of Base 16 - HEXADECIMAL
Hexadecimal has sixteen values 0 to 15, but to keep all these values in a single column, the 16 values (0 to 15) are written as 0 to F, using the letters A to F to represent numbers 10 to 15.
(100)2 = (256)16

### Complement :

In digital system complement is used to find subtraction of number base system.If R be the base of a number system then that number system can have two complements respectively R’s and (R-1)’s complement.
R's Complement
Let 'N' is a number and R is its base where R>1 and in N, 'n' is the number of digits before its decimal point then we can write
R's Complement = (Rn)10-N
R-1's Complement
Let 'N' is a number and R is its base where R>1 and in N, 'n' is the number of digits before its decimal point then we can write
R-1's Complement = {(Rn)10 - 1 } - N

### Definition :

A combinational circuit consists of an interconnection of logic gates. Combinational logic gates react to the values of the signals at their inputs and produce the value of the output signal, transforming binary information from the given input data to a required output data.
A combinational circuit that performs the addition of two bits is called a half adder .This circuit has two outputs carry and sum.
S = x`y+ xy`
C = xy
A combinational circuit that performs the addition of three bits (two significant bits and a previous carry) is a full adder .
S = x`y`z + x`yz` + xy`z` + xyz
C = xy + xz + yz
Decoder
A decoder is a combinational circuit that converts binary information from n input lines to a maximum of 2n unique output lines.
Encoder
An encoder is a digital circuit that performs the inverse operation of a decoder. An encoder has 2n (or fewer) input lines and n output lines.
Priority Encoder
A priority encoder is an encoder circuit that includes the priority function. The operation of the priority encoder is such that if two or more inputs are equal to 1 at the same time, the input having the highest priority will take precedence.
Multiplexer
A multiplexer is a combinational circuit that selects binary information from one of many input lines and directs it to a single output line. The selection of a particular input line is controlled by a set of selection lines. Normally, there are 2n input lines and n selection lines whose bit combinations determine which input is selected.
Demultiplexer
A demultiplexer performs the reverse operation of a multiplexer i.e. it receives one input and distributes it over several outputs. It has only one input, n outputs, m select input. At a time only one output line is selected by the select lines and the input is transmitted to the selected output line.

### Definition :

Sequential Circuit consists of a combinational circuit to which storage elements are connected to form a feedback path. The storage elements are devices capable of storing binary information.

### Latches :

Storage elements that operate with signal levels (rather than signal transitions) are referred to as latches.

SR - Latch
The SR latch is a circuit with two cross-coupled NOR gates or two cross-coupled NAND gates, and two inputs labeled S for set and R for reset.

D - Latch
One way to eliminate the undesirable condition of the indeterminate state in the SR latch is to ensure that inputs S and R are never equal to 1 at the same time. This is done in the D latch.This latch has only two inputs: D (data) and En(enable).
The D latch receives that designation from its ability to hold data in its internal storage. It is suited for use as a temporary storage for binary information between a unit and its environment.

### Flip Flops:

Flip flop is a sequential circuit which generally samples its inputs and changes its outputs only at particular instants of time and not continuously. Flip flop is said to be edge sensitive or edge triggered rather than being level triggered like latches.

D - Flip Flop
This is a D flip-flop with two D latches and an inverter.The first latch is called the master and the second the slave.
The value that is produced at the output of the flip-flop is the value that was stored in the master stage immediately before the negative edge occurred .

T - Flip Flop
When the clock triggers, the value remembered by the flip-flop either toggles or remains the same depending on whether the T input (Toggle) is 1 or 0.

JK - Flip Flop
When the clock triggers, the value remembered by the flip-flop toggles if the J and K inputs are both 1, remains the same if they are both 0; if they are different, then the value becomes 1 if the J (Jump) input is 1 and 0 if the K (Kill) input is 1.

### Characteristic Equation & Characteristic Truth Table of Flip Flops:

D - Flip Flop
Characteristic Equation Of D Flip Flop is :
Q(t + 1) = D

T - Flip Flop
Characteristic Equation Of T Flip Flop is :
Q(t + 1) = T ⊕ Q = TQ' + T'Q

JK - Flip Flop
Characteristic Equation Of JK Flip Flop is :
Q(t + 1) = JQ`+ K`Q

### Definition :

A counter is a digital sequential logic device that will go through a certain predefined states (for example counting up or down) based on the application of the input pulses. They are utilized in almost all computers and digital electronics systems

Types of Counter
* Asynchronous (ripple) counter - Changing state bits are used as clocks to subsequent state flip-flops
* Synchronous counter - All state bits change under control of a single clock
* Decade counter - Counts through ten states per stage
* Up/down counter - Counts both up and down, under command of a control input
* Ring counter - Formed by a shift register with feedback connection in a ring
* Johnson counter - A twisted ring counter

## Computer Networks

#### Overview

#### Delays and flow control protocols

#### Access Control

### Definition :

In a computer network, each layer may perform one or more of the following generic set of tasks:
Error control - which makes the logical channel between the layers in two peer network elements more reliable.
Flow control - which avoids overwhelming a slower peer with PDUs.
Segmentation and Reassembly - which at the transmitting side divides large data chunks into smaller pieces; and at the receiving side reassembles the smaller pieces into the original large chunk.
Multiplexing - which allows several higher-level sessions to share a single lower-level connection.
Connection setup - which provides the handshaking with a peer.

### Physical Layer :

Responsible for movement of individual bits from one terminal to the next.
Applications & Responsibilities :
* Synchronization of bits and transmission mode {simplex,half or full duplex}.
* Start/End flag
* Manchester Encoding
* Define pin outs in a connector used to attach a network layer.
* Maintaining connection semantics b/w 2 directly connected nodes.
* Mechanical & electrical specification of interface.
* Transmission rate.
* Synchronization & Line configuration.
* Topology.

Monitoring frames from one hop to another.
Frames - Collection of bits packaged according to size of medium.
Applications & Responsibilities :
* Medium access control {shared medium without collision}.
* Error detection & correction.
* Flow control {synchronization otherwise buffer overflow}.
* Reliable transport of data over physical point to point link.
* Recovering lost packets between two directly connected nodes.
* Arbitrating between multiple nodes attached to single medium.
* Generating error correction codes for packet error correction.
* Point to point.

### Network Layer :

Applications & Responsibilities :
* Source to destination delivery of packet.
* Finding shortcut path between 2 nodes separeted by multiple nodes.
* Route data from 1 node to the next.
* BGP - border gateway protocol.

### Transport Layer :

Process to process delivery of entire message.
Applications & Responsibilities :
* Segmentation & reassembly.
* End to end flow control.
* Process to process error control.
* Reliable.
* TCP - Reliable , Connection oriented.
* UDP - Unreliable , Connectionless.
* Recovering lost packet between nodes separated by multiple nodes.

### Session Layer :

It is a network dialog controller & it establish , maintain and synchronize interaction among communication devices.
Applications & Responsibilities :
* Dialog controller - Allow 2 system to enter into dialog.Communication between 2 take place in half or full duplex.
* Synchronization by adding checkpoints in stream of data.

### Presentation Layer :

Synatx and semantic of information exchanged between 2 systems.
Applications & Responsibilities :
* Translation into bit stream before transmission .
* Encryption to ensure privacy.
* Compression reduces number of bits contained in the info.
* Translation between EBCDIC to ASCII
* Translating between text message from one language to another

### Application Layer :

It enable user to access internet , provide user interface and supports HTTP , POP3, MIME, SMTP, TELNET , DNS, etc.
Applications & Responsibilities :
* Network virtual terminal - allow to log on to remote host.
* File transfer access and management.
* Mail service.
* Directory service - provide distributed DB source access for global info.
* The PDU(Protocol Data Unit) in internet stack is called message.

### Definition :

Physical layer in the OSI model plays the role of interacting with actual hardware and signaling mechanism. Physical layer is the only layer of OSI network model which actually deals with the physical connectivity of two different stations.

### Switching :

Switching :
Switching is a mechanism by which data/information sent from source towards destination which are not directly connected. Networks have interconnecting devices, which receives data from directly connected sources, stores data, analyze it and then forwards to the next interconnecting device closest to the destination.
Circuit switching
Circuit switching network establishes a fixed bandwidth circuit (channel) between nodes before the users may communicate, as if the nodes were physically connected with an electrical circuit. The bit delay is constant during the connection,
Packet switching
Packet switching is a communications paradigm in which packets (discrete blocks of data) are routed between nodes over data links shared with other traffic. The term "packets" refers to the fact that the data stream from your computer is broken up into packets of about 200 bytes (on average), which are then sent out onto the network. Each packet contains a "header" with information necessary for routing the packet from source to destination. Each packet in a data stream is independent.
Message switching
Message switching was the precursor of packet switching, where messages were routed in their entirety, one hop at a time. Nowadays, message switching systems are mostly implemented over packet-switched or circuit-switched data networks. E-mail is example of a message switching system.

Delay in Switching
There are four major types of delays on each node of a packet-switched network:
1) Processing Delay : When a packet reaches a router, the router reads the header, locates its final destination, and decides which outbound link to send it on. It also may do some transmission error checking. These account for the processing delay.
2) Queueing Delay : Most routers utilize a first-come-first-serve queue for packet traffic. If traffic on the router is busy, then the packet will have to wait in a queue for its turn to be transmitted by the router. This accounts for the queueing delay.
3) Transmission Delay : The amount of time it takes a router to push out the next packet on to the link is the transmission delay. This delay is a function of the size of the packet and the transmission rate of the link.
ttrans = Packet size / Bandwidth = L/R
4) Propagation Delay : The amount of time it takes to propagate the packet from the beginning of the link to the next link is the propagation delay. It is a function of the length of the link and the speed of the link.
tpd = distance b/w 2 hops / speed = d/s

If message is broken into n parts :
total delay = (no of packets + routers)*ttrans + (router + 1)*tpd

Definition :
the data link layer takes the packets it gets from the network layer and encapsulates them into frames for transmission.
1) FRAMING
Data link layer is responsible for making physical link reliable and, to do so, it breaks up network layer data stream into small blocks, a process called segmentation, and adds header and frame flag to each block to form a frame, a process called encapsulation.
2) Error Correction & Detection
At the receiver, data contained in frame is an arbitrary bit string. This means that the receiver is not allowed to know which bit strings correspond to “legal” data and which don’t (in accordance with layered encapsulation principles) ∴ Therefore some additional bits must be added to the frame to allow errors to be detected(and possibly corrected)

a) Error-Correcting Codes
Hamming distance :
* The number of bit positions in which two code words differ is called the Hamming distance (d).
* The minimum Hamming distance (or “minimum distance”) of the scheme as the smallest number of bit errors that changes one valid codeword into another
* If the minimum distance of an error-handling scheme is D, this scheme can detect any combination of = D-1 bit errors and correct any combination of strictly less than D/2 bit errors
Hamming Code :
Hamming code is used for error correction and detection. In Hamming codes the bits of the codeword are numbered consecutively, starting with bit 1 at the left end, bit 2 to its immediate right, and so on. The bits that are powers of 2 (1, 2, 4, 8, 16, etc.) are parity bits. The rest (3, 5, 6, 7, 9, etc.) are filled up with the m data bits.

b) Error-Detecting Codes
Parity bit
Checksums
Cyclic Redundancy Checks (CRCs)

#### Data Link Layer - Medium Access Control

Medium Access Control :
The Media Access Control layer is one of two sublayers of the Data Link Control layer and is concerned with sharing the physical connection to the network among several computers. Each computer has its own unique MAC address. Ethernet is an example of a protocol that works at the Media Access Control layer level.

a) Aloha
It is a random access method.
i) Pure Aloha -
* Just transit when you have a frame.
* If collision wait for a random time and retransmit.
Problem :
* Aloha result in too many collision.
* It is used in the place where less number of stations are there.
Contention Period :
Range of time during which collision can take place.
CP = 2 * frame transmission time.
Poisson's Distribution :
Let frame are generated using this:
Pr[k] = Gk e -G / K!
where G = average frame / frame time / Load
Pr[k] = Probability k frames generated in a frame time.
Throughput
S = Ge-2G
Smax = 0.184 & G = 1/2

ii) Slotted Aloha
*In it time is divided into slots.
* Stations can transmit only in the beginning.
* Contention period is 1 slot.
Throughput
S = Ge-G
Smax = 1/e = 0.37 & G = 1

NOTE : Slotted alloha Smax = 2 * Smax of Pure Aloha

Carrier Sensed Multiple Access (CSMA)
CSMA is a network access method used on shared network topologies such as Ethernet to control access to the network. Devices attached to the network cable listen (carrier sense) before transmitting. If the channel is in use, devices wait before transmitting. MA (Multiple Access) indicates that many devices can connect to and share the same network. All devices have equal access to use the network when it is clear. Even though devices attempt to sense whether the network is in use, there is a good chance that two stations will attempt to access it at the same time. On large networks, the transmission time between one end of the cable and another is enough that one station may access the cable even though another has already just accessed it. There are two methods for avoiding these so-called collisions, listed here :
i) CSMA/CD (Carrier Sense Multiple Access/Collision Detection)
ii) CSMA/CA (Carrier Sense Multiple Access/Collision Avoidance)

Data-link layer is responsible for implementation of point-to-point flow and error control mechanism.

### Flow Control :

When a data frame (Layer-2 data) is sent from one host to another over a single medium, it is required that the sender and receiver should work at the same speed. That is, sender sends at a speed on which the receiver can process and accept the data. What if the speed (hardware/software) of the sender or receiver differs? If sender is sending too fast the receiver may be overloaded, (swamped) and data may be lost.
Two types of mechanisms can be deployed to control the flow:
Stop and Wait
This flow control mechanism forces the sender after transmitting a data frame to stop and wait until the acknowledgement of the data-frame sent is received. Stop and Wait
Sliding Window
In this flow control mechanism, both sender and receiver agree on the number of data-frames after which the acknowledgement should be sent. As we learnt, stop and wait flow control mechanism wastes resources, this protocol tries to make use of underlying resources as much as possible.

### Error Control :

When data-frame is transmitted, there is a probability that data-frame may be lost in the transit or it is received corrupted. In both cases, the receiver does not receive the correct data-frame and sender does not know anything about any loss.In such case, both sender and receiver are equipped with some protocols which helps them to detect transit errors such as loss of data-frame. Hence, either the sender retransmits the data-frame or the receiver may request to resend the previous data-frame.
Requirements for error control mechanism:
Error detection - The sender and receiver, either both or any, must ascertain that there is some error in the transit.
Positive ACK - When the receiver receives a correct frame, it should acknowledge it.
Negative ACK - When the receiver receives a damaged frame or a duplicate frame, it sends a NACK back to the sender and the sender must retransmit the correct frame.
Retransmission - The sender maintains a clock and sets a timeout period. If an acknowledgement of a data-frame previously transmitted does not arrive before the timeout the sender retransmits the frame, thinking that the frame or it’s acknowledgement is lost in transit.
There are three types of techniques available which Data-link layer may deploy to control the errors by Automatic Repeat Requests (ARQ):

Stop-and-wait ARQ
The following transition may occur in Stop-and-Wait ARQ:
* The sender maintains a timeout counter.
* When a frame is sent, the sender starts the timeout counter.
* If acknowledgement of frame comes in time, the sender transmits the next frame in queue.
* If acknowledgement does not come in time, the sender assumes that either the frame or its acknowledgement is lost in transit. Sender retransmits the frame and starts the timeout counter.
* If a negative acknowledgement is received, the sender retransmits the frame.

Utilization
i) u = 1/(1 + 2a)
where a = tpd / ttrans
ii) For noisy Channel → u = (1 - p)/ (1 + 2a) * 100%
where p = probability of single frame in error.
Throughput
i) = ( 1 / ttrans + 2 * tpd ) frames/sec or
ii) = ( L / ttrans + 2 * tpd) bits/sec or
iii) = R/ (1 + 2a ) bits/sec
where R = bandwidth & a = tpd / ttrans

Go-Back-N ARQ
Stop and wait ARQ mechanism does not utilize the resources at their best.When the acknowledgement is received, the sender sits idle and does nothing. In Go-Back-N ARQ method, both sender and receiver maintain a window.
The sending-window size enables the sender to send multiple frames without receiving the acknowledgement of the previous ones. The receiving-window enables the receiver to receive multiple frames and acknowledge them. The receiver keeps track of incoming frame’s sequence number. When the sender sends all the frames in window, it checks up to what sequence number it has received positive acknowledgement. If all frames are positively acknowledged, the sender sends next set of frames. If sender finds that it has received NACK or has not receive any ACK for a particular frame, it retransmits all the frames after which it does not receive any positive ACK.
Utilization
i) u = w/(1 + 2a)
where a = tpd / ttrans & w = window size.
ii) If w ≤ 1 + 2a → u = w / (1 + 2a)
iii) If w ≥ 1 + 2a → u = 100%
Throughput

i) If w ≤ 1 + 2a → w / ttrans + 2 * tpd or
ii) = w * R / (1 + 2a)
iii) If w ≥ 1 + 2a → 1/ttransframes/sec

Selective Repeat ARQ
In Go-back-N ARQ, it is assumed that the receiver does not have any buffer space for its window size and has to process each frame as it comes. This enforces the sender to retransmit all the frames which are not acknowledged.
In Selective-Repeat ARQ, the receiver while keeping track of sequence numbers, buffers the frames in memory and sends NACK for only frame which is missing or damaged.
The sender in this case, sends only packet for which NACK is received.

### Definition :

The network layer is concerned with getting packets from the source all the way to the destination. The packets may require to make many hops at the intermediate routers while reaching the destination. This is the lowest layer that deals with end to end transmission.

### Routing protocol :

1) Distance Vector Routing
Each node maintains a vector of minimum distance to every node.
a) Initialization: Every node know only distance between itself and immediate neighbour.

b) Sharing: If we share between A and B then A comes to know about C.

* Each node shares routing tablewith its immediate neighbour.
* At any particular time period router update routing tabel & informs its neighbour abut the change.

Problem with DVR :

(a) Count to infinity
If any link is down , it keeps on updating the table upto infinity.
(b) Too many iterations to stabilize
In a larger network a lot of data traffic.

* It is a global and dynamic routing algorithm.
* Each node broadcasts information about its neighbour to the entire network.
* Link State Routing uses Dijkastra's Algorithms to find the shortest path.
* Age of packet is specified so that only recent packets should update not the old ones.
* And the packets are dropped after completing a fixed length path.
* These packets are known as LSP packers which contains the complete information about the network.
* If LSP with less recent data is received , new LSP is updated and send back only over to the node from where it came from.
* If LSP is not in databse , data is stored and forwarded to all ecept the sorce.

Problem with LSR :
Too much network traffic due to broadcasting.

#### Network Layer - IP

IP addresses are of 4 bytes and consist of :

The IP specifications divide addresses into the following classes :

(i) Limited Broadcast : It consists of all 1's, i.e., the address is 255.255.255.255 . It is used only on the LAN, and not for any external network.
(ii)Directed Broadcast : It consists of the network number + all other bits as 1's. It reaches the router corresponding to the network number, and from there it broadcasts to all the nodes in the network.
b) Network ID = 0
It means we are referring to this network and for local broadcast we make the host ID zero.
c) Host ID = 0
This is used to refer to the entire network in the routing table.
Here we have addresses of the type 127.x.y.z It goes down way upto the IP layer and comes back to the application layer on the same host. This is used to test network applications before they are used commercially.

Subnetting
Subnetting divides larger network into small networks.Subnetting depends on the class of the network address.Subnetting improves the performance of the network.IP address is 32 bit long,in that one part of the address is used to indicate the net id and other part is used to indicate the host id.
Supernetting
.Supernetting is also known as classless addressing.It is complementary to subnet addressing.Two or more networks can be combined into a single IP network.A group of routers is combined into a single router using supernetting.This reduces entries in routing table that increases the speed.

### Packet Structure :

Version :
IPV4 / IPV6
Header will always be a multiple of 4bytes and so we can have a maximum length of the field as 15, so the maximum size of the header is 60 bytes ( 20 bytes are mandatory ).
Types of service :
This helps the router in taking the right routing decisions.
Total Length :
It includes the IP header and everything that comes after it.
Identification :
The source and ID field together will represent the fragments of a unique packet. So each fragment will have a different ID.
Offset :
It is a 13 bit field that represents where in the packet, the current fragment starts. Each bit represents 8 bytes of the packet. So the packet size can be at most 64 kB.
TTL :
Using this field, we can set the time within which the packet should be delivered or else destroyed.
Protocol :
This specifies the module to which we should hand over the packet ( UDP or TCP ).
This is the usual checksum field used to detect errors. Since the TTL field is changing at every router so the header checksum ( upto the options field ) is checked and recalculated at every router.
Source :
It is the IP address of the source node
Destination :
It is the IP address of the destination node.
IP Options :
The options field was created in order to allow features to be added into IP as time passes and requirements change. Currently 5 options are specified although not all routers support them.

### Definition :

Transport layer helps in process to process communication.

UDP - User Datagram Protocol
The user datagram protocol (UDP), is an unreliable transport protocol with no sessions or flow control and optional error checking. UDP just sends packets as soon as requested and forgets about them. It is faster than TCP.
Applications
* Video chat and streaming.
* Domain name server.
* Remote procedure call.
Source Port : Source Port identifies the UDP process which sent the segment
Destination Port : Destination Port identifies the UDP process which will handle the application data
UDP packet length : Packet length specifies the length of the segment, including the header, in bytes
UDP checksum : Checksum is optional.

TCP - Transmission Control Protocol
The transmission control protocol (TCP) is used for applications in which reliable connections between hosts are necessary.
* Connection oriented.
* TCP uses IP layer for delivering packets.
* Reliable , Flow control , Error control , Retransmission.
* Byte to Byte stream not packet to packet.
Source Port : Source Port identifies the UDP process which sent the segment
Destination Port : Destination Port identifies the UDP process which will handle the application data
Header length : Header length (4 bits) is the header length in 32-bit words the flag field contains 6 1-bit flags
Reciever window : Receive window identifies how much buffer space is available for incoming data.

TCP flags :
URG flag indicates that the sender has marked some data as urgent in this case, the Urgent data pointer contains an offset into the TCP data stream marking the last byte of urgent data.
ACK flag indicates that the acknowledgement number field is valid (i.e. the segment is an acknowledgement).
PSH flag indicates that should be delivered immediately (PUSHed) and not buffered.
RST flag is used to reset a connection, i.e. a confused or refused connection.
SYN flag is used to establish a connection.
FIN flag is used to terminate a connection.

### TCP Flow Control :

TCP Connection Establishment
Connections are established by means of a three-way handshake :
1) The sender sends a TCP segment with the SYN flag ON.
2) The recipient sends a segment with both SYN and ACK flags ON.
3) The sender replies with ACK.

Connection Establishment

* Sender opens the connection.
* Receiver accepts the connect request by sending a TCP segment which acknowledges host 1's request & sets acknowledgement number to initial sequence number + 1.
* Sender acknowledges this request
* SYN flag "consumes" one byte of sequence space

Connection Establishment

* A three-way handshake is also used to terminate a connection.
* Sender terminates the connection by transmitting a segment with the FIN flag set containing optional data.
* Receiver acknowledges this and sets its own FIN flag.
* The third and last segment contains Sender's acknowledgement of Receiver's FIN flag.

### TCP Congetion Control :

It contains the following 3 phases :
i) Slow start algo
ii) Congestion avoidance
iii) Congestion Detection.

Slow start / Exponential Algorithm
* Slow-start algorithm is part of the congestion control in TCP, designed to avoid sending more data than the network is capable of transmitting.
* Slow-start algorithm works by increasing the TCP Window by one segment for each acknowledged segment. This behavior effectively doubles the TCP Window size each round trip of the network.
* The algorithm continues increasing until this "congestion window" (cwnd) reaches the threshold.

* In congestion avoidance algorithm increase of sender window is based on RTT(Round trip time).
* In this algorithm sender window increases linearlly.
* If data is lost & after 3 duplicate ACKs if it is accepted it is treated as weak possibility of congestion.
* If data is lost continuously until global timer expires it is treated as strong possibility of congestion.

Congestion Detection / Multiplicative Decrease
* Weak Possibility
Sender window = 1/2 * present window
Threshold = 1/2 * present window size
Then start CONGESTION AVOIDANCE
* Strong Possibility
Sender window = 1 MSS
Threshold = 1/2 * present window size
Then start SLOW START ALGORITHM

### Definition :

Application Layer is responsible or communication between process over systems.

Client-server architecture
Server :
* always-on host     * permanent IP address     * server farms for scaling.
Clients :
* communicate with server     * may be intermittently connected     * may have dynamic IP addresses     * do not communicate directly with each other

Pure P2P architecture
* no always-on server
* arbitrary end systems directly communicate
* peers are intermittently connected and change IP addresses
* example: Gnutella

Different protocols supported by Application Layer :

HTTP
HTTP is “stateless” protocol server maintains no information about past client requests
Procedure :
* The client initiates TCP connection (creates socket) to server, port 80.
* The server accepts TCP connection from client.
* HTTP messages exchanged between browser (HTTP client) and Web server (HTTP server).
* TCP connection closed.

HTTP connections

Non Persistent HTTP
At most one object is sent over a TCP connection.
Persistent HTTP
Multiple objects can be sent over single TCP connection between client and server.

FTP : File Transfer Protocol
* FTP is used to transfer file to/from remote host.
* It has client/server model
- client: side that initiates transfer (either to/from remote)
- server: remote host

FTP: separate control, data connections

Procedure :
* FTP client contacts FTP server at port 21, specifying TCP as transport protocol.
* Client obtains authorization over control connection.
* Client browses remote directory by sending commands over control connection.
* When server receives a command for a file transfer, the server opens a TCP data connection to client.
* After transferring one file, server closes connection.
* Server opens a second TCP data connection to transfer another file.
* Control connection: “out of band”
* FTP server maintains “state”: current directory, earlier authentication.

Electronic Mail
Three major components:
* user agents
* mail servers
* simple mail transfer protocol: SMTP
User Agent
* It is also known as “mail reader”
* composing, editing, reading mail messages
* e.g. Outlook, elm, Netscape Messenger
* Outgoing, incoming messages stored on server
User Agent
* mailbox contains incoming messages for user
* message queue of outgoing (to be sent) mail messages
* SMTP protocol between mail servers to send email messages

Electronic Mail: SMTP
* SMTP uses TCP to reliably transfer email message from client to server, port 25
* Three phases of transfer (a) handshaking (greeting) (b) transfer of messages (c) closure
* SMTP uses persistent connections & requires message (header & body) to be in 7-bit ASCII

Mail access protocols :
* SMTP: delivery/storage to receiver’s server
* Mail access protocol: retrieval from server
(b) IMAP: Internet Mail Access Protocol - more features & manipulation of stored message on server.
(c) HTTP: Hotmail , Yahoo! Mail, etc.

POP3 protocol :
* It is used to download mails that can be viewed on local machine using software such as Outlook, Thunderbird etc.

IMAP :
* Keep all messages in one place: the server
* Allows user to organize messages in folders
* IMAP keeps user state across sessions.

DNS: Domain Name System
DNS is used for Hostname to IP address translation, Host aliasing - Canonical and alias names, Mail server aliasing, Load distribution - replicated Web servers: set of IP addresses for one canonical name

Distributed, Hierarchical Database
Procedure :
* Client queries a root server to find com DNS server.
* Client queries com DNS server to get prepjunkie.com DNS erver.
* Client queries amazon.com DNS server to get IP address for www.prepjunkie.com.
DNS: Root name servers
Root name servers are contacted by local name server that can not resolve name
Root name servers :
* contacts authoritative name server if name mapping not known
* Gets mapping
* Returns mapping to local name server
* There are 13 root name servers worldwide
TLD and Authoritative Servers
Top-level domain (TLD) servers: responsible for com, org, net, edu, etc, and all top-level country domains uk, fr, ca, jp.
* Network solutions maintains servers for com TLD
* Educause for edu TLD
Authoritative DNS servers: organization’s DNS servers, providing authoritative hostname to IP mappings for organization’s servers
* Can be maintained by organization or service provider
Local Name Server
It does not strictly belong to hierarchy.
Each ISP (residential ISP, company, university)has one Also called “default name server”
When a host makes a DNS query, query is sent to its local DNS server

## DBMS

### Definition :

Database is a collection of data. DBMS is a software used to manage DB in efficient way.

### Flat File System

A flat file database is a type of database that stores data in a single table. Flat file databases are generally in plain-text form, where each line holds only one record. The fields in the record are separated using delimiters such as tabs and commas.

1) Data Security : The data stored in the flat files can be easily accessible and hence it is not secure.
2) Data Redundancy : In this storage model, the same information may get duplicated in two or more files. This may lead to to higher storage and access cost. it also may lead to data inconsistency.
3) Data Isolation : Data Isolation means that all the related data is not available in one file. Usually the data is scattered in various files having different formats. Hence writing new application programs to retrieve the appropriate data is difficult.
4) Lack of Flexibility : If we need unanticipated data, huge programming effort is needed to make the information available.
5) Concurrent Access Anomalies : Many traditional systems allow multiple users to access and update the same piece of data simultaneously & this concurrent updates may result in inconsistent data.
These difficulties lead to the development of database systems.

### Database Management System

Using a DBMS to manage data has many advantages:

1) Efficient Data Access
2) Data Integrity and Security
3) Concurrent Access and Crash Recovery
4) Reduced Application Development Time

### Levels of Abstraction

Physical level : Describes how a record is stored
Logical level : Describes data stored in database, and the relationships among the data
View level : A way to hide details of data types and information for security purposes.

### Relational RDBMS

In RDBMS data must be stored in table format.
* Arity : Number of columns of databse table.
* Cardinality : Number of records of database table
* Relational Schema : Definition of database table
* Relational Instance : Set of record / Record set of database table

KEYS in DBMS
In RDBMS keys are used to establish and identify relation between tables.
Super Key
Super Key is defined as a set of attributes within a table that uniquely identifies each record within a table. Super Key is a superset of Candidate key
Candidate Key
Candidate keys are defined as the set of fields from which primary key can be selected. It is an attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in that table.
Primary Key
Primary key is a candidate key that is most appropriate to become main key of the table. It is a key that uniquely identify each record in a table.
Foreign Key
Foreign key is a field in one table that uniquely identifies a row of another table.
Composite Key
Key that consist of two or more attributes that uniquely identify an entity occurance is called Composite key. But any attribute that makes up the Composite key is not a simple key in its own.
Secondary or Alternative key
The candidate key which are not selected for primary key are known as secondary keys or alternative keys
Non-key Attribute
Non-key attributes are attributes other than candidate key attributes in a table.
Non-prime Attribute
Non-prime Attributes are attributes other than Primary attribute.

Key Constraints
1) Two distinct tuples in a legal instance cannot have identical values in all the fields of a key.
2) No subset of the set of fields in a key is a unique identifier for a tuple.

Foreign Key Constraints
1) Every non null value of FK or referencing field (X) must belong to value of referenced field (Y).
2) This property should be satisfied even after Insertion ,Deletion or Updation.

Referential Integrity Constraints
a) Referred Relation :
1) Inserion into parent table :- Not a problem
2) Deletion :- May cause violation
a) ON DELETE "No Action "
If parent data used in child table → data is not allowed to delete i.e. Deletion Restricted.
If the referred tupleis delete then it forces to ndelete referring tuple also.
c) ON DELETE "Set NULL"
If the referring attribute is allowed to set values then referred records are allwed to delete.
NOTE : Not allowed to delete → if FK is PK or a not NULL attribute.
3) Updation :- May cause violation
a) ON UPDATE "No Action "
Not allowed.
Update child also.
c) ON UPDATE "Set NULL"
Set child value to NULL.

b) Referring Relation :
1) Inserion :- If causes problem → NOT ALLOWED.
2) Deletion :- NO violation
3) Updation :- If causes problem → NOT ALLOWED.

### Definition

Normalization helps in eliminating / reducing redundancy.
Redundancy can occur in DB table if 2 or more independent relation or data are stored in single RDBMS table.

Goals of the normalization
* Eliminate redundant data i.e. storing the same data in more than one table.
* Ensure data dependencies make sense i.e. only storing related data in a table.

Functional Dependencies
Describes the relationship between attributes in a relation.If A and B are attributes of a relation R & B is functionally dependent on A
A → B
It means for each value of A attribute there should be a single B value.
Anomaly in Databse
Update Anomaly : Occurs when a change of a single attribute in one record requires changes in multiple records.
Insertion Anomaly: Occurs when there does not appear to be any reasonable place to assign attribute values to records in the database. Probably have overlooked a critical entity.

### The Normal Forms

Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Coddnormal form (BCNF).

FIRST NORMAL FORM ( 1NF )
Relational Schema Ris in 1NF iff no multivalued attributes are present in relation R.
OR
Every attribute of Relation R must be a single valued attribute.

Some important points :
* In ER diagram multivalued attributes are allowed but in RDBMS table they are not allowed. Therefore they are in 1NF by default.
* If multivalued attribute exist in a relation every CK of relation is COMPOUND.
* If atleast 1 simple CK in a relation then no multivalued attribute exist in a table.
* Redundancy due to FD is reduced further is eliminated in higher level.

SECOND NORMAL FORM ( 2NF )
A relation that is in 1NF and every non-primary key attribute is fully functionally dependent on the primary key.
OR
R is in 2NF iff there is no PARTIAL DEPENDENCY in R.
Some important points :
* If every CK in R is SIMPLE → R is always in 2NF.
* If every CK in R is SIMPLE → Partial Dependencies does not exist.
* R is in 2NF if no Partial Dependencies in 2NF.

THIRD NORMAL FORM ( 3NF )
A relation that is in 1NF and 2NF, and in which no non-primary key attribute is transitively dependent on the primary key.
OR
A relation R is in 3NF if for every non trivial FD x → y in R :
a) X should b SUPER KEY
OR
b) Y must be a prime attribute.
3NF Inference Rule
1. All non prime attributes must be determined by Super Key.
2. Prime attributes may or may not be determined by super key.

Some important points :
* R is in 3 NF if no transitive dependency.
* If every attribute of R is prime → always satisfies 3NF.

Boyce-Codd Normal Form ( BCNF )
A relation is in BCNF if and only if every determinant is a candidate key.
OR
A relation R is in 3NF if for every non trivial FD in R :
a) x → y in R , X must be SUPER KEY
NOTE : BCNF is a stronger form of 3NF
BCNF => 3NF
3NF ≠> BCNF

Some important points :
* In no non-trivila FD in R , then there is no redundancy in R → R is in BCNF.
* If R is in 3NF and only 1 compound candidate key → R is in BCNF.
* If R contains only 2 attributes → R is in BCNF.

### Relational Algebra

A language based on operators and a domain of values. Operators map values taken from the domain into other domain values

Basic Operators
Select , Project , union , set difference , Cartesian product.

Derived Operators
set intersection, division, join

SELECT Operator
Produces table containing subset of columns of argument table
πattribute list ( relation )

Project Operator
Produce table containing subset of rows of argument table satisfying condition
σcondition ( relation )

Set Operators
Relation is a set of tuples, so set operations should apply: *cup;, *cap;, - (set difference)
Scope of set operations limited to union compatible relations
Union Compatible Relations
* Two relations are union compatible if
– Both have same number of columns
– Names of attributes are the same in both
– Attributes with the same name in both relation have the same domain
* Union compatible relations can be combined using union, intersection, and set difference

Cartesian Product
If R and S are two relations, R X S is the set of all concatenated tuples < x,y >, where x is a tuple in R and y is a tuple in S
R X S is expensive to compute.

Derived Operation: Join

* Theta Join : Theta join combines tuples from different relations provided they satisfy the theta condition. The join condition is denoted by the symbol θ.
* Natural Join : Natural join does not use any comparison operator. It is same as cartesian product.Natural join acts on those matching attributes where the values of attributes in both the relations are same.
* Outer Join
a) Left Outer Join : All the tuples from the Left relation, R, are included in the resulting relation.
b) Right Outer Join : All the tuples from the Right relation, S, are included in the resulting relation.
c) Full Outer Join : All the tuples from both participating relations are included in the resulting relation. If there are no matching tuples for both relations, their respective unmatched attributes are made NULL.

### SQL

Basic SQL clauses and order of its execution.

Aggregate functions in SQL
AVG() : Returns the average value.
COUNT() : Returns the number of rows.
FIRST() : Returns the first value.
LAST() : Returns the last value.
MAX() : Returns the largest value.
MIN() : Returns the smallest value.
SUM() : Returns the sum.

### Definition :

A transaction is a unit of program execution that accesses and possibly updates various data items
Properties
* A transaction must see a consistent database
* During transaction execution the database may be temporarily inconsistent
* When the transaction completes successfully & is committed, the database must be consistent
* Multiple transactions can execute in parallel
ACID Properties :
Atomicity : Either all operations of the transaction are properly reflected in the database or none are
Consistency : Execution of a transaction in isolation preserves the consistency of the database
Isolation : Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions
Durability : After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures
Transaction States :
Active : Either all operations of the transaction are properly reflected in the database or none are
Partially committed : Execution of a transaction in isolation preserves the consistency of the database
Failed : Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions
Aborted : After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures Committed : After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures
Serializability
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
1. "conflict serializability"
2. "view serializability
Conflict Equivalence
Two schedules would be conflicting if they have the following properties -
* Both belong to separate transactions.
* Both accesses the same data item.
* At least one of them is "write" operation.
NOTE : All conflict serializable schedules are view serializable too.

Conflict Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a similar manner.
Example :-
* If T reads the initial data in S1, then it also reads the initial data in S2.
* If T reads the value written by J in S1, then it also reads the value written by J in S2.
* If T performs the final write on the data value in S1, then it also performs the final write on the data value in S2.

Test of Serializability
* A schedule is serializable if and only if its precedence graph is acyclic.
* Cycle-detection algorithms exist which take order n2 time, where n is the number of vertices in the graph.
* If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph.