There are four stages of front-end analysis. They are **lexical analysis**, **syntax analysis**, **semantic analysis** and **intermediate code generation**.

Frequent access to a **symbol table** is required during front-end analysis. A symbol table is a data structure that stores the name and attributes, such as the data type and the scope, of every identifier.

The first stage of front-end analysis is **lexical analysis**. Lexical analysis separates each line of source code into **tokens**, also known as **lexemes**. Each token consists of a token name and a token value. All identifiers must be entered into the **symbol table**.

For example, the following declaration statement:

`Var Num : integer;`

will be separated into five tokens:

`Var`

, `Num`

, `:`

, `integer`

, `;`

Examples of token names and token values:

Token Name | Token Value |
---|---|

Identifier | `count` , `x` , `colour` |

Keyword | `while` , `if` , `return` |

Separater | `(` , `{` , `;` |

Operator | `+` , `=` , `<` |

Literal | `123` , `"blue"` , `true` |

Comment | `// Here` , `/* Here */` |

**Syntax analysis**, also known as **parsing**, is the process of analysing a sequence of tokens according to the formal grammar of the programming language. The result of the syntax analysis is the construction of a **syntax tree** or **parse tree**.

For example, the following assignment statement:

`y := 2 * x + 4`

will be parsed into a tree:

**Semantic analysis** is a process which gathers necessary logical information from the source code. It performs tasks such as type checking, object binding, etc. An annotated abstract syntax tree is constructed to record these information.

An **intermediate code** is generated from the annotated abstract syntax tree. The intermediate code is a completely accurate representation of the source code that is designed to be useful for further processing such as optimisation and translation.

**Transport Layer Security** (**TLS**) protocol is the successor to the **Secure Sockets Layer** (**SSL**) protocol which has been prohibited from use by the Internet Engineering Task Force (IETF) due to its vulnerabilities to attacks.

They are both protocol suites that have been implemented to function as an additional layer of security between the transport layer and the application layer.

When a TLS/SSL protocol is implemented, HTTP (Hyper-text Transfer Protocol) becomes HTTPS - S for Secure.

The Handshake protocol of the TLS/SSL protocol suite is implemented to establish a secure communication session between the client and the server. Once the handshake process is finished, secure message exchange is available using a shared symmetric key.

participant Client
Client->Server: "client hello" + list of supported cipher suites\n and TLS/SSL versions
Server->Client: "server hello" + ciper suite and TLS/SSL version\n chosen + digital certificate
Note left of Client: extract public key
Note left of Client: generate pre-master key*
Client->Server: encrypted pre-master key
Note right of Server: decrypt pre-master key
Note left of Client: generate shared secret**
Note right of Server: generate shared secret
Client->Server: test message encrypted with shared secret
Note right of Server: message decrypted
Server->Client: test message encrypted with shared secret
Note left of Client: message decrypted
Client->Server: "client finished"
Server->Client: "server finished"
Client-->Server: start secure message exchange
Server-->Client:

*** shared secret: symmetric session key*

**Hashing**: uses a mathematical algorithm that takes a string as input and outputs a universally unique hash

**Data Integrity**: data is not modified or corrupted

- A public key and private key's function can be reversed: a public key can be used to decrypt a message encrypted by its corresponding private key.
- A hashing algorithm must in principle:
- produce a fixed-length output hash
- produce a completely different output even for the slightest change in the input
- be impossible to reverse (derive the input from the output hash)

Because of the reversible nature of the public-private key pair, a private key can be used as a proof of identity. Only the sender owns the private key, so, if a message encrypted by the private key is successfully decrypted by the corresponding public key, that verifies the identity of the sender.

participant Sender
Note left of Sender: encrypt message\n with private key
Sender->Receiver: encrypted message
Note right of Receiver: decrypt message\n with public key

However it can take a long time to encrypt longer messages, so, a one-way hash function is used. The body of the message is taken as input by the hash function and it outputs a hash that can be encrypted much more quickly. The output is called a message **digest**. The message digest encrypted by a private key is called the **digital signature**.

participant Sender
Note left of Sender: hash message to produce digest
Note left of Sender: encrypt digest with private key\n to produce digital signature
Sender->Receiver: message
Sender->Receiver: digital signature
Note right of Receiver: hash message to produce digest
Note right of Receiver: use public key to decrypt\n digital signature

The receiver receives the **message** and the **encrypted digest**. Then, the receiver:

- runs the
**message**through the hash function and produce the message digest - decrypts the
**encrypted digest**with the public key to produce the message digest - compares the two message digests, if they are the same, it proves:
- the sender's identity (otherwise, the public key would be able to decrypt the message)
- the message's integrity after the transmission (otherwise, the digest produced by the receiver would not be the same as the digest encrypted by the sender)

However, the receiver might not even have the public key of the real sender. Someone else might have claimed to be the real sender and given the receiver his/her own public key.

The **digital certificate** is designed for this very reason. A trusted Certification Authority or CA acts as the middle man between the sender and the receiver:

- It verifies the identify of the sender and takes his/her public key.
- It creates the digital certificate containing the sender's public key.
- It encrypts the digital certificate with its private key

Now the senders can use the digital certificate as the proof of identity.

**Plaintext**: original data before encryption

**Cyphertext**: data after applying encryption algorithms

**Key**: a string that is used to encrypt or decrypt data

**Public key**: the key of the asymmetric key pair that is shared with the public

**Private key**: the key of the asymmetric key pair that is kept by the receiver

Symmetric cryptography uses the same key to encrypt and decrypt data, hence the name 'symmetric', meaning the same on both sides.

Note left of Sender: Generates key
Note left of Sender: Encrypts message with key
Sender->Receiver: Sends key
Sender->Receiver: Sends encrypted message
Note right of Receiver: Decrypts message with key

Asymmetric cryptography generates a public-private key pair where the public key is used for senders to encrypt messages and the private key is used for the receivers to decrypt messages, hence the name 'asymmetric', meaning different on the two sides.

participant Sender
Note right of Receiver: Generates Public-Private Key Pair
Receiver->Sender: Sends Public Key
Note left of Sender: Encrypts message with Public Key
Sender->Receiver: Sends encrypted message
Note right of Receiver: Decrypts message with Private Key

]]>**Bubble Sort:** Repeatedly loops through the list, comparing each pair of adjacent items and swapping them if needed, until it is sorted

**Insertion Sort:** Builds a sorted list by comparing the next unsorted item with each item of the sorted list and inserting it at the appropriate place (initially, the sorted list consists of only the first item of the list)

As you can see, their performance are identical for best-case, worst-case and average-case scenarios. The only difference is that insertion sort performs better for a substantially sorted list with very few inversions.

Type | Best-case performance | Worst-case performance | Average-case performance |
---|---|---|---|

Bubble | O(n) for an already sorted list | O(n^2) for a list sorted in reverse | O(n^2) |

Insertion | O(n) for an already sorted list | O(n^2) for a list sorted in reverse | O(n^2) |

Compared with most other algorithms, like quicksort, both bubble sort and insertion sort fall short on worst-case and average-case performance. However, they do have the fastest performance for an already sorted list.

Therefore, bubble sort is often avoided as the only advantage it possesses is shared with insertion sort - which has a better performance for substantially sorted lists.

```
unsorted = [1, 5, 2, 3, 10, 7, 8] #unsorted array of numbers
def bubble(toSort):
needsSwap = TRUE
while needsSwap:
needsSwap = FALSE
for i in range(1, len(toSort)):
if toSort[i-1] > toSort[i]:
toSort[i-1], toSort[i] = toSort[i], toSort[i-1]
needsSwap = TRUE
return toSort
```

**Line 4-5**: start a `while`

loop that ends when `needsSwap`

becomes `FALSE`

**Line 6**: set `needsSwap`

to `FALSE`

, because no swapping has occurred at this point

**Line 7-9**: step through the list with a `for`

loop, comparing each adjacent pair of items and if the first item is larger than the second, swap the two

**Line 10**: if there has been any swapping, set `needsSwap`

to `FALSE`

and the loop starts over

```
unsorted = [1, 5, 2, 3, 10, 7, 8] #unsorted array of numbers
def insertion(toSort):
currentPosition = 1
while currentPosition < len(toSort):
i = currentPosition
while i > 0 and toSort[i-1] > toSort[i]:
toSort[i-1], toSort[i] = toSort[i], toSort[i-1]
i -= 1
currentPosition += 1
return toSort
```

**Line 4**: set `currentPosition`

to 1 which is the second element of the array

**Line 5**: stay in a `while`

loop until `currentPosition`

is at the end of the array

**Line 6**: set index `i`

to the current position number

**Line 7-9**: compare the value of the current position `toSort[i]`

with the previous item and swapping them if the previous item is larger, until `tosort[i]`

is larger than its previous item or it is at the start of the list

**Line 10**: increment `currentPosition`

, the comparison starts over for the next item on the unsorted list

- write algorithms/program code to process array data including:
- sorting using a bubble sort

- write an algorithm to implement an insertion sort
- write an algorithm to implement a bubble sort

(Updated for the 2017-2019 CIE Syllabus)

]]>**Linear Search:** checking each element of an array in turn for a required value

**Binary Search:** checking the middle of a sorted array and discarding the half that does not contain the required value

Type | Requirement | Performance |
---|---|---|

Linear Search | no requirement | O(n); Time proportional to array length |

Binary Search | must be sorted | O(log n); Time proportional to log of array length |

As you can see from the graph above, the average search time of an O(n) algorithm (linear search) increases linearly while the average search time of an O(log n) algorithm (binary search) increases logarithmically, meaning the increase in search time tends towards 0.

```
List = [4, 3, 10, 7, 2, 1]
Found = False
Index = 0
SearchValue = input(int("Enter a value to search: "))
while Found == False and Index < len(List):
if List[Index] == SearchValue:
Found = True
Index += 1
if Found == True:
print("Value found at position:", Index)
else:
print("Value not found")
```

```
List = [1, 2, 5, 6, 9, 10, 11] #Must be ordered
Found = False
SearchFailed = False
Lower = 0 #Position of minimum value
Upper = len(List) - 1 #Position of maximum value
SearchValue = input(int("Enter the value to be searched: "))
while !Found and !SearchFailed:
Middle = (Lower + Upper) // 2 #Use // to return a whole number
if List[Middle] = SearchValue:
Found = True
else:
if Lower >= Upper:
SearchFailed = True
else:
if List[Middle] < SearchValue:
Lower = Middle + 1
else:
Upper = Middle - 1
if Found:
print("Value found at:", Middle)
else:
print("Value not found")
```

- Write algorithms/program code to process array data including
- searching using a linear search

- write a binary search algorithm to solve a particular problem
- show understanding of the conditions necessary for the use of a binary search
- show understanding of how the performance of a binary search varies according to the number of data items

(Updated for the 2017-2019 CIE Syllabus)

]]>