Learning to write like a computer
Binary, ASCII and how computers represent data
Computers and electronic systems store information in a variety of formats, but they all share one fundamental construction. At the most basic level the data is stored via an on/off status of lots of single locations of physical memory. In a Solid State Drive (SSD), for example, the on/off states are represented by the state of an individual tiny transistor.
We normally call these two states for a single bit of data 0 or 1, and indeed use a technical term to describe such a single piece of data: it is called a bit.
When you learned to count as a child you memorized the numbers and words for the integers in order:
\[ \text{Zero, One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve,} \ldots \]
\[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, \ldots \]
You’ve probably already forgotten how strange it was to you as a child that after NINE there wasn’t a number to represent TEN, instead we re-use the 1 and 0 digits. This is because we use what is known as Base Ten (also called Decimal), where the 1 digit shifted left by one represents TEN. Interestingly in Base Ten there are ten different digits used (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) but none of them actually represents ten itself!
Since computers work fundamentally by editing 0s and 1s it actually makes enormous sense for efficiency for them to work in Base Two (also known as Binary).
Task One:
- Investigate how to count to ten in binary. You may find this webpage useful (wikiHow link) or the first six minutues of this video (Intro to Binary), or find one yourself online!
- In Decimal the numbers \(1, 11, 111, 1111,\) etc.. differ by ten, one hundred, one thousand etc… What do the binary numbers \(1, 11, 111, 1111,\) etc… differ by?
- What happens when you add 1 to 999999 in Base Ten? What happens when you add 1 to 111111 in Base Two?
Divisibility is an interesting topic, especially in computing because when you ask a computer to divide 4 by 3 you can quickly get into difficulty with how to even store the answer! (Can you see why?)
When doing arithmetic and calculations it’s often very important to be able to identify if a number can be divided equally into 2, 3, 4, 5, etc.. i.e. if it’s even, a multiple of 3, multiple of 4 etc..
Task Two:
- How can you tell if a number in Base Ten is even? i.e. is it in the 2-times-table?
- How can you tell if a number in Base Ten is exactly divisible by four? i.e. is it in the 4-times-table? (Search online for a good rule if you don’t know one)
Now for Base Two:
Try and answer the same two questions above, but in binary not base ten. Along the way see if you can answer these two questions below.
- Is even-ness harder or easier to determine than in Base Ten?
- What about for divisibility by four?
Now you’ve seen the basics of binary, we shall learn a little about how computers store more than just numbers. In order to store letters, words, and symbols on computers all that is needed is an agreed code which matches numbers to letters. The most well known code for doing this is known as ASCII (pronounced ass-kee). In order to store the word Hello on a computer all you need to do is first tell the computer that you are about to provide an ASCII code, and then provide the five numbers which represent H, then e, then l then l then o in consecutive order.
For clarity, files on computers contain some initial information which tells the computer what format/code (e.g. ASCII) is used for the upcoming data, then they contain a long list of numbers which need to be converted according the named format/code in order to convert the long string of numbers into something readable. To encode the word Cat you go to the ASCII table and discover that C, a and t equal 67, 97 and 116 respectively, then you store these three numbers. Though since computers only store binary values, these numbers need to be first converted into binary before storage.
Task Three:
- Use the table of ASCII values (below) to convert the word Hello into ASCII. What five decimal numbers are needed (and in what order)? Hint: the important columns are the ‘Decimal’ and ‘Char’ (Character) columns. You need to go beyond row 60 to reach the letter characters.
- Use the binary column of the ASCII table to convert these five numbers into binary. You should have five eight-digit binary numbers for a total of 40-bits of storage space required.
Finally, for a few deeper questions which you might have some ideas about.
Task Four:
- Look carefully at the location of the letters A,B,C,…,Z in ASCII what’s special about these numbers when written in binary?
- What about the same question for the letters a,b,c,…,z?
- Can you see how letters and their capital versions are related? Is the relationship easier to spot in decimal or binary?
- Look carefully at the ASCII table and notice that :;<=>? and @ are placed after 9 but before A. Then [\]^_ and ’ are inserted after Z but before a. Can you guess why?
Extension:
As the Internet developed standards needed to be developed to allow for a much wider range of characters than just default English letters (called Latin characters really). Eight binary digits is clearly not enough to cope with letters and symbols in all alphabets of the world.
Investigate online what role Unicode had as an extension to ASCII, and then how UTF-8 was introduced to allow an even wider range of characters while still preserving the underlying ASCII format.