Language Processing: Do Computers Do It So Differently?

When we think of our computers and the language they speak, we tend to jump to the most colloquially known: But we’ll take it a step past NCIS…

It’s not so simple. (And yes, two people typing on the same keyboard is just ridiculous) Just like human language has evolved, so have the languages computers speak and read. Let’s draw some parallels!

Binary

An image like this probably just appeared in your head, right?

Okay, we’ll start here. Your processor (the CPU) works entirely in binary. Binary is important because we store values in HIGH/LOW states by using a differing voltage, but it’s only the most raw form of computer data and processing. It’s useful because of the logic gates I mentioned last post. But most data in a computer exists in some other forms.

A basic character

Here’s some necessary info for the next parts, so I’ll make this quick…

Characters in our computers are stored in two main forms, ASCII and Unicode. More about that here.

We’ll focus on ASCII here, in which 8 bits code for each character. Depending on the number that the 8 bits represent, a certain character is represented and can be looked up in a table like this one:

Hexadecimal

If we use only 0’s and 1’s, we need a lot of them to represent larger values. We can represent up to the number 256 with eight digits. But what if we want to store a much larger number? What if we want to do it in a way that makes more sense to a human?

Human readability is important. No one programs these days with only 0’s and 1’s, although some may joke about it…

A magnetized needle… would only be useful to flip individual bits x__x (0 -> 1, 1 -> 0)

Instead, sometimes it’s useful to go a step up. A good example of this is colors. For colors we use hexadecimal. You may recognize its formatting, which codes for RGB values on web pages:

Red: 0xFF0000

Green: 0x00FF00

Blue: 0x0000FF

How did we get to F from only 0’s and 1’s? Well, each digit here is actually a character. That character is an ASCII value, so a 0 would be a series of binary 48’s, and the F’s are either 70 or 102 (all this looked up in the table above).

Just like we use a series of 8 states to represent a character, we use multiple characters to represent a color. We could keep going up step-by-step, but that would take a while… Let’s get into the more interesting parts.

Some Code

When a programmer wants to get a computer to accomplish something, she’ll open up her favorite editor and type something like this…

```main()
{
printf("hello world");
}```

main is simply a function that every program has, and it always starts there. The code that gets executed lies in between the { and }. There’s one line here, it’s a call to a function. You might be able to guess that this code would print to the screen the message “hello world”. This is a classic proof of concept program that all programmers write the first time they use a language. It’s become somewhat of an inside joke 🙂

In english, we have grammar rules that we (formally) must abide by. The computer has these too. In many languages, for example, the end of a statement is indicated by a semi-colon. If you forget it, the computer gets confused and can’t evaluate the code.

Something a Little More Readable: SQL

Some standards use more natural terms, like full english words. Check out this SQL statement:

```SELECT Name, ProductNumber, ListPrice AS Price FROM Production.Product WHERE ProductLine = 'R' AND DaysToManufacture < 4 ORDER BY Name ASC;```

This queries a database, looking for data about manufacturing. A lot of it probably makes sense. We want all the data from the table called Production.Product if its Product line is ‘R’, and its days until manufacturing. Specifically, we want to select the name, the product number, and its price. SQL is coded to be written in english terms, so it looks a little more friendly.

Since this is a big topic, I’ll pick up again next week with natural language processing, a fascinating topic. See you then!

Advertisements