Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Unit 1 - Notebooks and python

Notebook basics

You are working within a Jupyter notebook. Notebooks have two types of cells: text and code. This cell contains text in markdown format. If you want to edit it, double click on it and you’ll see the markdown format.

The other type of cell is a code block. They contain code that you can execute (when you run this locally). You can do that by clicking on the “Run” button from the menu, by clicking the “play” button to the left of the cell or by pressing Shift+Enter.

The next cell is your first code block. Run it and see the output. If you are browsing the book online, then you can show the output.

There are many reasons people use notebooks. I like them for teaching because we can keep explanations, code, and output all in one file, allowing you to follow the explanations.

print("Hello world!")
Output
Hello world!

Intro to python

Python is a programming language. While R is also quite popular, python tends to be used in computational linguistics, natural language processing, and applications that deal with text data (which is ultimately language data).

There are different ways to write programs in python and there are conventions for how to write good code. In this class, you’ll mostly learn how to use existing packages and functionalities so that you can do some cool text processing.

You should by now have installed a version of Jupyter notebooks and got a GitHub account so that you can get this notebook from the class repository on GitHub. The best way to do so is through GitHub Desktop. See links in Canvas for all the information on how to get set up. Once you are set up, come back here and make sure you can run this on your own computer.

Variables and data types

  • Variables are places where you store information. They have a name and they are assigned with an equal sign.

  • Data types are the different types of variables. We’ll work mostly with numbers (integers and floats), strings, and lists.

In the code block below, we assign different data types to several variables. Then, we use type() to figure out the data type for each. Note that the first code block does not produce any output. We are simply assigning variables.

a_number = 380
a_string = "Ling"
a_list = ["is", "a", "fun", "course"]
Output
type(a_number)
Output
int
type(a_string)
Output
str
type(a_list)
Output
list

To understand the difference between a variable and its value, think of what the two print statements below will print. Try and predict it before you run the cell!

Below you also see the use of comments. Anything after a hashtag (#) and until the end of the line is ignored. It’s useful if you want to leave notes to yourself, your future self, or others.

# Wait, did you think before you ran this cell? 

print(a_number)
print("a_number") # What data type is this?
print(a_list) # Note that a list is printed in square brackets with the elements separated by a comma
Output
380
a_number
['is', 'a', 'fun', 'course']
# Now we can put all the variables together and print a sentence, sort of

print(a_string, a_number, a_list)
Output
Ling 380 ['is', 'a', 'fun', 'course']
# You can also combine hard-coded strings and variables in a print statement

print(a_string, a_number, "is a one-credit course")
Output
# f-string tends to print better when you have a mix of variables

print(f"{a_string} {a_number} is a one-credit course")
Output
Ling 380 is a one-credit course
# this is a better way to store and print a sentence or a series of sentences, as a string

a_sentence = "Ling 380 is a special topics course. This semester, the topic is 'Python for text analysis'."
print(a_sentence)
Output
Ling 380 is a special topics course. This semester, the topic is 'Python for text analysis'.

There are also some conventions for naming your variables. Note that you cannot use spaces (variable name) or hyphens (variable-name) in variable names.

Some people prefer to use underscore (variable_name), some people use “camel case”, where the words are indicated by upper case (variableName).

The main thing is that you should have useful names. a_number was actually not a very useful name!

Functions

Functions are blocks of code that allow you to do things repeatedly and more efficiently. Some functions, like print() for(), and if(), are already pre-defined in python. But you can also write your own functions. They have the following structure:

  • name of the function (head)

  • parentheses after the name (())

  • colon after name and parentheses (:)

  • indent after the head. Use the Tab key on your keyboard to insert the indent

head(): function_line(s)

The head often has one or more arguments, which you put in the parentheses in the head. The function lines often include something that you return (often a manipulation of the arguments). To create a new function, you first have to define it with def.

# Create a function

def printing_function():
    print("This function prints this statement.")
Output
# Use (call) a function

printing_function()
Output
This function prints this statement.

This is actually a pretty useless function, because python already has a print() function!

Useful commands

One of the most annoying things you’ll encounter is figuring out where your files are and making sure you are reading and writing to the right directory on your computer.

  • pwd gives you the current directory. If you run the next cell, you should see the path where this notebook is located.

pwd
Output