Python for text analysis

This book is an introduction to python through practical applications in text analysis, especially for the humanities and social sciences. It is part of a course at Simon Fraser University:
Ling 380 - Special Topics in Linguistics (Python for text analysis)
Department of Linguistics, Simon Fraser University
Instructor: Maite Taboada
The book can be read by itself, but it will only make sense together with the course materials under SFU’s learning management system, for those enrolled in the course.
Course objectives¶
The course introduces basic concepts and tools for text analysis using the python programming language. It will address data capture and manipulation, data cleaning and preprocessing, and text analysis for linguistics and other social sciences.
At the end of the course, students will have learnt the basic aspects of python programming. They will understand how to process language data for various analyses.
More specifically, students will:
Learn core concepts of programming (variables, functions, objects)
Learn to install and use basic packages for text analysis (NLTK, spacy)
Be able to collect and store a dataset using existing python packages
Clean and normalize language data
Perform natural language processing analysis on language data
Copyright¶
Made available under a Creative Commons CC BY-NC-SA 4.0 License, Attribution-NonCommercial-ShareAlike 4.0.
BY: credit must be given to the creator.
NC: Only noncommercial uses of the work are permitted.
SA: Adaptations must be shared under the same terms.
Acknowledgements¶
Some of the units contain code and ideas from other sources, referenced there
Logo by Greg Holoboff, CEE at SFU
Built with Jupyter Book
Suggested citation¶
Taboada, Maite (2025) Python for text analysis. Version 1. https://
maitetaboada .github .io /python _text _analysis