MDS Computational Linguistics | UBC Master of Data Science (2022)

UBC’s Master of Data Science in Computational Linguistics is the credential to set you apart. Offered at the Vancouver campus, this unique degree is tailored to those with a passion for language and data. Over 10 months, the program combines foundational data science courses with advanced computational linguistics courses—equipping graduates with the skills to turn language-related data into knowledge and to build AI that can interpret human language.

Are you passionate about language?

MDS Computational Linguistics | UBC Master of Data Science (1)

Are you passionate about language and curious about data? UBC’s new Master of Data Science in Computational Linguistics specialization was designed for you. An accelerated, 10-month, full-time program gets you into a career faster.

Program Benefits

Highlights Across All MDS Programs:

  • 10-month, full-time, accelerated program offers a short-term commitment for long-term gain
  • Condensed one-credit courses allow for in-depth focus on a limited set of topics at one time
  • Capstone project gives students an opportunity to apply their skills
  • Real-world data sets are integrated in all courses to provide practical experience across a range of domains

Highlights Specific To Computational Linguistics:

  • Courses are taught by a combination of arts (linguistics), computer science, and statistics faculty members giving students access to key experts within each field of study
  • Students learn fundamental data science skills, techniques, and tools with the core Master of Data Science cohort, then branch off into more specialized courses, experiencing the benefits of a large program and small program in one
  • UBC’s Vancouver campus offers students the unrivaled experience of a top 40 university, surrounded by remarkable natural beauty, at the edge of a cosmopolitan city
  • Strong connections with industry partners in public and private sectors, start-ups, and leading tech companies offer a wide range of networking/career opportunities


The program structure includes 24 one-credit courses offered in four-week segments. Courses are lab-oriented and delivered in-person with some blended online content.

At the end of the six segments, an eight-week, six-credit capstone project is also included, allowing students to apply their newly acquired knowledge, while working alongside other students with real-life data sets. Please note that instructors are subject to change.

* subject to change at the discretion of the MDS Computational Linguistics program

Fall: September - December

Block 1 (4 weeks, 4 credits)

Programming for Data Science | DSCI 511

Program design and data manipulation with Python. Overview of data structures, iteration, flow control, and program design relevant to data exploration and analysis. When and how to exploit pre-existing libraries.


Arman Ahmadi

Computing Platforms for Data Science | DSCI 521

How to install, maintain, and use the data scientific software stack. The Unix shell, version control, and problem solving strategies. Literate programming documents.


Programming for Data Manipulation | DSCI 523

Program design and data manipulation with R. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis.


Gittu George

Descriptive Statistics and Probability for Data Science | DSCI 551

Fundamental concepts in probability including conditional, joint, and marginal distributions. Statistical view of data coming from a probability distribution.


Alexi Rodríguez-Arelis

(Video) MDS Computational Linguistics Alumni Panel (October 2022)

Block 2 (4 weeks, 4 credits)

Algorithms & Data Structures | DSCI 512

How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as recursion and algorithmic complexity (e.g., efficiency, scalability).


Jungyeul Park

Data Visualization I | DSCI 531

Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python.


Joel Östblom

Statistical Inference and Computation I | DSCI 552

The statistical and probabilistic foundations of inference. Large sample results. The frequentist paradigm.


Alexi Rodriguez-Arelis

Supervised Learning I | DSCI 571

Introduction to supervised machine learning. Basic machine learning concepts such as generalization error and overfitting. Various approaches such as K-NN, decision trees, linear classifiers.


Varada Kolhatkar, Florencia D'Andrea

Block 3 (4 weeks, 4 credits)

Corpus Linguistics | COLX 521

Basic processing of text corpora using Python. Includes string manipulation, corpus readers, linguistic comparison of corpora, structured text formats, and text preprocessing tools.


Garrett Nicolai, Jungyeul Park

(Video) MDS Computational Linguistics Student & Alumni Panel (October 2020)

Databases & Data Retrieval | DSCI 513

How to work with data stored in relational database systems. Storage structures and schemas, data relationships, and ways to query and aggregate such data.


Arman Ahmadi

Regression I | DSCI 561

Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction.


Feature and Model Selection | DSCI 573

How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization.


Varada Kolhatkar, Joel Östblom

Winter: January - April

Block 4 (4 weeks, 4 credits)

Parsing for Computational Linguistics | COLX 535

The identification of syntactic structure in natural language. Parsing algorithms for popular grammar formalisms, application of statistical information to parsing, parser evaluation, and extraction of parse features.


Miikka Silfverberg, Jungyeul Park

Computational Semantics | COLX 561

How meaning is represented by computers. An overview of popular semantic resources, and techniques for building new resources from unstructured text data.


Garrett Nicolai, Jungyeul Park

Unsupervised Learning | DSCI 563

How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm.


Garrett Nicolai, Jungyeul Park

Supervised Learning II | DSCI 572

Introduction to optimization. Gradient descent and stochastic gradient descent. Roundoff error and finite differences. Neural networks and deep learning.


Jian Zhu, Jungyeul Park

(Video) UBC Master of Data Science Computational Linguistics Alumna - Niki

Block 5 (4 weeks + 1 week break, 4 credits)

Advanced Corpus Linguistics | COLX 523

Text corpora collection and curation. How to pull representative datasets from internet sources. Techniques for efficient and reliable annotation.


Garrett Nicolai, Jungyeul Park

Computational Morphology | COLX 525

Approaches to sub-word phenomenon in language processing. Automatic morphological analysis of diverse languages, part of speech tagging, word segmentation, and character-level neural network models.


Miikka Silfverberg, Jungyeul Park

Machine Translation | COLX 531

Key methodologies for automatic translation between languages, with a focus on statistical and neural machine translation approaches. Applying Machine Translation (MT) architectures to analogous monolingual tasks. MT evaluation.


Jian Zhu, Jungyeul Park

Sentiment Analysis | COLX 565

Identification and analysis of opinion, especially in social media. Text polarity and emotion classification, fine-grained (e.g. aspectual) opinion mining, argumentation mining, sentiment in social networks.

Block 6 (4 weeks, 4 credits)

Advanced Computational Semantics | COLX 563

Application of machine learning to various semantic tasks. Likely topics include: information extraction, semantic role labelling, semantic parsing, discourse parsing, question answering, summarization, and natural language inference.


Miikka Silfverberg, Jungyeul Park

(Video) MDS Computational Linguistics Capstone Seminar: Data Analytics in the Legal Profession

Natural Language Processing for Low-Resource Languages | COLX 581

Building automatic language tools when data is scarce. Rule-based and hybrid systems, semi-supervised learning, active learning. Knowledge transfer from other (related) languages.


Miikka Silfverberg, Jungyeul Park

Trends in Computational Linguistics | COLX 585

Cutting-edge techniques in natural language processing. For this iteration, the latest innovations in neural network architectures.


Jian Zhu, Jungyeul Park

Privacy, Ethics & Security | DSCI 541

The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.


Garrett Nicolai, Jungyeul Park

Spring: May - June

Capstone Project (8-10 Weeks, 6credits)

Capstone Project | COLX 595

A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a project report, presentation, and possibly other products, such as a web application.


MDS-CL Staff

Meet Amy

MDS Computational Linguistics | UBC Master of Data Science (2)

Even though Amy found the MDS Computational Linguistics program an intensive and accelerated one, it actually better fit her needs. Amy felt the most important thing they learned is to solve problems and once you are able to see a clear picture of the data, you are able to feel a sense of achievement.

Review Admission Requirements Contact Us With Questions


1. UBC Master of Data Science Computational Linguistics Alumna - Serena
(UBC Master of Data Science)
2. UBC Master of Data Science Computational Linguistics Alumna – Amy
(UBC Master of Data Science)
3. UBC Master of Data Science Computational Linguistics Alumna – Naga Sirisha
(UBC Master of Data Science)
4. UBC MDS Computational Linguistics Alumna - Serena on those best suited for the program.
(UBC Master of Data Science)
5. UBC MDS Computational Linguistics Alumna - Serena on who is best suited for this program.
(UBC Master of Data Science)
6. UBC MDS Computational Linguistics Alumna - Niki on her favourite part of the program
(UBC Master of Data Science)

Top Articles

Latest Posts

Article information

Author: Manual Maggio

Last Updated: 12/28/2022

Views: 6483

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.