"Accessible Investigative Journalism: Navigating Canada’s Largest Corpus"
Friday 20 September, noon (EDT)
On Friday 20 September 2024, noon (EDT), we host Sana Shams, University of British Columbia and Waris Bhatia, University of British Columbia, on “Accessible Investigative Journalism: Navigating Canada’s Largest Corpus of Government Documents”.
All welcome: https://utoronto.zoom.us/j/84277066292
Sana is a research fellow through UBC’s Data Science for Social Good Program. She is currently pursuing a BSc in cognitive systems with a minor in data science and is passionate about the intersection of technological development and ethical compliance, particularly in the fields of data science and machine learning.
Waris is a Data Science Intern with the University of British Columbia’s (UBC) Data Science Institute. He is a senior undergraduate studying Computer Science at UBC.
“Open By Default” (OBD) is a dataset from the Investigative Journalism Foundation which is Canada’s largest collection of government documents, comprising over 4.5 million pages of Access To Information and Privacy (ATIP) requests and corresponding government documentation. This project enhanced data capture using optical character recognition (OCR), improved search performance through Large Language Model (LLM) vectorization, and topic modelling to reveal the high-level subject matter represented in the OBD dataset. The final development of the project was a Retrieval Augmented Generation (RAG) LLM pipeline, which enables a chatbot to provide tailored, context-rich responses to user queries, paired with follow-up research directions.
Upcoming
Friday 27 September 2024, noon (EDT)
Annie Collins, GivingTuesday
Annie Collins is a Data Scientist at GivingTuesday, a US-based nonprofit focused on researching generosity and charitable giving behaviours. Beyond GivingTuesday, Annie has spent several years in data management and research roles within the Canadian nonprofit sector. She holds a Bachelors of Science in applied mathematics and statistics from the University of Toronto, and uses her experience to provide data for the public good and support a more data-driven social sector worldwide.
Friday 4 October 2024, noon (EDT)
Sean Taylor, Motif
Sean Taylor is a data scientist, social scientist, statistician, and software developer. He mostly specializes in methods for solving causal inference and business decision problems, and is particularly interested in building tools for practitioners working on real-world problems. He is a co-founder and chief scientist at Motif.
Friday 11 October 2024, noon (EDT)
TBA
Friday 18 October 2024, noon (EDT)
Xiaojun Su, Unilever
Xiaojun Su is a Machine Learning Lead, Horizon 3 Labs, Unilever where she leads cross-functional teams of data engineers, software developer, data scientists, postgraduate researchers, and 3rd party vendors to launch in-house models to drive significant ROIs. She holds a M.Sc from the University of Toronto.
Friday 25 October 2024, noon (EDT)
Jay Alammar, Cohere
“Hands-On Large Language Models: Language Understanding and Generation”
Jay Alammar is Director and Engineering Fellow at Cohere (pioneering provider of large language models as an API). In this role, he advises and educates enterprises and the developer community on using language models for practical use cases).
Friday 1 November 2024, noon (EDT)
TBA
Friday 8 November 2024, noon (EST)
Jacob Baldwin, Pro Football Focus (PFF)
Jacob Baldwin is a Senior Data Scientist at PFF. He holds an online M.S. degree in Applied Mathematics from the University of Washington, and graduated from Clarkson University with a B.S. in Physics, a B.S. in Applied Mathematics, and a minor in Computer Science.
Friday 15 November 2024, noon (EST)
TBA
Friday 22 November 2024, noon (EST)
Yiqin Fu, Stanford University
Born and raised in China, Yiqin Fu spent many of her formative years in the U.S. and the U.K. Yiqin (pronounced ee-ching) is studying towards a Ph.D. in political science at Stanford University, after having worked as a research associate at Yale Law School’s Paul Tsai China Center in New Haven, Connecticut and Beijing, China. She holds a B.A. in Philosophy, Politics, and Economics from the University of Oxford and is broadly interested in innovation, U.S.-China relations, and comparative political and electoral systems.
Friday 29 November 2024, noon (EST)
Caroline Weis, gsk.ai
Caroline Weis is a Senior AI/ML Engineer and team lead at gsk.ai. In 2021, she completed her PhD in Machine Learning for Computational Biology and Healthcare at ETH Zurich. Her research interests lie in the development of personalized healthcare through data analysis and machine learning on medical and biological data.