Tutorials – Roberto Rocha

Pair programming with LLMs: putting 5 leading models to the test

March 26, 2024March 27, 2024 Roberto 0 comments

Wanna skip the blabla and get right to the code? Access the Colab notebook here. I recently took a short course on DeepLearning.ai called Pair Programming with LLMs, where you learn how to use Google’s PaLM2 language model to help write, debug, and explain code within a Jupyter Notebook environment. Well, PaLM is old news, […]

How to use ChatGPT Vision to turn handwritten forms into data

October 30, 2023October 30, 2023 Roberto 0 comments

Takeaways: ChatGPT can turn handwritten forms into data, even with sloppy handwriting. Defining a schema of the desired output helps. It makes mistakes. Output still needs to be validated and possibly fixed by hand. Can’t be automated with API yet. Still need to manually upload images to web application. Limit of four images per upload […]

How to extract entities from raw text with Spacy: 3 approaches using Canadian data

November 7, 2022November 7, 2022 Roberto 0 comments

TL;DR: Use the en_core_web_trf transformer model with Spacy to get much more accurate named entity recognition with multilingual text. Entity recognition is one of the marvels or current technology, as least from a journalist’s perspective. There was a time journalists had to read through hundreds, maybe thousands of documents, highlight names of people, companies and […]

4 ways to make self-updating Datawrapper charts

August 10, 2021August 11, 2021 Roberto 0 comments

Datawrapper is right now the best tool for creating quick and simple charts. It’s so useful and feature-rich that news organizations that had their own in-house charting tool are switching over. One of its best features is the ability to connect a CSV file hosted on the web as a data source. This enables users […]

Setting up a Selenium web scraper on AWS Lambda with Python

April 29, 2018February 3, 2022 Roberto 110 Comments

IMPORTANT UPDATE This post is outdated now that AWS Lambda allows users to create and distribute layers with all sorts of plugins and packages, including Selenium and chromedriver. This simplifies a lot of the process. Here’s a post on how to make such a layer. And here’s a list of useful pre-packaged layers. This post […]

How I made the Montreal street history map

August 6, 2015August 7, 2015 Roberto 4 Comments

Click here to see the map at Huffington Post Québec First of all, a clarification. I did not really make that map. I adapted the code from Noah Veltman’s San Francisco history map, and made one for Montreal. Compare both maps, and you’ll see they are very similar in many ways. That said, the data sources […]

The best PyCon 2015 videos for journalists

July 6, 2015July 7, 2015 Roberto 0 comments

PyCon is the world’s biggest conference for Python programmers, with great talks for both veterans and newcomers. And every year, organizers publish videos of talks and workshops for free for all to enjoy. Here is my selection of videos from this year’s conference in Montreal that I believe are of value for journalists who use […]

Making an interactive choropleth map with Jqvmap and Geostats

May 23, 2015May 23, 2015 Roberto 0 comments

See the live map here. Jqvmap is an excellent JavaScript library for making sleek D3-style interactive vector maps quickly and easily. However, its suggested technique for visualizing data (see the library’s documentation) leaves something to be desired. This post suggests another way, using Geostats, another excellent JS library. To be clear: jqvmap’s dataviz technique works, […]

Using Python’s calendar module for scraping date-based data

March 10, 2015March 18, 2015 Roberto 3 Comments

I’ve recently fallen in love with Python’s standard calendar module. It has lots of functions to make handling dates a breeze. And for scraping data based on dates, it couldn’t be more convenient. Take Environment Canada’s historical hourly data for Montreal. Each page has 24 hours of data in a single day. If I want […]