Montreal, Québec, Canada

Roberto Rocha

Data storyteller and educator

Menu
  • Home
  • CV
  • Blog
  • Data services
  • Training
  • Contact

text data

Untitled

How to extract entities from raw text with Spacy: 3 approaches using Canadian data

November 7, 2022November 7, 2022 Roberto 0 comments

TL;DR: Use the en_core_web_trf transformer model with Spacy to get much more accurate named entity recognition with multilingual text. Entity recognition is one of the marvels or current technology, as least from a journalist’s perspective. There was a time journalists had to read through hundreds, maybe thousands of documents, highlight names of people, companies and […]

Posted in Data Journalism, Tutorials Tags NLP, Python, spaCy, text data
Read More

Recent Posts

  • Pair programming with LLMs: putting 5 leading models to the test
  • How to use ChatGPT Vision to turn handwritten forms into data
  • Using ChatGPT to clean data: an experiment
  • How to extract entities from raw text with Spacy: 3 approaches using Canadian data
  • Getting tabular data from unstructured text with GPT-3: an ongoing experiment

Recent Comments

  • Aditya Sharma on Using Python’s calendar module for scraping date-based data
  • Jed Clark on Using NLP to analyze open-ended responses in surveys
  • Roberto on Using ChatGPT to clean data: an experiment
  • Chris on Using ChatGPT to clean data: an experiment
  • Jacques Dufort on Using NLP to analyze open-ended responses in surveys
Theme: Albar by Kaira