Using ChatGPT to clean data: an experiment

One of the most annoying parts of data work is dealing with inconsistent entities: names of the same person spelled differently. Company names that rebranded, merged, or have varying suffixes like “Ltd.” and “Limited”. Standardizing data for accurate analysis can take days, sometime weeks, even with powerful tools like OpenRefine and Dedupe, which were made […]

Getting tabular data from unstructured text with GPT-3: an ongoing experiment

One of the most exciting applications of AI in journalism is the creation of structured data from unstructured text. Government reports, legal documents, emails, memos… these are rich with content like names, organizations, dates, and prices. But to get them into a format that can be analyzed and counted, like a spreadsheet, usually involves days […]