Takeaways:

  • ChatGPT can turn handwritten forms into data, even with sloppy handwriting.
  • Defining a schema of the desired output helps.
  • It makes mistakes. Output still needs to be validated and possibly fixed by hand.
  • Can’t be automated with API yet. Still need to manually upload images to web application.
  • Limit of four images per upload could make the job very long.

Ever since OpenAI rolled out ChatGPT Vision, giving the AI chatbot the ability to interpret images, the internet has documented all sorts of uses, from suggesting recipes worth a picture of ingredients on a counter, to explaining a complex diagram.

Here’s one I haven seen yet: you can ask it to turn hand-written forms into structured data.

At the Investigative Journalism Foundation, where I currently work, we are building a database of the declared financial assets of every politician in Canada, federal and provincial (if you want to support this project, consider subscribing).

Not all these declarations come as tidy data, ready to be dumped in a single file. Some come as cellphone photos of handwritten forms. Like this one from Nova Scotia:

These forms are five pages each, and we have hundreds of them. Turing them into digital data would involve days of tedious manual data entry, which is prone to mistakes.

Or we could use some combination of OCR to turn it into digital text and parsing tools like Amazon Textract. But that would involve a lot of manual work later to transfer the digital text from a text file to a database.

I asked ChatGPT Vision to do it for me. Because some questions can have multiple answers, I though JSON would be an appropriate output.

Test 1: Minimal instructions

First, I asked the chatbot if this is something it can do:

I have photos of pages of handwritten forms with the public disclosure of assets held by a politician in Nova Scotia. I need you to transform them into JSON data with an item for each filled field. The key should be the field’s prompt and the value the handwritten response. Can you do that?

To which it responded:

Yes, I can assist with that. Please upload the images of the handwritten forms, and I will do my best to extract the relevant information and convert it into a JSON format. Please note that the accuracy of information extraction from handwritten text can vary depending on the legibility of the writing.

So I just uploaded four pages and told it to go ahead. To my great surprise, this was the result given the bare-bones instructions given:

Here’s the extracted information in JSON format based on the provided images:

{
  "Statement Information": {
    "This disclosure statement is filed on behalf of": "Barbara Adams",
    "This statement is an": "initial statement (to be filed within 30 days of becoming a member)"
  },
  "Information about Member": {
    "Electoral district you represent": "Eastern Passage",
    "List of any appointments held by you under [the] Executive Council Act": "Minister of Seniors & Long Term Care",
    "Your office address": {
      "street": "1488 Main Rd",
      "municipality": "Eastern Passage NS",
      "postal code": "B3G 1M5"
    },
    "Your telephone numbers": {
      "constituency office": "902-406-0656",
      "caucus office": "902-424-2731",
      "ministerial office": "902-424-0170"
    },
    "Name of your spouse": "NA",
    "Names of your dependent children": "NA"
  },
  "Exclusions": "Not required to disclose any of the following in this disclosure statement (subsection 11(3) of the Conflict of Interest Act)",
  "Disclosures": {
    "a. name of every individual or organization that in any manner whatsoever remunerates or contributes to you": {
      "Individual or organization": "None",
      "Civic address": "NA",
      "Capacity in which received": "NA"
    },
    "b. the issuer or name of any bonds, debentures, holdings in investment funds, mutual funds, investment trusts or similar securities you have": "NA",
    "c. the nature and location of any interest you have in or with respect to real property, whether situate within or outside the Province": {
      "Location of property": "NA",
      "Nature of interest": "NA"
    },
    "d. unless excluded in item b, the name of every corporation, or subsidiary of a corporation, in which you hold a beneficial interest, a share warrant or purchase option": "NA",
    "e. the nature of any interest of yours in property of a kind required to be disclosed in item b, c or d, and over which you, your spouse or your dependent children have a general power of appointment or an expectancy under a trust": {
      "Nature of interest in disclosed property": "NA"
    },
    "f. any trust established for you, and the names of the trustees and the contributors to the trust": {
      "Name of trust": "NA",
      "Contributors to trust": "NA",
      "Trustees": "NA",
      "Address for trustees": "NA"
    },
    "g. the name of every individual or organization to which you are financially indebted": "NA"
  }
}

It followed the instructions to the letter. Each question was the JSON key, and the answers were the values. It understood the cursive handwriting. It even created nested key-value pairs for the components of the office address.

The only mistake it made was interpreting a 7 as a 1 in the third phone number.

Even though this was incredible, I thought I could do better.

Test 2: Defining a schema

This already saved someone a lot of mind-numbing work. But the JSON isn’t ideal. The keys are too long, and would use up unnecessary memory. And there’s no guarantee that it would maintain the same structure on future requests with other forms.

So I defined a schema for the output JSON and fed it with my prompt. To make it easier for the software to understand, I kept the numbers and litters of each question in the keys, and listed the data type I was expecting as the value, like str (for string or text), or a nested object.

I have several photos of hand-written forms. Your task is to convert the information in the forms into JSON. The full form is five pages long, but not all images will have all pages. In these cases, extract the information in the available pages. If the responses are blank or have “NA” as an answer, make the values “NA”. Use the following schema for the JSON.
{
"1.statement_information": {
"name": str,
"statement_type": str
},
"2.information_about_member": {
"electoral_district": str,
"appointments": list of str,
"office_address":
{ "street": str,
"municipality": str,
"postal_code": str },
"telephone_numbers":
{ "constituency_office": str,
"caucus_office": str,
"ministerial_office": str },
"dependent_children": [str array]
},
"4.disclosures": {
"a.renumerating_persons_or_organizations": object array with following structure:
[
{"individual": str,
"civic_address": str,
"capacity": str}
],
"b.issuers_of_holdings": [str array],
"c.real_estate_property": object array with following structure:
[
{"location": str,
"nature_of_interest": str }
],
"d.interests_in_corporations": [str array],
"e.nature_of_interest_in_disclosed_property": [str array],
"f.trusts: object array with following structure:
[
{ "name_of_trust": str,
"contributors_to_trust": str,
"trustees": str,
"address_for_trustees": str}
],
"g.financial_creditors": object array with following structure:
[
{"individual_or_organization": str,
"address": str}
],
"h.financial_debtors": object array with following structure:
[
{"individual_or_organization": str,
"address": str}
],
"i.transfers_and_settlements": object array with following structure:
[
{"individual_or_organization": str,
"address": str,
"asset_or_liability": str}
],
"j.payments_from_parties_or_associations": object array with following structure:
[
{"payment_source": str,
"amount": str,
"date": str}
],
"k.gifts_greater_200": object array with following structure:
[
{"gift": str,
"received_from": str,
"value": str}
]
},
"date_filed": str
}

After this message, I will upload the photos for you to parse.

Then I uploaded these four pages, which have actual answers in the boxes rather than just “NA”:

And this was the result:

{
  "1.statement_information": {
    "name": "Jill Balser",
    "statement_type": "annual statement (to be filed on or before June 30 each year)"
  },
  "2.information_about_member": {
    "electoral_district": "Digby Annapolis",
    "appointments": ["Minister of Labour, Skills and Immigration", "Treasury Board"],
    "office_address": {
      "street": "138 Hwy 303, Unit 2",
      "municipality": "Conway Digby",
      "postal_code": "B0V 1A0"
    },
    "telephone_numbers": {
      "constituency_office": "902-308-1992",
      "caucus_office": "902-424-7313",
      "ministerial_office": "902-424-5301"
    },
    "dependent_children": ["NA"]
  },
  "4.disclosures": {
    "a.renumerating_persons_or_organizations": [
      {
        "individual": "Inspiring Communities",
        "civic_address": "39 Warwick St, Digby, NS B0V 1A0",
        "capacity": "Direct Deposit"
      },
      {
        "individual": "Government of Nova Scotia",
        "civic_address": "One Government Place, Barrington St, Halifax, NS",
        "capacity": "Direct Deposit"
      }
    ],
    "b.issuers_of_holdings": ["RBC - RRSP", "Sunlife - Life Insurance"],
    "c.real_estate_property": [
      {
        "location": "Saint John, NB",
        "nature_of_interest": "Rental Property"
      }
    ],
    "d.interests_in_corporations": ["NA"],
    "e.nature_of_interest_in_disclosed_property": ["Potential rental property to manage"],
    "f.trusts": [],
    "g.financial_creditors": [],
    "h.financial_debtors": [],
    "i.transfers_and_settlements": [],
    "j.payments_from_parties_or_associations": [],
    "k.gifts_greater_200": []
  },
  "date_filed": "NA"
}

It followed the schema perfectly.

It only made two mistakes: the last four digits of the caucus office number are wrong, and it didn’t fill anything in for item g. This could be because the text for the key I gave it was too different from the form’s actual text. This would need to be tweaked, and the output would still need to be double checked.

Drawbacks

Although the time-saving potential is enormous, there’s no way to automate this. If you have hundreds of forms, each with many pages, the job would still be very long.

OpenAI’s API does not yet allow file uploads, so you have to use the ChatGPT website. And you can only upload four images at a time, so you’d need to make multiple requests for each form, ask the bot to update the JSON, and copy-paste the output each time.

It’s a matter of time until we get there, I’m sure. Rather than manual data entry, or stitching together different and imperfect processes, this could change the job to just uploading and proofreading. It would still be tedious and repetitive, to be sure, but much more efficient than the alternative.

Have you used ChatGPT Vision creatively for transforming data? Please share in the comments.

Leave a Reply

Your email address will not be published.