Annotate.io

Cleaning up data is painful and wastes time. Send us examples of what-you-have and what-you-want and Annotate will learn to transform the original into the cleaned version. You then send through new messy data and you'll get cleaned data back automatically.

You're freed to focus on solving more valuable problems with your freshly cleaned data. Your messy data is quickly turned into clean rich data.

Sign-up for alpha notification

DEMO - see the Python demo on github and then try posting your own data.

You provide a small set of examples and we build a transformer that you can use on-line to clean up large amounts of data robustly without building your own natural language processing pipeline.

Normalise job-advert salary fields:

Job-adverts are often filled in by humans, fields such as salary are written in a variety of forms. We can learn the mapping required to normalise your examples into a consistent format:

What you have:What you want:
"To 53K w/benefits""53000"
"30000 OTE plus bonus" "30000"
"£55000 salary""55000"
"Forty two thousand GBP" "42000"

You submit these examples, we'll learn a mapping (convert text to numbers then expand-K-to-000s then extract the number), you then apply this mapping to your data.

Normalise company names:

Company names are examples of text strings that can contain poorly formatted unicode, redundant tokens, superflous whitespace and unhelpful capitalisation:

What you have:What you want:
Accenture PLCaccenture
ACCENTUREaccenture
Lancômelancome
  Lancômelancome
Société Générale societe generale

We'll learn a mapping (fix badly encoded Unicode, strip-PLC and whitespace, lowercase, convert Unicode to ASCII), you can then send in lots of new data and it'll be immediately cleaned for you.

Alpha in January 2015:

Sign-up for alpha notification