Annotate.io

Cleaning up text is painful. Send us examples of what-you-have and what-you-want, we'll learn how to transform the original into the cleaned version. Alpha available.

You need clean rich data rather than big data, this service learns based on the examples you provide without needing any libraries installed on your machine.

Sign-up for alpha notification

You provide a small set of examples and we build a transformer that you can use on-line to clean up large amounts of data robustly without building your own natural language processing pipeline.

Normalise job-advert salary fields:

Job-adverts are often filled in by humans, fields such as salary are written in a variety of forms. We can learn the mapping required to normalise your examples into a consistent format:

What you have:What you want:
"To 53K w/benefits""53000"
"30000 OTE plus bonus" "30000"
"£55000 salary""55000"
"Forty two thousand GBP" "42000"

Normalise company names:

Company names are examples of text strings that can contain poorly formatted unicode, redundant tokens, superflous whitespace and unhelpful capitalisation:

What you have:What you want:
Accenture PLCaccenture
ACCENTUREaccenture
Lancômelancome
  Lancômelancome
Société Générale societe generale

Alpha in January 2015:

Sign-up for alpha notification