Cleaning up text is painful. Send us examples of what-you-have and what-you-want, we'll learn how to transform the original into the cleaned version. Alpha to follow in early 2015:
You provide a small set of examples and we build a converter that you can use on-line to clean up large amounts of data robustly without building your own natural language processing and machine learning pipeline. You'll save time (no need to fuss with regular expressions!), we can handle one-off or long-running conversion jobs. You get to focus on pulling value out of your data rather than investing weeks cleaning it up.
Job-adverts are often filled in by humans, fields such as salary are written in a variety of forms. We can learn the mapping required to normalise your examples into a consistent format:
|"To 53k w/benefits"||"53000"|
|"30000 OTE plus bonus"||"30000"|
eCommerce pages are often scraped from a variety of database sources, each product is written using a variety of units or common synonyms. We transform these into an easily-understood format:
|32 inch widescreen||32"|
|Thirty-three inch beautiful widescreen||33"|