Since launching WatchCharts in March of last year, one continual challenge has been to turn unstructured forum sale posts into structured listings from which we can filter, sort, and analyze. This means extracting the brand, collection, model, price, and whether the watch is sold or not, among other facets. In this week’s update to the community, I’ll discuss how we’re making improvements to our data accuracy in two different ways.
Recently, we’ve been getting a bit of feedback (including a bug report on our Discord server) regarding inaccuracies in our listing data, or even fraudulent listings from accounts whose credentials had been stolen.
As a response, we’ve launched a new feature to allow users to report such inaccuracies, fraud, or any other issue that might be present in our listings. By making a report, you can help improve the accuracy of our data, and our value to the community.
For example, replicated below is a listing where there is an inaccuracy in the extracted price:
As you can see, we’ve predicted the price to be $50,000, but from reading the description the actual price is in fact $8,850. While such errors are uncommon, they can happen particularly when there are multiple prices in the listing description.
To report the error, you can click the red “Report Listing” button above the image gallery, and select the appropriate reason.
Selecting a reporting reason will take you to the Site Feedback forum, and auto-create a draft with the report details (shown below). Note that you must be logged in to make a report. Also, I’ve already gone ahead and fixed the price error in the aforementioned listing, so no report is necessary for that one .
One barrier to improving our market analytics is that the accuracy of our data can often be skewed by the inclusion of non-watch listings in our calculations. We see this particularly with vintage models, such as the Rolex GMT-Master 1675. The first thing you’ll notice is the incredibly high variation in the price range. While price variance is generally higher for vintage models due to greater variance in condition, our data is being further skewed by listings for Rolex 1675 casebacks, bezels, clasps, dials, and a variety of other parts or accessories.
We currently already apply some primitive pre-processing techniques to try to remove such invalid data points. However, these techniques are rather limited in their effectiveness, and further improvement is necessary in order for us to be able to provide more meaningful analytics.
Fundamentally, the most reasonable thing to do seems to be to just eliminate any non-watch listings from our analytics calculations. However, determining this in an automated fashion is no easy feat. At the moment, our most promising prospect seems to be to use artificial intelligence. Specifically, we are looking into applying image classification techniques to eliminate listings that do not prominently feature a photo of an actual watch.
After implementing this feature, we hope to be able to apply more meaningful market price and range calculation algorithms using weighted moving average and weighted standard deviation. I’ll talk about the reasoning for using weighted metrics, and go into these techniques in more detail in a future post (hopefully when we announce these updates).
We feel that there’s a ton of exciting things we can do in the watch space using statistics and artificial intelligence. And we’d love to hear your feedback on what we’re doing. Please feel free to leave your thoughts in the comments below.