WatchCharts Forum

Improving the accuracy of our data (Part 1)

Since launching WatchCharts in March of last year, one continual challenge has been to turn unstructured forum sale posts into structured listings from which we can filter, sort, and analyze. This means extracting the brand, collection, model, price, and whether the watch is sold or not, among other facets. In this week’s update to the community, I’ll discuss how we’re making improvements to our data accuracy in two different ways.

  1. Allowing users to report listings
  2. Filtering non-watch sales

Also, check out last week’s update if you’re interested in the motivations behind our recent Price Guide redesign.

New Feature: Report a Listing

Recently, we’ve been getting a bit of feedback (including a bug report on our Discord server) regarding inaccuracies in our listing data, or even fraudulent listings from accounts whose credentials had been stolen.

As a response, we’ve launched a new feature to allow users to report such inaccuracies, fraud, or any other issue that might be present in our listings. By making a report, you can help improve the accuracy of our data, and our value to the community.

For example, replicated below is a listing where there is an inaccuracy in the extracted price:

As you can see, we’ve predicted the price to be $50,000, but from reading the description the actual price is in fact $8,850. While such errors are uncommon, they can happen particularly when there are multiple prices in the listing description.

To report the error, you can click the red “Report Listing” button above the image gallery, and select the appropriate reason.

Screen Shot 2020-06-07 at 1.34.24 AM

Selecting a reporting reason will take you to the Site Feedback forum, and auto-create a draft with the report details (shown below). Note that you must be logged in to make a report. Also, I’ve already gone ahead and fixed the price error in the aforementioned listing, so no report is necessary for that one :slight_smile: .

Filtering non-watch sales

One barrier to improving our market analytics is that the accuracy of our data can often be skewed by the inclusion of non-watch listings in our calculations. We see this particularly with vintage models, such as the Rolex GMT-Master 1675. The first thing you’ll notice is the incredibly high variation in the price range. While price variance is generally higher for vintage models due to greater variance in condition, our data is being further skewed by listings for Rolex 1675 casebacks, bezels, clasps, dials, and a variety of other parts or accessories.

We currently already apply some primitive pre-processing techniques to try to remove such invalid data points. However, these techniques are rather limited in their effectiveness, and further improvement is necessary in order for us to be able to provide more meaningful analytics.

Fundamentally, the most reasonable thing to do seems to be to just eliminate any non-watch listings from our analytics calculations. However, determining this in an automated fashion is no easy feat. At the moment, our most promising prospect seems to be to use artificial intelligence. Specifically, we are looking into applying image classification techniques to eliminate listings that do not prominently feature a photo of an actual watch.

After implementing this feature, we hope to be able to apply more meaningful market price and range calculation algorithms using weighted moving average and weighted standard deviation. I’ll talk about the reasoning for using weighted metrics, and go into these techniques in more detail in a future post (hopefully when we announce these updates).

We feel that there’s a ton of exciting things we can do in the watch space using statistics and artificial intelligence. And we’d love to hear your feedback on what we’re doing. Please feel free to leave your thoughts in the comments below.

These changes will really make a difference to the user experience of this site. It’s a significant challenge to autonomously capture all the applicable data given that no two listing formats are alike, so a feedback feature like this is unavoidable. Equally, identifying the difference between USD, AUD and SDG when the seller has just used the symbol $ must be impossible given that it is not always clear when reading the listing itself!

I’ve often noticed listings posted in reddit/WUS/ebay that do not appear on watchcharts and I thought it’d be useful to have a feature whereby we can notify you of any that slip through the net (presumably you employ a bot that searches for new listings and collates the details on your site). A function that allows us to report uncaptured listings would improve the listings/sales data collated on your site and give you the opportunity to fine-tune the algorithm where needed. The feedback feature recently added only works on current listings and hence does not allow a user to report a missing listing.

Here is one example of a WUS listing that does not appear on watchcharts (Grand Seiko SBGH267):

Ah, so the reason that listing is missing is because it was posted to the WUS dealer sales forum, while we only retrieve listings from the private sales forum. Perhaps it would be a good idea to retrieve from the dealer sales forum as well, but the reason we initially decided not to is because we feel that dealer prices are higher and not as representative of market value as private sales.

As far as I know, we should not be missing any listings from the WUS private sales forum or Reddit, though please let me know if you see any.

Reporting any missing listings could also be done by making a post on our #site-feedback forum.