You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1 line
1.6 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

1. Objective:
A. We want to detect outliers in the quantity of purchases at a supermarket.
B. We want to detect outliers in the whole transaction at a supermarket.
2. License: Free to use but requires citation of the following paper: Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D. and Giannotti, F., Explaining the Product Range Effect in Purchase Data. In BigData, 2013.
3. Data Source: http://www.michelecoscia.com/?page_id=379
4. DataSet Info: This is a dataset obtained from one of the largest Italian retail distribution company named Coop. The original dataset contains around ~25 million purchase records from January 2007 to December 2011. We merged three three separate files that comes with the original dataset and include only the first 100000 purchases.
5. Field Meanings:
A. customer_id: Unique customer ID.
B. shop_id: Unique shop ID.
C. product_id: Unique product ID.
D. quantity: Quantity in which the product was purchased.
E. price: Product price.
F. distance: Distance between the customers house and the shop location in meters.
G. probable_cause: Field that has most influence for making a outlying transaction.
H. isOutlier: 1(Outlier)/0(Normal)
6. Parameter Selection:
A. Dashboard Usage: Detect Numerical Outlier
Settings:
1)Search command: | inputlookup supermarket.csv | head 1000
2)Field to analyze: quantity
3)Threshold method: Standard Deviation
4)Threshold multiplier: 5
5)Sliding window: N/A
B. Dashboard Usage: Detect Categorical Outlier
Settings:
1)Search command: | inputlookup supermarket.csv
2)Field(s) to analyze: customer_id, shop_id, product_id, quantity, price, distance