![]() To train an autoencoder you’ll create a neural net such that the input and output layers both have a number of nodes equal to the size of our feature set, with a smaller hidden layer somewhere in between. This works for many different types of data, as the information content of a real-world dataset is often smaller than what its feature space is capable of containing. Sign up now for a free account and test it out yourself!Īn autoencoder is a neural net with two parts: one to compress (encode) data into a more compact representation, and the second to decompress (decode) that back into the original form. We will also make available the accompanying Python notebook on the Shakudo sandbox under ~/gitrepo/anomaly_fraud_detection/unsupervised_anomaly_detection.ipynb. Gin Rummy, The BoondocksĪ comprehensive review of all anomaly detection methods is well beyond the scope of this blog post, so we will be focusing on one effective (and broadly applicable) method: autoencoder reconstruction error. But there are also unknown unknowns things we don't know that we don't know. What I'm saying is that there are known knowns and that there are known unknowns. Simply because you don't have evidence that something does exist does not mean you have evidence of something that doesn't exist. I always say the absence of evidence is not the evidence of absence. Luckily, anomaly detection is a commonly used fraud detection method with a great track record. If there actually is no detectable difference between the two, then nothing will be able to find the fraud, and you’ll need to start collecting better data. ![]() This process does rely on the assumption that data collected from honest and fraudulent transactions will follow different distributions. You can trust the panglossian assumption that fraud is rare*, and rather than teaching the model to find fraud, you’ll teach it to find strange and anomalous occurrences - which we can then cynically assume are most likely fraud. Unless you're working with a longstanding company that already has a large backlog of fraud cases, it'll start to look like you won't be able to train your model to find fraud at all.įortunately, rather than solving this problem directly, you can cheat. All the models you’ll find need to train using a labeled dataset - and if you had an easy way of labeling your data, you wouldn't be looking for a model in the first place. Unfortunately, this is also the point where you run into a classic bootstrapping problem. Great - this is the point where you pick out some models and begin training. They also scale well with both absolute data volume and with data throughput - you don't want this system to become a transaction bottleneck as you expand to serve more customers. Meanwhile, if built well, machine learning approaches will improve in accuracy and adapt to systemic changes if you throw more and newer data at them. Human intervention is slow and costly, while bespoke "expert system" solutions are expensive to build, questionably accurate, and need to be manually updated any time the behavior behind your data changes (eg. When you have a large amount of complex data to categorize, such as labeling a database of customer transactions as "fraudulent" or "not fraudulent", you’ll likely want to employ a machine learning solution. Let's figure out how you can find instances of fraud. Congratulations, your business is up and running! You have an attractive array of services with happy customers ready to pay for them - and very few of them are scamming you! But being the go-getter that you are, you're striving for more - to reach for the eternal dream, the Elysium fields where even fewer customers are committing fraud.
0 Comments
Leave a Reply. |