Building AML Regulatory Platforms For The Big Data Era

Big data technologies can help financial services firms comply with a myriad of regulations, including US anti-money laundering requirements.

Banking is increasingly a global business and leading US Banks are beginning to generate serious amounts of revenue in non-US markets. A natural consequence of this revenue surge is not just the proliferation of newer financial products tailored to local markets but also increased integration between the commercial and investment banking operations.

Combine these trends with the regulatory mandates, including those in the USA PATRIOT Act , that require banks to put in effective compliance , regulatory pressures continue to increase on Wall Street. The PATRIOT ACT requires all FINRA member firms to develop and implement anti-money laundering (AML) compliance programs that comply with the Bank Secrecy Act (BSA).

AML legislation was first introduced in 2001 as part of the US Patriot Act. The legislation targets money laundering and mandates financial institutions help the authorities investigate any suspicious transactions occurring in their customer’s accounts.

Implementation and re-engineering AML processes has been a focus for banks, especially as they adopt technologies around enterprise middleware, cloud, analytics and Big Data.

The global challenges for IT Organizations when it comes to AML are five fold:
1. The need to potentially monitor every transaction for fraudulent activity, such as money laundering
2. Ability to glean insight from existing data sources as well as integrating new volumes of data from unstructured or semi structured feeds
3. Presenting information that matters to the right users as part of a business workflow
4. Provide a way to create and change such policies and procedures on the fly as business requirements evolve
5. Provide an integrated approach to enforce compliance and policy control around business processes and underlying data as more regulation gets added with the passage of time

In order to address these challenges, financial services organizations need to meet the following business requirements:

1. Integrate & cleanse data to get complete view of any transaction that could signal potential fraud
2. Assess client risk during specific points in the banking lifecycle, such as account opening, transactions above a certain monetary value. These data points could signal potentially illegetimate activity based on any number of features associated with such transactions. Any transaction could also lead to the filing of a suspicious activity report (SAR)
3. Help aggregate such customer transactions across multiple geographies for pattern detection and reporting purposes
4. Create business rules that capture a natural language description of the policies, conditions, identifying features of activities such as those that resemble terrorist financing, money laundering, identity theft etc. These rules trigger downstream workflows to allow human investigation on such transactions
5. Alert bank personnel to completing customer identification procedures for cross border accounts
6. Track these events end to end from a tactical and strategic perspective
7. Combine transaction data with long term information from a mining perspective to uncover any previously undetected patterns in the underlying data
8. Help build and refine profiles of customers and related entities
9. Provide appropriate and easy-to-use dashboards for compliance officers, auditors, government agencies and other personnel
10. A key requirement is to implement automated business operations that not only meet the regulatory mandate but also ensuring that they are also transparent to business process owners, auditors and the authorities.

AML Platform Technologies

Building a regulatory platform is never purely a technology project. There are many other issues, including business structure, complex local market requirements, multiple audiences -- both internal and external -- with differing reporting needs, already complex business processes, program governance and SLAs. But having said all of that, our focus is to examine some of the key technical aspects in building such a solution.

The key technology components that provide the scaffolding of a large enterprise grade implementation are listed below. One needs to keep in mind that this is only a best practice recommendation and most mature architectures would already have a good portion of such a solution set in house. It is also not recommended to throw away what you have and rebuild from scratch, or other types of rip and replace strategies. These days, such a product stack can very handily be assembled leveraging open source or proprietary technology.

The platform, at a minimum, is composed of the following four tiers (starting with the bottom tier):

The Data and Business Systems tier is the repository of data in the overall information architecture. This data is produced as a result of business interactions (stored in OLTP systems), legacy data systems, mainframes, packaged applications, data warehouses, NoSQL databases and other Big Data oriented sources -- Hadoop, columnar/MPP databases, etc. The data tier is also where core data processing and transformation happens. This is also the tier where a variety of different analysis and algorithms can be run to assess the different types of risk associated with AML programs, namely:

Client Risk
Business Risk
Geographic Risk

These data silos constitute data flowing into the enterprise architecture from Big Data or unstructured sources as a byproduct of business operations, data already present in-house in data warehouses, columnar data stores and other unstructured data.

The Data Virtualization tier sits atop the data and business systems tier and transforms data into actionable information so that it can be fed into business processes and integration tier that is above it. Most financial institutions suffer from being able to provide timely operational and analytical insights due to the inability to effectively utilize data trapped in disparate applications and technology silos. In essence, the Data Virtualization tier makes data spread across physically distinct systems appears as a set of tables in a local database (a virtual data view). It connects to any type of data source, including RDBMS (SQL), analytical cubes (MDX), XML, web services, and flat files. When users submit a query (SQL, XML, XQuery or procedural), this tier calculates the optimal way to fetch and join the data on remote, heterogeneous systems. It then performs the necessary joins and transformations to compose the virtual data view, and delivers the results to users via JDBC, ODBC or web services as a Virtual Data Service -- all on the fly without developers/users knowing anything about the true location of the data or mechanisms required to access or merge it.

This tier is also comprised of tools, components and services for creating and executing bi-directional data services. Through abstraction and federation, data is accessed and integrated in real-time across distributed data sources without copying or otherwise moving data from its system of record. Data can also be persisted back using a variety of commonly supported interfaces – ODBC/JDBC or Web services (SOAP or REST) or any custom interface that can conform to an API. The intention is to be polyglot at that level as well.

Using data provisioning, management and federation capabilities that enable actionable and unified information to be exposed to a SOA/BPM/ESB layer in the easy steps:

1. Connect: Access data from multiple, heterogeneous data sources. 2. Compose: Easily create reusable, business-friendly logical data models and views by combining and transforming data.

3. Consume: Make unified data easily consumable through open standard interfaces.

4. Compliance: This tier also improves data quality via centralized access control, a robust security infrastructure and reduction in physical copies of data thus reducing risk. A metadata repository catalogs enterprise data locations and the relationships between the data elements located in various data stores, thus enabling transparency and visibility.

Data Virtualization layer of a Big Data Platform

The Integration tier is the tier which serves as the primary means of integrating applications, data, services, and devices with the regulatory platform. The integration platform uses popular technologies to provide transformation, routing, and protocol-matching services. Examples include JMS, AMQP, and STOMP.

The core technologies of the Integration tier are a messaging subsystem, a mediation framework that supports the most common enterprise integration patterns as well as an Enterprise Service Bus (ESB) to interconnect applications. Based on proven integration design patterns, this layer handles the plumbing issues that deal with application interconnects, financial format exchange and transformation and reliable messaging, so that software architects can direct more of their attention towards solving business problems

Popular Message Exchange Patterns at the Integration Tier

The BPM/Rules tier is where AML business processes, policies, and rules are defined, as well as measured for their effectiveness as a result of business activities. The BPM/Rules tier optionally hosts a Complex Event Processing (CEP) layer as an embeddable and independent software module, one still completely integrated with the rest of the platform.

CEP allows the architecture to process multiple business events with the goal of identifying the meaningful ones. This process involves:

Detection of specific business events
Correlation of multiple discrete events based on causality, event attributes, and timing as defined by the business via a friendly user interface
Abstraction into higher-level (i.e. complex or composite) events

It is this ability to detect, correlate and determine business relevance that powers a truly active decision-making capability and makes this tier the heart of a successful implementation.

Overall flow

The above architectural tiers can then be brought together as outlined below:

1. Information sources send data into the enterprise architecture via standard interfaces. These could be batch oriented or a result of real-time human interactions. This data is simultaneously fed into the data tier as well. The data tier is the golden image of all data in the architecture and may choose to present predefined or dynamic views via the virtualization tier.
2. A highly scalable messaging system, as part of the integration layer, to help bring these feeds into the architecture as well as normalize them and send them in for further processing via the BPM tier.
3. The BPM/Rules/CEP tier that can process these feeds at scale to understand relationships among them.
4. As a result of specific patterns being met that indicate potential flags, business rule process workflows are instantiated dynamically. These have created that follow a well-defined process that is predefined and modeled by the business. Different dashboards can be provided based on the nature of the user accessing this system. For instance, executives can track the total number.
5. Data that has business relevance and needs to be kept for offline or batch processing can be handled using a data grid, or columnar database or a storage platform. The idea to deploy Hadoop oriented workloads (MapReduce, or, Hive, or, Pig, or Machine Learning) to understand and learn from compliance patterns as they occur over a period of time.
6. Scale out via a cloud-based deployment model is preferred as a deployment approach as this helps the system, as the loads placed on the system increase over time.

As Chief Architect of Red Hat's Financial Services Vertical, Vamsi Chemitiganti is responsible for driving Red Hat's technology vision from a client standpoint. The breadth of these areas range from Platform, Middleware, Storage to Big Data and Cloud (IaaS and PaaS). The ... View Full Bio