Domain Context
The vocabulary and stakes for both target domains — fraud & identity verification and multimodal sensor AI. Enough to sound credible in interviews and ask good questions back.
A · Fraud & identity (fraud-domain-flavored)
Fraud/identity platforms sit between credit applications and lenders, scoring identity authenticity. Knowing the domain language is part of the loop.
Vocabulary
| Term | Meaning |
|---|---|
| Synthetic identity | An identity manufactured from real and fake components — e.g., a real SSN with a fabricated name and DOB. |
| First-party fraud | A real person applies in their own name, intending to default. "Bust-out fraud" is the canonical pattern — pay down small balances, then max out and abscond. |
| Third-party fraud | Identity theft — someone else's data used to obtain credit. |
| Account takeover (ATO) | Adversary gains access to a real customer's account. |
| KYC | Know Your Customer — regulated identity verification, US-driven by USA PATRIOT Act and CIP rules. |
| AML | Anti-Money Laundering — broader regulatory framework targeting transaction patterns indicating laundering. |
| SAR | Suspicious Activity Report — filing to FinCEN required when a bank detects suspected illegal activity. |
| Bureau | Credit bureau (Experian, Equifax, TransUnion) — historical credit data. |
| eCBSV | Electronic Consent-Based SSN Verification — SSA service to confirm SSN/name/DOB match. (Some platforms were first to go live.) |
| OFAC | Office of Foreign Assets Control — US sanctions enforcement; institutions must screen counterparties against the SDN list. |
| Chargeback | A reversal of a card transaction, typically initiated by the issuer at the customer's request. A delayed fraud label. |
| Bust-out | Fraud pattern: build credit history, max out, vanish. Often a synthetic identity executed over months. |
| CIP | Customer Identification Program — required US bank policy specifying how new accounts verify identity. |
Types of fraud relevant to identity-verification products
Synthetic identity fraud
Modeled by identity-verification platforms at the application layer. The hard problem: a synthetic identity can have real bureau data (because the SSN was issued and may be associated with light real history). The signals are usually structural: implausible age vs SSN issuance window, address-history sparsity, device or behavioral anomalies during application.
First-party fraud (intent)
Hardest to model because the applicant uses real, accurate data — only their intent is fraudulent. Signals are behavioral: bust-out patterns, geographic anomalies in transaction history, application-to-credit-pull timing.
Identity theft
Real victim's data used. Easier in some ways — the legitimate person and their patterns exist on record, the fraudster's behavior often diverges (different device, geography, application velocity).
Identity verification
The pipeline at a typical lender:
- Applicant fills out an application.
- Identity verification: name + SSN + DOB + address validated via bureau and eCBSV.
- Fraud scoring: fraud-domain-style score on the identity itself.
- Credit decisioning: combine identity confidence with credit history → approve / decline / manual review.
- If reviewed: a human investigator looks at the application, decides, often files documentation.
the fraud/identity score lives in step 3, with an output that influences whether step 4 trusts the input.
Regulatory context
Knowing these names lets you discuss the stakes credibly:
- Bank Secrecy Act (BSA): foundational US AML law. Requires SARs, CTRs (Currency Transaction Reports), and CIP.
- FFIEC: Federal Financial Institutions Examination Council — issues guidance banks follow.
- FinCEN: Financial Crimes Enforcement Network — collects SARs and CTRs.
- OCC, FDIC, Federal Reserve: bank regulators. Examine compliance programs periodically.
- CFPB: Consumer Financial Protection Bureau — consumer protection in financial services. Fair-lending implications for AI models.
- FCRA: Fair Credit Reporting Act — governs use of consumer reports for credit decisions. Models that use bureau data must comply.
- ECOA / Reg B: Equal Credit Opportunity Act — prohibits discrimination in credit. Fair-lending model validation is a big deal.
A staff DS in fraud/identity doesn't need to be a compliance expert, but should know these names exist and how they shape model design (specifically: adverse-action notices, fair-lending considerations, model explainability for regulator review).
B · Multimodal sensor AI (Solutions-Engineering-flavored)
Multimodal-sensor AI platforms build foundation models for physical-world AI, fusing video, sensor traces, and time-series data. Customer base spans industrial monitoring, mobility, retail analytics, IoT applications.
Vocabulary
| Term | Meaning |
|---|---|
| Multimodal | A model that processes multiple input types (image + text + sensor) jointly. |
| Sensor fusion | Combining inputs from multiple sensors to produce a more reliable estimate than any single sensor. |
| Lens | (Solutions-Engineering-specific) a configurable analytical operation on Newton — e.g., "count people," "detect anomalies in equipment vibration." |
| Edge inference | Model runs on a device (camera, sensor, gateway) rather than the cloud. |
| Inertial sensors / IMU | Inertial Measurement Unit — accelerometer + gyroscope + sometimes magnetometer. |
| Computer vision tasks | Detection, segmentation, tracking, action recognition, depth estimation. |
| Time-series classification | Predict a label from a window of time-series data (activity recognition, equipment failure). |
| Anomaly detection | Flag observations that don't fit the normal pattern — common ask in industrial monitoring. |
| Synchronization | Aligning multiple sensor streams to a common time base. |
| Foundation model (multimodal) | Large pretrained model handling multiple modalities — CLIP, GPT-4V, Gemini, etc. Multimodal-AI platforms position themselves in this category. |
Use cases
Common applications for multimodal sensor AI platforms:
- Industrial monitoring: predictive maintenance from vibration, temperature, acoustic sensors on machines.
- Retail analytics: foot traffic, dwell time, queue lengths from cameras.
- Mobility: driver behavior, fleet management, traffic flow analysis.
- Safety: PPE compliance, fall detection, perimeter monitoring.
- Process optimization: throughput analysis on assembly lines, anomalies in operations.
- Healthcare: patient monitoring, gait analysis, sleep tracking.
What makes these hard
- Customer data is heterogeneous — every customer's sensors and conditions are different.
- Labels are scarce and expensive — manual annotation of video / sensor traces.
- Edge cases are operationally critical (the rare event you must catch) but rare in data.
- Customer expectations vary — some want point estimates, some want intervals, some want narratives.
Interview probes
Show probe 1: "What's synthetic identity fraud, and why is it hard to detect?"
An identity manufactured by combining real elements (often a real SSN, sometimes belonging to a child) with fabricated elements (name, DOB, contact info). Hard because synthetic identities can have legitimate bureau records — the fraudster builds credit slowly first. Detection signals are structural: implausible age vs SSN issuance window, address-history sparsity, application velocity at the same device or address, behavioral anomalies during application (typing patterns, copy-paste of fields).
Show probe 2: "What is eCBSV?"
Electronic Consent-Based SSN Verification — an SSA service banks can use, with the applicant's consent, to confirm SSN matches the name and DOB. Replaces older indirect methods.
Show probe 3: "Why is fair-lending validation a big deal for fraud models?"
Even though fraud models aren't credit-decision models, they affect who gets credit. If a fraud model declines or flags applicants from a protected class at disproportionate rates, that's a fair-lending issue (ECOA / Reg B). Staff DS work includes disparate-impact analysis, often comparing decline rates across racial and geographic groups, and identifying features that may proxy for protected attributes (ZIP code is the classic example).
Show probe 4: "What's a 'lens' at a multimodal-AI company, in your understanding?"
A configurable analytical operation on the multimodal-AI platform — basically a templated combination of prompts and parameters that performs a specific task ("count people in a region," "detect machinery anomalies," "classify activities"). The lens is what Solutions Engineers and customers actually invoke. The DS role configures lens parameters and the prompts driving them, per POC.
Show probe 5: "What makes multimodal sensor AI hard compared to a single-modality model?"
Three sources of difficulty. (1) Synchronization — aligning streams from sensors with different rates, clocks, and reliabilities. (2) Modality fusion strategy — how to combine an image with a vibration trace meaningfully. (3) Data heterogeneity — every deployment has different sensors, conditions, and labels, making transfer between deployments hard. The strongest practitioners reduce these to common scaffolding: standardized preprocessing, modality-agnostic feature extraction where possible, and prompt-based composition over a flexible foundation model.