Engineer, Data & Machine Learning
About jhana
jhana is an early-stage, seed-funded startup building intelligent practice tools for the law across research, drafting, and document management. Our first product, India’s first AI paralegal, is live in an open-access beta at [jhana.ai](https://jhana.ai). We have ongoing POCs for legal workflow and document generation automation products that passively adapt to a firm's or company's data and norms. We hold fellowships and honors at nonprofit, academic, and technical programs. More details are available to candidates.
About the role
The vertical
Our products thrive on high-fidelity, clean, and richly annotated datasets. Your mission: sourcing, curating, and enhancing these datasets. This role involves scraping, crawling, and transforming raw data into structured gold mines. You’ll engage in sophisticated data quality processes and feature engineering, enhancing our dataset’s value with creative supplementation—finding cross-references, extracting NER insights, and discerning nuanced legal concepts like 'ratio' and 'obitur dicta.' Moreover, you’ll construct dynamic graphs that elucidate the complex interrelations of precedents.
The day-to-day
This role demands a seamless blend of engineering prowess, data due diligence, and innovative problem-solving. Here’s a glimpse into the challenges you’ll tackle—
Data Acquisition Pipeline Plumbing
-
Develop and implement cutting-edge methods for web scraping and crawling, ensuring automated ingestion of vast legal texts.
-
Create robust metadata tables to structure raw datasets, ensuring clarity and accessibility.
-
Maintain clusters of cloud cores, managing job schedulers and queues to optimize resource utilization.
-
Implement automation for data updating processes, ensuring datasets are consistently current and reliable.
Data Quality & Feature Engineering
-
Construct resilient pipelines for data cleaning, ensuring high fidelity and reliability.
-
Engineer advanced features, such as identifying equivalent citations across journals and extracting intricate legal principles.
-
Conduct thorough data due diligence, including rigorous backchecks to validate data integrity.
Data Exploration & Visualization
-
Utilize clustering techniques to explore and visualize data, uncovering hidden patterns and insights.
-
Develop tools for visualizing complex data structures, enhancing understanding and usability for legal professionals.
Graph Construction
-
Design and implement sophisticated algorithms to map legal precedents, constructing cites-cited-by and reiterated-overruled graphs.
-
Enhance graph-based representations to improve the contextual understanding of legal documents.
About the Team
We are a public benefit corporation headquartered in Bangalore, operating within rapidly evolving legal systems. Our goal is to influence beneficence and alignment within the technological systems augmenting human institutions. Our team is diverse, with backgrounds spanning physics, mathematics, law, and public policy. We are small, fast-moving, horizontally flat, and deeply collaborative, with a high emphasis on code that reaches production quickly.
About you
You might resonate with some of these traits—
-
Proficient in leveraging DOM parsing and HTTP/HTTPS protocols for data extraction from remote servers and providers.
-
Familiarity with Regular Expressions, Optical Character Recognition (OCR), parsing structured data using, and employing state-of-the-art text corpus visualization techniques.
-
Experienced with BS4, Playwright/Selenium, and similar tools.
-
Skilled in maintaining cloud infrastructure, including lambda functions, Celery, and other schedulers and queues.
-
Proficient in Python, with interest in tensor math and linear algebra.
-
Thorough in and excited by data due diligence, eg. conducting rigorous backchecks to ensure data integrity.
-
Eager to write code that ships and drives real-world applications.
Miscellany
The expected compensation range is INR 20-35 lakhs per annum (US $24,000-42,000) and may include equity. Compensation is negotiable based on levels and mutual excitement.
This role will start ASAP and requires in-person presence at our Bangalore HQ. Full remote is negotiable for superlative candidates who are not India-based.
Come as you are: We are a diverse team constituted by members of different backgrounds in nationality, religion, caste, gender, and sexual orientation. We sincerely and wholeheartedly welcome diverse individuals.