Python Intelligence Strategist

Build the data intelligence backbone of Kautilya.

Turn unstructured chaos into searchable, structured knowledge powering AI pipelines at scale.

Impact

Every dataset processed makes democracies smarter. Your pipelines feed LLMs and define how truth is quantified and retrieved.

What you'll do

  • • Design data pipelines for ingestion, transformation, and retrieval
  • • Build ETL systems that clean and deduplicate at scale
  • • Implement vector search, embeddings, and RAG pipelines
  • • Manage indexing, caching, and semantic retrieval
  • • Work with AI teams on data architecture and freshness

What we value

  • • Solid grasp of Python fundamentals and async processing
  • • Understanding of data cleaning and deduplication at scale
  • • Familiarity with vector databases and embeddings
  • • Comfort with FastAPI, background workers, and APIs
  • • Bias for speed and efficiency in data processing

Tech Stack

  • • Python 3.10+, FastAPI
  • • Pandas, Pydantic
  • • PostgreSQL, Redis
  • • LangChain, LlamaIndex

First 90 Days

  • • Build Kautilya Data Pipeline v1
  • • Implement embedding generation
  • • Integrate semantic search & RAG
  • • Create monitoring & analytics

Why join

  • • AI-first product
  • • Scale + structure
  • • Ownership from day one
  • • Category-defining problem

Access Challenge

Optimize this Python data processing pipeline. This challenge tests algorithm efficiency, performance optimization, and data engineering intuition — essential skills for our intelligence strategist.

Python Code Optimizer

Optimize this document processing script step by step. Apply Python performance best practices to achieve sub-10 second execution time. Each optimization reveals the next level.

120.5s
Execution Time
850MB
Memory Usage
87.3%
Accuracy
def process_documents(docs): results = [] for doc in docs: cleaned = [] for word in doc.split(): if len(word) > 2: cleaned.append(word.lower()) results.append(' '.join(cleaned)) return results

Basic nested loops processing 10k documents. Slow and memory intensive.

Hint: Start with list comprehensions, then regex, vectorization, parallel processing, and finally pandas. Each step builds on the last for maximum performance gains.

Ready to architect data intelligence?

Optimize the pipeline challenge above to unlock the application form and join our data mission.