Skip to content

Terminologies

This page has all terminologies / nominclature / nuance that we use in Data Science lifecycle.

GIT

  • staging branch
  • main/master branch
  • test branch
  • push/pull/merge/
  • commit
  • checkout
  • HEAD

Environment

  • Test environment
  • preproduction enviroment
  • production environment
  • staging environment
  • Development server
  • Deployment server

CMD

  • putty
  • power shell
  • command prompt

Project

  • SME-Subject matter expertise
  • SPOC- Single point of contact
  • MVP- Minimum viable product
  • UAT- User acceptance testing
  • SOW- Scope of Work

Deployment

  • Synchronous(one request at a time)- serial request handler
  • Asynchronous(multi requests at a time)- Parallel requests handler
  • latency(task completion time)
  • throughput(no.of.requests to handle)
  • Deploy/Ship
  • logging
  • monitoring
  • metrics
  • model explainabillity
  • hyper parameter tuning
  • data drift
  • Concept drift (condition to decide churn, now changed over time)
  • model drift
  • Reproducebility
  • trigger based model retraining
  • fixed window size data(from last year to current day)
  • dynamic window size(moving average)
  • Migrate(from pytorch to pytorch_lightning)
  • Integrate(combine all modules)
  • feasibility study
  • Scoping
  • Requirements
  • MVP(Minimum viable product)
  • UAT(user acceptance testing)
  • Model metric(classification & regression)
  • Dependency
  • Resource allocation
  • sprint
  • scrum master
  • work breakdown structure
  • code review
  • release management
  • pip- performance improvement programe.

Business:

  • SLA(Service level agreement)- things we promise to deliver as service to client.
  • Business metric(increase in email openings)
  • KPI(key performance indicators)-Quantifiable measures used to evaluate the success or performance of an organization, project, or individual against specific objectives or goals.
  • ROI- return of investment.
  • SWOT- strength, weakness, opportunity and threat.
  • B2c/B2B - business models
  • QA- Quality assurance
  • RFP- Request for proposal/Request for Quote.
  • SOP- standard operating procedure.
  • CRM- Customer relationship manager
  • ERP- Enterprise resource planning.
  • Stakeholder

MODELs:

Knowledge distillation involves training a smaller model (student) to mimic the behavior of a larger, pre-trained model (teacher) like BERT. Distillbert is Smaller & quicker to Bert. This compact model learns not just the teacher’s predictions but also its confidence and reasoning. This approach is particularly useful when deploying BERT on resource-constrained devices.

  • latency- delay b/n input & output.
  • throughput- no of predictions a model can handle in specific time.
  • zero, one, few shot learning
  • stateless training (from sebastian raschaka Q&A book)
  • statefull training
  • data-centric ai(focus on data to improve performance.)
  • model-centric ai(focus on model to improve performance.)
  • model efficiency:
    • pin memory
    • num workers
    • gradient checkpoint
    • Model pruning
    • Model quantization
    • Learning rate schedule
  • label smooting
  • transfer learning
  • Mixed precision training

Startup

  • bootstraped- self funding with no external
  • Vertical SAAS- Building on single usecase indepth- End-to-End
  • Horizontal SAAS- Building on Multiple usecases paralelly like detection, classification, ocr all at once.

Others

  • Mutually exclusive: 2 events won't happen simultaneously.
  • Non Mutually Exclusive: 2 events occur same time.