Terminologies
This page has all terminologies / nominclature / nuance that we use in Data Science lifecycle.
GIT
- staging branch
- main/master branch
- test branch
- push/pull/merge/
- commit
- checkout
- HEAD
Environment
- Test environment
- preproduction enviroment
- production environment
- staging environment
- Development server
- Deployment server
CMD
- putty
- power shell
- command prompt
Project
- SME-Subject matter expertise
- SPOC- Single point of contact
- MVP- Minimum viable product
- UAT- User acceptance testing
- SOW- Scope of Work
Deployment
- Synchronous(one request at a time)- serial request handler
- Asynchronous(multi requests at a time)- Parallel requests handler
- latency(task completion time)
- throughput(no.of.requests to handle)
- Deploy/Ship
- logging
- monitoring
- metrics
- model explainabillity
- hyper parameter tuning
- data drift
- Concept drift (condition to decide churn, now changed over time)
- model drift
- Reproducebility
- trigger based model retraining
- fixed window size data(from last year to current day)
- dynamic window size(moving average)
- Migrate(from pytorch to pytorch_lightning)
- Integrate(combine all modules)
- feasibility study
- Scoping
- Requirements
- MVP(Minimum viable product)
- UAT(user acceptance testing)
- Model metric(classification & regression)
- Dependency
- Resource allocation
- sprint
- scrum master
- work breakdown structure
- code review
- release management
- pip- performance improvement programe.
Business:
- SLA(Service level agreement)- things we promise to deliver as service to client.
- Business metric(increase in email openings)
- KPI(key performance indicators)-Quantifiable measures used to evaluate the success or performance of an organization, project, or individual against specific objectives or goals.
- ROI- return of investment.
- SWOT- strength, weakness, opportunity and threat.
- B2c/B2B - business models
- QA- Quality assurance
- RFP- Request for proposal/Request for Quote.
- SOP- standard operating procedure.
- CRM- Customer relationship manager
- ERP- Enterprise resource planning.
- Stakeholder
MODELs:
Knowledge distillation involves training a smaller model (student) to mimic the behavior of a larger, pre-trained model (teacher) like BERT. Distillbert is Smaller & quicker to Bert. This compact model learns not just the teacher’s predictions but also its confidence and reasoning. This approach is particularly useful when deploying BERT on resource-constrained devices.
- latency- delay b/n input & output.
- throughput- no of predictions a model can handle in specific time.
- zero, one, few shot learning
- stateless training (from sebastian raschaka Q&A book)
- statefull training
- data-centric ai(focus on data to improve performance.)
- model-centric ai(focus on model to improve performance.)
- model efficiency:
- pin memory
- num workers
- gradient checkpoint
- Model pruning
- Model quantization
- Learning rate schedule
- label smooting
- transfer learning
- Mixed precision training
Startup
- bootstraped- self funding with no external
- Vertical SAAS- Building on single usecase indepth- End-to-End
- Horizontal SAAS- Building on Multiple usecases paralelly like detection, classification, ocr all at once.
Others
- Mutually exclusive: 2 events won't happen simultaneously.
- Non Mutually Exclusive: 2 events occur same time.