Estimating NLP/ML Model Creation Costs
By Tyler Logtenberg
Decemeber 2024 | 7 min. read
Table of Contents
To account for the estimated costs in the creation and managing of an NLP/ML classifier or model, there are three key elements: the human resources required (manpower), the infrastructure costs, and the ongoing maintenance costs to sustain the new capability.
Estimating Resource Costs
While the complexity of NLP/ML classifier models varies heavily depending on the use cases, this estimation is based on the creation of a semi-complex NLP classifier. An example of this is sentiment extraction or entity detection.
The average effort for the creation of a semi-complex NLP or ML classifier can vary in size, but often can be estimated at a duration of 8 ‘sprints.’ A Sprint is a measurement within engineering teams of dedicated time to specific stories and generally is aligned with 2 week cycles. This brings our estimation of duration to 16 weeks from planning to production release. The usual team composition and costs that are most common seen are laid out below:
Resource | Monthly Estimate | Count |
---|---|---|
Data Scientist | $13,333 | 1 |
Data Engineer | $8,830 | 1 |
ML Ops Engineer | $9,182 | 1 |
Resource Cost | $31,345 | 3 |
Using this estimated 3-month duration of complete effort, the Resource Costs of the NLP/ML Classifier and Model would be $94,035 and does not include other documentation, product marketing, QA, or project management costs.
Infrastructure Estimated Costs
In addition to the resource costs, there are many supporting costs across infrastructure and supporting teams.
The below estimation is illustrative of many of the regular costs, but does not include the costs in acquiring any training data, nor external API integrations.
Infrastructure | Monthly Estimate | Ongoing |
---|---|---|
Model Training | $50 | Yes |
Inference Costs | $1,700* | Yes |
Model Storage | $0.80 | Yes |
MLOps Tools | $1,000 | Yes |
Pipeline Setup | $5,659** | No |
Infrastructure Cost | $7,357 |
Using this estimated 3-month duration of dedicated effort, the support costs of the NLP/ML classifier and model would be $10,755.
*If you are building a simpler solution that relies on data of low dimensionality, you may get by with four virtual CPUs running on one to three nodes. In processing mid to large volumes of web data, this generally would require a GPU-based server (Pricing from GCP).
** An integration of a simple data pipeline and needed APIs to integrate a model into the overall platform system takes up around 100 development hours. This does not account for documentation, QA, and external API integrations.
Estimated Maintenance Costs & Summary
According to a study conducted by Dimensional Research, businesses commit 25% to 75% of the initial resources to maintaining ML algorithms. As we have assumed the usage of MLOps tooling, and other resources; the lower end of the estimated percentage was used to account for annual costs.
Infrastructure | Monthly Estimate | Commit % |
---|---|---|
Human Resources | $653 | 25% |
Inference Costs | $1,700 | Full |
Model Storage | $0.80 | Full |
MLOps Tools | $1,000 | 25% |
Pipeline Setup | $94 | 20% |
Maintenance Cost | $2,698 |
The total costs summarized for a NLP/ML model are then best separated into the initial project costs and ongoing maintenance.
This brings us to the total estimated costs below, as confirmed by market research by Datastreamer, Dimensional Research, UpsilonIT, and ITRex Group.
NLP/ML Classifiers and Model Creation Costs | |
---|---|
Initial Model Creation | Ongoing Monthly Maintenance |
$116,108 USD | $2,698 USD |