Estimating the Cost to Adding Web Data Sources
By Tyler Logtenberg
Decemeber 2024 | 7 min. read
Table of Contents
To account for the estimated costs in the integration of a new data source into your product, there are three crucial elements to factor in. The first is the human resources required, the second being the infrastructure costs, and lastly, the ongoing maintenance costs to sustain the new capability.
Estimating Resource Costs
Different data sources can vary wildly, for the purpose of ensuring a simple and fair estimation, a general web data source is selected. An example of a source matching this criteria would be a news provider, blog network, or mid-sized social network.
This estimation does not include the cost of data source acquisition, licensing, or the high-level enrichments required with web data. For an idea on the estimated costs to create an NLP/ML classified or model, we created another page for that which dives into the details similar to here. There is also a substantial amount of education around internal tooling and frameworks required as sometimes the provided SDKS are just not robust or well-documented.
The average effort for the integration of a web data source can very in size, but often can be estimated at a duration of 3.5 “Sprints.” A Sprint is a measurement within engineering teams of dedicated time to specific stories and generally aligned with 2 week cycles. This brings our estimation of duration to 7 weeks from planning to production release. The usual composition and costs that are most common seen are laid out below.
Resource | Monthly Estimate | Count |
---|---|---|
Software Engineer | $12,442 | 2 |
DevOps Engineer | $13,939 | 1 |
Resource Cost | $38,823 | 3 |
Using this estimated 7-week duration of effort, the Resource Costs of the web data integration would be $67,940, and does not include other documentation, product marketing, QA, or project management costs.
Infrastructure Estimated Costs
In addition to the resource costs, there are many supporting costs across infrastructure and supporting teams that may be applied.
The below estimation is illustrative of many of the regular costs, but does not include any costs around data enrichment beyond data structuring and schema unification.
Infrastructure | Monthly Estimate | Ongoing |
---|---|---|
Transform Costs | $150 | Yes |
Extraction Costs | $120 | Yes |
Data Storage* | $414 | Yes |
DevOps Tools | $1,000 | Yes |
Infrastructure Cost | $1,684 |
Using this estimated 7-week duration of effort, the supporting costs of the data source during the initial integration project would be $2,947.
*Data storage options vary, but the most common usage is a Search-focused database service such as BigQuery, ElasticSearch, or others. 100GB per month on a 3-month rolling cycle is used, priced at a per GB price of $1.38.
Estimated Maintenance Costs & Summary
Software Engineers working with external web data see a new release update every 6 weeks. As web data sources are subject to many changes and are in a state of rapid market changes, a side-effect of this rapid change leads to a breaking change per source every 18 months requiring extensive refactoring. In addition to the roughly 15% maintenance costs, budget should be set aside for refactoring every 18 months.
Infrastructure | Monthly Estimate | Commit % |
---|---|---|
Human Resources | $486 | 15% |
Transform Costs | $150 | Full |
Extraction Costs | $120 | Full |
Data Storage | $414 | Full |
DevOps Tools | $100 | 10% |
Maintenance Cost | $1,269 |
The total costs summarized for a web data source integration are then best separated into the initial project costs and ongoing maintenance.
Estimated Web Data Integration Costs | |
---|---|
Initial Data Source Integration | Ongoing Monthly Maintenance |
$70,887 USD | $1,269 USD |