Estimating the Cost to Adding Web Data Sources

By Tyler Logtenberg

Decemeber 2024 | 7 min. read

Table of Contents

To account for the estimated costs in the integration of a new data source into your product, there are three crucial elements to factor in. The first is the human resources required, the second being the infrastructure costs, and lastly, the ongoing maintenance costs to sustain the new capability. 

Estimating Resource Costs

Different data sources can vary wildly, for the purpose of ensuring a simple and fair estimation, a general web data source is selected. An example of a source matching this criteria would be a news provider, blog network, or mid-sized social network. 

This estimation does not include the cost of data source acquisition, licensing, or the high-level enrichments required with web data. For an idea on the estimated costs to create an NLP/ML classified or model, we created another page for that which dives into the details similar to here. There is also a substantial amount of education around internal tooling and frameworks required as sometimes the provided SDKS are just not robust or well-documented. 

The average effort for the integration of a web data source can very in size, but often can be estimated at a duration of 3.5 “Sprints.” A Sprint is a measurement within engineering teams of dedicated time to specific stories and generally aligned with 2 week cycles. This brings our estimation of duration to 7 weeks from planning to production release. The usual composition and costs that are most common seen are laid out below. 

Resource Monthly Estimate Count
Software Engineer $12,442 2
DevOps Engineer $13,939 1
Resource Cost $38,823 3

Using this estimated 7-week duration of effort, the Resource Costs of the web data integration would be $67,940, and does not include other documentation, product marketing, QA, or project management costs. 

Infrastructure Estimated Costs

In addition to the resource costs, there are many supporting costs across infrastructure and supporting teams that may be applied.

The below estimation is illustrative of many of the regular costs, but does not include any costs around data enrichment beyond data structuring and schema unification.

InfrastructureMonthly EstimateOngoing
Transform Costs$150Yes
Extraction Costs$120Yes
Data Storage*$414Yes
DevOps Tools$1,000Yes
Infrastructure Cost$1,684

Using this estimated 7-week duration of effort, the supporting costs of the data source during the initial integration project would be $2,947.

*Data storage options vary, but the most common usage is a Search-focused database service such as BigQuery, ElasticSearch, or others. 100GB per month on a 3-month rolling cycle is used, priced at a per GB price of $1.38.

Estimated Maintenance Costs & Summary

Software Engineers working with external web data see a new release update every 6 weeks. As web data sources are subject to many changes and are in a state of rapid market changes, a side-effect of this rapid change leads to a breaking change per source every 18 months requiring extensive refactoring. In addition to the roughly 15% maintenance costs, budget should be set aside for refactoring every 18 months.

InfrastructureMonthly EstimateCommit %
Human Resources$48615%
Transform Costs$150Full
Extraction Costs$120Full
Data Storage$414Full
DevOps Tools$10010%
Maintenance Cost$1,269

The total costs summarized for a web data source integration are then best separated into the initial project costs and ongoing maintenance.

Estimated Web Data Integration Costs
Initial Data Source IntegrationOngoing Monthly Maintenance
$70,887 USD$1,269 USD