#DataEthics
A bad data source is one that isn’t reliable, original, comprehensive, current, and cited
Data sources are usually classified in reference to how they are sourced:
- Primary source (i.e. you collected the data yourself)
- Secondary source (i.e. you are getting the data from a party – individual or company – that sources the data themselves
- Tertiary source: you do not have direct access to those who collected the data
In general, the farther you get from the data source, the lower the quality of the data because it becomes:
- harder to validate the data
- harder to correct perceived errors
- harder to confirm consent to use
- more liable to lawsuits and ethical problems down the line
Analysis is only as good as the data used; and the decisions made based on analysis said are directly impacted by the quality of data being examined. As an analyst, it is primarily in the favour of your reputation for excellent work to ensure that your data sources are as reliable as possible
Failing this, it makes a lot more sense to request a better data source, a fresh collection, different business objectives etc.
Rule of thumb: bad data = unreliable results