Gilbane Advisor 2–21–24 — Common Crawl, data for what

Frank Gilbane
2 min readFeb 21, 2024

This week we feature articles from Stefan Baack & Mozilla Insights, and Steve Jones.

Additional reading comes from Daniel Tunkelang, Kate Moran, Louis Rosenberg, and Matt Marshall.

News comes from Franz, Optimizely & Writer, Otter.ai, and Grammarly.

All previous issues are available at https://gilbane.com/gilbane-advisor-index

Opinion / Analysis…

Training data for the price of a sandwich

It’s easy to think you know what Common Crawl is, but if you’re using some version of it as a key source for training data for generative AI, you need to do your own due diligence. This article from Stefan Baack and Mozilla Insights is a little bit repetitive so a quicker read than it first appears, and well worth it. (43 min)

https://foundation.mozilla.org/en/research/library/generative-ai-training-data/common-crawl/

Your data warehouse thinking is killing your AI ambitions

I’m not sure how many of you don’t already know the difference, but Steve Jones’ article is a useful explanation for reading, or sharing, with certain colleagues…

“The reality is that traditional data warehousing thinking and its idea of some mythical “perfect” post transactional data set only exists because historically data hasn’t been important, except for finance regulations…

The future is different, AI is critical to that future, AI that works at operational speed and can be engaged in operational decisions. This is the participation of data in business, the day-to-day, minute-to-minute running of the business, not simply reporting after the fact what went on…” (7 min)

https://blog.metamirror.io/your-data-warehouse-thinking-is-killing-your-ai-ambitions-849356dd3679

More Reading

All Gilbane Advisor issues

Content technology news…

Otter.ai announces Meeting GenAI

AI Chat allow users to tap into the collective wisdom from past meetings no matter which platform.
https://otter.ai

Optimizely integration with Writer now live

Writer’s enterprise-grade, industry-specific generative AI capabilities within the Optimizely Content Marketing Platform.
https://www.optimizely.comhttps://writer.com

Grammarly announces general availability of App Actions

New third-party app integrations and actions enable businesses and professionals to simplify workflows and reduce constant context-switching.
https://www.grammarly.com/app-actions

Franz announces Gruff 9

The web-based Knowledge Graph visualization tool offers LLM integration and unique RDF* (RDFStar) features for building AI applications.
https://allegrograph.com/products/gruff/

All content technology news

The Gilbane Advisor is authored by Frank Gilbane and is ad-free, cost-free, and curated for content, computing, web, data, and digital experience technology and information professionals. We publish recommended articles and content technology news weekly. We do not sell or share personal data.

Subscribe | View online | Editorial policy | Privacy policy | Contact

Originally published at https://gilbane.com on February 21, 2024.

--

--

Frank Gilbane

Content, computing, web, mobile, digital experience, digital strategy - Gilbane Advisor & curated news — https://www.linkedin.com/in/frankgilbane/