HN Companion◀︎ back | HN Companion home | new | best | ask | show | jobs
Atlassian enables default data collection to train AI (letsdatascience.com)
502 points by kevcampb 13 hours ago | 115 comments


Atlassian just goes from misstep to misstep. I still use their products quite often. The amount of P0 bugs I experience is absolutely crazy:

- Bitbucket workers are hopelessly out of date (self hosted). We've had to put so many random workarounds in especially for Docker, as they don't keep them up to date enough

- I have had a bug in JIRA for years where I can't reorder a new ticket unless I refresh the page

- Every new feature they introduce into JIRA/Bitbucket over the past couple of years just doesn't work.

- I tried their AI stuff on the free trial, didn't work at all, tried to cancel, can't cancel the free trial online and had to write a load of support tickets (of which the support ticket contact form bugged out multiple times).

Anyone have any insight into why things have got so so dysfunctional? Tech debt? Talent leaving? Both? Even 'bad' enterprise software tends to be able to keep the most basic features running, but Atlassian is a whole new category. If you check their 'community' it is just hundreds/thousands of bugs with workarounds.


> I tried their AI stuff on the free trial, didn't work at all, tried to cancel, can't cancel the free trial online and had to write a load of support tickets (of which the support ticket contact form bugged out multiple times).

Absolutely insane that this is legal. The only reason to do this is to trick and abuse customers. It would be trivially easy to legislate away if our government cared to.

Atlassian seems like a typical entrenched big company, albeit an extreme example. They make money by selling to the bosses of their users and being the default name brand for many cases. Once a company gets to a certain size and doesn't directly compete much on quality internal corruption and incompetence can run rampant.


>> internal corruption and incompetence can run rampant

This affliction happens to almost every company, eventually. Nobody seems to have solved this.


Steam seems to be doing pretty well after 20 years

Valve isn't a publicly traded company with shareholders looking to extract every possible nickel out of the company at any cost.

Enshitification

I generally agree with this comment, but what option does a decision maker have here? (apart from similar products that probably will end up doing the same things anyway). Are there equivalent scale/functionality products that can truly serve as an option?

It's explicitly not legal in California and some other places.

Also for business customers? I would expect such regulations to only apply to b2c contexts.

California's law apparently only applied to B2C, but there was an FTC rule that applied to B2B as well which has been paused by a federal appeals court while they consider if the FTC followed the law in making the rule.

I worked there and the answer is the engineering talent isn't great, in addition to being very unfocused, and tons of pointless org churn. Bitbucket pipelines/workers was originally implemented, essentially, by two guys (I know, because I sat 2 rows of desks apart from them!) if that tells you anything. I doubt there was more than one person actively maintaining it for the past decade, if they didn't get laid off recently. That office doesn't even physically exist anymore, and the people are long gone.

Featureatis. Just keep pumping out features with no thought. Today, probably also AI-coded .

Even in mid-sized projects if you keep pushing for only new features you'll get a similar system. At least my experience in 3 or so midsized projects that I've worked on where nothing else mattered than checking of features from a huge backlog.


The search function in Jira has always been unusable. It’s perhaps the worst part of the entire platform, but nice to see they’re still focused on adding features I will never use.

Half the time I just grep the ticket key in Slack because it's faster than using Jira's own search.

YouTrack's search is one of the main reasons I use it. Nice query language to filter down on any fields, including custom fields, never had an issue finding things. It's great. With the number of useless search functions in so many products, I'm happy that at least my issue tracking does it right.

I've always thought I was the only one experiencing this and felt like I was crazy.

I guess it's "good" to know that I'm not alone.

The amount of times I've searched for a ticket that I know it's there (because I either have it opened in a different tab, or because I just created it), but can't find, it's just way to many.


The results usually seem completely random to me. It's like the feature never made it out of proof of concept territory. The only advantage of all the email noise Jira sends out is that I can usually search my email for what I'm looking for.

I've used JIRA back in 2009 and that is exactly what we did to work around shitty search function in JIRA.

ironically it's the one place where an agent might be of some use and they created one and it's terrible.

at least they didn't break their pattern of disappointing users. consistency is key.

Jira is buggy as hell these days. Lots of desyncing that forces me to refresh the page. I can have a ticket open on a sprint board and the modal spontaneously closes after a while, forcing me to reopen it frequently. The other week there were tickets that simply refused to show up in their respective sprint board no matter what I did; later the epic magically appeared on the board out of nowhere, then finally the individual tickets themselves reappeared.

Gotta love the value that vibe coding has added to this world.


Atlassian also shutdown their self-hosted offerings. I'm not sure which version they were on with their datacenter edition, when that got cancelled. Part of it might also simply be a lax approach to QA, now that they don't have to support thousands of installations in on-prem environments. When you can just push out an update, your QA has to be much much better.

I'm sure Atlassian's shareholders appreciate your sacrifice.

You can add to this list: Every single input field they have in Confluence and Jira is misbehaving or broken. Apparently, we can't just have a text input widget that works well. Also apparently, this billion dollar enterprise cannot afford to write or use a proper markdown parser, and apparently we, the user lowlives, cannot be trusted with the full "pwer" of basic markdown laugh.

It can't be that hard to just dump/export the entire JIRA in one day and migrate it to something else like linear.app? i was already exporting HTML dumps of the entire JIRA and using it in local tool calls to ground agents as far back as last year instead of wrestling with JIRA API to get it to work. This was before linear became popular.

The migration would take 1-2 engineering man-days I suppose. But its money well spent.


How do you export all of jira? any tips or github repos?

There is an export button on Jira. https://youtu.be/-wGRKzYmA7o?t=92 was what I used. For the workspace docs there is also an export button that can export all the documentation for the project(the export would be in HTML). I then used a simple script built with an LLM to convert all of it into markdowns.

Jira has vim-like bindings for navigating tickets on boards and years later the feature barely works. It has bugs like pressing the j/k keys changes the URL and random fields but stays on the same ticket or doesn't render the newly navigated-to ticket, etc.

Here mine: you cannot link an existing branch to a jira issue. Maybe this is easier said than done but I can't find their reasoning anywhere

Confluence is ok and has improved recently.

Jira is garbage (frontend, backend). Tough but true.


Umm? Is there single step Atlassian did it right? It's a cancer of software development the suits force us to swallow while real development and useful documents are outside of their service because it's so stressful to use.

Stack ranking

Until it starts actually affecting their bottom line, how is it a misstep?

You yourself just admitted that you still use their products often.


Sounds like every other SaaS company that was bought by investment fund to milk it dry till everyone migrates off of it.

Not surprised. Quote „…with significant institutional ownership from Vanguard, BlackRock, and others”.


I really wish I could find a better source to link to for this. By default, all free and paid customers are being opted-in to their data being used for AI training.

All your Confluence pages, Jira tickets, etc.

https://support.atlassian.com/security-and-access-policies/d... describes how to disable this, but it also appears that the setting to disable this doesn't exist (it's not visible on any of our instances).


They said the opt out features will be rolled out to the Admin portal in May.

I got this info from an email they sent out

>To give you control over this change, we're introducing new in‑app settings that allow you to manage in‑app data contribution. Initially, these settings will apply to data in Jira, Confluence, and Jira Service Management, including data in your Atlassian Platform apps (Rovo, Home, Teams, Projects, Assets, Goals, Analytics, and Administration). We'll notify you when settings become available for additional apps you own, so you can review them in Atlassian Administration. Between today and May 19, 2026, we'll gradually roll out these settings in Atlassian Administration. We'll send you another email on May 19th as a reminder, so you have time to review and make any adjustments before August 17, 2026.


I also do not see the setting to opt out. I'm at Atlassian Administration > Security, and I do not see Data contribution. I've looked at other, multiple setting pages and I do not see it.

So, is this an automatic opt-in without the ability to opt-out?


Opt out features will be introduced at a later time

"In-app data covers user-generated content: page titles and bodies in Confluence, Jira issue titles, descriptions, comments, custom emoji names, custom status names, and workflow names" ... damn!!

What about really sensitive stuff like if possibly private tickets that have all kinds of stuff like customer data, embargoed CVE fixes or even sensitive health related data, are they just cobble that all into a model so it can leak out to random people ?

There is a bunch of manufacturing related investigation reports written up in jira tickets or confluence pages at the pharma company I work for.

This seems to be the official description of the changes:

https://www.atlassian.com/trust/ai/data-contribution/faqs



Unfortunately that one has a subheading of "From August 17, the outfit will collect customer metadata by default unless you pay for the top tier"

It's not just metadata, it's all "in-app data"


Opt-out at the Org level.

To get value out of Rovo, it needs detail. Your over-subscribed Jira power user/admin can't effectively make it happen. No guarantees Atlassian (Rovo itself) can make it happen either, but the patterns are going to develop and evolve closer and closer to the Agents that make the features.

They have a peculiar definition of Metadata, however. It's a proprietary data product derived from user content. It's a bit shit they way they sell it as metadata. It's a derivation. It's a product of Content, so it's Content - privacy safeguards cannot begin to cover the variation.

\"Metadata includes two data types referred to as content attributes and common patterns.

Content attributes are statistical characteristics, numeric fields, and derivatives of your in-app data. Examples of content attributes may include the number of story points assigned to a Jira work item or the complexity of a Confluence page. Common patterns are phrases, keywords, and topics we extract from search queries and results, Rovo Chat (conversations, prompts, and responses), and custom configuration data that are frequently seen across many customers, while omitting rare data that may be unique to your organization. Examples of common patterns may include common words, phrases, or Rovo Chat prompt topics that are frequently used by customers, such as “vacation policy” or “recap team activity.”\"



That's insane. Every single one of those things is highly sensitive and confidential information. How could you ever trust them after this? That information is priceless for shorting your company on the stock market.

Not that they'd ever do that of course. Nobody with highly sensitive information about rival companies would ever do that.



"Your available data contribution settings will be available no later than May 19, 2026."

So let me guess, they're hoping that we forget about this by then, so that they can scoop up our data? I can't think any other reason for it.


Rumors that Anthropic is in talks to buy Atlassian, presumably for the training data. Data poisoning efforts are underway: https://www.reddit.com/r/PoisonFountain/comments/1sqrq24/atl...

I know at least two companies that won't be able to use Atlassian products anymore if that's the case. They really don't give a shit about privacy and regulatory requirements.

github etc hold source code -> scraped -> so AI may generate any of that.

And the specs become the new source (code).

fast forward..

Atlassian etc hold source specs -> scraped -> so AI may generate any of that.. then any of above..

the new source would be (?what? company missions? get-rich-quick-schemes?)

fast forward..


Hmm, if the stock keeps falling that might really happen.

The opt-out-by-default pattern has been gradually normalizing in enterprise SaaS, but what makes this particularly egregious is the combination of two things: the data scope (not just metadata, but all in-app content per kevcampb's link) and the broken opt-out (the disabling setting not rendering on any instance).

One is a policy decision you can argue about. Both together suggest the friction is intentional.

The data residency point is worth flagging separately - a lot of enterprise buyers treat region-pinning as a privacy guarantee for everything in their contract. It was never that. Residency tells you where data is stored at rest, not who can access it for what purpose.


What makes this extra scummy is this:

“If customers were to right now terminate their contract, the new data contribution settings will not apply to them as these will not be enforced until August 17, 2026,” (from https://www.theregister.com/2026/04/18/atlassians_new_data_c...)

So you can't even take a bit of time to consider your options.


Plenty of other companies enable this by default too, such as Github, Figma, Adobe, Vercel. I think it's fair to assume that if you ahve data stored within any company, they'll by default use it for training.

Maybe this will become The Year of the Self Hosted.

For stuff that I don't particularly care about privacy I've kept on the cloud (e.g. my blog, which is public anyway and as such is probably training bots regardless), but for stuff that I don't want to be used to train their models and/or sell to advertisers I have moved to be self hosted on my own network.


self hosting needs to be easier to set up for that to happen.

we're not far off it being good enough but it's not there yet.


Atlassian made self-hosting 'less easier' on purpose. They even discontinued their on-prem products.

If the rumours of an Anthropic acquisition are true, this makes a lot of sense. Anthropic are probably looking for a clean, high-signal dataset of metadata around business tasks that they can buy.

I'm thinking it would be ideal if Broadcom buys Attlassian instead and pulls another VMware. Problem solved - for ever. ;-)

Oh what the.. i can't pay for a 2000$ max sub :/

I know of a company that's stuck on the datacenter edition, because they aren't allowed by some customers to store their data in the cloud. I can't imagine how much they must pay for that.

Until they finish evaluating competitors, and eventually migrate to .... something, they are completely stuck. Jira is at the heart of all of their workflows and they cannot and will not move to cloud. This was an Atlassian partner, but they got screwed over on that part as well.


I doubt data in Atlassian are anywhere close to clean or organic. It was designed by hell to swallow shit to real programmer who does real works outside of Atlassian.

Programmer adjacent data can already be consumed from git repos. Atlassian has PM data.

Sounds very questionable, like boiler room is trying to do a pump n dump. I would not believe these rumors until we hear reputable sources outside of forum speculation

Will Atlassian be harvesting code and content from private Bitbucket repositories? The wording in their policies and FAQ's is vague, so I'd like to get a definitive (Yes / No) answer.

I think I looked for this months ago, and my interpretation was that no, they were not doing AI training with it.. but with this announcement, I will be moving all my stuff to my own servers.

cloud repos are handy, but, having to constantly worry if some criminal comes "joinks, its my data now", is not worth it.


If it is vague, then that probably is a very clear answer to your question.

The adage was "If you're not paying for the product, you are the product." Now enterprises are paying to become the product. That's ridiculous.

Just a couple days ago my CTO was saying he was reluctant to clone all our git repos into github because of the AI training possibility. All our code is in bitbucket now, so not sure what our plan now is.

Worth noting that Atlassian's data residency options don't exempt you from this—your data can still be used for training even if you've pinned it to a specific region.

No wonder they wanted to stop supporting the Data Center versions for on prem.

I read this as "Stop using this product" toggle every time a company does this without consent. It has done a good amount of mental and financial improvements to me.

The official Atlassian FAQ on this change:

https://www.atlassian.com/trust/ai/data-contribution/faqs


Microsoft, Amazon, Google, everybody else with both having-business-customers and also data-collecting businesses: "We swear that we absolutely will not collect/train our stuff on business customer data."

Atlassian: "Yolo!"


They are lowering the threshold for this kind of shit for everyone else. We should kill it with fire, before this spreads even further. But I guess most businesses led by non-technical people will simply not care and give their customer data to the AI sharks at no additional cost.

Does anyone know what falls under "other cloud products" mentioned here?

Would that include something like Trello?


I made this a while back to move us off our on-prem Atlassian to Gitlab [1]. Maybe it'll help someone if they want something similar. Fair warning: I haven't tried this recently, so YMMV.

[1] https://gitlab.com/jeremygonyea/jira-to-gitlab-migration-too...


Genuine question: how many agent-hours to rebuild Jira from scratch and migrate 100% of the content out? Split the work, pool our agents, ship by August 17. ;-)

Presumably the government and HIPAA carveouts are for legal obligations. Trade secret theft is illegal so I wonder why they're not considering this.

Maybe if you put your data in Atlassian the you failed to adequately protect your trade secret? IIRC you need to make a reasonable effort to protect the secret.

Establishing MNDAs is considered reasonable effort and this is a policy update that basically says "we are ignoring all MNDAs".

Because nobody will prosecute them for violations

Does this include repos content in BitBucket?

I am wondering why not just rsyncrypt the source code before pushing to the repo?

>rsyncrypto is a utility that encrypts a file (or a directory structure) in a way that ensures that local changes to the plain text file will result in local changes to the cipher text file. This, in turn, ensures that doing rsync to synchronize the encrypted files to another machine will have only a small impact on rsync's wire efficiency.

https://manpages.ubuntu.com/manpages/focal/man1/rsyncrypto.1...


Who wouldn't these days. Just assume if a company has your data it's training AI on it. No company cares about your privacy more than they do their profits. Not a one.

You can thank GitHub for setting this draconian precedent

They're so desperate because their stock went down ~10 times in last 5 years or so

To anyone using a model trained on my company's Jira tickets, I apologize for the regression.

We need to kill SaaS. Apps should be local-first and have peer-to-peer data sync. These companies won't stop until they use your data to replace you and enrich their owners.

Beautiful on paper. But it does not scale outside a certain type of tech people.

What’s the scaling bottleneck? If you made a local-first, P2P version of Figma what would break first? For a company of like 50 people, I doubt you’d have more than 100GB of data so it should fit on everyone’s computers. The P2P syncing part seems solvable, even if you need a centralized handshake server somewhere. And from the user perspective I don’t see why the UX couldn’t be identical, so it’s all the same to them.

It seems like the real bottleneck is something else.


> If you made a local-first, P2P version of Figma what would break first?

The guy who has to keep it running day by day, next to the other 30 local-first systems.


What is there to run? There are millions of apps that don’t require maintenance, this was the default before SaaS.

Every app need maintenance if it's connected to the internet. Security updates at minimum.

AI contributing to rising natural stupidity.

Imagine an AI based on jira tickets. _That's_ the torment nexus.

Bye bye Bitbucket, Jira, Confluence, etc. Seriously, if you're using any Atlassian product other than Statuspage, you deserve to get your data hoovered up for AI.

This is such an obvious conflict of interest. They know Confluence is full of proprietary information. They are violating their client's trust https://www.theregister.com/2026/04/18/atlassians_new_data_c...

Why does Atlassian need to train AI models?

Rumor is they're being bought by Anthropic.

Does this apply to Loom?

Loom isn't mentioned in the Partner materials I have read. That's about all I can say.

Oh another piece of the abysmal tools stack that should bite the dust. Maybe I will still see a software job without terrible tooling in the EU.

I'm really tired of JIRA, to the point where I have expressed it publicly: https://www.embeddedrelated.com/showarticle/1772.php

No surprise here. It's by design.

The only silver lining I can see in this is that if they replace their existing tooling with AI integration, we might actually get search and confluence that works.

I've lost count of how many times I search for a keyword and get no relevant results, but the document I'm looking for, which contains the keyword, is in my automatic pop-up of recent documents visited.


Omg

Yet another opportunity to provide an alternative that keeps data private

I’m building a self-hosted Confluence alternative called Docmost. It’s open-source and can run fully air-gapped.

GitHub: https://github.com/docmost/docmost


genius move.

I don't see it as a misstep at all. The purpose of StackOVerflow is to share expertise.

I am 100% supportive of it being used for training... AI, you, everyone.


Dude, what?

What? Atlassian is not stack overflow.