Language Media LTDDigital Marketing & Data Science in LondonSearching for Another Earth with Embeddings

When the first exoplanet was confirmed in 1992, it made headlines across the world. Today, the NASA Exoplanet Archive lists almost 6,000 confirmed planets orbiting stars beyond our Sun, with thousands more candidates awaiting confirmation.

The question is no longer whether planets are common. It’s which of these planets might be habitable?

That question has traditionally been answered with formulas. Scientists developed the Earth Similarity Index (ESI) and other measures, each designed to compress a planet’s parameters—mass, radius, orbit, temperature—into a single score that tells us how “Earth-like” it is.

But there’s another way to approach this problem. Instead of reducing planets to a score, what if we treated them as embeddings—high-dimensional vectors that capture their physical and stellar properties—and then asked which of those vectors lies closest to Earth’s?

From Chatbots to Cosmos

If you work in AI, you’ve probably heard of embeddings. They’re the secret sauce behind semantic search, recommendation systems, and natural language models.

Embeddings let search engines match your query to the most relevant document.
They allow Spotify to recommend the next song, or Netflix to suggest the next show.
They power chatbots, enabling them to understand the meaning behind words instead of just the words themselves.

At their core, embeddings map things into a vector space where closeness equals similarity. A word, a paragraph, an image—anything that can be represented numerically can be embedded.

So why not planets?

Planets as Vectors

Each exoplanet in the NASA archive comes with a rich set of features:

Orbital period and distance from the star.
Planetary radius and mass.
Orbital eccentricity.
Equilibrium temperature.
Host star temperature, radius, mass, and metallicity.

These numbers can be combined into a vector. Earth, for example, could be represented roughly as:

[365, 1.0, 1.0, 1.0, 0.0167, 288, 5778, 1.0, 0.0]

Each number corresponds to one physical or stellar property. Other planets have their own vectors. With these embeddings, we can compute cosine similarity: a measure of how close one planet is to another in this multi-feature space.

Ask the system: “Which planets are most like Earth?” and it will return a ranked list—not from a fixed formula, but from the geometry of the data itself.

Why This Matters

Flexibility
Instead of a one-size-fits-all index, embeddings let us define habitability in different ways. Do we care most about surface temperature? Orbital stability? Stellar metallicity? We can adjust the weights and re-run the search.
Integration of diverse data
Embeddings are not limited to numbers. We can add spectral data from telescopes, text descriptions from research papers, even atmospheric observations. A future “habitability embedding” could combine all of these.
Discovery through similarity
With embeddings, we don’t just rank planets—we can cluster them, find analogues, and identify outliers. It’s a richer exploration than reducing habitability to a single index.
Future-proofing
As telescopes like JWST and the upcoming ELT give us more detailed exoplanet data, embeddings allow us to extend the space without rewriting the entire system.

A Novel Perspective

From my search of the literature, astronomers haven’t widely adopted embedding-based methods for habitability yet. They use indices, clustering, and statistical likelihood models. But the concept of embedding planets into a similarity space and navigating that space—much like we navigate semantic spaces in NLP—is still fresh.

It’s a cross-pollination of fields: AI methods meeting astrophysical data.

Imagine a “cosmic search engine” where Earth is the query and the output is a shortlist of planets worth deeper study. Imagine dynamically adjusting what “habitability” means—whether for liquid water, atmospheric protection, or long-term climate stability—and letting the embeddings re-rank the candidates instantly.

Beyond the Numbers

Embeddings won’t tell us if a planet has life. For that, we need telescopes capable of detecting biosignatures—oxygen, methane, water vapor—in distant atmospheres.

But embeddings can help us get there faster. They can prioritize targets, highlight analogues, and suggest where to point our most precious instruments.

And they remind us of something profound: the tools we build for search engines, chatbots, and recommendation systems are not confined to our screens. They can be turned outward, to the stars.

Conclusion

The journey to find another Earth is not just about rockets and telescopes. It’s about data, interpretation, and imagination.

By representing planets as embeddings, we open up a new way of exploring habitability—one that is flexible, scalable, and deeply aligned with the way AI already helps us navigate information here on Earth.

One day, when the headlines announce “Earth-like planet discovered,” it might be thanks in part to the same mathematical trick that helps your phone autocomplete a sentence.

Embeddings may not just map our language. They may help us map the galaxy.