A Book from the Sky | Simon Shengyu Meng

date

type

status

slug

summary

The Intersection of Art and AI: Exploring “A Book from the Sky” by Gene Kogan From Time, Education and Culture Scales

Introduction

“A Book from the Sky” by Gene Kogan is a groundbreaking piece that bridges the domains of artificial intelligence (AI) and art. Created in 2015, “A Book from the Sky” utilizes Deep Convolutional Generative Adversarial Networks (DCGAN) to explore the latent space of Chinese handwriting, reflecting on the profound changes AI brings to art, culture, and society. The project signifies a pivotal moment in the evolution of AI, transitioning from recognition tasks to generative capabilities since 2015, highlighting AI’s potential to create novel, meaningful content from learned data.

Gene Kogan, an artist and programmer deeply interested in autonomous systems and generative art, has been instrumental in pushing the boundaries of what AI can achieve in creative fields. His work focuses on the intersection of technology and self-expression, advocating for the use of AI to democratize art and education. Kogan has been actively involved in various educational initiatives, including the “ml4a” (Machine Learning for Artists) project, which provides free resources to help artists integrate machine learning into their practice.

Scale of Time: The Development History of AI Art

Indicating the Turning Point in AI: From Perception to Generation

“A Book from the Sky” occupies a pivotal position in the timeline of contemporary AI development. It marks the transition from AI’s focus on recognition to its capabilities in generation since 2015. This transition is crucial in understanding how AI evolved from a tool primarily used for identifying and interpreting existing data to one that can create entirely new content.

Pre-2015 Focus on Recognition: Before 2015, AI’s primary applications were in recognition tasks. These tasks included object detection, facial recognition, and semantic segmentation, where AI systems were trained to identify and classify objects within images and videos. The technology’s primary role was to interpret and make sense of visual data.

Post-2015 Shift to Generation: The introduction of Generative Adversarial Networks (GANs) around 2014-2015 marked a significant shift in AI capabilities. GANs consist of two neural networks—the generator and the discriminator—working in tandem to produce new, synthetic data that resembles the training data. This innovation allowed AI to move beyond mere recognition to the creation of new images, text, and other media. However, the public’s awareness of this generative capability surged around 2022, after the release of ChatGPT, stable diffusion and midjourney, although the foundational shift occurred much earlier.

A Fossil of AI Evolution: “A Book from the Sky” serves as a “fossil” indicating the eve of the “Cambrian explosion” in AI generation. It showcases the nascent stage of AI’s creative capabilities, reflecting a moment in time when AI began to transcend its original functions and embark on a path towards generating original art. This piece not only highlights the technical advancements of its time but also provides a historical context for the rapid development and public recognition of AI’s generative abilities in subsequent years.

Positioned within this historical framework, “A Book from the Sky” not only underscores a significant technological milestone but also invites a deeper reflection on its educational and cultural implications. “A Book from the Sky” can be seen as a “fossil” from the early days of AI’s generative capabilities, predating the public’s broader recognition of AI’s creative potential by several years. The project is a testament to the significant shift in AI development, marking the transition from mere perception and recognition to the generation of original content. This evolution mirrors the broader “Cambrian explosion” of AI creativity that became more publicly acknowledged around 2022, even though the foundational technologies were developed much earlier.

Scale of Education: Public Understanding of AI

The Concept and Importance of Word Embedding and Latent Space

The concept of word embedding, proposed in 2013, revolutionized the way text is processed, enabling its conversion into structured, computable data formats. This development paved the way for advancements in Natural Language Processing (NLP) models, which is one of the key factors that made the training of Large Language Models (such as the GPT series) possible.

Word Embedding: Converts text into latent vectors, facilitating neural networks to encode and cluster text based on vector coordinates. This breakthrough enabled a more nuanced understanding of textual data, allowing for more sophisticated AI models that could interpret and generate human language with greater accuracy.

Visualization Challenge: Initially, the complex calculations in the latent space were challenging to represent visually, making it difficult for the public to grasp the underlying processes. The abstract nature of these computations often rendered them inaccessible to those without a technical background. Specifically, although we could create new vectors by interpolating between vectors from existing characters or words, we could not reverse the embedding operation to visualize the novel vector as words or character again..

Word Embedding Visualization: Semantic and coordinate relevance of word embeddings

First-Time Generation of Novel Characters from the Latent Space

By leveraging the principles of word embedding and GANs, Kogan’s work not only creates visually compelling art but also serves as an educational tool that helps the public understand the complex processes underlying AI-generated content. It demonstrates how AI can encode and manipulate textual information in a structured, computable format, making it accessible and visually interpretable for the first time. This project underscores the transformative potential of AI in both understanding and recreating human culture, offering a glimpse into the future of AI-driven creativity.

These “word vectors” can be expressed by equations such as {king} − {man} + {woman} = {queen}, despite having had no prior knowledge of these words’ meanings.

Reversing Word Embeddings: “A Book from the Sky” successfully reversed interpolated word embeddings into visual novel characters for the first time, combining DCGAN with word embedding features to reconstruct text from word vectors. This achievement not only showcased the potential of AI to generate new content but also provided a tangible way to visualize complex data processes, making them more accessible to the public.

Span the latent space between characters

The educational value of “A Book from the Sky” lies in its ability to demystify the processes behind AI-generated content. By visualizing the latent space and demonstrating the practical application of GANs and word embeddings, the project bridges the gap between complex technical concepts and public understanding. This work serves as an educational tool, illustrating the potential of AI to transform how we perceive and interact with textual and visual data.

Technical Background: GAN / DCGAN / Latent Space Vectors

Traditional GANs: Traditional Generative Adversarial Networks (GANs) revolutionized the field of AI by introducing a novel approach for generating images. This was achieved by having two neural networks—the generator and the discriminator. The generator creates synthetic images that attempt to mimic the distribution of the training set, while the discriminator evaluates the authenticity of these images, distinguishing between real and generated images. Through this adversarial process, both networks improve iteratively, leading to the generation of increasingly realistic images.

Enhancements with DCGANs: DCGANs took this a step further by replacing the fully connected layers in traditional GANs with convolutional neural networks (CNNs). This architectural enhancement improved both the training stability and the quality of the generated images. CNNs are particularly effective at capturing spatial hierarchies in images, making them ideal for generating high-quality visual content. By leveraging CNNs, DCGANs were able to produce more detailed and coherent images, closely matching the characteristics of the training data.

Latent Space Vectors in GANs: High-dimensional latent space vectors sampled during GAN image generation correlate with the image content. This correlation allows for the exploration of latent space to generate novel and meaningful images that maintain the inherent characteristics of the training data.

The high-dimensional latent space vectors randomly sampled during GAN image generation are correlated with the image content.

Treating Text as Images in DCGAN: In the “A Book from the Sky” project, instead of directly embedding the text format, Chinese characters were treated as images. Given that Chinese is a pictographic language, there is a strong correlation between the semantics and the graphical form of the characters. This project organized Chinese characters into a training set and used it to train a custom Deep Convolutional Generative Adversarial Network (DCGAN) model. Through this model, it became possible to correctly perform vector embedding, vector traveling & interpolation, and the previously mentioned vector visualization of Chinese characters.

Comparison of Real and Generated Chinese Character from Customized DCGAN

Scale of Culture: The Special Meaning to the History of Chinese Characters

Value for Pictographic Chinese Characters

Chinese characters hold a unique place in the world of writing systems due to their pictographic nature. Unlike alphabetic scripts, where letters are abstract symbols not directly related to their semantic content, Chinese characters often visually represent their meanings. This inherent quality of Chinese characters provides a rich tapestry for exploring the intersection of visual art and textual meaning through artificial intelligence. The original “A Book from the Sky” (1988 book version) by Xu Bing exemplifies this feature, as it includes thousands of fictional glyphs mimicking the traditional Chinese characters used in the woodblock prints of the Song and Ming dynasties. This work served as the main reference for “A Book from the Sky” by Gene Kogan, highlighting the profound visual and semantic dimensions of Chinese characters.

In “A Book from the Sky,” this pictographic quality is harnessed by treating Chinese characters as images rather than mere text. The project leverages the strong correlation between the semantics and graphical form of Chinese characters, organizing them into a comprehensive training set. This training set is then used to train a custom Deep Convolutional Generative Adversarial Network (DCGAN) model. The result is a sophisticated system capable of correct vector embedding, vector traveling, interpolation, and vector visualization of Chinese characters.

Exploring the latent space by visualisation the corresponding Chinese Characters

The approach showcases the first time how machines can perceive and recreate human text using visual and technical methods. By clustering texts and demonstrating correlations between characters, “A Book from the Sky” provides a visual and conceptual framework for understanding the latent space of Chinese characters. This not only highlights the aesthetic and cultural dimensions of AI-generated art but also bridges the gap between technology and traditional forms of cultural expression.

Value for Chinese Characters Evolutionary History

The evolution of Chinese characters has been marked by variation, simplification, elaboration, and amalgamation. Historically, this evolution reflects changes in writing habits, cultural shifts, and technological advancements. In “A Book from the Sky,” the latent interpolation achieved through neural networks essentially recreates this evolutionary process, offering the public a parallel perspective on the development of Chinese characters.

Historical Evolution: The project mirrors the historical evolution of Chinese characters, showcasing how characters have transformed over time. This is achieved through the latent space interpolation, which visually represents the gradual changes and adaptations in character forms.

The historical evolution of Chinese characters

Create novel Chinese character by Radical interpolating between similar characters.

Diverse Calligraphy Styles: Even within the same era, Chinese characters exhibit diverse styles such as Clerical Script, Cursive Script, and Running Script. “A Book from the Sky” reflects this diversification by presenting varied representations of the same characters, demonstrating the flexibility and richness of Chinese writing.

The diverse Chinese calligraphy generated by AI

By utilizing AI to present these characters in a AI perspective manner within an extremely short timeframe, “A Book from the Sky” showcases the evolution of writing that would typically occur over extensive periods and vast geographical scales. This rapid, AI “narrative” of Chinese character evolution is both striking and thought-provoking, particularly for audiences familiar with Chinese culture. It emphasizes the dynamic nature of Chinese characters and the cultural significance embedded in their forms.

Conclusion

“A Book from the Sky” by Gene Kogan exemplifies the transformative power of artificial intelligence across time, education, and culture. It marks a pivotal moment in AI’s evolution from recognition to generation, making complex AI processes accessible through visual representation for public audience. By building on Xu Bing’s 1988 work, it honors traditional Chinese culture while using contemporary AI techniques to explore the rich interplay between visual art and textual meaning. This project not only advances AI-generated art but also fosters a deeper understanding of AI’s potential to shape and reflect human culture and heritage.