AI News and Guides

Explore the best AI News and Guides — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Key frame

    Key frame

    In animation and filmmaking, a key frame (or keyframe) is a drawing or shot that defines the starting and ending points of a smooth transition. These are called frames because their position in time is measured in frames on a strip of film or on a digital video editing timeline. A sequence of key frames defines which movement the viewer will see, whereas the position of the key frames on the film, video, or animation defines the timing of the movement. Because only two or three key frames over the span of a second do not create the illusion of movement, the remaining frames are filled with "inbetweens". == Use of key frames as a means to change parameters == In software packages that support animation, especially 3D graphics, there are many parameters that can be changed for any one object. One example of such an object is a light. In 3D graphics, lights function similarly to real-world lights. They cause illumination, cast shadows, and create specular highlights. Lights have many parameters, including light intensity, beam size, light color, and the texture cast by the light. Supposing that an animator wants the beam size to change smoothly from one value to another within a predefined period of time, that could be achieved by using key frames. At the start of the animation, a beam size value is set. Another value is set for the end of the animation. Thus, the software program automatically interpolates the two values, creating a smooth transition. == Video editing == In non-linear digital video editing, as well as in video compositing software, a key frame is a frame used to indicate the beginning or end of a change made to a parameter. For example, a key frame could be set to indicate the point at which audio will have faded up or down to a certain level. == Video compression == In video compression, a key frame, also known as an intra-frame, is a frame in which a complete image is stored in the data stream. In video compression, only changes that occur from one frame to the next are stored in the data stream, in order to greatly reduce the amount of information that must be stored. This technique capitalizes on the fact that most video sources (such as a typical movie) have only small changes in the image from one frame to the next. Whenever a drastic change to the image occurs, such as when switching from one camera shot to another or at a scene change, a key frame must be created. The entire image for the frame must be output when the visual difference between the two frames is so great that representing the new image incrementally from the previous frame would require more data than recreating the whole image. Because video compression only stores incremental changes between frames (except for key frames), it is not possible to fast-forward or rewind to any arbitrary spot in the video stream. That is because the data for a given frame only represents how that frame was different from the preceding one. For that reason, it is beneficial to include key frames at arbitrary intervals while encoding video. For example, a key frame may be output once for each 10 seconds of video, even though the video image does not change enough visually to warrant the automatic creation of the key frame. That would allow seeking within the video stream at a minimum of 10-second intervals. The downside is that the resulting video stream will be larger in disk size because many key frames are added when they are not necessary for the frame's visual representation. This drawback, however, does not produce significant compression loss when the bitrate is already set at a high value for better quality (as in the DVD MPEG-2 format).

    Read more →
  • Ontology merging

    Ontology merging

    Ontology merging defines the act of bringing together two conceptually divergent ontologies or the instance data associated to two ontologies. This is similar to work in database merging (schema matching). This merging process can be performed in a number of ways, manually, semi automatically, or automatically. Manual ontology merging although ideal is extremely labour-intensive and current research attempts to find semi or entirely automated techniques to merge ontologies. These techniques are statistically driven often taking into account similarity of concepts and raw similarity of instances through textual string metrics and semantic knowledge. These techniques are similar to those used in information integration employing string metrics from open source similarity libraries.

    Read more →
  • Environmental informatics

    Environmental informatics

    Environmental informatics is the science of information applied to environmental science. As such, it provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences aiming at data, information and knowledge integration, the application of computational intelligence to environmental data as well as the identification of environmental impacts of information technology. Environmental informatics thus acts as a bridge, providing an interdisciplinary means of analysing, describing and understanding the complex interactions between humans, nature and technology. Since each field of applied computer science has its own subject matter, terminology and methods, specialised disciplines, such as environmental, bio- and geoinformatics have emerged, each of which combines computer science with a specific field of application such as environmental, bio- or geosciences. Environmental informatics, bioinformatics and geoinformatics all deal with computer-based processing of environmental phenomena. However, environmental informatics is the only field that pursues normative goals (e.g., political goals of environmental protection, environmental planning, and sustainability). This also influences the choice of methods. This also distinguishes it from application areas such as numerical weather prediction, which is considered an early and important example of computer simulation of environmental phenomena. The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information." Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs." Karatzas argued further that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information and software technology methods and tools for serving environmental engineering needs." Environmental informatics emerged in early 1990 in Central Europe. Current initiatives to effectively manage, share, and reuse environmental and ecological data are indicative of the increasing importance of fields like environmental informatics and ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE and Data Conservancy. == Subject matter and objectives == The subject of environmental informatics are environmental information systems (EIS). An EIS 'is a computer-based system that integrates and stores data collected about the natural environment and provides powerful methods for accessing and evaluating it.' This allows environmental data to be processed by computers for environmental protection, planning, research and technology. According to Jaeschke and Bossel, environmental informatics has three interrelated objectives: Environmental informatics serves to procure data and information for describing the state and development of the environment. Of particular importance is information that is needed to prevent or limit undesirable changes and to support desirable changes. Based on the evaluation and analysis of data, environmental informatics improves our understanding of the environment and the interactions between nature, technology and society. It thus supports environmentally relevant decisions. This enables the influence of development (system correction), the assessment of the effects and side effects of potential measures, and the creation of tools for the routine planning, implementation and monitoring of measures. == History == The simulation model World3, which formed the basis of the highly acclaimed study The Limits to Growth, is considered the starting point of environmental informatics. It incorporated environmental information, among other things, to calculate scenarios for global development. In the mid-1980s, interest grew in structuring environmental protection as an area of application for computer science. One of the first publications in German was the book Informatik im Umweltschutz. Anwendungen und Perspektiven (Computer science in environmental protection. Applications and perspectives) from 1986. The term 'environmental informatics' did not appear until around 1993, which is why the development of environmental informatics is usually referred to as having taken place in the 1990s. In 1993, the first university chair for environmental informatics was established in Cottbus. In 1994, the anthology Umweltinformatik. Informatikmethoden für Umweltschutz und Umweltforschung (Environmental Informatics: Informatics Methods for Environmental Protection and Environmental Research) was published. The development of environmental informatics was 'primarily initiated by German computer science.' In the English-speaking world, the volume Environmental Informatics was published in 1995, mainly based on the German anthology of 1994. An article in the conference proceedings of the World Computer Congress of the International Federation for Information Processing (IFIP) in Hamburg in 1994 describes the initial situation of environmental informatics as follows: 'On the one hand, we suffer from the huge amount of available data – people sometimes speak of data graveyards – on the other hand, the really relevant data may still be missing.' This statement indicates the need that led to the emergence of environmental informatics as a specialised discipline of applied computer science. Furthermore, the specific characteristics and processing requirements of environmental data necessitated the emergence of environmental informatics. The special features of environmental data include: The data structures required are highly heterogeneous due to specific processes and differing perspectives on environmental aspects (e.g., water protection, emission control, hazardous substances). In addition to the heterogeneity of the data, heterogeneous databases also play a role, as environmental data is often obtained and presented in an interdisciplinary manner. Obligations change frequently as a result of new legislation, whether regional (e.g. state regulations on water protection), national (e.g. federal emission control regulations) or international (e.g. Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH). The objects represented are often multidimensional and, therefore, require complex geometric representation using curves or polygons. It is often necessary to process uncertain, imprecise or incomplete data, which is, for example, the result of extrapolations or forecasts. A new "knowledge paradigm" has emerged to meet the requirements of environmental management. Environmental informatics produces its own concepts, methods and techniques and is not merely the result of using information and communication technology methods and tools to meet environmental requirements. The development of environmental informatics since the 1990s has been significantly influenced by the newly established conferences EnviroInfo, ISESS and ITEE and is documented in the respective proceedings. Aspects of sustainability and sustainable development were increasingly integrated into environmental informatics after 2000, thereby expanding the field. In 2004, the Working Group on Sustainable Information Society of the Gesellschaft für Informatik e. V. (German Informatics Society, GI) published the Memorandum on a Sustainable Information Society, which formulates recommendations for an information society that is compatible with human, social and natural needs. Since 2007, environmental informatics has often been described in more detail as informatics for environmental protection, sustainable development and risk management. The increased focus on sustainability has also contributed to the formation of the research focus Information and Communications Technology for Sustainability (ICT4S) and to the emergence of the international conference ICT4S in 2013. ICT-ENSURE, the European Commission's funding measure for the establishment of a European research area on "ICT for Environmental Sustainability Research" (2008–2010), has also contributed to the structuring of environmental informatics. == Environmental informatics and sustainable development == Efforts to place environmental informatics within the context of sustainable development have been growing since 2000 and were significantly influenced by the Memorandum on a Sustainable Information Society. According to this Memorandum, the information society offers great but unevenly distributed opportunities for education, participation and intercultural understanding. In addition, the Memorandum highlighted the material and energy consumption of inf

    Read more →
  • CENDI

    CENDI

    CENDI (Commerce, Energy, NASA, Defense Information Managers Group) is an interagency group of senior Scientific and Technical Information (STI) managers from 14 United States federal agencies. CENDI managers cooperate by exchanging information and ideas, collaborating to address common issues, and undertaking joint initiatives. CENDI's accomplishments range from impacting federal information policy to educating a broad spectrum of stakeholders on all aspects of federal STI systems, including its value to research and the taxpayer, and to operational improvements in agency and interagency STI operations. == History == CENDI traces its roots to the Committee on Scientific and Technical Information (COSATI) of the Federal Council on Science and Technology. COSATI was established in the early 1960s to coordinate the management of the results from the U.S. government's increasing commitment to scientific research and technology development. The scientific and technical information (STI) managers of the government's major research and development (R&D) agencies worked within COSATI to standardize guidelines for cataloging and indexing technical reports. COSATI ceased formal operations in the early 1970s. To continue the cooperation begun under COSATI, managers of agency STI programs from Commerce (National Technical Information Service), Energy (Office of Scientific and Technical Information), NASA (HQ/STI Division), and Defense (Defense Technical Information Center) began meeting periodically to discuss common topics and stimulate more effective cooperation. In 1985, a Memorandum of Understanding was signed by the four charter agencies and CENDI was established. From this small core of STI managers, CENDI has grown to its current membership, which represents the major science agencies, the national libraries, and agencies involved in the dissemination and long-term management of scientific and technical information. The vision of CENDI is to facilitate cooperative enterprise where capabilities are shared and challenges are faced together so that the sum of the accomplishments is greater than each individual agency can achieve on its own amongst federal STI agencies. The abbreviation CENDI refers to the "Commerce, Energy, NASA, Defense Information Managers Group". == Membership == New members from other federal R&D information organizations may be admitted by unanimous agreement of the members. However, it is the intent of the group that membership in CENDI should remain small and focus on organizations with STI or supporting responsibilities. Each agency provides funding to CENDI. == Members == The members of CENDI are: Defense Technical Information Center (United States Department of Defense) Office of Research and Development and Office of Environmental Information (United States Environmental Protection Agency) Government Printing Office Library of Congress NASA Scientific and Technical Information Program National Agricultural Library (United States Department of Agriculture) National Archives and Records Administration National Library of Education (United States Department of Education) National Library of Medicine (United States Department of Health and Human Services) National Science Foundation National Technical Information Service (United States Department of Commerce) National Transportation Library (United States Department of Transportation) Office of Scientific and Technical Information (United States Department of Energy) USGS/Biological Resources Discipline (United States Department of the Interior) == Mission and operation == CENDI's mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information support systems. In fulfilling its mission, CENDI agencies play an important role in addressing science- and technology-based national priorities and strengthening U.S. competitiveness. === Goals === STI Coordination and Leadership: Provide coordination and leadership for information exchange on important STI policy issues. Improvement of STI Systems: Promote the development of improved STI systems through the productive interrelationship of content and technology. STI Understanding: Promote better understanding of STI and STI management. === Principals and Alternates === CENDI is made up of senior federal STI managers and each organization appoints a Principal representative. This person is the point of contact for that organization within CENDI. Each Principal has an Alternate. The Principals and Alternates comprise the main group that meets on a regular basis, usually every other month. === Secretariat === A Tennessee-based information management company, -- Information International Associates, Inc., currently serves as the CENDI Secretariat. The Secretariat provides day-to-day operations to CENDI. The Secretariat prepares the necessary materials for the Principals' meetings, provides support for the working group and task group meetings, assists in developing papers, and maintains the CENDI files and outreach tools. === Task Groups and Working Groups === The chair(s) of a working group is appointed by the Principals and has the overall responsibility for the group's activities. The Secretariat provides support at the request of the Working Group chair(s). The Working Groups and Task Groups that are currently operating are: Copyright and Intellectual Property Working Group Distribution Markings Task Group Digital Preservation Task Group Digitization Specifications Task Group Image Metadata Task Group Science.gov (see below) STI Policy Working Group Terminology Resources Task Group === Science.gov and Worldwidescience.org === In 2001, in response to the April 2001 workshop on "Strengthening the Public Information Infrastructure for Science", and taking into consideration a request from Firstgov (now USA.gov) to develop specialized topical portals, CENDI formed an alliance to develop an interagency website for access to STI. This website, called Science.gov, is a one-stop source of STI, including both selected, authoritative government websites and deep Web databases of technical reports, journal articles, conference proceedings, and other published materials. Through the volunteer efforts of members and involving over 100 staff, content and architecture is developed for the site. The Science.gov website is hosted by the Department of Energy (DOE) Office of Scientific and Technical Information (OSTI). The site was formally launched in December 2002. As a result of the success of Science.gov, under DOE leadership and in cooperation with the International Council of Scientific and Technical Information, a worldwide coordination across national portals called WorldWideScience was launched in 2008. === Work with non-member organizations === CENDI works with several cooperating non-member organizations on a regular basis. These agencies are in academia, federal government, legal and policy analysis, international, non-governmental, and private organizations.

    Read more →
  • The Master Algorithm

    The Master Algorithm

    The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World is a book by Pedro Domingos released in 2015. Domingos wrote the book in order to generate interest from people outside the field. == Overview == The book outlines five approaches of machine learning: inductive reasoning, connectionism, evolutionary computation, Bayes' theorem and analogical modelling. The author explains these tribes to the reader by referring to more understandable processes of logic, connections made in the brain, natural selection, probability and similarity judgments. Throughout the book, it is suggested that each different tribe has the potential to contribute to a unifying "master algorithm". Towards the end of the book the author pictures a "master algorithm" in the near future, where machine learning algorithms asymptotically grow to a perfect understanding of how the world and people in it work. Although the algorithm doesn't yet exist, he briefly reviews his own invention of the Markov logic network. == In the media == In 2016 Bill Gates recommended the book, alongside Nick Bostrom's Superintelligence, as one of two books everyone should read to understand AI. In 2018 the book was noted to be on Chinese Communist Party general secretary Xi Jinping's bookshelf. === Reception === A computer science educator stated in Times Higher Education that the examples are clear and accessible. In contrast, The Economist agreed Domingos "does a good job" but complained that he "constantly invents metaphors that grate or confuse". Kirkus Reviews praised the book, stating that "Readers unfamiliar with logic and computer theory will have a difficult time, but those who persist will discover fascinating insights." A New Scientist review called it "compelling but rather unquestioning".

    Read more →
  • Holographic algorithm

    Holographic algorithm

    In computer science, a holographic algorithm is an algorithm that uses a holographic reduction. A holographic reduction is a constant-time reduction that maps solution fragments many-to-many such that the sum of the solution fragments remains unchanged. These concepts were introduced by Leslie Valiant, who called them holographic because "their effect can be viewed as that of producing interference patterns among the solution fragments". The algorithms are unrelated to laser holography, except metaphorically. Their power comes from the mutual cancellation of many contributions to a sum, analogous to the interference patterns in a hologram. Holographic algorithms have been used to find polynomial-time solutions to problems without such previously known solutions for special cases of satisfiability, vertex cover, and other graph problems. They have received notable coverage due to speculation that they are relevant to the P versus NP problem and their impact on computational complexity theory. Although some of the general problems are #P-hard problems, the special cases solved are not themselves #P-hard, and thus do not prove FP = #P. Holographic algorithms have some similarities with quantum computation, but are completely classical. == Holant problems == Holographic algorithms exist in the context of Holant problems, which generalize counting constraint satisfaction problems (#CSP). A #CSP instance is a hypergraph G=(V,E) called the constraint graph. Each hyperedge represents a variable and each vertex v {\displaystyle v} is assigned a constraint f v . {\displaystyle f_{v}.} A vertex is connected to an hyperedge if the constraint on the vertex involves the variable on the hyperedge. The counting problem is to compute ∑ σ : E → { 0 , 1 } ∏ v ∈ V f v ( σ | E ( v ) ) , ( 1 ) {\displaystyle \sum _{\sigma :E\to \{0,1\}}\prod _{v\in V}f_{v}(\sigma |_{E(v)}),~~~~~~~~~~(1)} which is a sum over all variable assignments, the product of every constraint, where the inputs to the constraint f v {\displaystyle f_{v}} are the variables on the incident hyperedges of v {\displaystyle v} . A Holant problem is like a #CSP except the input must be a graph, not a hypergraph. Restricting the class of input graphs in this way is indeed a generalization. Given a #CSP instance, replace each hyperedge e of size s with a vertex v of degree s with edges incident to the vertices contained in e. The constraint on v is the equality function of arity s. This identifies all of the variables on the edges incident to v, which is the same effect as the single variable on the hyperedge e. In the context of Holant problems, the expression in (1) is called the Holant after a related exponential sum introduced by Valiant. == Holographic reduction == A standard technique in complexity theory is a many-one reduction, where an instance of one problem is reduced to an instance of another (hopefully simpler) problem. However, holographic reductions between two computational problems preserve the sum of solutions without necessarily preserving correspondences between solutions. For instance, the total number of solutions in both sets can be preserved, even though individual problems do not have matching solutions. The sum can also be weighted, rather than simply counting the number of solutions, using linear basis vectors. === General example === It is convenient to consider holographic reductions on bipartite graphs. A general graph can always be transformed it into a bipartite graph while preserving the Holant value. This is done by replacing each edge in the graph by a path of length 2, which is also known as the 2-stretch of the graph. To keep the same Holant value, each new vertex is assigned the binary equality constraint. Consider a bipartite graph G=(U,V,E) where the constraint assigned to every vertex u ∈ U {\displaystyle u\in U} is f u {\displaystyle f_{u}} and the constraint assigned to every vertex v ∈ V {\displaystyle v\in V} is f v {\displaystyle f_{v}} . Denote this counting problem by Holant ( G , f u , f v ) . {\displaystyle {\text{Holant}}(G,f_{u},f_{v}).} If the vertices in U are viewed as one large vertex of degree |E|, then the constraint of this vertex is the tensor product of f u {\displaystyle f_{u}} with itself |U| times, which is denoted by f u ⊗ | U | . {\displaystyle f_{u}^{\otimes |U|}.} Likewise, if the vertices in V are viewed as one large vertex of degree |E|, then the constraint of this vertex is f v ⊗ | V | . {\displaystyle f_{v}^{\otimes |V|}.} Let the constraint f u {\displaystyle f_{u}} be represented by its weighted truth table as a row vector and the constraint f v {\displaystyle f_{v}} be represented by its weighted truth table as a column vector. Then the Holant of this constraint graph is simply f u ⊗ | U | f v ⊗ | V | . {\displaystyle f_{u}^{\otimes |U|}f_{v}^{\otimes |V|}.} Now for any complex 2-by-2 invertible matrix T (the columns of which are the linear basis vectors mentioned above), there is a holographic reduction between Holant ( G , f u , f v ) {\displaystyle {\text{Holant}}(G,f_{u},f_{v})} and Holant ( G , f u T ⊗ ( deg ⁡ u ) , ( T − 1 ) ⊗ ( deg ⁡ v ) f v ) . {\displaystyle {\text{Holant}}(G,f_{u}T^{\otimes (\deg u)},(T^{-1})^{\otimes (\deg v)}f_{v}).} To see this, insert the identity matrix T ⊗ | E | ( T − 1 ) ⊗ | E | {\displaystyle T^{\otimes |E|}(T^{-1})^{\otimes |E|}} in between f u ⊗ | U | f v ⊗ | V | {\displaystyle f_{u}^{\otimes |U|}f_{v}^{\otimes |V|}} to get f u ⊗ | U | f v ⊗ | V | {\displaystyle f_{u}^{\otimes |U|}f_{v}^{\otimes |V|}} = f u ⊗ | U | T ⊗ | E | ( T − 1 ) ⊗ | E | f v ⊗ | V | {\displaystyle =f_{u}^{\otimes |U|}T^{\otimes |E|}(T^{-1})^{\otimes |E|}f_{v}^{\otimes |V|}} = ( f u T ⊗ ( deg ⁡ u ) ) ⊗ | U | ( f v ( T − 1 ) ⊗ ( deg ⁡ v ) ) ⊗ | V | . {\displaystyle =\left(f_{u}T^{\otimes (\deg u)}\right)^{\otimes |U|}\left(f_{v}(T^{-1})^{\otimes (\deg v)}\right)^{\otimes |V|}.} Thus, Holant ( G , f u , f v ) {\displaystyle {\text{Holant}}(G,f_{u},f_{v})} and Holant ( G , f u T ⊗ ( deg ⁡ u ) , ( T − 1 ) ⊗ ( deg ⁡ v ) f v ) {\displaystyle {\text{Holant}}(G,f_{u}T^{\otimes (\deg u)},(T^{-1})^{\otimes (\deg v)}f_{v})} have exactly the same Holant value for every constraint graph. They essentially define the same counting problem. === Specific examples === ==== Vertex covers and independent sets ==== Let G be a graph. There is a 1-to-1 correspondence between the vertex covers of G and the independent sets of G. For any set S of vertices of G, S is a vertex cover in G if and only if the complement of S is an independent set in G. Thus, the number of vertex covers in G is exactly the same as the number of independent sets in G. The equivalence of these two counting problems can also be proved using a holographic reduction. For simplicity, let G be a 3-regular graph. The 2-stretch of G gives a bipartite graph H=(U,V,E), where U corresponds to the edges in G and V corresponds to the vertices in G. The Holant problem that naturally corresponds to counting the number of vertex covers in G is Holant ( H , OR 2 , EQUAL 3 ) . {\displaystyle {\text{Holant}}(H,{\text{OR}}_{2},{\text{EQUAL}}_{3}).} The truth table of OR2 as a row vector is (0,1,1,1). The truth table of EQUAL3 as a column vector is ( 1 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ) T = [ 1 0 ] ⊗ 3 + [ 0 1 ] ⊗ 3 {\displaystyle (1,0,0,0,0,0,0,1)^{T}={\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}} . Then under a holographic transformation by [ 0 1 1 0 ] , {\displaystyle {\begin{bmatrix}0&1\\1&0\end{bmatrix}},} OR 2 ⊗ | U | EQUAL 3 ⊗ | V | {\displaystyle {\text{OR}}_{2}^{\otimes |U|}{\text{EQUAL}}_{3}^{\otimes |V|}} = ( 0 , 1 , 1 , 1 ) ⊗ | U | ( [ 1 0 ] ⊗ 3 + [ 0 1 ] ⊗ 3 ) ⊗ | V | {\displaystyle =(0,1,1,1)^{\otimes |U|}\left({\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}\right)^{\otimes |V|}} = ( 0 , 1 , 1 , 1 ) ⊗ | U | [ 0 1 1 0 ] ⊗ | E | [ 0 1 1 0 ] ⊗ | E | ( [ 1 0 ] ⊗ 3 + [ 0 1 ] ⊗ 3 ) ⊗ | V | {\displaystyle =(0,1,1,1)^{\otimes |U|}{\begin{bmatrix}0&1\\1&0\end{bmatrix}}^{\otimes |E|}{\begin{bmatrix}0&1\\1&0\end{bmatrix}}^{\otimes |E|}\left({\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}\right)^{\otimes |V|}} = ( ( 0 , 1 , 1 , 1 ) [ 0 1 1 0 ] ⊗ 2 ) ⊗ | U | ( ( [ 0 1 1 0 ] [ 1 0 ] ) ⊗ 3 + ( [ 0 1 1 0 ] [ 0 1 ] ) ⊗ 3 ) ⊗ | V | {\displaystyle =\left((0,1,1,1){\begin{bmatrix}0&1\\1&0\end{bmatrix}}^{\otimes 2}\right)^{\otimes |U|}\left(\left({\begin{bmatrix}0&1\\1&0\end{bmatrix}}{\begin{bmatrix}1\\0\end{bmatrix}}\right)^{\otimes 3}+\left({\begin{bmatrix}0&1\\1&0\end{bmatrix}}{\begin{bmatrix}0\\1\end{bmatrix}}\right)^{\otimes 3}\right)^{\otimes |V|}} = ( 1 , 1 , 1 , 0 ) ⊗ | U | ( [ 0 1 ] ⊗ 3 + [ 1 0 ] ⊗ 3 ) ⊗ | V | {\displaystyle =(1,1,1,0)^{\otimes |U|}\left({\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}\right)^{\otimes |V|}} = NAND 2 ⊗ | U | EQUAL 3 ⊗ | V | , {\displaystyle ={\text{NAND}}_{2}^{\otim

    Read more →
  • E-Science librarianship

    E-Science librarianship

    E-Science librarianship refers to a role for librarians in e-Science. == Early scholars == Early references to e-Science and librarianship involve information studies scholars researching cyberinfrastructure and emerging networked information and knowledge communities. Notably Christine Borgman, Professor and Presidential Chair in Information Studies at the University of California, Los Angeles (UCLA) was a key player in bringing e-Science, and the idea of networked knowledge communities, to the attention of the library profession. In 2004, as a visiting fellow at the Oxford Internet Institute, she conducted research and lectured publicly on e-Science, Digital Libraries, and Knowledge Communities. In 2007 Anna K. Gold, formerly of MIT and Cal Poly, San Luis Obispo, authored a series of articles in D-Lib Magazine that opened the door for academic libraries to begin exploring roles, skills, and strategies for engaging in e-Science: Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians and Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries. == Academic research and health sciences libraries == In 2007, the Association of Research Libraries (ARL) e-Science task force issued its report on e-Science and librarianship. The ARL's report encouraged its member libraries to position themselves to engage with researchers involved in e-Science (eScience) by cultivating new research support strategies and developing their digital scholarship infrastructure. E-Science has multiple attributes; Tony and Jessie Hey framed e-Science for the library community by characterizing it as a research methodology: "e-Science is not a new scientific discipline in its own right: e-Science is shorthand for the set of tools and technologies required to support collaborative, networked science". In addition to academic libraries' interests in providing support for their researchers engaging in e-Science, the health sciences library community also emerged as a major proponent for creating librarian positions for supporting the information needs of large-scale, networked, research collaborations on their campuses. Neil Rambo, current director of NYU's Health Sciences Library and former director of University of Washington Health Sciences Library, was the first to use the term in the Journal of the Medical Library Association, in his 2009 editorial e-Science and the Biomedical Library. Rambo's definition of e-Science highlighted the potential e-Science held for creating data as a research product: "E-science is a new research methodology, fueled by networked capabilities and the practical possibility of gathering and storing vast amounts of data." In response to this article the University of Massachusetts Medical School Lamar Soutter Library and National Network of Libraries of Medicine, New England Region encouraged health sciences libraries to cooperate to identify skills and develop a program for training e-Science Librarians. Then, in 2013, Shannon Bohle, an archivist who was employed in the library at Cold Spring Harbor Laboratory, an NCI-designated basic cancer research facility, used experience gained there and previous papers and presentations about preserving scientific archival materials to expand the traditional definition of e-Science by including the terms, principles, and practices used in archival science. These included in the definition the "long-term storage and accessibility of all materials generated through the scientific process," as well as examples of material types traditionally preserved in archives, like "electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints," as well as library materials ("print and/or electronic publications"). == Roles == Many areas of science are about to be transformed by the availability of vast amounts of new scientific data that can potentially provide insights at a level of detail never before envisaged. However, this new data dominant era brings new challenges for the scientists and they will need the skills and technologies both of computer scientists and of the library community to manage, search and curate these new data resources. Libraries will not be immune from change in this new world of research. Karen Williams identifies roles in the following areas for librarians in the developing world of e-Science. Campus Engagement Content/Collection Development and Management Teaching and Learning Scholarly Communication E-Scholarship and Digital Tools Reference/Help Services Outreach Fund Raising Exhibit and Event Planning Leadership == Challenges for research libraries == E-science tends toward inter- and multidisciplinary approaches that depend on computation and computer science. Research libraries have traditionally been discipline focused and, although increasingly technologically sophisticated, do not have systems of the scale or complexity of the e-science environment. E-science is data intensive, but research libraries have not typically been responsible for scientific data. E-science is frequently conducted in a team context, often distributed across multiple institutions and on a global scale. The primary constituency of libraries generally comprises those affiliated with the local institution. Licenses for electronic content are typically restricted to a particular institutional community, and the infrastructure to move institutional licenses into a multi-institutional environment is not well developed. E-science challenges all these traditional paradigms of research library organization and services. == Skills == Garritano & Carlson were among the first to outline a skill set for librarians seeking to support the data needs of e-Science; they identified five skill categories librarians new to this area should expect to adapt or develop when participating on such projects: Library and information science expertise Subject expertise Partnerships and outreach (both internal and external) Participating in sponsored research Balancing workload An example of librarians reconfiguring traditional librarian skills to meet the needs of researchers engaging in e-Science is Witt & Carlson's adaptation of the traditional reference interview into a "data interview" in order to provide effective data management and e-Science services. This interview consists of ten practical queries necessary for understanding the provenance and expectations for the preservation of datasets typical of e-Science that also help illustrate some of the educational tools and skills needed by a librarian new to e-Science. "What is the story of the data? What form and format are the data in? What is the expected lifespan of the dataset? How could the data be used, reused, and repurposed? How large is the dataset, and what is its rate of growth? Who are the potential audiences for the data? Who owns the data? Does the dataset include any sensitive information? What publications or discoveries have resulted from the data? How should the data be made accessible?" == Resources == In 2009 the Lamar Soutter Library at the University of Massachusetts Medical School (UMMS) and the National Network of Libraries of Medicine, New England Region (NN/LM NER) funded an e-Science program for building the skills highlighted above for librarians. Elaine Russo Martin, Director of Library Services at the Lamar Soutter Library and Director of the NN/LM NER developed this comprehensive e-Science program to build librarians' subject expertise in the sciences, developing their data management skills, and their familiarity with cyberinfrastructure and e-Science. Three major products of this program are the e-Science web portal for librarians, the E-Science Symposium, and the New England Collaborative Data Management Curriculum (NECDMC). This portal includes educational resources for specific tools and subject/discipline tutorials and modules to assist librarians new to e-Science. UMMS and NN/LM NER also publish an open access journal called the Journal of eScience Librarianship.

    Read more →
  • Document capture software

    Document capture software

    Document capture software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process. == Typical features == Typical features of Document Capture Software include: Barcode recognition Patch Code recognition Separation Optical Character Recognition (OCR) Optical Mark Recognition (OMR) Quality Assurance Indexing Migration === Goal for implementation of a document capture solution === The goal for implementing a document capture solution is to reduce the amount of time spent scanning, separating, enhancing, organizing, classifying, normalizing, and collecting information from document collections, and to produce metadata along with an image/PDF file, and/or OCR text. This information is then migrated to a file share, FTP site, database, Document Management or Enterprise Content Management system. These systems often provide a search function, allowing search of the assets based on the produced metadata, and then viewed using document imaging software. == General document capture system solutions == === Integration with document management system === ECM (Enterprise Content management) and their DMS component (Document Management System) are being adopted by many organizations as a corporate document management system for all types of electronic files, e.g. MS word, PDF ... However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository. By converting paper documents into digital format through scanning, organizations convert paper into image formats such as TIF, JPG, and PDF, and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in the ECM in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents by using keywords in the metadata fields or by searching the content of PDF files across the repository. ==== Advantages of scanning documents into a ECM/DMS ==== Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as SharePoint or a CMIS-compatible repository improves collaboration between departments or employees and also eliminates the risk of losing this information through disasters such as floods or fire. Organisations adopting an ECM/DMS often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails. For business critical documents, such as purchase orders and supplier invoices, digitising documents helps speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow. == Electronic document capture == In the earlier implementations of Document Capture Software, the technology focused solely on the digitization and capture of information from paper documents. Document images were acquired from document scanners via TWAIN/ISIS drivers. Only image-based file formats like TIF, JPG, and BMP were typically compatible with these solutions. But in recent years, as the volume of electronically-created documents and the number of proprietary file formats continues to increase at exponential rates, the need for handling documents existing in electronic formats has grown. The relevant document capture products have adapted to function with non-image file formats with the end-goal of creating a unified processing workflow capable of handling all incoming documents The ability to import files from a variety of sources is one example of such adaptation. Importing documents from ECM/DMS software solutions, email servers, FTP, and EDI is now as much of a requirement of document capture software as is paper capture. The normalization of output files to text-based PDF format is now another critical factor in long-term archival of proprietary electronic file formats. Normalization expands access and usage of files to users throughout the enterprise, rather than only those that created the original electronic file.

    Read more →
  • Digital sculpting

    Digital sculpting

    Digital sculpting, also known as sculpt modeling or 3D sculpting, is the use of software that offers tools to push, pull, smooth, grab, pinch or otherwise manipulate a digital object as if it were made of a real-life substance such as clay. == Sculpting technology == The geometry used in digital sculpting programs to represent the model can vary; each offers different benefits and limitations. The majority of digital sculpting tools on the market use mesh-based geometry, in which an object is represented by an interconnected surface mesh of polygons that can be pushed and pulled around. This is somewhat similar to the physical process of beating copper plates to sculpt a scene in relief. Other digital sculpting tools use voxel-based geometry, in which the volume of the object is the basic element. Material can be added and removed, much like sculpting in clay. Still other tools make use of more than one basic geometry representation. A benefit of mesh-based programs is that they support sculpting at multiple resolutions on a single model. Areas of the model that are finely detailed can have very small polygons while other areas can have larger polygons. In many mesh-based programs, the mesh can be edited at different levels of detail, and the changes at one level will propagate to higher and lower levels of model detail. A limitation of mesh-based sculpting is the fixed topology of the mesh; the specific arrangement of the polygons can limit the ways in which detail can be added or manipulated. A benefit of voxel-based sculpting is that voxels allow complete freedom over form. The topology of a model can be altered continually during the sculpting process as material is added and subtracted, which frees the sculptor from considering the layout of polygons on the model's surface. After sculpting, it may be necessary to retopologize the model to obtain a clean mesh for use in animation or real-time rendering. Voxels, however, are more limited in handling multiple levels of detail. Unlike mesh-based modeling, broad changes made to voxels at a low level of detail may completely destroy finer details. == Uses == Sculpting can often introduce details to meshes that would otherwise have been difficult or impossible to create using traditional 3D modeling techniques. This makes it preferable for achieving photorealistic and hyperrealistic results, though, many stylized results are achieved as well. Sculpting is primarily used in high poly organic modeling (the creation of 3D models which consist mainly of curves or irregular surfaces, as opposed to hard surface modeling). It is also used by auto manufacturers in their design of new cars. It can create the source meshes for low poly game models used in video games. In conjunction with other 3D modeling and texturing techniques and Displacement and Normal mapping, it can greatly enhance the appearance of game meshes often to the point of photorealism. Some sculpting programs like 3D-Coat, Zbrush, and Mudbox offer ways to integrate their workflows with traditional 3D modeling and rendering programs. Conversely, 3D modeling applications like 3ds Max, Maya and MODO are now incorporating sculpting capability as well, though these are usually less advanced than tools found in sculpting-specific applications. High poly sculpts are also extensively used in CG artwork for movies, industrial design, art, photorealistic illustrations, and for prototyping in 3D printing. == 3D print == Sculptors and digital artists use digital sculpting to create a model (or Digital Twin) to be materialized through CNC technologies including 3D printing. The final sculptures are often called Digital Sculpture or 3D printed art. While digital technologies have emerged in many art disciplines (painting, photography), this is less the case for digital sculpture due to the higher complexity and technology limitations to produce the final sculpture. == Sculpting Process == The best way to learn sculpture is by understanding primary, secondary and tertiary forms. First, break down the object you want to make down its basic shapes, such as a sphere or cube. Focus on making the large, overall shape of the object. After that, work on the bigger shapes on top of or inside the object. These can be protrusions or cut outs. Then, do a final detail pass, such as pores or lines to break up the shape. == Sculpting programs == There are a number of digital sculpting tools available. Some popular tools for creating are: Traditional 3D modeling suites are also beginning to include sculpting capability. 3D modeling programs which currently feature some form of sculpting include the following:

    Read more →
  • Communication-avoiding algorithm

    Communication-avoiding algorithm

    Communication-avoiding algorithms minimize movement of data within a memory hierarchy for improving its running-time and energy consumption. These minimize the total of two costs (in terms of time and energy): arithmetic and communication. Communication, in this context refers to moving data, either between levels of memory or between multiple processors over a network. It is much more expensive than arithmetic. == Formal theory == === Two-level memory model === A common computational model in analyzing communication-avoiding algorithms is the two-level memory model: There is one processor and two levels of memory. Level 1 memory is infinitely large. Level 0 memory ("cache") has size M {\displaystyle M} . In the beginning, input resides in level 1. In the end, the output resides in level 1. Processor can only operate on data in cache. The goal is to minimize data transfers between the two levels of memory. === Matrix multiplication === Corollary 6.2: More general results for other numerical linear algebra operations can be found in. The following proof is from. == Motivation == Consider the following running-time model: Measure of computation = Time per FLOP = γ Measure of communication = No. of words of data moved = β ⇒ Total running time = γ·(no. of FLOPs) + β·(no. of words) From the fact that β >> γ as measured in time and energy, communication cost dominates computation cost. Technological trends indicate that the relative cost of communication is increasing on a variety of platforms, from cloud computing to supercomputers to mobile devices. The report also predicts that gap between DRAM access time and FLOPs will increase 100× over coming decade to balance power usage between processors and DRAM. Energy consumption increases by orders of magnitude as we go higher in the memory hierarchy. United States president Barack Obama cited communication-avoiding algorithms in the FY 2012 Department of Energy budget request to Congress: New Algorithm Improves Performance and Accuracy on Extreme-Scale Computing Systems. On modern computer architectures, communication between processors takes longer than the performance of a floating-point arithmetic operation by a given processor. ASCR researchers have developed a new method, derived from commonly used linear algebra methods, to minimize communications between processors and the memory hierarchy, by reformulating the communication patterns specified within the algorithm. This method has been implemented in the TRILINOS framework, a highly-regarded suite of software, which provides functionality for researchers around the world to solve large scale, complex multi-physics problems. == Objectives == Communication-avoiding algorithms are designed with the following objectives: Reorganize algorithms to reduce communication across all memory hierarchies. Attain the lower-bound on communication when possible. The following simple example demonstrates how these are achieved. === Matrix multiplication example === Let A, B and C be square matrices of order n × n. The following naive algorithm implements C = C + A B: for i = 1 to n for j = 1 to n for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) Arithmetic cost (time-complexity): n2(2n − 1) for sufficiently large n or O(n3). Rewriting this algorithm with communication cost labelled at each step for i = 1 to n {read row i of A into fast memory} - n2 reads for j = 1 to n {read C(i,j) into fast memory} - n2 reads {read column j of B into fast memory} - n3 reads for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) {write C(i,j) back to slow memory} - n2 writes Fast memory may be defined as the local processor memory (CPU cache) of size M and slow memory may be defined as the DRAM. Communication cost (reads/writes): n3 + 3n2 or O(n3) Since total running time = γ·O(n3) + β·O(n3) and β >> γ the communication cost is dominant. The blocked (tiled) matrix multiplication algorithm reduces this dominant term: ==== Blocked (tiled) matrix multiplication ==== Consider A, B and C to be n/b-by-n/b matrices of b-by-b sub-blocks where b is called the block size; assume three b-by-b blocks fit in fast memory. for i = 1 to n/b for j = 1 to n/b {read block C(i,j) into fast memory} - b2 × (n/b)2 = n2 reads for k = 1 to n/b {read block A(i,k) into fast memory} - b2 × (n/b)3 = n3/b reads {read block B(k,j) into fast memory} - b2 × (n/b)3 = n3/b reads C(i,j) = C(i,j) + A(i,k) B(k,j) - {do a matrix multiply on blocks} {write block C(i,j) back to slow memory} - b2 × (n/b)2 = n2 writes Communication cost: 2n3/b + 2n2 reads/writes << 2n3 arithmetic cost Making b as large possible: 3b2 ≤ M we achieve the following communication lower bound: 31/2n3/M1/2 + 2n2 or Ω (no. of FLOPs / M1/2) == Previous approaches for reducing communication == Most of the approaches investigated in the past to address this problem rely on scheduling or tuning techniques that aim at overlapping communication with computation. However, this approach can lead to an improvement of at most a factor of two. Ghosting is a different technique for reducing communication, in which a processor stores and computes redundantly data from neighboring processors for future computations. Cache-oblivious algorithms represent a different approach introduced in 1999 for fast Fourier transforms, and then extended to graph algorithms, dynamic programming, etc. They were also applied to several operations in linear algebra as dense LU and QR factorizations. The design of architecture specific algorithms is another approach that can be used for reducing the communication in parallel algorithms, and there are many examples in the literature of algorithms that are adapted to a given communication topology.

    Read more →
  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →
  • Single source of truth

    Single source of truth

    In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas such that every data element is mastered (or edited) in only one place, providing data normalization to a canonical form (for example, in database normalization or content transclusion). There are several scenarios with respect to copies and updates: The master data is never copied and instead only references to it are made; this means that all reads and updates go directly to the SSOT. The master data is copied but the copies are only read and only the master data is updated; if requests to read data are only made on copies, this is an instance of CQRS. The master data is copied and the copies are updated; this needs a reconciliation mechanism when there are concurrent updates. Updates on copies can be thrown out whenever a concurrent update is made on the master, so they are not considered fully committed until propagated to the master. (many blockchains work that way.) Concurrent updates are merged. (if an automatic merge fails, it could fall back on another strategy, which could be the previous strategy or something else like manual intervention, which most source version control systems do.) The advantages of SSOT architectures include easier prevention of mistaken inconsistencies (such as a duplicate value/copy somewhere being forgotten), and greatly simplified version control. Without a SSOT, dealing with inconsistencies implies either complex and error-prone consensus algorithms, or using a simpler architecture that's liable to lose data in the face of inconsistency (the latter may seem unacceptable but it is sometimes a very good choice; it is how most blockchains operate: a transaction is actually final only if it was included in the next block that is mined). Ideally, SSOT systems provide data that are authentic (and authenticatable), relevant, and referable. Deployment of an SSOT architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional or unintentional denormalization of any explicit data model) pose a risk for retrieval of outdated, and therefore incorrect, information. Common examples (i.e., example classes of implementation) are as follows: In electronic health records (EHRs), it is imperative to accurately validate patient identity against a single referential repository, which serves as the SSOT. Duplicate representations of data within the enterprise would be implemented by the use of pointers rather than duplicate database tables, rows, or cells. This ensures that data updates to elements in the authoritative location are comprehensively distributed to all federated database constituencies in the larger overall enterprise architecture. EHRs are an excellent class for exemplifying how SSOT architecture is both poignantly necessary and challenging to achieve: it is challenging because inter-organization health information exchange is inherently a cybersecurity competence hurdle, and nonetheless it is necessary, to prevent medical errors, to prevent the wasted costs of inefficiency (such as duplicated work or rework), and to make the primary care and medical home concepts feasible (to achieve competent care transitions). Single-source publishing as a general principle or ideal in content management relies on having SSOTs, via transclusion or (otherwise, at least) substitution. Substitution happens via libraries of objects that can be propagated as static copies which are later refreshed when necessary (that is, when refreshing of the copy-paste or import is triggered by a larger updating event). Component content management systems are a class of content management systems that aim to provide competence on this level. == Implementation == === Ontologic interactions === An acknowledged prerequisite (of the notion that any given single source of truth can exist) is that it depends on the ontologic condition that no more than a single truth (about any particular fact or idea) exists, an assertion that is ontologic in both the IT sense and the general sense of that word. In many instances, this presents no problem (for example, within particular namespaces, or even across them, as long as naming collisions or broader name conflicts are adequately handled). The broadest contexts (and thus thorniest, regarding ontologic discrepancies) require adequate epistemic regime comparison and reconciliation (or at least negotiation or transactional exchanges). An archetypal example of this class of reconciliation is that two theological seminary libraries, from two different religions (X and Y), could exchange information with an SSOT architecture, but the unification of truth would reside on the level of the statement that "religion X asserts that God is purple whereas religion Y asserts that God is green", rather than on the level of "God is purple" or "God is green". === Architectures or architectural features === An ideal implementation of SSOT is rarely possible in most enterprises. This is because many organisations have multiple information systems, each of which needs access to data relating to the same entities (e.g., customer). Often these systems are purchased as commercial off-the-shelf products from vendors and cannot be modified in trivial ways. Each of these various systems therefore needs to store its own version of common data or entities, and therefore each system must retain its own copy of a record (hence immediately violating the SSOT approach defined above). For example, an enterprise resource planning (ERP) system (such as SAP or Oracle e-Business Suite) may store a customer record; the customer relationship management (CRM) system also needs a copy of the customer record (or part of it) and the warehouse dispatch system might also need a copy of some or all of the customer data (e.g., shipping address). In cases where vendors do not support such modifications, it is not always possible to replace these records with pointers to the SSOT. For organisations (with more than one information system) wishing to implement a Single Source of Truth (without modifying all but one master system to store pointers to other systems for all entities), some supporting architectures are: Master data management (MDM) Event store and event sourcing (ES) ==== Master data management (MDM) ==== A master data management system typically serves as the source of truth for an organization's metadata, helping to ensure accuracy and consistency throughout that organizations multiple data sources. Typically the MDM acts as a hub for multiple systems, many of which could allow (be the source of truth for) updates to different aspects of information on a given entity. For example, the CRM system may be the "source of truth" for most aspects of the customer, and is updated by a call centre operator. However, a customer may (for example) also update their address via a customer service web site, with a different back-end database from the CRM system. The MDM application receives updates from multiple sources, acts as a broker to determine which updates are to be regarded as authoritative (the golden record) and then syndicates this updated data to all subscribing systems. The MDM application normally requires an ESB to syndicate its data to multiple subscribing systems. ==== Event store and event sourcing (ES) ==== In event oriented architectures, it has become increasingly common to find an implementation of the Event Sourcing pattern which stores the system state as an ordered sequence of state changes. To do this, you need an Event Store, a particular type of database designed to hold all the events that change the state of the system. The event store in an Event Sourcing + Command Query Responsibility Separation + Domain Driven Design + Messaging architecture is in fact a "single source of truth", with the additional advantage that it can also act as an Enterprise Service Bus as it can listen directly to the event store for status changes as everything passes by. In addition, by saving all the events, it also plays the role of Data Warehouse. One last advantage is that through this system the Shared Database pattern can be implemented, another technique not mentioned to obtain a single source of truth. ==== Data warehouse (DW) ==== While the primary purpose of a data warehouse is to support reporting and analysis of data that has been combined from multiple sources, the fact that such data has been combined (according to business logic embedded in the data transformation and integration processes) means that the data warehouse is often used as a de facto SSOT. Generally, however, the data available from the data warehouse are not used to update other systems; rather the DW becomes

    Read more →
  • Oversampled binary image sensor

    Oversampled binary image sensor

    An oversampled binary image sensor is an image sensor with non-linear response capabilities reminiscent of traditional photographic film. Each pixel in the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. The response function of the image sensor is non-linear and similar to a logarithmic function, which makes the sensor suitable for high dynamic range imaging. == Working principle == Before the advent of digital image sensors, photography, for the most part of its history, used film to record light information. At the heart of every photographic film are a large number of light-sensitive grains of silver-halide crystals. During exposure, each micron-sized grain has a binary fate: Either it is struck by some incident photons and becomes "exposed", or it is missed by the photon bombardment and remains "unexposed". In the subsequent film development process, exposed grains, due to their altered chemical properties, are converted to silver metal, contributing to opaque spots on the film; unexposed grains are washed away in a chemical bath, leaving behind the transparent regions on the film. Thus, in essence, photographic film is a binary imaging medium, using local densities of opaque silver grains to encode the original light intensity information. Thanks to the small size and large number of these grains, one hardly notices this quantized nature of film when viewing it at a distance, observing only a continuous gray tone. The oversampled binary image sensor is reminiscent of photographic film. Each pixel in the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. At the start of the exposure period, all pixels are set to 0. A pixel is then set to 1 if the number of photons reaching it during the exposure is at least equal to a given threshold q. One way to build such binary sensors is to modify standard memory chip technology, where each memory bit cell is designed to be sensitive to visible light. With current CMOS technology, the level of integration of such systems can exceed 109~1010 (i.e., 1 giga to 10 giga) pixels per chip. In this case, the corresponding pixel sizes (around 50~nm ) are far below the diffraction limit of light, and thus the image sensor is oversampling the optical resolution of the light field. Intuitively, one can exploit this spatial redundancy to compensate for the information loss due to one-bit quantizations, as is classic in oversampling delta-sigma converters. Building a binary sensor that emulates the photographic film process was first envisioned by Fossum, who coined the name digital film sensor (now referred to as a quanta image sensor). The original motivation was mainly out of technical necessity. The miniaturization of camera systems calls for the continuous shrinking of pixel sizes. At a certain point, however, the limited full-well capacity (i.e., the maximum photon-electrons a pixel can hold) of small pixels becomes a bottleneck, yielding very low signal-to-noise ratios (SNRs) and poor dynamic ranges. In contrast, a binary sensor whose pixels need to detect only a few photon-electrons around a small threshold q has much less requirement for full-well capacities, allowing pixel sizes to shrink further. == Imaging model == === Lens === Consider a simplified camera model shown in Fig.1. The λ 0 ( x ) {\displaystyle \lambda _{0}(x)} is the incoming light intensity field. By assuming that light intensities remain constant within a short exposure period, the field can be modeled as only a function of the spatial variable x {\displaystyle x} . After passing through the optical system, the original light field λ 0 ( x ) {\displaystyle \lambda _{0}(x)} gets filtered by the lens, which acts like a linear system with a given impulse response. Due to imperfections (e.g., aberrations) in the lens, the impulse response, a.k.a. the point spread function (PSF) of the optical system, cannot be a Dirac delta, thus, imposing a limit on the resolution of the observable light field. However, a more fundamental physical limit is due to light diffraction. As a result, even if the lens is ideal, the PSF is still unavoidably a small blurry spot. In optics, such diffraction-limited spot is often called the Airy disk, whose radius R a {\displaystyle R_{a}} can be computed as R a = 1.22 w f , {\displaystyle R_{a}=1.22\,wf,} where w {\displaystyle w} is the wavelength of the light and f {\displaystyle f} is the F-number of the optical system. Due to the lowpass (smoothing) nature of the PSF, the resulting λ ( x ) {\displaystyle \lambda (x)} has a finite spatial-resolution, i.e., it has a finite number of degrees of freedom per unit space. === Sensor === Fig.2 illustrates the binary sensor model. The s m {\displaystyle s_{m}} denote the exposure values accumulated by the sensor pixels. Depending on the local values of s m {\displaystyle s_{m}} , each pixel (depicted as "buckets" in the figure) collects a different number of photons hitting on its surface. y m {\displaystyle y_{m}} is the number of photons impinging on the surface of the m {\displaystyle m} th pixel during an exposure period. The relation between s m {\displaystyle s_{m}} and the photon count y m {\displaystyle y_{m}} is stochastic. More specifically, y m {\displaystyle y_{m}} can be modeled as realizations of a Poisson random variable, whose intensity parameter is equal to s m {\displaystyle s_{m}} , As a photosensitive device, each pixel in the image sensor converts photons to electrical signals, whose amplitude is proportional to the number of photons impinging on that pixel. In a conventional sensor design, the analog electrical signals are then quantized by an A/D converter into 8 to 14 bits (usually the more bits the better). But in the binary sensor, the quantizer is 1 bit. In Fig.2, b m {\displaystyle b_{m}} is the quantized output of the m {\displaystyle m} th pixel. Since the photon counts y m {\displaystyle y_{m}} are drawn from random variables, so are the binary sensor output b m {\displaystyle b_{m}} . === Spatial and temporal oversampling === If it is allowed to have temporal oversampling, i.e., taking multiple consecutive and independent frames without changing the total exposure time τ {\displaystyle \tau } , the performance of the binary sensor is equivalent to the sensor with same number of spatial oversampling under certain condition. It means that people can make trade off between spatial oversampling and temporal oversampling. This is quite important, since technology usually gives limitation on the size of the pixels and the exposure time. == Advantages over traditional sensors == Due to the limited full-well capacity of conventional image pixel, the pixel will saturate when the light intensity is too strong. This is the reason that the dynamic range of the pixel is low. For the oversampled binary image sensor, the dynamic range is not defined for a single pixel, but a group of pixels, which makes the dynamic range high. == Reconstruction == One of the most important challenges with the use of an oversampled binary image sensor is the reconstruction of the light intensity λ ( x ) {\displaystyle \lambda (x)} from the binary measurement b m {\displaystyle b_{m}} . Maximum likelihood estimation can be used for solving this problem. Fig. 4 shows the results of reconstructing the light intensity from 4096 binary images taken by single photon avalanche diodes (SPADs) camera. A better reconstruction quality with fewer temporal measurements and faster, hardware friendly implementation, can be achieved by more sophisticated algorithms.

    Read more →
  • Object storage

    Object storage

    Object storage (also known as object-based storage or blob storage) is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems, which manage data as a file hierarchy, and block storage, which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity. Object storage systems allow retention of massive amounts of unstructured data in which data is written once and read once (or many times). Object storage is used for purposes such as storing objects like videos and photos on Facebook, songs on Spotify, or files in online collaboration services, such as Dropbox. One of the limitations with object storage is that it is not intended for transactional data, as object storage was not designed to replace NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file. == History == === Origins === Jim Starkey coined the term blob working at Digital Equipment Corporation to refer to opaque data entities. The terminology was adopted for Rdb/VMS. Blob is often humorously explained to be an abbreviation for binary large object. According to Starkey, this backronym arose when Terry McKiever, working in marketing at Apollo Computer felt that the term needed to be an abbreviation. McKiever began using the expansion basic large object. This was later eclipsed by the retroactive explanation of blobs as binary large objects. According to Starkey, "Blob don't stand for nothin'." Rejecting the acronym, he explained his motivation behind the coinage, saying, "A blob is the thing that ate Cincinnatti [sic], Cleveland, or whatever", referring to the 1958 science fiction film The Blob. In 1995, research led by Garth Gibson on Network-Attached Secure Disks first promoted the concept of splitting less common operations, like namespace manipulations, from common operations, like reads and writes, to optimize the performance and scale of both. In the same year, a Belgian company – FilePool – was established to build the basis for archiving functions. Object storage was proposed at Gibson's Carnegie Mellon University lab as a research project in 1996. Another key concept was abstracting the writes and reads of data to more flexible data containers (objects). Fine grained access control through object storage architecture was further described by one of the NASD team, Howard Gobioff, who later was one of the inventors of the Google File System. Other related work includes the Coda filesystem project at Carnegie Mellon, which started in 1987, and spawned the Lustre file system. There is also the OceanStore project at UC Berkeley, which started in 1999 and the Logistical Networking project at the University of Tennessee Knoxville, which started in 1998. In 1999, Gibson founded Panasas to commercialize the concepts developed by the NASD team. === Development === Seagate Technology played a central role in the development of object storage. According to the Storage Networking Industry Association (SNIA), "Object storage originated in the late 1990s: Seagate specifications from 1999 Introduced some of the first commands and how operating system effectively removed from consumption of the storage." A preliminary version of the "OBJECT BASED STORAGE DEVICES Command Set Proposal" dated 10/25/1999 was submitted by Seagate as edited by Seagate's Dave Anderson and was the product of work by the National Storage Industry Consortium (NSIC) including contributions by Carnegie Mellon University, Seagate, IBM, Quantum, and StorageTek. This paper was proposed to INCITS T-10 (International Committee for Information Technology Standards) with a goal to form a committee and design a specification based on the SCSI interface protocol. This defined objects as abstracted data, with unique identifiers and metadata, how objects related to file systems, along with many other innovative concepts. Anderson presented many of these ideas at the SNIA conference in October 1999. The presentation revealed an IP Agreement that had been signed in February 1997 between the original collaborators (with Seagate represented by Anderson and Chris Malakapalli) and covered the benefits of object storage, scalable computing, platform independence, and storage management. == Architecture == === Abstraction of storage === One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of blocks or (exclusively) files. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions. === Inclusion of rich custom metadata within the object === Object storage explicitly separates file metadata from data to support additional capabilities. As opposed to fixed metadata in file systems (filename, creation date, type, etc.), object storage provides for full function, custom, object-level metadata in order to: Capture application-specific or user-specific information for better indexing purposes Support data-management policies (e.g. a policy to drive object movement from one storage tier to another) Centralize management of storage across many individual nodes and clusters Optimize metadata storage (e.g. encapsulated, database or key value storage) and caching/indexing (when authoritative metadata is encapsulated with the metadata inside the object) independently from the data storage (e.g. unstructured binary storage) Additionally, in some object-based file-system implementations: The file system clients only contact metadata servers once when the file is opened and then get content directly via object-storage servers (vs. block-based file systems which would require constant metadata access) Data objects can be configured on a per-file basis to allow adaptive stripe width, even across multiple object-storage servers, supporting optimizations in bandwidth and I/O Object-based storage devices (OSD) as well as some software implementations (e.g., DataCore Swarm) manage metadata and data at the storage device level: Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, data is organized into flexible-sized data containers, called objects Each object has both data (an uninterpreted sequence of bytes) and metadata (an extensible set of attributes describing the object); physically encapsulating both together benefits recoverability. The command interface includes commands to create and delete objects, write bytes and read bytes to and from individual objects, and to set and get attributes on objects Security mechanisms provide per-object and per-command access control === Programmatic data management === Object storage provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes Create, read, update and delete (CRUD) functions for basic read, write and delete operations. Some object storage implementations go further, supporting additional functionality like object/file versioning, object replication, life-cycle management and movement of objects between different tiers and types of storage. Most API implementations are REST-based, allowing the use of many standard HTTP calls. == Implementation == === Cloud storage === The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon S3, which debuted in March 2006, Microsoft Azure Blob Storage, IBM Cloud Object Storage, Rackspace Cloud Files (whose code was donated in 2010 to Openstack project and released as OpenStack Swift), and Google Cloud Storage released in May 2010. === Object-based file systems === Some distributed file systems use an object-based architecture, where file metadata is stored in metadata servers and file data is stored i

    Read more →
  • Authoritative Legal Entity Identifier

    Authoritative Legal Entity Identifier

    An Authoritative Legal Entity Identifier (ALEI) is the identifier assigned by a government jurisdiction authorized by statute or decree to create a legal entity and to maintain the authoritative registries of legal entities. ALEIs are used within supply chain data, ERP applications and master data management systems to support accurate and consistent identification of entities in digital records, supply chains, and government databases. ALEIs are described in the international standard ISO 8000-116, which outlines a structured format that makes the locally unique identifier into a globally unique one and ensures global interoperability and data quality. == Structure == An ALEI is composed of three main components: a prefix that identifies the jurisdiction and register, a subdomain element (optional), and the local registration number of the entity. For example, the identifier "US-DE.BER:3031657" refers to an entity registered in the Delaware Business Entity Register in the United States. The standardization of this structure is governed by ISO 8000-116, which is designed to ensure each ALEI is globally unique and resolvable. == Comparison with other identifiers == ALEIs differ from proxy identifiers such as the DUNS number, NCAGE code, or the Legal Entity Identifier (LEI) managed by GLEIF. While proxy identifiers can be issued by institutions that do not create legal entities, ALEIs are created and maintained by public bodies with the authority to form and register legal entities. This authoritative origin makes ALEIs particularly suitable for applications involving legal traceability, government regulation, and international transparency efforts. == Usage == ALEIs are increasingly utilized to identify legal entities in public and private datasets. The identifiers support supply chain accuracy, regulatory compliance, and the unification of master data. The first practical implementation of an ALEI was the International Business Registration Number (IBRN), developed to provide globally unique identifiers for registered business entities. IBRNs are issued by authorized government jurisdictions and are used to verify entities across borders, particularly in the context of trade facilitation and data exchange systems. For instance, business directories and registration systems in U.S. states like Connecticut provide structured registration documents that can be used to verify the ALEIs they issue. The use of ALEIs has been recommended by international organizations such as the Extractive Industries Transparency Initiative (EITI) and Open ownership to improve beneficial ownership registries. == Policy and regulation == ALEIs have been referenced in policy consultations such as those related to the U.S. Financial Data Transparency Act. Federal institutions including the Federal Reserve and FDIC have examined the potential for ALEIs to unify entity identification across regulatory databases.

    Read more →